At Reproducible Builds we just added popcon stats to our issues page, to help 
us better understand which issues to prioritise: 

However, we work on source packages, but popcon data is based on binary 
packages. This means that that page is currently very inaccurate for some 
packages - for example it thinks "linux" has a popcon score of 6.

Popcon does provide stats for source packages at 

however, these stats are basically useless - the "score" for each source 
package, is simply the sum total of the scores for the binary packages produced 
by that source package. This is *not* the correct way to calculate "popularity" 
for a source package, since it is heavily biased in favour of source packages 
with many binary packages that must be co-installed.

What we really want is the statistic "number of people that have installed 
binary-package-1 OR binary-package-2 OR .. OR binary-package-n". It is 
mathematically impossible to calculate this from the data that popcon is 
currently providing at http://popcon.debian.org/, however fixing this is easy - 
we would simply need to change the backend to keep a separate 
"by-source-package" dump of data, that is based on set-union 
(logical-disjunction) and not arithmetic-sum.

I had thought about coding up heuristics to estimate this, but it would be 
better to just have the popcon backend calculate this exactly, for others to 

I'd be happy to submit a patch for the popcon backend, but I could only find 
the client source code here: https://anonscm.debian.org/cgit/popcon/ Could you 
let me know how I could submit a patch for the backend?


GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE

Reproducible-builds mailing list

Reply via email to