Hi,
At Reproducible Builds we just added popcon stats to our issues page, to help
us better understand which issues to prioritise:
https://tests.reproducible-builds.org/debian/index_issues.html

## Advertising

However, we work on source packages, but popcon data is based on binary
packages. This means that that page is currently very inaccurate for some
packages - for example it thinks "linux" has a popcon score of 6.
Popcon does provide stats for source packages at
http://popcon.debian.org/source/by_inst
however, these stats are basically useless - the "score" for each source
package, is simply the sum total of the scores for the binary packages produced
by that source package. This is *not* the correct way to calculate "popularity"
for a source package, since it is heavily biased in favour of source packages
with many binary packages that must be co-installed.
What we really want is the statistic "number of people that have installed
binary-package-1 OR binary-package-2 OR .. OR binary-package-n". It is
mathematically impossible to calculate this from the data that popcon is
currently providing at http://popcon.debian.org/, however fixing this is easy -
we would simply need to change the backend to keep a separate
"by-source-package" dump of data, that is based on set-union
(logical-disjunction) and not arithmetic-sum.
I had thought about coding up heuristics to estimate this, but it would be
better to just have the popcon backend calculate this exactly, for others to
consume.
I'd be happy to submit a patch for the popcon backend, but I could only find
the client source code here: https://anonscm.debian.org/cgit/popcon/ Could you
let me know how I could submit a patch for the backend?
X
--
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
git://github.com/infinity0/pubkeys.git
_______________________________________________
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds