Re: Popularity contest for Fedora

2020-12-28 Thread Stephen John Smoogen
On Sun, 27 Dec 2020 at 17:52, Matthew Miller 
wrote:

> On Sun, Dec 27, 2020 at 07:44:57PM +0100, clime wrote:
> > I think we can simply parse server-side access logs to count package
> > downloads, no?
>
> We can for our primary server, but most people get updates from mirrors
>
which we don't run directly. The central mirrorlist (from which I get the
> dnf count data) just redirects people to those mirrors. Even if we could
> get
> package download counts from the mirrors, they're heavily skewed by:
>
> * public mirrors pulling the whole thing
> * people pulling the whole thing for a private mirror
> * ci and build systems (like, running mock)
> * mysterious bots downloading stuff for whatever reason
> * proxies and caching
>
>
There are a couple of other items which make it hard to see and impossible
for even our primary servers to be useful. When you look at the logs, there
is nothing that indicates whether a package is being installed, updated, or
pulled in as a dependency. This means that any stats will show which
packages get updated the most during a release or have a lot of
sub-packages which might get pulled in.

The mirroring effect also has a noise problem where a client  got some of
his packages from one mirror and then got mostly dependencies from a
secondary mirror.

Finally CI and build systems swamp all other downloads from mirrors these
days. Depending on how they are setup some seem to do a ```yum install *```
before operating. My guess is that at least 60% of all traffic is CI these
days. (I expect that this also the case for a lot of other distributions
also).

Packages with lots of updates sounds like they might be getting more
interest but you have a lot of upstreams who do 2 week sprint releases
which mean there are lots of regular updates.

All in all, what you get by looking at a mirrors data is a 'reverse
popularity contest'. Packages like the kernel, glibc, firefox, and every
dependency which gets an update sits on top. Packages at the bottom may be
the ones being asked for but they are also dependencies which aren't pulled
in a lot or don't see an update.

In the end I think popcorn might be better BUT they are also hard to setup
in these days of trolls and GDPR. [Heck smolt had almost more trolls in it
than regular data by the end of it.. so many people set up PDP-11 and VAX
as their hardware running Fedora.]



and probably more. Popcon and smolt are better because it's actual
> individual system data. On the other than, they're worse as mentioned
> because opt-in doesn't give a realistic picture.
>
>
> --
> Matthew Miller
> 
> Fedora Project Leader
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
>


-- 
Stephen J Smoogen.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Popularity contest for Fedora

2020-12-27 Thread Matthew Miller
On Sun, Dec 27, 2020 at 07:44:57PM +0100, clime wrote:
> I think we can simply parse server-side access logs to count package
> downloads, no?

We can for our primary server, but most people get updates from mirrors
which we don't run directly. The central mirrorlist (from which I get the
dnf count data) just redirects people to those mirrors. Even if we could get
package download counts from the mirrors, they're heavily skewed by:

* public mirrors pulling the whole thing
* people pulling the whole thing for a private mirror
* ci and build systems (like, running mock)
* mysterious bots downloading stuff for whatever reason
* proxies and caching

and probably more. Popcon and smolt are better because it's actual
individual system data. On the other than, they're worse as mentioned
because opt-in doesn't give a realistic picture.


-- 
Matthew Miller

Fedora Project Leader
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Popularity contest for Fedora

2020-12-27 Thread John Reiser

I think we can simply parse server-side access logs to count package
downloads, no?


That ignores the effect of caching proxies, which are prevalent in academic
and corporate environments.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Popularity contest for Fedora

2020-12-27 Thread Vitaly Zaitsev via devel

On 27.12.2020 19:44, clime wrote:

I think we can simply parse server-side access logs to count package
downloads, no?


On every third-party mirror?

--
Sincerely,
  Vitaly Zaitsev (vit...@easycoding.org)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Popularity contest for Fedora

2020-12-27 Thread clime
On Sun, 27 Dec 2020 at 17:41, Gary Buhrmaster  wrote:
>
> On Sun, Dec 27, 2020 at 3:12 PM Matthew Miller  
> wrote:
>
> > It's been talked about before but no one has done it.
>
> There was also smolt, which collected some
> system information (and could be extended
> to collect more)  However, not only did the
> upstream die, follow-on proposals never
> took off, and also opened the entire
> can-of-worms regarding an opt-in data
> collection mechanism (and it was agreed
> by most it had to be opt-in) not being able to
> provide useful data to actually make good
> decisions on.  It is also true that many wish
> we did have sufficiently good data in order
> to make good decisions.  Rock, meet hard
> place.

I think we can simply parse server-side access logs to count package
downloads, no?

It won't be probably very precise but could be enough to give us a basic idea...

clime

> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Popularity contest for Fedora

2020-12-27 Thread Gary Buhrmaster
On Sun, Dec 27, 2020 at 3:12 PM Matthew Miller  wrote:

> It's been talked about before but no one has done it.

There was also smolt, which collected some
system information (and could be extended
to collect more)  However, not only did the
upstream die, follow-on proposals never
took off, and also opened the entire
can-of-worms regarding an opt-in data
collection mechanism (and it was agreed
by most it had to be opt-in) not being able to
provide useful data to actually make good
decisions on.  It is also true that many wish
we did have sufficiently good data in order
to make good decisions.  Rock, meet hard
place.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Popularity contest for Fedora

2020-12-27 Thread Matthew Miller
On Sat, Dec 26, 2020 at 05:33:39PM -0600, Ron Olson wrote:
> Has anything like this been considered for Fedora? It would actually
> be kind of nice to see installation statistics of my packages, if
> only to determine if I’m the only one using them. :)

It's been talked about before but no one has done it.

-- 
Matthew Miller

Fedora Project Leader
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Popularity contest for Fedora

2020-12-27 Thread Vitaly Zaitsev via devel

On 27.12.2020 00:33, Ron Olson wrote:
Has anything like this been considered for Fedora? It would actually be 
kind of nice to see installation statistics of my packages, if only to 
determine if I’m the only one using them. :)


Telemetry and user tracking are evil.

--
Sincerely,
  Vitaly Zaitsev (vit...@easycoding.org)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org