Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-05-24 Thread Robin H. Johnson
On Mon, May 04, 2020 at 11:57:03PM +0100, Andrey Utkin wrote: > Since it is going to be opt-in and optional anyway, we seem to be fine with > having just partial data. > > I assume we have logs of distfiles downloads from Gentoo infrastructure, and > can negotiate access to relevant logs of our

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-05-05 Thread Toralf Förster
On 5/5/20 10:26 PM, Daniel Pielmeier wrote: > Actually the maintainer decided to continue the project. > The code is now hosted at Github [1]. > The site moved to a new server and the upload is working again. > > [1] https://github.com/portagefilelist > > -- > Best regards > Daniel Indeed -

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-05-05 Thread Daniel Pielmeier
Am May 5, 2020 7:31:34 PM UTC schrieb "Toralf Förster" : >On 4/26/20 10:08 AM, Michał Górny wrote: >> I don't think we really want to try to investigate >> which files are actually used but focus on what's installed. >Hi, > >I do wonder if the http://www.portagefilelist.de/site/start (package

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-05-05 Thread Kent Fredric
On Tue, 5 May 2020 02:47:48 +0200 Thomas Deutschmann wrote: > Yes it would be a signal but a useless signal, not? "There are no users reported using this dist, so we can nuke it" is still far far superior to "there are no reverse dependencies, so we can nuke it" *Even* when the former is false

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-05-05 Thread Toralf Förster
On 4/26/20 10:08 AM, Michał Górny wrote: > I don't think we really want to try to investigate > which files are actually used but focus on what's installed. Hi, I do wonder if the http://www.portagefilelist.de/site/start (package app-portage/pfl) would be part of that or not? The maintainer of

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-05-05 Thread Jaco Kroon
Hi Michał, and the rest of the Gentoo devs, I've been patiently sitting and watching this discussion. I raised some ideas with another developer (Not Michał) just days before he raised this thread to the ML. I believe all points raised to this point is valid, I'll try to summarise: 1.  This

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-05-05 Thread Nils Freydank
Hi all, I find the idea of having data great, but agree that it can lead to a false sense of having a correct data base. Therefor two thoughts: First, therefore I'd like to propose that you introduce gentoostats as a *strictly timed experiment* and evaluate if it actually changed anything within

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-05-05 Thread Michał Górny
On Tue, 2020-05-05 at 02:47 +0200, Thomas Deutschmann wrote: > Yes it would be a signal but a useless signal, not? > You seem to aim for arbitrarily blocking developers from making decisions by preventing them from having data. This won't work. Firstly, because *we have* to make decisions, and

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-05-05 Thread Alec Warner
On Mon, May 4, 2020 at 10:14 PM Matt Turner wrote: > On Mon, May 4, 2020 at 5:48 PM Thomas Deutschmann > wrote: > > > > On 2020-04-26 15:46, Kent Fredric wrote: > > > On Sun, 26 Apr 2020 14:38:54 +0200 > > > Thomas Deutschmann wrote: > > > > > >> Let's assume we will get reports that

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-05-04 Thread Matt Turner
On Mon, May 4, 2020 at 5:48 PM Thomas Deutschmann wrote: > > On 2020-04-26 15:46, Kent Fredric wrote: > > On Sun, 26 Apr 2020 14:38:54 +0200 > > Thomas Deutschmann wrote: > > > >> Let's assume we will get reports that app-misc/foo is only installed 20 > >> times. If you are going to judge based

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-05-04 Thread Thomas Deutschmann
On 2020-04-26 15:46, Kent Fredric wrote: > On Sun, 26 Apr 2020 14:38:54 +0200 > Thomas Deutschmann wrote: > >> Let's assume we will get reports that app-misc/foo is only installed 20 >> times. If you are going to judge based on this data, "Obviously, nobody >> is using that package, it's stuck

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-05-04 Thread Thomas Deutschmann
On 2020-05-05 00:57, Andrey Utkin wrote: > I assume we have logs of distfiles downloads from Gentoo infrastructure, and > can negotiate access to relevant logs of our mirrors. That constitutes partial > data correlated with users' installation activity, as good as it gets. Even if we would have

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-05-04 Thread Andrey Utkin
Since it is going to be opt-in and optional anyway, we seem to be fine with having just partial data. I assume we have logs of distfiles downloads from Gentoo infrastructure, and can negotiate access to relevant logs of our mirrors. That constitutes partial data correlated with users'

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Andreas K . Hüttel
Am Sonntag, 26. April 2020, 12:09:59 EEST schrieb Ulrich Mueller: > > On Sun, 26 Apr 2020, Michał Górny wrote: > > The other major problem is spam protection. The best semi-anonymous way > > I see is to use submitter's IPv4 addresses (can we support IPv6 then?). > > We could set a limit of,

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Samuel Bernardo
Hi everyone, gentoostats is a novelty for me and I'm not aware of previous discussions or implementations. But for what I could understand from the comments and Michał Górny explanation, I would start to ask your attention to octoverse[1] initiative. Maybe collected statistics could be a

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Kent Fredric
On Sun, 26 Apr 2020 10:52:27 +0200 Michał Górny wrote: > Do you have any other idea for spam protection then? What is the realistic risk here for spamming? If the record is well formed, and pertains to known packages, the worst I currently imagine is astroturfing: A single individual

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Kent Fredric
On Sun, 26 Apr 2020 03:39:24 -0700 Brian Dolbec wrote: > We would need that > person/team to only enable their test system for gentoostats/disabled > for deployments. Repeated failure to do that could result in that uuid > being blacklisted. Part of the initial profile details for that >

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Kent Fredric
On Sun, 26 Apr 2020 14:38:54 +0200 Thomas Deutschmann wrote: > Let's assume we will get reports that app-misc/foo is only installed 20 > times. If you are going to judge based on this data, "Obviously, nobody > is using that package, it's stuck on ... safe to remove" your > view is biased: I

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Kent Fredric
On Sun, 26 Apr 2020 10:08:32 +0200 Michał Górny wrote: > A proper solution to cluster problem would probably involve some way to > internally collect and combine data data before submission. If you have > large clusters of similar systems, I think you'd want to have all > packages used on

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Thomas Deutschmann
On 2020-04-26 10:08, Michał Górny wrote: > What do you think? Do you foresee other problems? Do you have > other needs? Can you think of better solutions? While I would really like to have data, I think it's impossible to get correct data and therefore we shouldn't collect any data at all

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Toralf Förster
On 4/26/20 12:25 PM, Michał Górny wrote: > On Sun, 2020-04-26 at 12:15 +0200, Toralf Förster wrote: >> On 4/26/20 10:52 AM, Michał Górny wrote: >>> Do you have any other idea for spam protection then? >> >> IMO there're 2 types of spam: >> >> 1. made by accident (eg. "* * * * *" instead "@weekly"

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Brian Dolbec
On Sun, 26 Apr 2020 11:32:06 +0200 Toralf Förster wrote: > On 4/26/20 11:09 AM, Ulrich Mueller wrote: > > Instead of using the IP address, you could generate a UUID when > > installing the tool. > > like the pfl tool did ? > Like the last gentoostats gsoc project did. As for

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Michał Górny
On Sun, 2020-04-26 at 12:15 +0200, Toralf Förster wrote: > On 4/26/20 10:52 AM, Michał Górny wrote: > > Do you have any other idea for spam protection then? > > IMO there're 2 types of spam: > > 1. made by accident (eg. "* * * * *" instead "@weekly" in crontab) > 2. made intentionlly > > The

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Toralf Förster
On 4/26/20 10:52 AM, Michał Górny wrote: > Do you have any other idea for spam protection then? IMO there're 2 types of spam: 1. made by accident (eg. "* * * * *" instead "@weekly" in crontab) 2. made intentionlly The 1st can be handled by UUID - just drop any old related dataset from inbox

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Michał Górny
On Sun, 2020-04-26 at 11:09 +0200, Ulrich Mueller wrote: > > > > > > On Sun, 26 Apr 2020, Michał Górny wrote: > > The other major problem is spam protection. The best semi-anonymous way > > I see is to use submitter's IPv4 addresses (can we support IPv6 then?). > > We could set a limit of, say,

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Toralf Förster
On 4/26/20 11:09 AM, Ulrich Mueller wrote: > Instead of using the IP address, you could generate a UUID when > installing the tool. like the pfl tool did ? -- Toralf PGP 23217DA7 9B888F45 signature.asc Description: OpenPGP digital signature

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Ulrich Mueller
> On Sun, 26 Apr 2020, Michał Górny wrote: > The other major problem is spam protection. The best semi-anonymous way > I see is to use submitter's IPv4 addresses (can we support IPv6 then?). > We could set a limit of, say, 10 submissions per IPv4 address per week. > If some address would

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Michał Górny
On Sun, 2020-04-26 at 10:43 +0200, Toralf Förster wrote: > On 4/26/20 10:08 AM, Michał Górny wrote: > > . This > > involves accepting a privacy policy and setting up a cronjob. The tool > > would suggest a (random?) time for submission to take place periodically > > (say, every week). > > Well,

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Toralf Förster
On 4/26/20 10:08 AM, Michał Górny wrote: > . This > involves accepting a privacy policy and setting up a cronjob. The tool > would suggest a (random?) time for submission to take place periodically > (say, every week). Well, something like "@weekly" should be preferred over eg "42 23 * * *" b/c

Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation

2020-04-26 Thread Alarig Le Lay
Hi, On Sun 26 Apr 2020 10:08:32 GMT, Michał Górny wrote: > The other major problem is spam protection. The best semi-anonymous way > I see is to use submitter's IPv4 addresses (can we support IPv6 then?). > We could set a limit of, say, 10 submissions per IPv4 address per week. > If some