Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
On 06/08/2011 04:36 PM, Vikraman wrote: * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size, and Build time for each installed package How many operations do you expect for a submissions with 1000 packages on SQL level? Will that be around 1000 inserts? Best, Sebastian
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
On Sat, Jun 11, 2011 at 01:10:36AM +0200, Sebastian Pipping wrote: On 06/08/2011 04:36 PM, Vikraman wrote: * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size, and Build time for each installed package How many operations do you expect for a submissions with 1000 packages on SQL level? Will that be around 1000 inserts? One insert for each package entry, and one insert for every useflag. Best, Sebastian -- Vikraman signature.asc Description: PGP signature
[gentoo-dev] Gentoo package statistics -- GSoC 2011
Hi everyone, I'm working on the `Package statistics` project this year. Till now, I have managed to write a client and server[0] to collect the following information from hosts: * Uname, portage profile, timestamp of portage tree * ARCH, CHOST, CFLAGS, CXXFLAGS, FFLAGS, LDFLAGS, MAKEOPTS * ACCEPT_KEYWORDS, FEATURES, USE, LANG, SYNC, GENTOO_MIRRORS * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size, and Build time for each installed package Is there a need to collect files installed by a package ? Doesn't PFL[1] already provide that ? Please provide some feedback on what other data should be collected, etc. Also, I'm starting work on the webUI, and would like some recommendations for stats pages, such as: * Packages installed sorted by users * Top arches, keywords, profiles * Most enabled, disabled useflags per package/globally [0] http://git.overlays.gentoo.org/gitweb/?p=proj/gentoostats.git;a=commit;h=1b9697a090515d2a373e83b1094d6e08ec405c02 [1] http://www.portagefilelist.de/index.php/Main_Page -- Vikraman signature.asc Description: PGP signature
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
On 6/8/11 4:36 PM, Vikraman wrote: I'm working on the `Package statistics` project this year. Till now, I have managed to write a client and server[0] to collect the following information from hosts: Excellent, good luck with the idea! I think that better information about how Gentoo is actually used will greatly help improving it. Is there a need to collect files installed by a package ? Doesn't PFL[1] already provide that ? Well, PFL is not an official Gentoo project. It might be useful, but I wouldn't say it's a priority. Please provide some feedback on what other data should be collected, etc. In my opinion it's *not* about collecting as much data as possible. I think it's most important to get the core functionality working really well, and convincing as large percentage of users as possible to enable reporting the statistics (to make the results - hopefully - accurately represent the user base). Please note that in some cases it may mean collecting _less_ data, or thinking more about the privacy of the users. For me, as a developer, even a list of packages sorted by popularity (aka Debian/Ubuntu popcon) would be very useful. Ah, and maybe files in /etc/portage: package.keywords and so on. It could be useful to see what people are masking/unmasking, that may be an indication of stale stabilizations or brokenness hitting the tree. Anyway, I'd call it an enhancement. Also, I'm starting work on the webUI, and would like some recommendations for stats pages, such as: * Packages installed sorted by users Cool! * Top arches, keywords, profiles And percentage of ~arch vs arch users? * Most enabled, disabled useflags per package/globally Also great, especially the per-package variant. It'd be also useful to have per-profile data, to better tune the profile defaults. [0] http://git.overlays.gentoo.org/gitweb/?p=proj/gentoostats.git;a=commit;h=1b9697a090515d2a373e83b1094d6e08ec405c02 I took a quick look at the code. Some random comments: - it uses portage Python API a lot. But it's not stable, or at least not guaranteed to be stable. Have you considered using helpers like portageq (or eventually enhancing those helpers)? - make the licensing super-clear (a LICENSE file, possibly some header in every source file, and so on) - how about submitting the data over HTTPS and not HTTP to better help privacy? - don't leave exception handling as a TODO; it should be a part of your design, not an afterthought - instead of or in addition to the setup.txt file, how about just writing the real setup.py file for distutils? signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
Wasn't there a project like this a couple of years ago which tried to use a cross-distro tool ? -- Gilles Dartiguelongue e...@gentoo.org Gentoo
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
On Wed, 2011-06-08 at 17:19 +0200, Paweł Hajdan, Jr. wrote: In my opinion it's *not* about collecting as much data as possible. I think it's most important to get the core functionality working really well, and convincing as large percentage of users as possible to enable reporting the statistics (to make the results - hopefully - accurately represent the user base). Please note that in some cases it may mean collecting _less_ data, or thinking more about the privacy of the users. +1 on this. Taking the extreme, I'd rather see a properly implemented architecture that is installed on 50% of Gentoo system just reporting on the arch, then something that collects a lot more data and is installed on 50 machines. Once the framework is in place and there is user uptake then it is easy to slowly extend the statistics collection and gather more useful data. For me, as a developer, even a list of packages sorted by popularity (aka Debian/Ubuntu popcon) would be very useful. That would be useful. Ah, and maybe files in /etc/portage: package.keywords and so on. It could be useful to see what people are masking/unmasking, that may be an indication of stale stabilizations or brokenness hitting the tree. Anyway, I'd call it an enhancement. I'd rather not see this in the initial gsoc project if that means we'll sacrifice a big rollout. Kind regards, Hans signature.asc Description: This is a digitally signed message part
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
On 08.06.2011 18:36, Vikraman wrote: Hi everyone, I'm working on the `Package statistics` project this year. Till now, I have managed to write a client and server[0] to collect the following information from hosts: * Uname, portage profile, timestamp of portage tree * ARCH, CHOST, CFLAGS, CXXFLAGS, FFLAGS, LDFLAGS, MAKEOPTS * ACCEPT_KEYWORDS, FEATURES, USE, LANG, SYNC, GENTOO_MIRRORS * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size, and Build time for each installed package May be collect hardware info kernel configs too? For example cpuinfo, lspci and lsusb(?). I think, that after 1-3 month after installing gentoo, user can(should) receive newsitem about participating in `Package statistics` project. This newsitem can contains short instruction how-to install and configure this tool. And even in other gentoo projects(for example write short wiki page) And, where can I found ebuilds to the `Package statistics` project? Sory for my english... and Good luck!
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
On Wed, Jun 08, 2011 at 05:19:33PM +0200, Paweł Hajdan, Jr. wrote: On 6/8/11 4:36 PM, Vikraman wrote: I'm working on the `Package statistics` project this year. Till now, I have managed to write a client and server[0] to collect the following information from hosts: Excellent, good luck with the idea! I think that better information about how Gentoo is actually used will greatly help improving it. Well, that information cannot be collected automatically, can it ? Is there a need to collect files installed by a package ? Doesn't PFL[1] already provide that ? Well, PFL is not an official Gentoo project. It might be useful, but I wouldn't say it's a priority. Please provide some feedback on what other data should be collected, etc. In my opinion it's *not* about collecting as much data as possible. I think it's most important to get the core functionality working really well, and convincing as large percentage of users as possible to enable reporting the statistics (to make the results - hopefully - accurately represent the user base). Please note that in some cases it may mean collecting _less_ data, or thinking more about the privacy of the users. For me, as a developer, even a list of packages sorted by popularity (aka Debian/Ubuntu popcon) would be very useful. Ah, and maybe files in /etc/portage: package.keywords and so on. It could be useful to see what people are masking/unmasking, that may be an indication of stale stabilizations or brokenness hitting the tree. Anyway, I'd call it an enhancement. Also, I'm starting work on the webUI, and would like some recommendations for stats pages, such as: * Packages installed sorted by users Cool! * Top arches, keywords, profiles And percentage of ~arch vs arch users? * Most enabled, disabled useflags per package/globally Also great, especially the per-package variant. It'd be also useful to have per-profile data, to better tune the profile defaults. [0] http://git.overlays.gentoo.org/gitweb/?p=proj/gentoostats.git;a=commit;h=1b9697a090515d2a373e83b1094d6e08ec405c02 I took a quick look at the code. Some random comments: - it uses portage Python API a lot. But it's not stable, or at least not guaranteed to be stable. Have you considered using helpers like portageq (or eventually enhancing those helpers)? - make the licensing super-clear (a LICENSE file, possibly some header in every source file, and so on) - how about submitting the data over HTTPS and not HTTP to better help privacy? Fair points, thanks! - don't leave exception handling as a TODO; it should be a part of your design, not an afterthought - instead of or in addition to the setup.txt file, how about just writing the real setup.py file for distutils? Yes, these are part of my sub-goals for next week. -- Vikraman signature.asc Description: PGP signature
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
On Wed, Jun 08, 2011 at 09:35:26PM +0400, Николай Антонов wrote: On 08.06.2011 18:36, Vikraman wrote: Hi everyone, I'm working on the `Package statistics` project this year. Till now, I have managed to write a client and server[0] to collect the following information from hosts: * Uname, portage profile, timestamp of portage tree * ARCH, CHOST, CFLAGS, CXXFLAGS, FFLAGS, LDFLAGS, MAKEOPTS * ACCEPT_KEYWORDS, FEATURES, USE, LANG, SYNC, GENTOO_MIRRORS * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size, and Build time for each installed package May be collect hardware info kernel configs too? For example cpuinfo, lspci and lsusb(?). That's not part of package statistics. There's the smolt project for hardware statistics. I think, that after 1-3 month after installing gentoo, user can(should) receive newsitem about participating in `Package statistics` project. This newsitem can contains short instruction how-to install and configure this tool. And even in other gentoo projects(for example write short wiki page) And, where can I found ebuilds to the `Package statistics` project? The server hasn't been deployed yet, and ebuilds will be available soon! Sory for my english... and Good luck! -- Vikraman signature.asc Description: PGP signature
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
On 17:19 Wed 08 Jun , Paweł Hajdan, Jr. wrote: On 6/8/11 4:36 PM, Vikraman wrote: I'm working on the `Package statistics` project this year. Till now, I have managed to write a client and server[0] to collect the following information from hosts: Excellent, good luck with the idea! I think that better information about how Gentoo is actually used will greatly help improving it. Is there a need to collect files installed by a package ? Doesn't PFL[1] already provide that ? Well, PFL is not an official Gentoo project. It might be useful, but I wouldn't say it's a priority. I would love to see it happen, but it's more important to roll out a minimal working solution now and add on later. By combining installed files with USE flag settings, this project could actually attempt to factor out which USE flags result in which files in an automatic fashion. That would address one of the biggest objections many people have had to such a package-to-file search engine. It would also be pretty useful for some other GSoC projects, like the ebuild generator and the auto dependency scanner. -- Thanks, Donnie Donnie Berkholz Sr. Developer, Gentoo Linux Blog: http://dberkholz.com pgp99ZuhwbiGQ.pgp Description: PGP signature
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
On Wed, 2011-06-08 at 23:31 +0530, Vikraman wrote: Excellent, good luck with the idea! I think that better information about how Gentoo is actually used will greatly help improving it. Well, that information cannot be collected automatically, can it ? You could pop up a window at random times and ask the user. So it can be done. Whether it's a good idea … Hans signature.asc Description: This is a digitally signed message part
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
El 08/06/11 20:07, Vikraman escribió: On Wed, Jun 08, 2011 at 09:35:26PM +0400, Николай Антонов wrote: On 08.06.2011 18:36, Vikraman wrote: Hi everyone, I'm working on the `Package statistics` project this year. Till now, I have managed to write a client and server[0] to collect the following information from hosts: * Uname, portage profile, timestamp of portage tree * ARCH, CHOST, CFLAGS, CXXFLAGS, FFLAGS, LDFLAGS, MAKEOPTS * ACCEPT_KEYWORDS, FEATURES, USE, LANG, SYNC, GENTOO_MIRRORS * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size, and Build time for each installed package May be collect hardware info kernel configs too? For example cpuinfo, lspci and lsusb(?). That's not part of package statistics. There's the smolt project for hardware statistics. Well there is another reason about why you don't want' to log that: Hardened users. Not having access to the kernel .config helps in making the system more resilient to some attacks, as a result many hardened users are very stubborn in not having the .config files published. signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
May be collect hardware info kernel configs too? For example cpuinfo, lspci and lsusb(?). That's not part of package statistics. There's the smolt project for hardware statistics. Well there is another reason about why you don't want' to log that: Hardened users. Not having access to the kernel .config helps in making the system more resilient to some attacks, as a result many hardened users are very stubborn in not having the .config files published. I would really like to see a nice way to set what information I want sent. Perhaps a config file in /etc ? Also, an option to see what is being sent would be great. :) I look forward to start contributing my machine's info. -Ross