Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011

2011-06-10 Thread Sebastian Pipping
On 06/08/2011 04:36 PM, Vikraman wrote:
 * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size,
   and Build   time for each installed package

How many operations do you expect for a submissions with 1000 packages
on SQL level?  Will that be around 1000 inserts?

Best,



Sebastian



Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011

2011-06-10 Thread Vikraman
On Sat, Jun 11, 2011 at 01:10:36AM +0200, Sebastian Pipping wrote:
 On 06/08/2011 04:36 PM, Vikraman wrote:
  * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size,
and Build time for each installed package
 
 How many operations do you expect for a submissions with 1000 packages
 on SQL level?  Will that be around 1000 inserts?
 

One insert for each package entry, and one insert for every useflag.

 Best,
 
 
 
 Sebastian
 

-- 
Vikraman


signature.asc
Description: PGP signature


[gentoo-dev] Gentoo package statistics -- GSoC 2011

2011-06-08 Thread Vikraman
Hi everyone,

I'm working on the `Package statistics` project this year. Till now, I
have managed to write a client and server[0] to collect the following
information from hosts:

* Uname, portage profile, timestamp of portage tree
* ARCH, CHOST, CFLAGS, CXXFLAGS, FFLAGS, LDFLAGS, MAKEOPTS
* ACCEPT_KEYWORDS, FEATURES, USE, LANG, SYNC, GENTOO_MIRRORS
* Repository, Keyword, Useflags (plus,minus,unset), Counter, Size,
  and Build time for each installed package

Is there a need to collect files installed by a package ? Doesn't PFL[1]
already provide that ?

Please provide some feedback on what other data should be collected, etc.

Also, I'm starting work on the webUI, and would like some
recommendations for stats pages, such as:

* Packages installed sorted by users
* Top arches, keywords, profiles
* Most enabled, disabled useflags per package/globally

[0]
http://git.overlays.gentoo.org/gitweb/?p=proj/gentoostats.git;a=commit;h=1b9697a090515d2a373e83b1094d6e08ec405c02
[1] http://www.portagefilelist.de/index.php/Main_Page

-- 
Vikraman


signature.asc
Description: PGP signature


Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011

2011-06-08 Thread Paweł Hajdan, Jr.
On 6/8/11 4:36 PM, Vikraman wrote:
 I'm working on the `Package statistics` project this year. Till now, I
 have managed to write a client and server[0] to collect the following
 information from hosts:

Excellent, good luck with the idea! I think that better information
about how Gentoo is actually used will greatly help improving it.

 Is there a need to collect files installed by a package ? Doesn't PFL[1]
 already provide that ?

Well, PFL is not an official Gentoo project. It might be useful, but I
wouldn't say it's a priority.

 Please provide some feedback on what other data should be collected, etc.

In my opinion it's *not* about collecting as much data as possible. I
think it's most important to get the core functionality working really
well, and convincing as large percentage of users as possible to enable
reporting the statistics (to make the results - hopefully - accurately
represent the user base). Please note that in some cases it may mean
collecting _less_ data, or thinking more about the privacy of the users.

For me, as a developer, even a list of packages sorted by popularity
(aka Debian/Ubuntu popcon) would be very useful.

Ah, and maybe files in /etc/portage: package.keywords and so on. It
could be useful to see what people are masking/unmasking, that may be an
indication of stale stabilizations or brokenness hitting the tree.
Anyway, I'd call it an enhancement.

 Also, I'm starting work on the webUI, and would like some
 recommendations for stats pages, such as:
 
 * Packages installed sorted by users

Cool!

 * Top arches, keywords, profiles

And percentage of ~arch vs arch users?

 * Most enabled, disabled useflags per package/globally

Also great, especially the per-package variant. It'd be also useful to
have per-profile data, to better tune the profile defaults.

 [0]
 http://git.overlays.gentoo.org/gitweb/?p=proj/gentoostats.git;a=commit;h=1b9697a090515d2a373e83b1094d6e08ec405c02

I took a quick look at the code. Some random comments:

- it uses portage Python API a lot. But it's not stable, or at least not
guaranteed to be stable. Have you considered using helpers like portageq
(or eventually enhancing those helpers)?

- make the licensing super-clear (a LICENSE file, possibly some header
in every source file, and so on)

- how about submitting the data over HTTPS and not HTTP to better help
privacy?

- don't leave exception handling as a TODO; it should be a part of your
design, not an afterthought

- instead of or in addition to the setup.txt file, how about just
writing the real setup.py file for distutils?



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011

2011-06-08 Thread Gilles Dartiguelongue
Wasn't there a project like this a couple of years ago which tried to
use a cross-distro tool ?

-- 
Gilles Dartiguelongue e...@gentoo.org
Gentoo




Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011

2011-06-08 Thread Hans de Graaff
On Wed, 2011-06-08 at 17:19 +0200, Paweł Hajdan, Jr. wrote:

 In my opinion it's *not* about collecting as much data as possible. I
 think it's most important to get the core functionality working really
 well, and convincing as large percentage of users as possible to enable
 reporting the statistics (to make the results - hopefully - accurately
 represent the user base). Please note that in some cases it may mean
 collecting _less_ data, or thinking more about the privacy of the users.

+1 on this. Taking the extreme, I'd rather see a properly implemented
architecture that is installed on 50% of Gentoo system just reporting
on the arch, then something that collects a lot more data and is
installed on 50 machines. Once the framework is in place and there is
user uptake then it is easy to slowly extend the statistics collection
and gather more useful data.

 For me, as a developer, even a list of packages sorted by popularity
 (aka Debian/Ubuntu popcon) would be very useful.

That would be useful.

 Ah, and maybe files in /etc/portage: package.keywords and so on. It
 could be useful to see what people are masking/unmasking, that may be an
 indication of stale stabilizations or brokenness hitting the tree.
 Anyway, I'd call it an enhancement.

I'd rather not see this in the initial gsoc project if that means we'll
sacrifice a big rollout.

Kind regards,

Hans


signature.asc
Description: This is a digitally signed message part


Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011

2011-06-08 Thread Николай Антонов
On 08.06.2011 18:36, Vikraman wrote:
 Hi everyone,
 
 I'm working on the `Package statistics` project this year. Till now, I
 have managed to write a client and server[0] to collect the following
 information from hosts:
 
 * Uname, portage profile, timestamp of portage tree
 * ARCH, CHOST, CFLAGS, CXXFLAGS, FFLAGS, LDFLAGS, MAKEOPTS
 * ACCEPT_KEYWORDS, FEATURES, USE, LANG, SYNC, GENTOO_MIRRORS
 * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size,
   and Build   time for each installed package
 

May be collect hardware info  kernel configs too?
For example cpuinfo, lspci and lsusb(?).

I think, that after 1-3 month after installing gentoo, user can(should)
receive newsitem about participating in `Package statistics` project.
This newsitem can contains short instruction how-to install and
configure this tool. And even in other gentoo projects(for example write
short wiki page)

And, where can I found ebuilds to the `Package statistics` project?

Sory for my english... and Good luck!



Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011

2011-06-08 Thread Vikraman
On Wed, Jun 08, 2011 at 05:19:33PM +0200, Paweł Hajdan, Jr. wrote:
 On 6/8/11 4:36 PM, Vikraman wrote:
  I'm working on the `Package statistics` project this year. Till now, I
  have managed to write a client and server[0] to collect the following
  information from hosts:
 
 Excellent, good luck with the idea! I think that better information
 about how Gentoo is actually used will greatly help improving it.
 

Well, that information cannot be collected automatically, can it ?

  Is there a need to collect files installed by a package ? Doesn't PFL[1]
  already provide that ?
 
 Well, PFL is not an official Gentoo project. It might be useful, but I
 wouldn't say it's a priority.
 
  Please provide some feedback on what other data should be collected, etc.
 
 In my opinion it's *not* about collecting as much data as possible. I
 think it's most important to get the core functionality working really
 well, and convincing as large percentage of users as possible to enable
 reporting the statistics (to make the results - hopefully - accurately
 represent the user base). Please note that in some cases it may mean
 collecting _less_ data, or thinking more about the privacy of the users.
 
 For me, as a developer, even a list of packages sorted by popularity
 (aka Debian/Ubuntu popcon) would be very useful.
 
 Ah, and maybe files in /etc/portage: package.keywords and so on. It
 could be useful to see what people are masking/unmasking, that may be an
 indication of stale stabilizations or brokenness hitting the tree.
 Anyway, I'd call it an enhancement.
 
  Also, I'm starting work on the webUI, and would like some
  recommendations for stats pages, such as:
  
  * Packages installed sorted by users
 
 Cool!
 
  * Top arches, keywords, profiles
 
 And percentage of ~arch vs arch users?
 
  * Most enabled, disabled useflags per package/globally
 
 Also great, especially the per-package variant. It'd be also useful to
 have per-profile data, to better tune the profile defaults.
 
  [0]
  http://git.overlays.gentoo.org/gitweb/?p=proj/gentoostats.git;a=commit;h=1b9697a090515d2a373e83b1094d6e08ec405c02
 
 I took a quick look at the code. Some random comments:
 
 - it uses portage Python API a lot. But it's not stable, or at least not
 guaranteed to be stable. Have you considered using helpers like portageq
 (or eventually enhancing those helpers)?
 
 - make the licensing super-clear (a LICENSE file, possibly some header
 in every source file, and so on)
 
 - how about submitting the data over HTTPS and not HTTP to better help
 privacy?

Fair points, thanks!

 
 - don't leave exception handling as a TODO; it should be a part of your
 design, not an afterthought
 
 - instead of or in addition to the setup.txt file, how about just
 writing the real setup.py file for distutils?
 

Yes, these are part of my sub-goals for next week.

-- 
Vikraman


signature.asc
Description: PGP signature


Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011

2011-06-08 Thread Vikraman
On Wed, Jun 08, 2011 at 09:35:26PM +0400, Николай Антонов wrote:
 On 08.06.2011 18:36, Vikraman wrote:
  Hi everyone,
  
  I'm working on the `Package statistics` project this year. Till now, I
  have managed to write a client and server[0] to collect the following
  information from hosts:
  
  * Uname, portage profile, timestamp of portage tree
  * ARCH, CHOST, CFLAGS, CXXFLAGS, FFLAGS, LDFLAGS, MAKEOPTS
  * ACCEPT_KEYWORDS, FEATURES, USE, LANG, SYNC, GENTOO_MIRRORS
  * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size,
and Build time for each installed package
  
 
 May be collect hardware info  kernel configs too?
 For example cpuinfo, lspci and lsusb(?).

That's not part of package statistics. There's the smolt project for
hardware statistics.

 
 I think, that after 1-3 month after installing gentoo, user can(should)
 receive newsitem about participating in `Package statistics` project.
 This newsitem can contains short instruction how-to install and
 configure this tool. And even in other gentoo projects(for example write
 short wiki page)
 
 And, where can I found ebuilds to the `Package statistics` project?

The server hasn't been deployed yet, and ebuilds will be available soon!
 
 Sory for my english... and Good luck!
 

-- 
Vikraman


signature.asc
Description: PGP signature


Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011

2011-06-08 Thread Donnie Berkholz
On 17:19 Wed 08 Jun , Paweł Hajdan, Jr. wrote:
 On 6/8/11 4:36 PM, Vikraman wrote:
  I'm working on the `Package statistics` project this year. Till now, I
  have managed to write a client and server[0] to collect the following
  information from hosts:
 
 Excellent, good luck with the idea! I think that better information
 about how Gentoo is actually used will greatly help improving it.
 
  Is there a need to collect files installed by a package ? Doesn't PFL[1]
  already provide that ?
 
 Well, PFL is not an official Gentoo project. It might be useful, but I
 wouldn't say it's a priority.

I would love to see it happen, but it's more important to roll out a 
minimal working solution now and add on later.

By combining installed files with USE flag settings, this project could 
actually attempt to factor out which USE flags result in which files in 
an automatic fashion. That would address one of the biggest objections 
many people have had to such a package-to-file search engine.

It would also be pretty useful for some other GSoC projects, like the 
ebuild generator and the auto dependency scanner.

-- 
Thanks,
Donnie

Donnie Berkholz
Sr. Developer, Gentoo Linux
Blog: http://dberkholz.com


pgp99ZuhwbiGQ.pgp
Description: PGP signature


Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011

2011-06-08 Thread Hans de Graaff
On Wed, 2011-06-08 at 23:31 +0530, Vikraman wrote:

  Excellent, good luck with the idea! I think that better information
  about how Gentoo is actually used will greatly help improving it.
  
 
 Well, that information cannot be collected automatically, can it ?

You could pop up a window at random times and ask the user. So it can be
done. Whether it's a good idea …

Hans


signature.asc
Description: This is a digitally signed message part


Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011

2011-06-08 Thread Francisco Blas Izquierdo Riera (klondike)
El 08/06/11 20:07, Vikraman escribió:
 On Wed, Jun 08, 2011 at 09:35:26PM +0400, Николай Антонов wrote:
 On 08.06.2011 18:36, Vikraman wrote:
 Hi everyone,

 I'm working on the `Package statistics` project this year. Till now, I
 have managed to write a client and server[0] to collect the following
 information from hosts:

 * Uname, portage profile, timestamp of portage tree
 * ARCH, CHOST, CFLAGS, CXXFLAGS, FFLAGS, LDFLAGS, MAKEOPTS
 * ACCEPT_KEYWORDS, FEATURES, USE, LANG, SYNC, GENTOO_MIRRORS
 * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size,
   and Build time for each installed package

 May be collect hardware info  kernel configs too?
 For example cpuinfo, lspci and lsusb(?).
 That's not part of package statistics. There's the smolt project for
 hardware statistics.
Well there is another reason about why you don't want' to log that:
Hardened users. Not having access to the kernel .config helps in making
the system more resilient to some attacks, as a result many hardened
users are very stubborn in not having the .config files published.



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011

2011-06-08 Thread ross smith
 May be collect hardware info  kernel configs too?
 For example cpuinfo, lspci and lsusb(?).
 That's not part of package statistics. There's the smolt project for
 hardware statistics.
 Well there is another reason about why you don't want' to log that:
 Hardened users. Not having access to the kernel .config helps in making
 the system more resilient to some attacks, as a result many hardened
 users are very stubborn in not having the .config files published.

I would really like to see a nice way to set what information I want
sent.   Perhaps a config file in /etc ?   Also, an option to see what
is being sent would be great. :)

I look forward to start contributing my machine's info.

-Ross