Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread hadley wickham
Hi Ian, I've spoken with Stefan Theussl (cran maintainer) about this, and he's concerned about the privacy implications of making the apache access logs public. A compromise that he mentioned was having a script run on the cran mirror that processed the log files and output summary statistics.

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Gabor Grothendieck
Knowing what percentage of different OSes are being used is of interest to package developers and would be obscured by the proposal to massage the data. I prefer to see the raw figure as is. Also the number of IPs are important and should not be removed in my opinion since (1) it is a measure of

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Gabor Grothendieck
On Mon, Nov 23, 2009 at 9:48 AM, hadley wickham h.wick...@gmail.com wrote: Knowing what percentage of different OSes are being used is of interest to package developers and would be obscured by the proposal to massage the data.  I prefer to see the raw figure as is. I agree.  I was arguing

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread hadley wickham
Knowing what percentage of different OSes are being used is of interest to package developers and would be obscured by the proposal to massage the data.  I prefer to see the raw figure as is. I agree. I was arguing that sorting by that value wasn't very useful. Also the number of IPs are

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread hadley wickham
A few comments on your current site:  * Are you just including packages downloaded interactively from within R?  * I don't think the continent from which the package was download is of much interest.  There's definitely no need to include it on the main page.  * I'd be far more interested

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Jeff Ryan
While I think download statistics are potentially interesting for developers, done incorrectly it can very likely damage the community. A basic data reporting problem, with all of the caveats attached. This information has also been readily available from the main CRAN mirror for years:

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Gabor Grothendieck
On Mon, Nov 23, 2009 at 11:11 AM, Friedrich Leisch friedrich.lei...@stat.uni-muenchen.de wrote: On , Anonymous () wrote:   Knowing what percentage of different OSes are being used is of   interest to package developers and would be obscured by the proposal   to massage the data.  I prefer to

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Gabor Grothendieck
On Mon, Nov 23, 2009 at 12:09 PM, hadley wickham h.wick...@gmail.com wrote: As Hadley already pointed out we cannot make CRAN logs publicly available for privacy reasons. That would be a violation of national laws. I think that's unlikely.  There is no info given out identifying users.  

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread hadley wickham
As Hadley already pointed out we cannot make CRAN logs publicly available for privacy reasons. That would be a violation of national laws. I think that's unlikely.  There is no info given out identifying users.  There are lots of web stats on the net. Fritz and Stefan are concerned about

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Gabor Grothendieck
On Mon, Nov 23, 2009 at 12:15 PM, Friedrich Leisch friedrich.lei...@stat.uni-muenchen.de wrote: IP address plus time will always allow sysadmins to recover identities. For static adresses or in combination with mail headers etc it is also not exactly rocket science for others. I had not

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Gabor Grothendieck
On Mon, Nov 23, 2009 at 12:37 PM, Friedrich Leisch friedrich.lei...@stat.uni-muenchen.de wrote:   On Mon, Nov 23, 2009 at 12:15 PM, Friedrich Leisch   friedrich.lei...@stat.uni-muenchen.de wrote:   IP address plus time will always allow sysadmins to recover   identities. For static adresses

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Fellows, Ian
download statistics (Was: R Usage Statistics) Hi Ian, I've spoken with Stefan Theussl (cran maintainer) about this, and he's concerned about the privacy implications of making the apache access logs public. A compromise that he mentioned was having a script run on the cran mirror that processed

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread Gabor Grothendieck
On Mon, Nov 23, 2009 at 3:51 PM, Fellows, Ian ifell...@ucsd.edu wrote: 6. Regarding package dependancies, I was thinking about also counting the number of top level downloads, as approximated by the number of downloads where a reverse dependancy was not downloaded in the next 5 min by the

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-23 Thread spencerg
Beyond what Gabor said, I might download a package that uses zoo, then use zoo directly in other contexts without ever downloading it directly. Total downloads would capture that; top level downloads would not. The flip side is that a package that requires zoo may only use it for features

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-22 Thread Fellows, Ian
Hi All, It seems that the question of how may people use (or download) R, and it's packages is one that comes up on a fairly regular basis in a variety of forums (There was also recent thread on the subject on Stack Overflow). A couple of students at UCLA (including myself), wanted to address

Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

2009-11-22 Thread Detlef Steuer
Hi! Nice work! But keep in mind, that for example the opensuse packages are no longer kept up to date on CRAN, but in openSUSE's Build Service. So the stats are biased towards windows and mac. It seems you only count binary downloads of contributed packages? Introduces some nice bias, too.