Re: [R] popular R packages

2009-03-11 Thread Jim Lemon

Christos Hatzis wrote:

Bioconductor already provides download stats for all packages...

http://bioconductor.org/packages/stats/bioc/affy.html
  

Maybe if we asked the Bioconductor people _really_ nicely

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-10 Thread Jim Lemon

Gabor Grothendieck wrote:

R-Forge already has this but I don't think its used much.  R-Forge
does allow authors to opt out which seems sensible lest it deter
potential authors from submitting packages.

I think objective quality metrics are better than ratings, e.g. does package
have a vignette, has package had a release within the last year,
does package have free software license, etc.  That would have
the advantage that authors might react to increase their package's
quality assessment resulting in an overall improvement in quality on CRAN
that would result in more of a pro-active cycle whereas ratings are reactive
and don't really encourage improvement.
  
I beg to offer an alternative assessment of quality. Do users download 
the package and find it useful? If so, they are likely to download it 
again when it is updated. Much as I appreciate the convenience of 
vignettes, regular updates and the absolute latest GPL license, a 
perfectly dud package can have all of these things. If a package is 
downloaded upon first release and not much thereafter, the maintainer 
might be motivated to attend to its shortcomings of utility rather than 
incrementing the version number every month or so. Downloads, as many 
have pointed out, are not a direct assessment of quality, but if I saw a 
package that just kept getting downloaded, version after version, I 
would be much more likely to check it out myself and perhaps even write 
a review for Hadley's neat site. Which I will try to do tonight.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-10 Thread Gabor Grothendieck
On Tue, Mar 10, 2009 at 6:14 AM, Jim Lemon j...@bitwrit.com.au wrote:
 Gabor Grothendieck wrote:

 R-Forge already has this but I don't think its used much.  R-Forge
 does allow authors to opt out which seems sensible lest it deter
 potential authors from submitting packages.

 I think objective quality metrics are better than ratings, e.g. does
 package
 have a vignette, has package had a release within the last year,
 does package have free software license, etc.  That would have
 the advantage that authors might react to increase their package's
 quality assessment resulting in an overall improvement in quality on CRAN
 that would result in more of a pro-active cycle whereas ratings are
 reactive
 and don't really encourage improvement.


 I beg to offer an alternative assessment of quality. Do users download the
 package and find it useful? If so, they are likely to download it again when
 it is updated.

I was referring to motivating authors, not users, so that CRAN improves.

 Much as I appreciate the convenience of vignettes, regular
 updates and the absolute latest GPL license, a perfectly dud package can
 have all of these things. If a package is downloaded upon first release and

These are nothing but the usual  FUD against quality improvement, i.e. the
quality metrics are not measuring what you want but the fact is that
quality metrics can work and have had huge successes.  Also I think
objective measures would be more accepted by authors than ratings.
No one is going to be put off that their package has no vignette when
obviously it doesn't and the authors are free to add one and instantly
improve their package's rating.

 not much thereafter, the maintainer might be motivated to attend to its
 shortcomings of utility rather than incrementing the version number every
 month or so. Downloads, as many have pointed out, are not a direct
 assessment of quality, but if I saw a package that just kept getting
 downloaded, version after version, I would be much more likely to check it
 out myself and perhaps even write a review for Hadley's neat site. Which I
 will try to do tonight.

I was arguing for objective metrics rather than ratings. Downloading is not
a rating but is objective although there are measurement problems as has
been pointed out.  Also, the worst feature is that it does not react to changes
in quality very quickly making it anti-motivating.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-10 Thread Frank E Harrell Jr

Gabor Grothendieck wrote:

On Tue, Mar 10, 2009 at 6:14 AM, Jim Lemon j...@bitwrit.com.au wrote:

Gabor Grothendieck wrote:

R-Forge already has this but I don't think its used much.  R-Forge
does allow authors to opt out which seems sensible lest it deter
potential authors from submitting packages.

I think objective quality metrics are better than ratings, e.g. does
package
have a vignette, has package had a release within the last year,
does package have free software license, etc.  That would have
the advantage that authors might react to increase their package's
quality assessment resulting in an overall improvement in quality on CRAN
that would result in more of a pro-active cycle whereas ratings are
reactive
and don't really encourage improvement.


I beg to offer an alternative assessment of quality. Do users download the
package and find it useful? If so, they are likely to download it again when
it is updated.


I was referring to motivating authors, not users, so that CRAN improves.


Much as I appreciate the convenience of vignettes, regular
updates and the absolute latest GPL license, a perfectly dud package can
have all of these things. If a package is downloaded upon first release and


These are nothing but the usual  FUD against quality improvement, i.e. the
quality metrics are not measuring what you want but the fact is that
quality metrics can work and have had huge successes.  Also I think
objective measures would be more accepted by authors than ratings.
No one is going to be put off that their package has no vignette when
obviously it doesn't and the authors are free to add one and instantly
improve their package's rating.


not much thereafter, the maintainer might be motivated to attend to its
shortcomings of utility rather than incrementing the version number every
month or so. Downloads, as many have pointed out, are not a direct
assessment of quality, but if I saw a package that just kept getting
downloaded, version after version, I would be much more likely to check it
out myself and perhaps even write a review for Hadley's neat site. Which I
will try to do tonight.


I was arguing for objective metrics rather than ratings. Downloading is not
a rating but is objective although there are measurement problems as has
been pointed out.  Also, the worst feature is that it does not react to changes
in quality very quickly making it anti-motivating.


Gabor I think your approach will have more payoff in the long run.  I 
would suggest one other metric: the number of lines of code in the 
'examples' section of all the package's help files.


Frank
--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-10 Thread Max Kuhn
If is easy to get the download numbers, we should do it and deal with
the interpretation issues. I'd like to know the numbers so I can
understand which (of my) packages have the most usage.

One other compication about # downloads: I suspect that a package
being on teh depends/suggests/imports list of another package might be
a big driver with respect to how many times that it was downloaded.

If I remember correctly, about 5 years ago Bioconductor asked for
volunteers to review packages to get detailed, specific feedback by
people who use the package (and should be fairly R proficient). I
think that this is pretty important and something like Crantastic is a
good interface. I personally got a lot out of the comments the a JSS
reviewer had for a package.

-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-10 Thread Dylan Beaudette
On Tuesday 10 March 2009, Frank E Harrell Jr wrote:
 Gabor Grothendieck wrote:
  On Tue, Mar 10, 2009 at 6:14 AM, Jim Lemon j...@bitwrit.com.au wrote:
  Gabor Grothendieck wrote:
  R-Forge already has this but I don't think its used much.  R-Forge
  does allow authors to opt out which seems sensible lest it deter
  potential authors from submitting packages.
 
  I think objective quality metrics are better than ratings, e.g. does
  package
  have a vignette, has package had a release within the last year,
  does package have free software license, etc.  That would have
  the advantage that authors might react to increase their package's
  quality assessment resulting in an overall improvement in quality on
  CRAN that would result in more of a pro-active cycle whereas ratings
  are reactive
  and don't really encourage improvement.
 
  I beg to offer an alternative assessment of quality. Do users download
  the package and find it useful? If so, they are likely to download it
  again when it is updated.
 
  I was referring to motivating authors, not users, so that CRAN improves.
 
  Much as I appreciate the convenience of vignettes, regular
  updates and the absolute latest GPL license, a perfectly dud package can
  have all of these things. If a package is downloaded upon first release
  and
 
  These are nothing but the usual  FUD against quality improvement, i.e.
  the quality metrics are not measuring what you want but the fact is that
  quality metrics can work and have had huge successes.  Also I think
  objective measures would be more accepted by authors than ratings. No one
  is going to be put off that their package has no vignette when obviously
  it doesn't and the authors are free to add one and instantly improve
  their package's rating.
 
  not much thereafter, the maintainer might be motivated to attend to its
  shortcomings of utility rather than incrementing the version number
  every month or so. Downloads, as many have pointed out, are not a direct
  assessment of quality, but if I saw a package that just kept getting
  downloaded, version after version, I would be much more likely to check
  it out myself and perhaps even write a review for Hadley's neat site.
  Which I will try to do tonight.
 
  I was arguing for objective metrics rather than ratings. Downloading is
  not a rating but is objective although there are measurement problems as
  has been pointed out.  Also, the worst feature is that it does not react
  to changes in quality very quickly making it anti-motivating.

 Gabor I think your approach will have more payoff in the long run.  I
 would suggest one other metric: the number of lines of code in the
 'examples' section of all the package's help files.

 Frank

Absolutely. From the perspective of a user, not an expert, packages with a 
good vignette and lots of examples are by far my favorite and most used.

Dylan

-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-10 Thread Ajay ohri
Pricing each download at 99 cents ( the same as a song from I Tunes) can
measure users more accurately.
Thats my 2 cents anyways.

On Tue, Mar 10, 2009 at 9:54 PM, Max Kuhn mxk...@gmail.com wrote:

 If is easy to get the download numbers, we should do it and deal with
 the interpretation issues. I'd like to know the numbers so I can
 understand which (of my) packages have the most usage.

 One other compication about # downloads: I suspect that a package
 being on teh depends/suggests/imports list of another package might be
 a big driver with respect to how many times that it was downloaded.

 If I remember correctly, about 5 years ago Bioconductor asked for
 volunteers to review packages to get detailed, specific feedback by
 people who use the package (and should be fairly R proficient). I
 think that this is pretty important and something like Crantastic is a
 good interface. I personally got a lot out of the comments the a JSS
 reviewer had for a package.

 --

 Max

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-10 Thread Christos Hatzis
Bioconductor already provides download stats for all packages...

http://bioconductor.org/packages/stats/bioc/affy.html

-Christos 

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Max Kuhn
 Sent: Tuesday, March 10, 2009 12:25 PM
 To: r-help@r-project.org
 Subject: Re: [R] popular R packages
 
 If is easy to get the download numbers, we should do it and 
 deal with the interpretation issues. I'd like to know the 
 numbers so I can understand which (of my) packages have the 
 most usage.
 
 One other compication about # downloads: I suspect that a 
 package being on teh depends/suggests/imports list of another 
 package might be a big driver with respect to how many times 
 that it was downloaded.
 
 If I remember correctly, about 5 years ago Bioconductor asked 
 for volunteers to review packages to get detailed, specific 
 feedback by people who use the package (and should be fairly 
 R proficient). I think that this is pretty important and 
 something like Crantastic is a good interface. I personally 
 got a lot out of the comments the a JSS reviewer had for a package.
 
 -- 
 
 Max
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-10 Thread Tom Backer Johnsen

s...@xlsolutions-corp.com wrote:

 Hi Spencer,

 XLSolutions is currently analyzing r-help archived questions to rank
packages for the upcoming R-PLUS 3.3 Professional version and we will be
happy to share the outcome with interested parties. Please email
d...@xlsolutions-corp.com


I would expect that the correlation between popularity on the one hand 
and usefulness as well as quality to be relatively low.  If it was 
possible to rate the downloaders in respect to seriousness and whether 
they actually use the package for some sensible purpose I would be more 
interested.  Consider a highly specialized and good quality package used 
by a relatively small group of distinguished reseachers.  Would that 
have a high rank?  No.  But important?  Possibly yes.


Tom



 Regards -
 Sue Turner
 Senior Account Manager
 XLSolutions Corporation
 North American Division
 1700 7th Ave
 Suite 2100
 Seattle, WA 98101
 Phone: 206-686-1578
 Email: s...@xlsolutions-corp.com
 web: www.xlsolutions-corp.com



--- On Sat, 3/7/09, Spencer Graves spencer.gra...@prodsyse.com wrote:


From: Spencer Graves spencer.gra...@prodsyse.com
Subject: Re: [R] popular R packages
To: Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no
Cc: r-help@r-project.org, Jeroen Ooms j.c.l.o...@uu.nl, Thomas Adams 
thomas.ad...@noaa.gov
Date: Saturday, March 7, 2009, 5:22 PM
I just did RSiteSearch(library(xxx)) with xxx =
the names of 6 packages familiar to me, with the following
numbers of hits: 


hits package

169 lme4
165 nlme
  6 fda
  4 maps
  2 FinTS
  2 DierckxSpline

 Software could be written to (1) extract the names of

current packages from CRAN then (2) perform queries similar
to this on all such packages and summarize the results.  I
don't have the time now to write code for this, but
I've written similar code before for step (1);  it can
be found in scripts/TsayFiles.R in the
FinTS package on CRAN.  For step (2), Sundar
Dorai-Raj wrote code that is is included in the preliminary
RSiteSearch package available from R-Forge via
install.'packages(RSiteSearch,repos=http://r-forge.r-project.org;)'.

 Code to do this could probably be written (a) in a
matter of seconds by many of those in the R Core team or (b)
in a matter of hours by virtually any reader of this list
using the examples I just cited.  And it could provide
numbers without a need to convince others to keep download
statistics and make them available later. 
 Hope this helps.  Spencer Graves
Wacek Kusnierczyk wrote:

i have kept r installed on more than ten computers

during the past few

years, some of them running win + more than one linux

distro, all of

them having r, most often installed from a separate

download.

i know of many cases where students download r for the

purpose of a

course in statistics -- often an introductory course

for students who

otherwise have little to do with stats. some of them

do it more than

once during the semester, and many of them never use r

again.

taking into account that basic statistics courses are

taught to most

university students and that r is surely the most

popular free

statistical computing environment, download-based

usage estimates may be

a bit optimistic, unless 'usage' is taken to

include 'learn-pass-forget'.

vQ



Tal Galili wrote:
  

I agree with Thomas, over the years I have

installed R on at least 5

computers.

BTW: does any one knows how the website statistics

of r-project are

being analyzed?
Since I can't see any google

analytics or other tracking code in the main

website, I am guessing someone might be running

some log-file analyzer - but

I'd rather hear that then assume.






On Sun, Mar 8, 2009 at 12:45 AM, Thomas Adams

thomas.ad...@noaa.gov wrote:
  

I don't think At least one of the

participants in the 2004 thread

suggested that it would be a good

thing to track the numbers of downloads

by package. is reasonable because I

download R packages for 2 home

computers (laptop  desktop) and 2 at work

(1 Linux  1 Mac). There must be

many such cases…

Tom

  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained,

reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
++
| Tom Backer Johnsen

Re: [R] popular R packages

2009-03-09 Thread David Duffy
Given we are talking about statistical software, one bibliometric measure 
of relative package popularity is scientific citations.  Web of Science is 
not too useful where the citation has been to a website or computer 
package, but Google Scholar for lme4: Linear mixed-effects models using 
S4 classes gives us 108 journal citations; mgcv: GAMs and generalized 
ridge regression for R 80 etc



Cheers, David Duffy.
--
| David Duffy (MBBS PhD) ,-_|\
| email: dav...@qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  / *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-09 Thread Ted Harding
On 10-Mar-09 01:07:54, David Duffy wrote:
 Given we are talking about statistical software, one bibliometric
 measure of relative package popularity is scientific citations.
 Web of Science is not too useful where the citation has been to a
 website or computer package, but Google Scholar for lme4: Linear
 mixed-effects models using S4 classes gives us 108 journal
 citations; mgcv: GAMs and generalized ridge regression for R 80 etc
 
 Cheers, David Duffy.

A good point. But such numbers must be considered in the context
of the prevalence of the kind of study for which the respective
methods would be used.

A great number of epidemiological studies would be suitable for
application of glm(). Fewer would involve GAMs. Popularity of
a package by citation frequency would (other things being equal)
be proportional to the frequency of the kind of study for which
it could be used.

So one  should either evaluate the proportion of studies in which
an R package *could* be used, in which it *was* used; or compare
the number of citations of an R package with the number of citations
of an equiavlent package/module/proc in other software.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 10-Mar-09   Time: 02:03:22
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-09 Thread Matthew Keller
Hi all,

Put me in the camp that says more information is better than less
information - even if imperfect. Interpretation can be left to those
using the data.

Also, popular can mean many things. An alternative to number of
times a package is downloaded would be a ratings system, where R users
can supply starred ratings or something (much as they do on netflix or
amazon). Combining this with # of downloads would give users some idea
about the users' perception, impact, and possibly the quality of a
package. Obviously it would be imperfect, but it seems to me this
would be better than the even more scant and more imperfect
information currently available. I wouldn't advocate that such
information be used in the same way a citation index is, but it might
prove helpful to users who are confused (even paralyzed) by the ever
burgeoning number of R packages.

There was a discussion on this a while back in which Bill Venables
said: To me a much more urgent initiative [than rating responders on
R listserves] is some kind of user online review system for packages,
even something as simple as that used by Amazon.com has for customer
review of books. I think the need for this is rather urgent, in fact.
Most packages are very good, but I regret to say some are pretty
inefficient and others downright dangerous.  You don't want to
discourage people from submitting their work to CRAN, but at the same
time you do want some mechanism that allows users to relate their
experience with it, good or bad.

Find the whole thread here:
https://stat.ethz.ch/pipermail/r-help/2007-December/147323.html.

Matt


-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-09 Thread hadley wickham
 There was a discussion on this a while back in which Bill Venables
 said: To me a much more urgent initiative [than rating responders on
 R listserves] is some kind of user online review system for packages,
 even something as simple as that used by Amazon.com has for customer
 review of books. I think the need for this is rather urgent, in fact.
 Most packages are very good, but I regret to say some are pretty
 inefficient and others downright dangerous.  You don't want to
 discourage people from submitting their work to CRAN, but at the same
 time you do want some mechanism that allows users to relate their
 experience with it, good or bad.

And you can see my initial attempts at this at http://crantastic.org.
Unfortunately I haven't had much time to work on it, and haven't had
much luck recruiting helpers.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-09 Thread Gabor Grothendieck
R-Forge already has this but I don't think its used much.  R-Forge
does allow authors to opt out which seems sensible lest it deter
potential authors from submitting packages.

I think objective quality metrics are better than ratings, e.g. does package
have a vignette, has package had a release within the last year,
does package have free software license, etc.  That would have
the advantage that authors might react to increase their package's
quality assessment resulting in an overall improvement in quality on CRAN
that would result in more of a pro-active cycle whereas ratings are reactive
and don't really encourage improvement.

On Mon, Mar 9, 2009 at 10:20 PM, Matthew Keller mckellerc...@gmail.com wrote:
 Hi all,

 Put me in the camp that says more information is better than less
 information - even if imperfect. Interpretation can be left to those
 using the data.

 Also, popular can mean many things. An alternative to number of
 times a package is downloaded would be a ratings system, where R users
 can supply starred ratings or something (much as they do on netflix or
 amazon). Combining this with # of downloads would give users some idea
 about the users' perception, impact, and possibly the quality of a
 package. Obviously it would be imperfect, but it seems to me this
 would be better than the even more scant and more imperfect
 information currently available. I wouldn't advocate that such
 information be used in the same way a citation index is, but it might
 prove helpful to users who are confused (even paralyzed) by the ever
 burgeoning number of R packages.

 There was a discussion on this a while back in which Bill Venables
 said: To me a much more urgent initiative [than rating responders on
 R listserves] is some kind of user online review system for packages,
 even something as simple as that used by Amazon.com has for customer
 review of books. I think the need for this is rather urgent, in fact.
 Most packages are very good, but I regret to say some are pretty
 inefficient and others downright dangerous.  You don't want to
 discourage people from submitting their work to CRAN, but at the same
 time you do want some mechanism that allows users to relate their
 experience with it, good or bad.

 Find the whole thread here:
 https://stat.ethz.ch/pipermail/r-help/2007-December/147323.html.

 Matt


 --
 Matthew C Keller
 Asst. Professor of Psychology
 University of Colorado at Boulder
 www.matthewckeller.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Emmanuel Charpentier
On Sat, 07 Mar 2009 18:04:24 -0500, David Winsemius wrote :

[ Snip ... ]
 Nonetheless, I do think the relative numbers of package downloads might
 be interpretable, or at the very least, the basis for discussions over
 beer.

*Anything* might be the basis for discussions over beer (obvious 
corollary to Thermogoddamics' second principle).

More seriously : I don't think relative numbers of package downloads can 
be interpreted in any reasonable way, because reasons for package 
download have a very wide range from curiosity (what's this ?), fun 
(think fortunes...), to vital need tthink lme4 if/when a consensus on 
denominator DFs can be reached :-)...). What can you infer in good faith 
from such a mess ?

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread hadley wickham
 More seriously : I don't think relative numbers of package downloads can
 be interpreted in any reasonable way, because reasons for package
 download have a very wide range from curiosity (what's this ?), fun
 (think fortunes...), to vital need tthink lme4 if/when a consensus on
 denominator DFs can be reached :-)...). What can you infer in good faith
 from such a mess ?

So when we have messy data with measurement error, we should just give
up?  Doesn't sound very statistical! ;)

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Gabor Grothendieck
On Sun, Mar 8, 2009 at 10:49 AM, hadley wickham h.wick...@gmail.com wrote:
 More seriously : I don't think relative numbers of package downloads can
 be interpreted in any reasonable way, because reasons for package
 download have a very wide range from curiosity (what's this ?), fun
 (think fortunes...), to vital need tthink lme4 if/when a consensus on
 denominator DFs can be reached :-)...). What can you infer in good faith
 from such a mess ?

 So when we have messy data with measurement error, we should just give
 up?  Doesn't sound very statistical! ;)


Also I would think that the rankings would be meaningful since
the factors that cause the absolute numbers to be off would affect
all packages equally.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Duncan Murdoch

On 08/03/2009 10:49 AM, hadley wickham wrote:

More seriously : I don't think relative numbers of package downloads can
be interpreted in any reasonable way, because reasons for package
download have a very wide range from curiosity (what's this ?), fun
(think fortunes...), to vital need tthink lme4 if/when a consensus on
denominator DFs can be reached :-)...). What can you infer in good faith
from such a mess ?


So when we have messy data with measurement error, we should just give
up?  Doesn't sound very statistical! ;)


I think the situation is worse than messy.  If a client comes in with 
data that doesn't address the question they're interested in, I think 
they are better served to be told that, than to be given an answer that 
is not actually valid.  They should also be told how to design a study 
that actually does address their question.


You (and others) have mentioned Google Analytics as a possible way to 
address the quality of data; that's helpful.  But analyzing bad data 
will just give bad conclusions.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Barry Rowlingson
 I think the situation is worse than messy.  If a client comes in with data
 that doesn't address the question they're interested in, I think they are
 better served to be told that, than to be given an answer that is not
 actually valid.  They should also be told how to design a study that
 actually does address their question.

 You (and others) have mentioned Google Analytics as a possible way to
 address the quality of data; that's helpful.  But analyzing bad data will
 just give bad conclusions.

 As long as we say 'package Foo is the most downloaded package on
CRAN', and not 'package Foo is the most used package for R', we can
leave it to the user to decide if the latter conclusion follows from
the former. In the absence of actual usage data I would think it a
good approximation. Not that I would risk my life on it.

 Pop music charts are now based on download counts, but I wouldn't
believe they represent the songs that are listened to the most times.
Nor would I go so far as to believe they represent the quality of the
songs...

 Should R have a 'Would you like to tell CRAN every time you do
library(foo) so we can do usage counts (no personal data is
transmitted blah blah) ?'? I don't think so

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Ted Harding
On 08-Mar-09 15:14:03, Duncan Murdoch wrote:
 On 08/03/2009 10:49 AM, hadley wickham wrote:
 More seriously : I don't think relative numbers of package downloads
 can be interpreted in any reasonable way, because reasons for
 package download have a very wide range from curiosity (what's
 this ?), fun (think fortunes...), to vital need tthink lme4
 if/when a consensus on denominator DFs can be reached :-)...).
 What can you infer in good faith from such a mess ?
 
 So when we have messy data with measurement error, we should just
 give up?  Doesn't sound very statistical! ;)
 
 I think the situation is worse than messy.  If a client comes in with 
 data that doesn't address the question they're interested in, I think 
 they are better served to be told that, than to be given an answer that
 is not actually valid.  They should also be told how to design a study 
 that actually does address their question.
 
 You (and others) have mentioned Google Analytics as a possible way to 
 address the quality of data; that's helpful.  But analyzing bad data 
 will just give bad conclusions.
 Duncan Murdoch

The population of R users (which we would need to sample in order
to obtain good data) is probably more elusive than a fish population
in the ocean -- only partially visible at best, and with an unknown
proportion invisible.

At least in Fisheries research, there are long established capture
techniques (from trawling to netting to electro-fishing to ... )
which can be deployed, for research purposes, in such a way as to
potentially reach all members of a target population, with at least
a moderately good approximation to random sampling. What have we
for R?

Come to think of it, electro-fishing, ...

Suppose R were released with 2 types of cookie embedded in base R.
Each type is randomly configured, when R is first run, to be Active
or Inactive (probability of activation to be decided at the design
stage ... ). Type 1, if active, on a certain date generates an
event which brings it to the notice of R-Core (e.g. by clandestine
email or by inducing a bug report). Type 2 acts similarly on a later
date. If Type 2 acts, it carries with it information as to whether
there was a Type 1 action along with whether, apparently, the Type 1
action succeeded.

We then have, in effect, an analogue of the Mark-Recapture technique
of population estimation (along with the usual questions about
equal catchability and so forth).

However, since this sort of thing (which I am not proposing seriously,
only for the sake of argument) is undoubtedly unethical (and would
do R's reputation no good if it came to light), I tentatively conclude
that the population of R users is likely to remain as elusive as ever.

Best wishes to all,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 08-Mar-09   Time: 16:11:44
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Tal Galili
Hi Ted,

Coming to think about your direction - another idea came to mind:
The next time a major release is made (there is one scheduled quite soon
actually), the core team could add a survey on the downloading page of the
R base package asking for just one question
please click here if this is the first computer you are downloading this
package for.
This, combined with the fact that when serving a user we can obtain his IP
address (which gives geo information) could give a pretty nice rough
estimate of how many major release downloaders the R community has.



Tal








On Sun, Mar 8, 2009 at 6:11 PM, Ted Harding ted.hard...@manchester.ac.ukwrote:

 On 08-Mar-09 15:14:03, Duncan Murdoch wrote:
  On 08/03/2009 10:49 AM, hadley wickham wrote:
  More seriously : I don't think relative numbers of package downloads
  can be interpreted in any reasonable way, because reasons for
  package download have a very wide range from curiosity (what's
  this ?), fun (think fortunes...), to vital need tthink lme4
  if/when a consensus on denominator DFs can be reached :-)...).
  What can you infer in good faith from such a mess ?
 
  So when we have messy data with measurement error, we should just
  give up?  Doesn't sound very statistical! ;)
 
  I think the situation is worse than messy.  If a client comes in with
  data that doesn't address the question they're interested in, I think
  they are better served to be told that, than to be given an answer that
  is not actually valid.  They should also be told how to design a study
  that actually does address their question.
 
  You (and others) have mentioned Google Analytics as a possible way to
  address the quality of data; that's helpful.  But analyzing bad data
  will just give bad conclusions.
  Duncan Murdoch

 The population of R users (which we would need to sample in order
 to obtain good data) is probably more elusive than a fish population
 in the ocean -- only partially visible at best, and with an unknown
 proportion invisible.

 At least in Fisheries research, there are long established capture
 techniques (from trawling to netting to electro-fishing to ... )
 which can be deployed, for research purposes, in such a way as to
 potentially reach all members of a target population, with at least
 a moderately good approximation to random sampling. What have we
 for R?

 Come to think of it, electro-fishing, ...

 Suppose R were released with 2 types of cookie embedded in base R.
 Each type is randomly configured, when R is first run, to be Active
 or Inactive (probability of activation to be decided at the design
 stage ... ). Type 1, if active, on a certain date generates an
 event which brings it to the notice of R-Core (e.g. by clandestine
 email or by inducing a bug report). Type 2 acts similarly on a later
 date. If Type 2 acts, it carries with it information as to whether
 there was a Type 1 action along with whether, apparently, the Type 1
 action succeeded.

 We then have, in effect, an analogue of the Mark-Recapture technique
 of population estimation (along with the usual questions about
 equal catchability and so forth).

 However, since this sort of thing (which I am not proposing seriously,
 only for the sake of argument) is undoubtedly unethical (and would
 do R's reputation no good if it came to light), I tentatively conclude
 that the population of R users is likely to remain as elusive as ever.

 Best wishes to all,
 Ted.

 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 08-Mar-09   Time: 16:11:44
 -- XFMail --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
--


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
www.talgalili.com
www.biostatistics.co.il

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Spencer Graves
 Is this another discussion of what data might be collected and 
analyzed, and what could and could not be said if we only had such data? 

 Has anyone but me produced any actual data?  If so, I missed it.  
Hadly mentioned the 'fortunes' package.  My earlier methodology, 
RSiteSearch('library(fortunes)'), produced 40 hits for 'fortunes', 
compared to 169 for 'lme4' and 2 for 'DierckxSpline'. 

 With anything like this, it would be wise to approach the problem 
from many different perspectives, recognizing that the strengths of one 
approach can help improve our understanding of what other analyses say 
about the question at hand. 

 Happy Sunday. 
 Spencer Graves


(Ted Harding) wrote:

On 08-Mar-09 15:14:03, Duncan Murdoch wrote:
  

On 08/03/2009 10:49 AM, hadley wickham wrote:


More seriously : I don't think relative numbers of package downloads
can be interpreted in any reasonable way, because reasons for
package download have a very wide range from curiosity (what's
this ?), fun (think fortunes...), to vital need tthink lme4
if/when a consensus on denominator DFs can be reached :-)...).
What can you infer in good faith from such a mess ?


So when we have messy data with measurement error, we should just
give up?  Doesn't sound very statistical! ;)
  
I think the situation is worse than messy.  If a client comes in with 
data that doesn't address the question they're interested in, I think 
they are better served to be told that, than to be given an answer that
is not actually valid.  They should also be told how to design a study 
that actually does address their question.


You (and others) have mentioned Google Analytics as a possible way to 
address the quality of data; that's helpful.  But analyzing bad data 
will just give bad conclusions.

Duncan Murdoch



The population of R users (which we would need to sample in order
to obtain good data) is probably more elusive than a fish population
in the ocean -- only partially visible at best, and with an unknown
proportion invisible.

At least in Fisheries research, there are long established capture
techniques (from trawling to netting to electro-fishing to ... )
which can be deployed, for research purposes, in such a way as to
potentially reach all members of a target population, with at least
a moderately good approximation to random sampling. What have we
for R?

Come to think of it, electro-fishing, ...

Suppose R were released with 2 types of cookie embedded in base R.
Each type is randomly configured, when R is first run, to be Active
or Inactive (probability of activation to be decided at the design
stage ... ). Type 1, if active, on a certain date generates an
event which brings it to the notice of R-Core (e.g. by clandestine
email or by inducing a bug report). Type 2 acts similarly on a later
date. If Type 2 acts, it carries with it information as to whether
there was a Type 1 action along with whether, apparently, the Type 1
action succeeded.

We then have, in effect, an analogue of the Mark-Recapture technique
of population estimation (along with the usual questions about
equal catchability and so forth).

However, since this sort of thing (which I am not proposing seriously,
only for the sake of argument) is undoubtedly unethical (and would
do R's reputation no good if it came to light), I tentatively conclude
that the population of R users is likely to remain as elusive as ever.

Best wishes to all,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 08-Mar-09   Time: 16:11:44
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Duncan Murdoch

On 08/03/2009 12:08 PM, Barry Rowlingson wrote:

I think the situation is worse than messy.  If a client comes in with data
that doesn't address the question they're interested in, I think they are
better served to be told that, than to be given an answer that is not
actually valid.  They should also be told how to design a study that
actually does address their question.

You (and others) have mentioned Google Analytics as a possible way to
address the quality of data; that's helpful.  But analyzing bad data will
just give bad conclusions.


 As long as we say 'package Foo is the most downloaded package on
CRAN', and not 'package Foo is the most used package for R', we can
leave it to the user to decide if the latter conclusion follows from
the former.


But we don't even have that data, since CRAN is distributed across lots 
of mirrors.


Duncan Murdoch

 In the absence of actual usage data I would think it a

good approximation. Not that I would risk my life on it.

 Pop music charts are now based on download counts, but I wouldn't
believe they represent the songs that are listened to the most times.
Nor would I go so far as to believe they represent the quality of the
songs...

 Should R have a 'Would you like to tell CRAN every time you do
library(foo) so we can do usage counts (no personal data is
transmitted blah blah) ?'? I don't think so

Barry


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Emmanuel Charpentier
Dear Barry,

As far as I understand, you're telling us that having a bit of data
mining does not harm whatever the data. Your example of pop music charts
might support your point (although my ears disagree ...) but I think it
is bad policy to indulge in white-noise analysis without a well-reasoned
motive to do so. It might give bad ideas to potential statistics
patrons (think a bit about the sorry state of financial markets :-().

More generally, I tend to be extremely wary about over-interpretation of
belly grumbles as the Voice of the Spirit ... which is a very powerful
urge of many statisticians and statistician's clients. Data mining can
be fine for exploratory musings, but a serious study needs a model, i.
e. a set of ideas and a way to reality-stress them.

As far as I can see (but I might be nearsighted), I see no model linking
package download to package use(s). Data may or may not become available
with more or less of an effort, but I can't see the point.

Emmanuel Charpentier

Le dimanche 08 mars 2009 à 16:08 +, Barry Rowlingson a écrit :
  I think the situation is worse than messy.  If a client comes in with data
  that doesn't address the question they're interested in, I think they are
  better served to be told that, than to be given an answer that is not
  actually valid.  They should also be told how to design a study that
  actually does address their question.
 
  You (and others) have mentioned Google Analytics as a possible way to
  address the quality of data; that's helpful.  But analyzing bad data will
  just give bad conclusions.
 
  As long as we say 'package Foo is the most downloaded package on
 CRAN', and not 'package Foo is the most used package for R', we can
 leave it to the user to decide if the latter conclusion follows from
 the former. In the absence of actual usage data I would think it a
 good approximation. Not that I would risk my life on it.
 
  Pop music charts are now based on download counts, but I wouldn't
 believe they represent the songs that are listened to the most times.
 Nor would I go so far as to believe they represent the quality of the
 songs...
 
  Should R have a 'Would you like to tell CRAN every time you do
 library(foo) so we can do usage counts (no personal data is
 transmitted blah blah) ?'? I don't think so
 
 Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Dirk Eddelbuettel

On 8 March 2009 at 13:27, Duncan Murdoch wrote:
| But we don't even have that data, since CRAN is distributed across lots 
| of mirrors.

On 8 March 2009 at 19:01, Emmanuel Charpentier wrote:
| As far as I can see (but I might be nearsighted), I see no model linking
| package download to package use(s). Data may or may not become available

Which is why Debian (and Ubuntu) use the _opt-in package_ popularity-contest
that collects data on packages used and submits that to a host collecting the
data.  This drives the so-called 'popcon' statistics.

Yes, and there are many ways in which one can criticise this data collection
process.   But I fail to see how __not having any data__ leads to more
informed decisions.

Once you have data, you have an option of using or discarding it. But if you
have no data, you have no option.  How is that better?

Dirk

-- 
Three out of two people have difficulties with fractions.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Jeffrey Horner

Dirk Eddelbuettel wrote:

On 8 March 2009 at 13:27, Duncan Murdoch wrote:
| But we don't even have that data, since CRAN is distributed across lots 
| of mirrors.


On 8 March 2009 at 19:01, Emmanuel Charpentier wrote:
| As far as I can see (but I might be nearsighted), I see no model linking
| package download to package use(s). Data may or may not become available

Which is why Debian (and Ubuntu) use the _opt-in package_ popularity-contest
that collects data on packages used and submits that to a host collecting the
data.  This drives the so-called 'popcon' statistics.

Yes, and there are many ways in which one can criticise this data collection
process.   But I fail to see how __not having any data__ leads to more
informed decisions.

Once you have data, you have an option of using or discarding it. But if you
have no data, you have no option.  How is that better?


I've also created a package named PopCon here:

http://biostat.mc.vanderbilt.edu/twiki/pub/Main/JeffreyHorner/PopCon_0.1.tar.gz

I provided it to the list many months ago and got no response on it's 
implementation or use. I encourage anyone to download it and understand 
how it can be used to implement a popularity contest for both packages 
and even functions and such.


Maybe R can sponsor a Popularity Contest day where everyone is 
encouraged to download the package and push some data to r-project.org 
or even crantastic.org that notes what useRs currently have loaded on 
their search path...


Best,


Jeff
--
http://biostat.mc.vanderbilt.edu/JeffreyHorner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Rolf Turner


On 9/03/2009, at 4:14 AM, Duncan Murdoch wrote:


... analyzing bad data will just give bad conclusions.


Fortune?

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Wacek Kusnierczyk
Rolf Turner wrote:

 On 9/03/2009, at 4:14 AM, Duncan Murdoch wrote:

 ... analyzing bad data will just give bad conclusions.

 Fortune?


looking for fortunes?  got one for you:

A key reason that R is a good thing is because it is a language

who/where is left as an (easy) exercise.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Rolf Turner


On 9/03/2009, at 10:23 AM, John Fox wrote:


Dear Rolf,

Tukey put it nicely: The combination of some data and an aching  
desire for
an answer does not ensure that a reasonable answer can be extracted  
from a
given body of data. Inasmuch as there are no current fortunes from  
Tukey, I

nominate this one.


Indeed.  That is one of my favourites.  I second the nomination.

cheers,

Rolf

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Emmanuel Charpentier
Le dimanche 08 mars 2009 à 13:22 -0500, Dirk Eddelbuettel a écrit :
 On 8 March 2009 at 13:27, Duncan Murdoch wrote:
 | But we don't even have that data, since CRAN is distributed across lots 
 | of mirrors.
 
 On 8 March 2009 at 19:01, Emmanuel Charpentier wrote:
 | As far as I can see (but I might be nearsighted), I see no model linking
 | package download to package use(s). Data may or may not become available
 
 Which is why Debian (and Ubuntu) use the _opt-in package_ popularity-contest
 that collects data on packages used and submits that to a host collecting the
 data.  This drives the so-called 'popcon' statistics.
 
 Yes, and there are many ways in which one can criticise this data collection
 process.   But I fail to see how __not having any data__ leads to more
 informed decisions.
 
 Once you have data, you have an option of using or discarding it. But if you
 have no data, you have no option.  How is that better?

I question 1) the usefulness of the effort necessary to get the data ;
and 2) the very concept of data mining, which seems to be the rationale
for this proposed effort.

Furthermore (but this is seriously off-topic), I seriously despise the
very idea of popularity in scientific debates... Everybody does it
is *not* a valid argument. Nor Everyone knows

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Dirk Eddelbuettel

On 8 March 2009 at 23:45, Emmanuel Charpentier wrote:
| Le dimanche 08 mars 2009   13:22 -0500, Dirk Eddelbuettel a  crit :
|  Once you have data, you have an option of using or discarding it. But if you
|  have no data, you have no option.  How is that better?
| 
| I question 1) the usefulness of the effort necessary to get the data ;
| and 2) the very concept of data mining, which seems to be the rationale
| for this proposed effort.
 
Re 1), Popcon is used for a few actual tasks as for example guiding in the
knapsack problem of which of the 20,000+ packages should be placed on the
first dvd, which on the second and so on simply to minimise disk swapping
when installing.  That's useful in my book, and solves a real problem.

Also, and back to R, consider the relevant page for 'r-base' on Debian (and
forgive them the ugly gnuplot chart)

http://qa.debian.org/popcon.php?package=r-base

This clearly shows a couple of things:

 - about 3% of all machines participating have r-base-core [ the main R
   package ] installed

 - 89% of those also install r-recommended (which pulls in VR, lattice, ...)

 - 63% of those have the all-in package r-base installs (which pulls in
   r-recommended and documentation package)

 - r-mathlib is not very well used

 - the debug package r-base-core-dbg is possible underused [ it allows you to
   run gdb by installing this package containing matching debug symbols
   without having to rebuild; these dbg are very useful but eat up lots of
   mirror space, whether they could or should be removed was a recent
   internal question

Likewise, you can look at other CRAN package. Here is 

http://qa.debian.org/popcon.php?package=lme4

which is only about 0.3% of all machines.

| Furthermore (but this is seriously off-topic), I seriously despise the
| very idea of popularity in scientific debates... Everybody does it
| is *not* a valid argument. Nor Everyone knows

TTBOMK nobody suggested this. 

Dirk

-- 
Three out of two people have difficulties with fractions.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Barry Rowlingson
2009/3/8 Emmanuel Charpentier charp...@bacbuc.dyndns.org:

 I question 1) the usefulness of the effort necessary to get the data ;
 and 2) the very concept of data mining, which seems to be the rationale
 for this proposed effort.

 Furthermore (but this is seriously off-topic), I seriously despise the
 very idea of popularity in scientific debates... Everybody does it
 is *not* a valid argument. Nor Everyone knows

 As long as we agree that pacakge downloads != popularity then we have
useful data.

 Usefulness of the data? Let's think...

 Suppose we discover that spatstat is downloaded 100 times more than
splancs is. Both packages compute K-functions of spatial data. Pretend
there's an enhancement to K-function computation that could be
implemented in spatstat and/or splancs. Why bother doing it in
splancs?

 Currently the only usage stats we have are even worse measures such
as number of mentions in R-help or number of bug reports. Or maybe
citation counts, but who would make important decisions based on
those?

 I'd love to go 'Hmmm how many people are using my package?' and get
an exact answer. Given the impossibility of that information, I'd love
to go 'Hmmm how many people downloaded my package?', a good
approximation to which is not beyond the bounds of our technology. Web
pages have had annoying 'this piece of software has been downloaded
443535 times' banners (often enclosed in blink tags) since 1996.Yes
it would require some effort at each CRAN site, but maybe the CRAN
mirror site maintainers might be interested in doing this. If they
don't want to, then fine.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread hadley wickham
 I question 1) the usefulness of the effort necessary to get the data ;
 and 2) the very concept of data mining, which seems to be the rationale
 for this proposed effort.

 Furthermore (but this is seriously off-topic), I seriously despise the
 very idea of popularity in scientific debates... Everybody does it
 is *not* a valid argument. Nor Everyone knows

  As long as we agree that pacakge downloads != popularity then we have
 useful data.

  Usefulness of the data? Let's think...

  Suppose we discover that spatstat is downloaded 100 times more than
 splancs is. Both packages compute K-functions of spatial data. Pretend
 there's an enhancement to K-function computation that could be
 implemented in spatstat and/or splancs. Why bother doing it in
 splancs?

  Currently the only usage stats we have are even worse measures such
 as number of mentions in R-help or number of bug reports. Or maybe
 citation counts, but who would make important decisions based on
 those?

  I'd love to go 'Hmmm how many people are using my package?' and get
 an exact answer. Given the impossibility of that information, I'd love
 to go 'Hmmm how many people downloaded my package?', a good
 approximation to which is not beyond the bounds of our technology. Web
 pages have had annoying 'this piece of software has been downloaded
 443535 times' banners (often enclosed in blink tags) since 1996.Yes
 it would require some effort at each CRAN site, but maybe the CRAN
 mirror site maintainers might be interested in doing this. If they
 don't want to, then fine.

Here's a few either uses that I would put the data to:

 * In my tenure case, grant applications etc, I can say how many
people have downloaded my packages.

 * If relatively few people are using a package, I'd know that I
either need to promote the package more, or improve it so that it is
useful to more people.

 * At a higher level, it would be interesting to see what types of
packages are most frequently download.  Modelling packages? Graphics
packages? Packages for particular applications? ...

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-07 Thread Gabor Grothendieck
This function will show which other packages depend on a particular
package:

 dep - function(pkg, AP = available.packages()) {
+pkg - paste(\\b, pkg, \\b, sep = )
+cat(Depends:, rownames(AP)[grep(pkg, AP[, Depends])], \n)
+cat(Suggests:, rownames(AP)[grep(pkg, AP[, Suggests])], \n)
+ }
 dep(zoo)
Depends: AER BootPR FinTS PerformanceAnalytics RBloomberg
StreamMetabolism TSfame TShistQuote VhayuR dyn dynlm fda fxregime
lmtest meboot party quantmod sandwich sde strucchange tripEstimation
tseries xts
Suggests: TSMySQL TSPostgreSQL TSSQLite TSdbi TSodbc UsingR Zelig
gsubfn playwith pscl tframePlus


On Sat, Mar 7, 2009 at 2:57 PM, Jeroen Ooms j.c.l.o...@uu.nl wrote:

 I would like to get some idea of which R-packages are popular, and what R is
 used for in general. Are there any statistics available on which R packages
 are downloaded often, or is there something like a package-survey? Something
 similar to http://popcon.debian.org/ maybe? Any tips are welcome!

 -
 Jeroen Ooms * Dept. of Methodology and Statistics * Utrecht University

 Visit  http://www.jeroenooms.com www.jeroenooms.com  to explore some of my
 current projects.






 --
 View this message in context: 
 http://www.nabble.com/popular-R-packages-tp22391260p22391260.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-07 Thread David Winsemius
When the question arises How many R-users there are?, the consensus  
seems to be that there is no valid method to address the question. The  
thread R-business case from 2004 can be found here:

https://stat.ethz.ch/pipermail/r-help/2004-March/047606.html

I did not see any material revision to that conclusion during the  
recent discussion of the New York Times article on the r-challenge to  
SAS.


Gmane tracks the number of r-help activity (I realize not what you  
asked for):

http://www.gmane.org/info.php?group=gmane.comp.lang.r.general

The distribution of r-packages is, well  ... distributed:
http://cran.r-project.org/mirrors.html

At least one of the participants in the 2004 thread suggested that it  
would be a good thing to track the numbers of downloads by package.  
I have not heard of any such system being installed in the mirror  
software and I see nothing that suggests data gathering in the CRAN  
Mirror How-to:

http://cran.r-project.org/mirror-howto.html

On the other hand I am not part of R-core, so you must await more  
authoritative opinion since a 5 year-old thread and amateur  
speculation is not much of a leg to stand on.


There are lexicographic packages for R. One approach to a de novo  
analysis would be to do some sort of natural language analysis of the  
r-help archives counting up either package names with non-English  
names or  close proximity of the words library or package to  
package names that overlap the 30,000 common English words. That would  
have the danger of inflating counts of the packages with the least  
adequate documentation or a paucity of good worked examples, but there  
are many readers of this list who suspect that new users don't look at  
the documentation, so who knows?


--
David Winsemius


On Mar 7, 2009, at 2:57 PM, Jeroen Ooms wrote:



I would like to get some idea of which R-packages are popular, and  
what R is
used for in general. Are there any statistics available on which R  
packages
are downloaded often, or is there something like a package-survey?  
Something

similar to http://popcon.debian.org/ maybe? Any tips are welcome!

-
Jeroen Ooms * Dept. of Methodology and Statistics * Utrecht University

Visit  http://www.jeroenooms.com www.jeroenooms.com  to explore some  
of my

current projects.






--
View this message in context: 
http://www.nabble.com/popular-R-packages-tp22391260p22391260.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-07 Thread Thomas Adams
I don't think At least one of the participants in the 2004 thread 
suggested that it would be a good thing to track the numbers of 
downloads by package. is reasonable because I download R packages for 2 
home computers (laptop  desktop) and 2 at work (1 Linux  1 Mac). There 
must be many such cases…


Tom

David Winsemius wrote:
When the question arises How many R-users there are?, the consensus 
seems to be that there is no valid method to address the question. The 
thread R-business case from 2004 can be found here:

https://stat.ethz.ch/pipermail/r-help/2004-March/047606.html

I did not see any material revision to that conclusion during the 
recent discussion of the New York Times article on the r-challenge to 
SAS.


Gmane tracks the number of r-help activity (I realize not what you 
asked for):

http://www.gmane.org/info.php?group=gmane.comp.lang.r.general

The distribution of r-packages is, well ... distributed:
http://cran.r-project.org/mirrors.html

At least one of the participants in the 2004 thread suggested that it 
would be a good thing to track the numbers of downloads by package. 
I have not heard of any such system being installed in the mirror 
software and I see nothing that suggests data gathering in the CRAN 
Mirror How-to:

http://cran.r-project.org/mirror-howto.html

On the other hand I am not part of R-core, so you must await more 
authoritative opinion since a 5 year-old thread and amateur 
speculation is not much of a leg to stand on.


There are lexicographic packages for R. One approach to a de novo 
analysis would be to do some sort of natural language analysis of the 
r-help archives counting up either package names with non-English 
names or close proximity of the words library or package to 
package names that overlap the 30,000 common English words. That would 
have the danger of inflating counts of the packages with the least 
adequate documentation or a paucity of good worked examples, but there 
are many readers of this list who suspect that new users don't look at 
the documentation, so who knows?





--
Thomas E Adams
National Weather Service
Ohio River Forecast Center
1901 South State Route 134
Wilmington, OH 45177

EMAIL:  thomas.ad...@noaa.gov

VOICE:  937-383-0528
FAX:937-383-0033

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-07 Thread Tal Galili
I agree with Thomas, over the years I have installed R on at least 5
computers.

BTW: does any one knows how the website statistics of r-project are
being analyzed?
Since I can't see any google analytics or other tracking code in the main
website, I am guessing someone might be running some log-file analyzer - but
I'd rather hear that then assume.






On Sun, Mar 8, 2009 at 12:45 AM, Thomas Adams thomas.ad...@noaa.gov wrote:

 I don't think At least one of the participants in the 2004 thread
 suggested that it would be a good thing to track the numbers of downloads
 by package. is reasonable because I download R packages for 2 home
 computers (laptop  desktop) and 2 at work (1 Linux  1 Mac). There must be
 many such cases…

 Tom

 David Winsemius wrote:

 When the question arises How many R-users there are?, the consensus
 seems to be that there is no valid method to address the question. The
 thread R-business case from 2004 can be found here:
 https://stat.ethz.ch/pipermail/r-help/2004-March/047606.html

 I did not see any material revision to that conclusion during the recent
 discussion of the New York Times article on the r-challenge to SAS.

 Gmane tracks the number of r-help activity (I realize not what you asked
 for):
 http://www.gmane.org/info.php?group=gmane.comp.lang.r.general

 The distribution of r-packages is, well ... distributed:
 http://cran.r-project.org/mirrors.html

 At least one of the participants in the 2004 thread suggested that it
 would be a good thing to track the numbers of downloads by package. I have
 not heard of any such system being installed in the mirror software and I
 see nothing that suggests data gathering in the CRAN Mirror How-to:
 http://cran.r-project.org/mirror-howto.html

 On the other hand I am not part of R-core, so you must await more
 authoritative opinion since a 5 year-old thread and amateur speculation is
 not much of a leg to stand on.

 There are lexicographic packages for R. One approach to a de novo analysis
 would be to do some sort of natural language analysis of the r-help archives
 counting up either package names with non-English names or close proximity
 of the words library or package to package names that overlap the 30,000
 common English words. That would have the danger of inflating counts of the
 packages with the least adequate documentation or a paucity of good worked
 examples, but there are many readers of this list who suspect that new users
 don't look at the documentation, so who knows?



 --
 Thomas E Adams
 National Weather Service
 Ohio River Forecast Center
 1901 South State Route 134
 Wilmington, OH 45177

 EMAIL:  thomas.ad...@noaa.gov

 VOICE:  937-383-0528
 FAX:937-383-0033


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
--


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
www.talgalili.com
www.biostatistics.co.il

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-07 Thread David Winsemius
Quite so. It certainly is the case that Dirk Eddelbuettel suggested  
would be very desirable and I think Dirk's track record speaks for  
itself. I never said (and I am sure Dirk never intended) that one  
could take the raw numbers as a basis for blandly asserting that  
 copies of ttt package are currently installed.


When I update packages, the automated process takes hold and I go for  
a cup of coffee. I only have at the moment two computers with R  
installed and have not updated any binary packages on Windoze in over  
a year.  Nonetheless, I do think the relative numbers of package  
downloads might be interpretable, or at the very least, the basis for  
discussions over beer.


--
David Winsemius


On Mar 7, 2009, at 5:45 PM, Thomas Adams wrote:

I don't think At least one of the participants in the 2004 thread  
suggested that it would be a good thing to track the numbers of  
downloads by package. is reasonable because I download R packages  
for 2 home computers (laptop  desktop) and 2 at work (1 Linux  1  
Mac). There must be many such cases…


Tom

David Winsemius wrote:
When the question arises How many R-users there are?, the  
consensus seems to be that there is no valid method to address the  
question. The thread R-business case from 2004 can be found here:

https://stat.ethz.ch/pipermail/r-help/2004-March/047606.html

I did not see any material revision to that conclusion during the  
recent discussion of the New York Times article on the r-challenge  
to SAS.


Gmane tracks the number of r-help activity (I realize not what you  
asked for):

http://www.gmane.org/info.php?group=gmane.comp.lang.r.general

The distribution of r-packages is, well ... distributed:
http://cran.r-project.org/mirrors.html

At least one of the participants in the 2004 thread suggested that  
it would be a good thing to track the numbers of downloads by  
package. I have not heard of any such system being installed in the  
mirror software and I see nothing that suggests data gathering in  
the CRAN Mirror How-to:

http://cran.r-project.org/mirror-howto.html

On the other hand I am not part of R-core, so you must await more  
authoritative opinion since a 5 year-old thread and amateur  
speculation is not much of a leg to stand on.


There are lexicographic packages for R. One approach to a de novo  
analysis would be to do some sort of natural language analysis of  
the r-help archives counting up either package names with non- 
English names or close proximity of the words library or  
package to package names that overlap the 30,000 common English  
words. That would have the danger of inflating counts of the  
packages with the least adequate documentation or a paucity of good  
worked examples, but there are many readers of this list who  
suspect that new users don't look at the documentation, so who knows?





--
Thomas E Adams
National Weather Service
Ohio River Forecast Center
1901 South State Route 134
Wilmington, OH 45177

EMAIL:  thomas.ad...@noaa.gov

VOICE:  937-383-0528
FAX:937-383-0033



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-07 Thread Jeroen Ooms

 I agree with Thomas, over the years I have installed R on at least 5
 computers.


I don't see why per-marchine statistics would not be useful. When you
installed a package on five machines, you probably use it a lot, and it is
more important to you than packages that you only installed once.

Furthermore I don't think the distribution of packages has to be
problematic. I guess downloads are only slightly related to the specific
mirror, so download statistics from one of the popular mirror's would do for
me.

Of course these statistics are never perfect, but they could be
informative...

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-07 Thread Wacek Kusnierczyk
i have kept r installed on more than ten computers during the past few
years, some of them running win + more than one linux distro, all of
them having r, most often installed from a separate download.

i know of many cases where students download r for the purpose of a
course in statistics -- often an introductory course for students who
otherwise have little to do with stats. some of them do it more than
once during the semester, and many of them never use r again.

taking into account that basic statistics courses are taught to most
university students and that r is surely the most popular free
statistical computing environment, download-based usage estimates may be
a bit optimistic, unless 'usage' is taken to include 'learn-pass-forget'.

vQ



Tal Galili wrote:
 I agree with Thomas, over the years I have installed R on at least 5
 computers.

 BTW: does any one knows how the website statistics of r-project are
 being analyzed?
 Since I can't see any google analytics or other tracking code in the main
 website, I am guessing someone might be running some log-file analyzer - but
 I'd rather hear that then assume.






 On Sun, Mar 8, 2009 at 12:45 AM, Thomas Adams thomas.ad...@noaa.gov wrote:

   
 I don't think At least one of the participants in the 2004 thread
 suggested that it would be a good thing to track the numbers of downloads
 by package. is reasonable because I download R packages for 2 home
 computers (laptop  desktop) and 2 at work (1 Linux  1 Mac). There must be
 many such cases…

 Tom


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-07 Thread Spencer Graves
 I just did RSiteSearch(library(xxx)) with xxx = the names of 6 
packages familiar to me, with the following numbers of hits: 



hits package

169 lme4
165 nlme
  6 fda
  4 maps
  2 FinTS
  2 DierckxSpline


 Software could be written to (1) extract the names of current 
packages from CRAN then (2) perform queries similar to this on all such 
packages and summarize the results.  I don't have the time now to write 
code for this, but I've written similar code before for step (1);  it 
can be found in scripts/TsayFiles.R in the FinTS package on CRAN.  
For step (2), Sundar Dorai-Raj wrote code that is is included in the 
preliminary RSiteSearch package available from R-Forge via 
install.'packages(RSiteSearch,repos=http://r-forge.r-project.org;)'. 

 Code to do this could probably be written (a) in a matter of 
seconds by many of those in the R Core team or (b) in a matter of hours 
by virtually any reader of this list using the examples I just cited.  
And it could provide numbers without a need to convince others to keep 
download statistics and make them available later. 

 Hope this helps. 
 Spencer Graves


Wacek Kusnierczyk wrote:

i have kept r installed on more than ten computers during the past few
years, some of them running win + more than one linux distro, all of
them having r, most often installed from a separate download.

i know of many cases where students download r for the purpose of a
course in statistics -- often an introductory course for students who
otherwise have little to do with stats. some of them do it more than
once during the semester, and many of them never use r again.

taking into account that basic statistics courses are taught to most
university students and that r is surely the most popular free
statistical computing environment, download-based usage estimates may be
a bit optimistic, unless 'usage' is taken to include 'learn-pass-forget'.

vQ



Tal Galili wrote:
  

I agree with Thomas, over the years I have installed R on at least 5
computers.

BTW: does any one knows how the website statistics of r-project are
being analyzed?
Since I can't see any google analytics or other tracking code in the main
website, I am guessing someone might be running some log-file analyzer - but
I'd rather hear that then assume.






On Sun, Mar 8, 2009 at 12:45 AM, Thomas Adams thomas.ad...@noaa.gov wrote:

  


I don't think At least one of the participants in the 2004 thread
suggested that it would be a good thing to track the numbers of downloads
by package. is reasonable because I download R packages for 2 home
computers (laptop  desktop) and 2 at work (1 Linux  1 Mac). There must be
many such cases…

Tom

  


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-07 Thread s...@xlsolutions-corp.com
 Hi Spencer,

 XLSolutions is currently analyzing r-help archived questions to rank
packages for the upcoming R-PLUS 3.3 Professional version and we will be
happy to share the outcome with interested parties. Please email
d...@xlsolutions-corp.com


 Regards -
 Sue Turner
 Senior Account Manager
 XLSolutions Corporation
 North American Division
 1700 7th Ave
 Suite 2100
 Seattle, WA 98101
 Phone: 206-686-1578
 Email: s...@xlsolutions-corp.com
 web: www.xlsolutions-corp.com



--- On Sat, 3/7/09, Spencer Graves spencer.gra...@prodsyse.com wrote:

 From: Spencer Graves spencer.gra...@prodsyse.com
 Subject: Re: [R] popular R packages
 To: Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no
 Cc: r-help@r-project.org, Jeroen Ooms j.c.l.o...@uu.nl, Thomas Adams 
 thomas.ad...@noaa.gov
 Date: Saturday, March 7, 2009, 5:22 PM
 I just did RSiteSearch(library(xxx)) with xxx =
 the names of 6 packages familiar to me, with the following
 numbers of hits: 
 
 hits package
 
 169 lme4
 165 nlme
   6 fda
   4 maps
   2 FinTS
   2 DierckxSpline
 
  Software could be written to (1) extract the names of
 current packages from CRAN then (2) perform queries similar
 to this on all such packages and summarize the results.  I
 don't have the time now to write code for this, but
 I've written similar code before for step (1);  it can
 be found in scripts/TsayFiles.R in the
 FinTS package on CRAN.  For step (2), Sundar
 Dorai-Raj wrote code that is is included in the preliminary
 RSiteSearch package available from R-Forge via
 install.'packages(RSiteSearch,repos=http://r-forge.r-project.org;)'.
 
  Code to do this could probably be written (a) in a
 matter of seconds by many of those in the R Core team or (b)
 in a matter of hours by virtually any reader of this list
 using the examples I just cited.  And it could provide
 numbers without a need to convince others to keep download
 statistics and make them available later. 
  Hope this helps.  Spencer Graves
 Wacek Kusnierczyk wrote:
  i have kept r installed on more than ten computers
 during the past few
  years, some of them running win + more than one linux
 distro, all of
  them having r, most often installed from a separate
 download.
  
  i know of many cases where students download r for the
 purpose of a
  course in statistics -- often an introductory course
 for students who
  otherwise have little to do with stats. some of them
 do it more than
  once during the semester, and many of them never use r
 again.
  
  taking into account that basic statistics courses are
 taught to most
  university students and that r is surely the most
 popular free
  statistical computing environment, download-based
 usage estimates may be
  a bit optimistic, unless 'usage' is taken to
 include 'learn-pass-forget'.
  
  vQ
  
  
  
  Tal Galili wrote:

  I agree with Thomas, over the years I have
 installed R on at least 5
  computers.
  
  BTW: does any one knows how the website statistics
 of r-project are
  being analyzed?
  Since I can't see any google
 analytics or other tracking code in the main
  website, I am guessing someone might be running
 some log-file analyzer - but
  I'd rather hear that then assume.
  
  
  
  
  
  
  On Sun, Mar 8, 2009 at 12:45 AM, Thomas Adams
 thomas.ad...@noaa.gov wrote:
  

  I don't think At least one of the
 participants in the 2004 thread
  suggested that it would be a good
 thing to track the numbers of downloads
  by package. is reasonable because I
 download R packages for 2 home
  computers (laptop  desktop) and 2 at work
 (1 Linux  1 Mac). There must be
  many such cases…
  
  Tom
  

  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
  
  
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-07 Thread Jim Lemon

Hi all,
I'm kind of amazed at the answers suggested for the relatively simple 
question, How many times has each R package been downloaded?. Some 
have veered off in another direction, like working out how many packages 
a package depends upon, or whether someone downloads more than one copy. 
The response about ranking packages by the number of questions asked 
about them may be interesting, but may not relate very well at all to 
popularity in terms of downloads. If people were constantly asking 
questions about one of the packages I maintain, I would be working on 
the help pages to improve them, not basking in the inferred glory of 
having a popular package. There is one way that the download count would 
be very useful for package maintainers, if no one else. Take as an 
example the package concord, that has not been maintained for a year or 
more since the content was merged into the irr package. If I knew that 
no one downloaded concord any more, I would surely petition those in 
charge of the archive to remove it or at least transfer it to the 
package museum. No point in having ever more packages on CRAN if they 
are never downloaded.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.