Re: [R] popular R packages
Christos Hatzis wrote: Bioconductor already provides download stats for all packages... http://bioconductor.org/packages/stats/bioc/affy.html Maybe if we asked the Bioconductor people _really_ nicely Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Gabor Grothendieck wrote: R-Forge already has this but I don't think its used much. R-Forge does allow authors to opt out which seems sensible lest it deter potential authors from submitting packages. I think objective quality metrics are better than ratings, e.g. does package have a vignette, has package had a release within the last year, does package have free software license, etc. That would have the advantage that authors might react to increase their package's quality assessment resulting in an overall improvement in quality on CRAN that would result in more of a pro-active cycle whereas ratings are reactive and don't really encourage improvement. I beg to offer an alternative assessment of quality. Do users download the package and find it useful? If so, they are likely to download it again when it is updated. Much as I appreciate the convenience of vignettes, regular updates and the absolute latest GPL license, a perfectly dud package can have all of these things. If a package is downloaded upon first release and not much thereafter, the maintainer might be motivated to attend to its shortcomings of utility rather than incrementing the version number every month or so. Downloads, as many have pointed out, are not a direct assessment of quality, but if I saw a package that just kept getting downloaded, version after version, I would be much more likely to check it out myself and perhaps even write a review for Hadley's neat site. Which I will try to do tonight. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
On Tue, Mar 10, 2009 at 6:14 AM, Jim Lemon j...@bitwrit.com.au wrote: Gabor Grothendieck wrote: R-Forge already has this but I don't think its used much. R-Forge does allow authors to opt out which seems sensible lest it deter potential authors from submitting packages. I think objective quality metrics are better than ratings, e.g. does package have a vignette, has package had a release within the last year, does package have free software license, etc. That would have the advantage that authors might react to increase their package's quality assessment resulting in an overall improvement in quality on CRAN that would result in more of a pro-active cycle whereas ratings are reactive and don't really encourage improvement. I beg to offer an alternative assessment of quality. Do users download the package and find it useful? If so, they are likely to download it again when it is updated. I was referring to motivating authors, not users, so that CRAN improves. Much as I appreciate the convenience of vignettes, regular updates and the absolute latest GPL license, a perfectly dud package can have all of these things. If a package is downloaded upon first release and These are nothing but the usual FUD against quality improvement, i.e. the quality metrics are not measuring what you want but the fact is that quality metrics can work and have had huge successes. Also I think objective measures would be more accepted by authors than ratings. No one is going to be put off that their package has no vignette when obviously it doesn't and the authors are free to add one and instantly improve their package's rating. not much thereafter, the maintainer might be motivated to attend to its shortcomings of utility rather than incrementing the version number every month or so. Downloads, as many have pointed out, are not a direct assessment of quality, but if I saw a package that just kept getting downloaded, version after version, I would be much more likely to check it out myself and perhaps even write a review for Hadley's neat site. Which I will try to do tonight. I was arguing for objective metrics rather than ratings. Downloading is not a rating but is objective although there are measurement problems as has been pointed out. Also, the worst feature is that it does not react to changes in quality very quickly making it anti-motivating. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Gabor Grothendieck wrote: On Tue, Mar 10, 2009 at 6:14 AM, Jim Lemon j...@bitwrit.com.au wrote: Gabor Grothendieck wrote: R-Forge already has this but I don't think its used much. R-Forge does allow authors to opt out which seems sensible lest it deter potential authors from submitting packages. I think objective quality metrics are better than ratings, e.g. does package have a vignette, has package had a release within the last year, does package have free software license, etc. That would have the advantage that authors might react to increase their package's quality assessment resulting in an overall improvement in quality on CRAN that would result in more of a pro-active cycle whereas ratings are reactive and don't really encourage improvement. I beg to offer an alternative assessment of quality. Do users download the package and find it useful? If so, they are likely to download it again when it is updated. I was referring to motivating authors, not users, so that CRAN improves. Much as I appreciate the convenience of vignettes, regular updates and the absolute latest GPL license, a perfectly dud package can have all of these things. If a package is downloaded upon first release and These are nothing but the usual FUD against quality improvement, i.e. the quality metrics are not measuring what you want but the fact is that quality metrics can work and have had huge successes. Also I think objective measures would be more accepted by authors than ratings. No one is going to be put off that their package has no vignette when obviously it doesn't and the authors are free to add one and instantly improve their package's rating. not much thereafter, the maintainer might be motivated to attend to its shortcomings of utility rather than incrementing the version number every month or so. Downloads, as many have pointed out, are not a direct assessment of quality, but if I saw a package that just kept getting downloaded, version after version, I would be much more likely to check it out myself and perhaps even write a review for Hadley's neat site. Which I will try to do tonight. I was arguing for objective metrics rather than ratings. Downloading is not a rating but is objective although there are measurement problems as has been pointed out. Also, the worst feature is that it does not react to changes in quality very quickly making it anti-motivating. Gabor I think your approach will have more payoff in the long run. I would suggest one other metric: the number of lines of code in the 'examples' section of all the package's help files. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
If is easy to get the download numbers, we should do it and deal with the interpretation issues. I'd like to know the numbers so I can understand which (of my) packages have the most usage. One other compication about # downloads: I suspect that a package being on teh depends/suggests/imports list of another package might be a big driver with respect to how many times that it was downloaded. If I remember correctly, about 5 years ago Bioconductor asked for volunteers to review packages to get detailed, specific feedback by people who use the package (and should be fairly R proficient). I think that this is pretty important and something like Crantastic is a good interface. I personally got a lot out of the comments the a JSS reviewer had for a package. -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
On Tuesday 10 March 2009, Frank E Harrell Jr wrote: Gabor Grothendieck wrote: On Tue, Mar 10, 2009 at 6:14 AM, Jim Lemon j...@bitwrit.com.au wrote: Gabor Grothendieck wrote: R-Forge already has this but I don't think its used much. R-Forge does allow authors to opt out which seems sensible lest it deter potential authors from submitting packages. I think objective quality metrics are better than ratings, e.g. does package have a vignette, has package had a release within the last year, does package have free software license, etc. That would have the advantage that authors might react to increase their package's quality assessment resulting in an overall improvement in quality on CRAN that would result in more of a pro-active cycle whereas ratings are reactive and don't really encourage improvement. I beg to offer an alternative assessment of quality. Do users download the package and find it useful? If so, they are likely to download it again when it is updated. I was referring to motivating authors, not users, so that CRAN improves. Much as I appreciate the convenience of vignettes, regular updates and the absolute latest GPL license, a perfectly dud package can have all of these things. If a package is downloaded upon first release and These are nothing but the usual FUD against quality improvement, i.e. the quality metrics are not measuring what you want but the fact is that quality metrics can work and have had huge successes. Also I think objective measures would be more accepted by authors than ratings. No one is going to be put off that their package has no vignette when obviously it doesn't and the authors are free to add one and instantly improve their package's rating. not much thereafter, the maintainer might be motivated to attend to its shortcomings of utility rather than incrementing the version number every month or so. Downloads, as many have pointed out, are not a direct assessment of quality, but if I saw a package that just kept getting downloaded, version after version, I would be much more likely to check it out myself and perhaps even write a review for Hadley's neat site. Which I will try to do tonight. I was arguing for objective metrics rather than ratings. Downloading is not a rating but is objective although there are measurement problems as has been pointed out. Also, the worst feature is that it does not react to changes in quality very quickly making it anti-motivating. Gabor I think your approach will have more payoff in the long run. I would suggest one other metric: the number of lines of code in the 'examples' section of all the package's help files. Frank Absolutely. From the perspective of a user, not an expert, packages with a good vignette and lots of examples are by far my favorite and most used. Dylan -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Pricing each download at 99 cents ( the same as a song from I Tunes) can measure users more accurately. Thats my 2 cents anyways. On Tue, Mar 10, 2009 at 9:54 PM, Max Kuhn mxk...@gmail.com wrote: If is easy to get the download numbers, we should do it and deal with the interpretation issues. I'd like to know the numbers so I can understand which (of my) packages have the most usage. One other compication about # downloads: I suspect that a package being on teh depends/suggests/imports list of another package might be a big driver with respect to how many times that it was downloaded. If I remember correctly, about 5 years ago Bioconductor asked for volunteers to review packages to get detailed, specific feedback by people who use the package (and should be fairly R proficient). I think that this is pretty important and something like Crantastic is a good interface. I personally got a lot out of the comments the a JSS reviewer had for a package. -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Bioconductor already provides download stats for all packages... http://bioconductor.org/packages/stats/bioc/affy.html -Christos -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Max Kuhn Sent: Tuesday, March 10, 2009 12:25 PM To: r-help@r-project.org Subject: Re: [R] popular R packages If is easy to get the download numbers, we should do it and deal with the interpretation issues. I'd like to know the numbers so I can understand which (of my) packages have the most usage. One other compication about # downloads: I suspect that a package being on teh depends/suggests/imports list of another package might be a big driver with respect to how many times that it was downloaded. If I remember correctly, about 5 years ago Bioconductor asked for volunteers to review packages to get detailed, specific feedback by people who use the package (and should be fairly R proficient). I think that this is pretty important and something like Crantastic is a good interface. I personally got a lot out of the comments the a JSS reviewer had for a package. -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
s...@xlsolutions-corp.com wrote: Hi Spencer, XLSolutions is currently analyzing r-help archived questions to rank packages for the upcoming R-PLUS 3.3 Professional version and we will be happy to share the outcome with interested parties. Please email d...@xlsolutions-corp.com I would expect that the correlation between popularity on the one hand and usefulness as well as quality to be relatively low. If it was possible to rate the downloaders in respect to seriousness and whether they actually use the package for some sensible purpose I would be more interested. Consider a highly specialized and good quality package used by a relatively small group of distinguished reseachers. Would that have a high rank? No. But important? Possibly yes. Tom Regards - Sue Turner Senior Account Manager XLSolutions Corporation North American Division 1700 7th Ave Suite 2100 Seattle, WA 98101 Phone: 206-686-1578 Email: s...@xlsolutions-corp.com web: www.xlsolutions-corp.com --- On Sat, 3/7/09, Spencer Graves spencer.gra...@prodsyse.com wrote: From: Spencer Graves spencer.gra...@prodsyse.com Subject: Re: [R] popular R packages To: Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no Cc: r-help@r-project.org, Jeroen Ooms j.c.l.o...@uu.nl, Thomas Adams thomas.ad...@noaa.gov Date: Saturday, March 7, 2009, 5:22 PM I just did RSiteSearch(library(xxx)) with xxx = the names of 6 packages familiar to me, with the following numbers of hits: hits package 169 lme4 165 nlme 6 fda 4 maps 2 FinTS 2 DierckxSpline Software could be written to (1) extract the names of current packages from CRAN then (2) perform queries similar to this on all such packages and summarize the results. I don't have the time now to write code for this, but I've written similar code before for step (1); it can be found in scripts/TsayFiles.R in the FinTS package on CRAN. For step (2), Sundar Dorai-Raj wrote code that is is included in the preliminary RSiteSearch package available from R-Forge via install.'packages(RSiteSearch,repos=http://r-forge.r-project.org;)'. Code to do this could probably be written (a) in a matter of seconds by many of those in the R Core team or (b) in a matter of hours by virtually any reader of this list using the examples I just cited. And it could provide numbers without a need to convince others to keep download statistics and make them available later. Hope this helps. Spencer Graves Wacek Kusnierczyk wrote: i have kept r installed on more than ten computers during the past few years, some of them running win + more than one linux distro, all of them having r, most often installed from a separate download. i know of many cases where students download r for the purpose of a course in statistics -- often an introductory course for students who otherwise have little to do with stats. some of them do it more than once during the semester, and many of them never use r again. taking into account that basic statistics courses are taught to most university students and that r is surely the most popular free statistical computing environment, download-based usage estimates may be a bit optimistic, unless 'usage' is taken to include 'learn-pass-forget'. vQ Tal Galili wrote: I agree with Thomas, over the years I have installed R on at least 5 computers. BTW: does any one knows how the website statistics of r-project are being analyzed? Since I can't see any google analytics or other tracking code in the main website, I am guessing someone might be running some log-file analyzer - but I'd rather hear that then assume. On Sun, Mar 8, 2009 at 12:45 AM, Thomas Adams thomas.ad...@noaa.gov wrote: I don't think At least one of the participants in the 2004 thread suggested that it would be a good thing to track the numbers of downloads by package. is reasonable because I download R packages for 2 home computers (laptop desktop) and 2 at work (1 Linux 1 Mac). There must be many such cases… Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- ++ | Tom Backer Johnsen
Re: [R] popular R packages
Given we are talking about statistical software, one bibliometric measure of relative package popularity is scientific citations. Web of Science is not too useful where the citation has been to a website or computer package, but Google Scholar for lme4: Linear mixed-effects models using S4 classes gives us 108 journal citations; mgcv: GAMs and generalized ridge regression for R 80 etc Cheers, David Duffy. -- | David Duffy (MBBS PhD) ,-_|\ | email: dav...@qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
On 10-Mar-09 01:07:54, David Duffy wrote: Given we are talking about statistical software, one bibliometric measure of relative package popularity is scientific citations. Web of Science is not too useful where the citation has been to a website or computer package, but Google Scholar for lme4: Linear mixed-effects models using S4 classes gives us 108 journal citations; mgcv: GAMs and generalized ridge regression for R 80 etc Cheers, David Duffy. A good point. But such numbers must be considered in the context of the prevalence of the kind of study for which the respective methods would be used. A great number of epidemiological studies would be suitable for application of glm(). Fewer would involve GAMs. Popularity of a package by citation frequency would (other things being equal) be proportional to the frequency of the kind of study for which it could be used. So one should either evaluate the proportion of studies in which an R package *could* be used, in which it *was* used; or compare the number of citations of an R package with the number of citations of an equiavlent package/module/proc in other software. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 10-Mar-09 Time: 02:03:22 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Hi all, Put me in the camp that says more information is better than less information - even if imperfect. Interpretation can be left to those using the data. Also, popular can mean many things. An alternative to number of times a package is downloaded would be a ratings system, where R users can supply starred ratings or something (much as they do on netflix or amazon). Combining this with # of downloads would give users some idea about the users' perception, impact, and possibly the quality of a package. Obviously it would be imperfect, but it seems to me this would be better than the even more scant and more imperfect information currently available. I wouldn't advocate that such information be used in the same way a citation index is, but it might prove helpful to users who are confused (even paralyzed) by the ever burgeoning number of R packages. There was a discussion on this a while back in which Bill Venables said: To me a much more urgent initiative [than rating responders on R listserves] is some kind of user online review system for packages, even something as simple as that used by Amazon.com has for customer review of books. I think the need for this is rather urgent, in fact. Most packages are very good, but I regret to say some are pretty inefficient and others downright dangerous. You don't want to discourage people from submitting their work to CRAN, but at the same time you do want some mechanism that allows users to relate their experience with it, good or bad. Find the whole thread here: https://stat.ethz.ch/pipermail/r-help/2007-December/147323.html. Matt -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
There was a discussion on this a while back in which Bill Venables said: To me a much more urgent initiative [than rating responders on R listserves] is some kind of user online review system for packages, even something as simple as that used by Amazon.com has for customer review of books. I think the need for this is rather urgent, in fact. Most packages are very good, but I regret to say some are pretty inefficient and others downright dangerous. You don't want to discourage people from submitting their work to CRAN, but at the same time you do want some mechanism that allows users to relate their experience with it, good or bad. And you can see my initial attempts at this at http://crantastic.org. Unfortunately I haven't had much time to work on it, and haven't had much luck recruiting helpers. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
R-Forge already has this but I don't think its used much. R-Forge does allow authors to opt out which seems sensible lest it deter potential authors from submitting packages. I think objective quality metrics are better than ratings, e.g. does package have a vignette, has package had a release within the last year, does package have free software license, etc. That would have the advantage that authors might react to increase their package's quality assessment resulting in an overall improvement in quality on CRAN that would result in more of a pro-active cycle whereas ratings are reactive and don't really encourage improvement. On Mon, Mar 9, 2009 at 10:20 PM, Matthew Keller mckellerc...@gmail.com wrote: Hi all, Put me in the camp that says more information is better than less information - even if imperfect. Interpretation can be left to those using the data. Also, popular can mean many things. An alternative to number of times a package is downloaded would be a ratings system, where R users can supply starred ratings or something (much as they do on netflix or amazon). Combining this with # of downloads would give users some idea about the users' perception, impact, and possibly the quality of a package. Obviously it would be imperfect, but it seems to me this would be better than the even more scant and more imperfect information currently available. I wouldn't advocate that such information be used in the same way a citation index is, but it might prove helpful to users who are confused (even paralyzed) by the ever burgeoning number of R packages. There was a discussion on this a while back in which Bill Venables said: To me a much more urgent initiative [than rating responders on R listserves] is some kind of user online review system for packages, even something as simple as that used by Amazon.com has for customer review of books. I think the need for this is rather urgent, in fact. Most packages are very good, but I regret to say some are pretty inefficient and others downright dangerous. You don't want to discourage people from submitting their work to CRAN, but at the same time you do want some mechanism that allows users to relate their experience with it, good or bad. Find the whole thread here: https://stat.ethz.ch/pipermail/r-help/2007-December/147323.html. Matt -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
On Sat, 07 Mar 2009 18:04:24 -0500, David Winsemius wrote : [ Snip ... ] Nonetheless, I do think the relative numbers of package downloads might be interpretable, or at the very least, the basis for discussions over beer. *Anything* might be the basis for discussions over beer (obvious corollary to Thermogoddamics' second principle). More seriously : I don't think relative numbers of package downloads can be interpreted in any reasonable way, because reasons for package download have a very wide range from curiosity (what's this ?), fun (think fortunes...), to vital need tthink lme4 if/when a consensus on denominator DFs can be reached :-)...). What can you infer in good faith from such a mess ? Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
More seriously : I don't think relative numbers of package downloads can be interpreted in any reasonable way, because reasons for package download have a very wide range from curiosity (what's this ?), fun (think fortunes...), to vital need tthink lme4 if/when a consensus on denominator DFs can be reached :-)...). What can you infer in good faith from such a mess ? So when we have messy data with measurement error, we should just give up? Doesn't sound very statistical! ;) Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
On Sun, Mar 8, 2009 at 10:49 AM, hadley wickham h.wick...@gmail.com wrote: More seriously : I don't think relative numbers of package downloads can be interpreted in any reasonable way, because reasons for package download have a very wide range from curiosity (what's this ?), fun (think fortunes...), to vital need tthink lme4 if/when a consensus on denominator DFs can be reached :-)...). What can you infer in good faith from such a mess ? So when we have messy data with measurement error, we should just give up? Doesn't sound very statistical! ;) Also I would think that the rankings would be meaningful since the factors that cause the absolute numbers to be off would affect all packages equally. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
On 08/03/2009 10:49 AM, hadley wickham wrote: More seriously : I don't think relative numbers of package downloads can be interpreted in any reasonable way, because reasons for package download have a very wide range from curiosity (what's this ?), fun (think fortunes...), to vital need tthink lme4 if/when a consensus on denominator DFs can be reached :-)...). What can you infer in good faith from such a mess ? So when we have messy data with measurement error, we should just give up? Doesn't sound very statistical! ;) I think the situation is worse than messy. If a client comes in with data that doesn't address the question they're interested in, I think they are better served to be told that, than to be given an answer that is not actually valid. They should also be told how to design a study that actually does address their question. You (and others) have mentioned Google Analytics as a possible way to address the quality of data; that's helpful. But analyzing bad data will just give bad conclusions. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
I think the situation is worse than messy. If a client comes in with data that doesn't address the question they're interested in, I think they are better served to be told that, than to be given an answer that is not actually valid. They should also be told how to design a study that actually does address their question. You (and others) have mentioned Google Analytics as a possible way to address the quality of data; that's helpful. But analyzing bad data will just give bad conclusions. As long as we say 'package Foo is the most downloaded package on CRAN', and not 'package Foo is the most used package for R', we can leave it to the user to decide if the latter conclusion follows from the former. In the absence of actual usage data I would think it a good approximation. Not that I would risk my life on it. Pop music charts are now based on download counts, but I wouldn't believe they represent the songs that are listened to the most times. Nor would I go so far as to believe they represent the quality of the songs... Should R have a 'Would you like to tell CRAN every time you do library(foo) so we can do usage counts (no personal data is transmitted blah blah) ?'? I don't think so Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
On 08-Mar-09 15:14:03, Duncan Murdoch wrote: On 08/03/2009 10:49 AM, hadley wickham wrote: More seriously : I don't think relative numbers of package downloads can be interpreted in any reasonable way, because reasons for package download have a very wide range from curiosity (what's this ?), fun (think fortunes...), to vital need tthink lme4 if/when a consensus on denominator DFs can be reached :-)...). What can you infer in good faith from such a mess ? So when we have messy data with measurement error, we should just give up? Doesn't sound very statistical! ;) I think the situation is worse than messy. If a client comes in with data that doesn't address the question they're interested in, I think they are better served to be told that, than to be given an answer that is not actually valid. They should also be told how to design a study that actually does address their question. You (and others) have mentioned Google Analytics as a possible way to address the quality of data; that's helpful. But analyzing bad data will just give bad conclusions. Duncan Murdoch The population of R users (which we would need to sample in order to obtain good data) is probably more elusive than a fish population in the ocean -- only partially visible at best, and with an unknown proportion invisible. At least in Fisheries research, there are long established capture techniques (from trawling to netting to electro-fishing to ... ) which can be deployed, for research purposes, in such a way as to potentially reach all members of a target population, with at least a moderately good approximation to random sampling. What have we for R? Come to think of it, electro-fishing, ... Suppose R were released with 2 types of cookie embedded in base R. Each type is randomly configured, when R is first run, to be Active or Inactive (probability of activation to be decided at the design stage ... ). Type 1, if active, on a certain date generates an event which brings it to the notice of R-Core (e.g. by clandestine email or by inducing a bug report). Type 2 acts similarly on a later date. If Type 2 acts, it carries with it information as to whether there was a Type 1 action along with whether, apparently, the Type 1 action succeeded. We then have, in effect, an analogue of the Mark-Recapture technique of population estimation (along with the usual questions about equal catchability and so forth). However, since this sort of thing (which I am not proposing seriously, only for the sake of argument) is undoubtedly unethical (and would do R's reputation no good if it came to light), I tentatively conclude that the population of R users is likely to remain as elusive as ever. Best wishes to all, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Mar-09 Time: 16:11:44 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Hi Ted, Coming to think about your direction - another idea came to mind: The next time a major release is made (there is one scheduled quite soon actually), the core team could add a survey on the downloading page of the R base package asking for just one question please click here if this is the first computer you are downloading this package for. This, combined with the fact that when serving a user we can obtain his IP address (which gives geo information) could give a pretty nice rough estimate of how many major release downloaders the R community has. Tal On Sun, Mar 8, 2009 at 6:11 PM, Ted Harding ted.hard...@manchester.ac.ukwrote: On 08-Mar-09 15:14:03, Duncan Murdoch wrote: On 08/03/2009 10:49 AM, hadley wickham wrote: More seriously : I don't think relative numbers of package downloads can be interpreted in any reasonable way, because reasons for package download have a very wide range from curiosity (what's this ?), fun (think fortunes...), to vital need tthink lme4 if/when a consensus on denominator DFs can be reached :-)...). What can you infer in good faith from such a mess ? So when we have messy data with measurement error, we should just give up? Doesn't sound very statistical! ;) I think the situation is worse than messy. If a client comes in with data that doesn't address the question they're interested in, I think they are better served to be told that, than to be given an answer that is not actually valid. They should also be told how to design a study that actually does address their question. You (and others) have mentioned Google Analytics as a possible way to address the quality of data; that's helpful. But analyzing bad data will just give bad conclusions. Duncan Murdoch The population of R users (which we would need to sample in order to obtain good data) is probably more elusive than a fish population in the ocean -- only partially visible at best, and with an unknown proportion invisible. At least in Fisheries research, there are long established capture techniques (from trawling to netting to electro-fishing to ... ) which can be deployed, for research purposes, in such a way as to potentially reach all members of a target population, with at least a moderately good approximation to random sampling. What have we for R? Come to think of it, electro-fishing, ... Suppose R were released with 2 types of cookie embedded in base R. Each type is randomly configured, when R is first run, to be Active or Inactive (probability of activation to be decided at the design stage ... ). Type 1, if active, on a certain date generates an event which brings it to the notice of R-Core (e.g. by clandestine email or by inducing a bug report). Type 2 acts similarly on a later date. If Type 2 acts, it carries with it information as to whether there was a Type 1 action along with whether, apparently, the Type 1 action succeeded. We then have, in effect, an analogue of the Mark-Recapture technique of population estimation (along with the usual questions about equal catchability and so forth). However, since this sort of thing (which I am not proposing seriously, only for the sake of argument) is undoubtedly unethical (and would do R's reputation no good if it came to light), I tentatively conclude that the population of R users is likely to remain as elusive as ever. Best wishes to all, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Mar-09 Time: 16:11:44 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- -- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: www.talgalili.com www.biostatistics.co.il [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Is this another discussion of what data might be collected and analyzed, and what could and could not be said if we only had such data? Has anyone but me produced any actual data? If so, I missed it. Hadly mentioned the 'fortunes' package. My earlier methodology, RSiteSearch('library(fortunes)'), produced 40 hits for 'fortunes', compared to 169 for 'lme4' and 2 for 'DierckxSpline'. With anything like this, it would be wise to approach the problem from many different perspectives, recognizing that the strengths of one approach can help improve our understanding of what other analyses say about the question at hand. Happy Sunday. Spencer Graves (Ted Harding) wrote: On 08-Mar-09 15:14:03, Duncan Murdoch wrote: On 08/03/2009 10:49 AM, hadley wickham wrote: More seriously : I don't think relative numbers of package downloads can be interpreted in any reasonable way, because reasons for package download have a very wide range from curiosity (what's this ?), fun (think fortunes...), to vital need tthink lme4 if/when a consensus on denominator DFs can be reached :-)...). What can you infer in good faith from such a mess ? So when we have messy data with measurement error, we should just give up? Doesn't sound very statistical! ;) I think the situation is worse than messy. If a client comes in with data that doesn't address the question they're interested in, I think they are better served to be told that, than to be given an answer that is not actually valid. They should also be told how to design a study that actually does address their question. You (and others) have mentioned Google Analytics as a possible way to address the quality of data; that's helpful. But analyzing bad data will just give bad conclusions. Duncan Murdoch The population of R users (which we would need to sample in order to obtain good data) is probably more elusive than a fish population in the ocean -- only partially visible at best, and with an unknown proportion invisible. At least in Fisheries research, there are long established capture techniques (from trawling to netting to electro-fishing to ... ) which can be deployed, for research purposes, in such a way as to potentially reach all members of a target population, with at least a moderately good approximation to random sampling. What have we for R? Come to think of it, electro-fishing, ... Suppose R were released with 2 types of cookie embedded in base R. Each type is randomly configured, when R is first run, to be Active or Inactive (probability of activation to be decided at the design stage ... ). Type 1, if active, on a certain date generates an event which brings it to the notice of R-Core (e.g. by clandestine email or by inducing a bug report). Type 2 acts similarly on a later date. If Type 2 acts, it carries with it information as to whether there was a Type 1 action along with whether, apparently, the Type 1 action succeeded. We then have, in effect, an analogue of the Mark-Recapture technique of population estimation (along with the usual questions about equal catchability and so forth). However, since this sort of thing (which I am not proposing seriously, only for the sake of argument) is undoubtedly unethical (and would do R's reputation no good if it came to light), I tentatively conclude that the population of R users is likely to remain as elusive as ever. Best wishes to all, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Mar-09 Time: 16:11:44 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
On 08/03/2009 12:08 PM, Barry Rowlingson wrote: I think the situation is worse than messy. If a client comes in with data that doesn't address the question they're interested in, I think they are better served to be told that, than to be given an answer that is not actually valid. They should also be told how to design a study that actually does address their question. You (and others) have mentioned Google Analytics as a possible way to address the quality of data; that's helpful. But analyzing bad data will just give bad conclusions. As long as we say 'package Foo is the most downloaded package on CRAN', and not 'package Foo is the most used package for R', we can leave it to the user to decide if the latter conclusion follows from the former. But we don't even have that data, since CRAN is distributed across lots of mirrors. Duncan Murdoch In the absence of actual usage data I would think it a good approximation. Not that I would risk my life on it. Pop music charts are now based on download counts, but I wouldn't believe they represent the songs that are listened to the most times. Nor would I go so far as to believe they represent the quality of the songs... Should R have a 'Would you like to tell CRAN every time you do library(foo) so we can do usage counts (no personal data is transmitted blah blah) ?'? I don't think so Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Dear Barry, As far as I understand, you're telling us that having a bit of data mining does not harm whatever the data. Your example of pop music charts might support your point (although my ears disagree ...) but I think it is bad policy to indulge in white-noise analysis without a well-reasoned motive to do so. It might give bad ideas to potential statistics patrons (think a bit about the sorry state of financial markets :-(). More generally, I tend to be extremely wary about over-interpretation of belly grumbles as the Voice of the Spirit ... which is a very powerful urge of many statisticians and statistician's clients. Data mining can be fine for exploratory musings, but a serious study needs a model, i. e. a set of ideas and a way to reality-stress them. As far as I can see (but I might be nearsighted), I see no model linking package download to package use(s). Data may or may not become available with more or less of an effort, but I can't see the point. Emmanuel Charpentier Le dimanche 08 mars 2009 à 16:08 +, Barry Rowlingson a écrit : I think the situation is worse than messy. If a client comes in with data that doesn't address the question they're interested in, I think they are better served to be told that, than to be given an answer that is not actually valid. They should also be told how to design a study that actually does address their question. You (and others) have mentioned Google Analytics as a possible way to address the quality of data; that's helpful. But analyzing bad data will just give bad conclusions. As long as we say 'package Foo is the most downloaded package on CRAN', and not 'package Foo is the most used package for R', we can leave it to the user to decide if the latter conclusion follows from the former. In the absence of actual usage data I would think it a good approximation. Not that I would risk my life on it. Pop music charts are now based on download counts, but I wouldn't believe they represent the songs that are listened to the most times. Nor would I go so far as to believe they represent the quality of the songs... Should R have a 'Would you like to tell CRAN every time you do library(foo) so we can do usage counts (no personal data is transmitted blah blah) ?'? I don't think so Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
On 8 March 2009 at 13:27, Duncan Murdoch wrote: | But we don't even have that data, since CRAN is distributed across lots | of mirrors. On 8 March 2009 at 19:01, Emmanuel Charpentier wrote: | As far as I can see (but I might be nearsighted), I see no model linking | package download to package use(s). Data may or may not become available Which is why Debian (and Ubuntu) use the _opt-in package_ popularity-contest that collects data on packages used and submits that to a host collecting the data. This drives the so-called 'popcon' statistics. Yes, and there are many ways in which one can criticise this data collection process. But I fail to see how __not having any data__ leads to more informed decisions. Once you have data, you have an option of using or discarding it. But if you have no data, you have no option. How is that better? Dirk -- Three out of two people have difficulties with fractions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Dirk Eddelbuettel wrote: On 8 March 2009 at 13:27, Duncan Murdoch wrote: | But we don't even have that data, since CRAN is distributed across lots | of mirrors. On 8 March 2009 at 19:01, Emmanuel Charpentier wrote: | As far as I can see (but I might be nearsighted), I see no model linking | package download to package use(s). Data may or may not become available Which is why Debian (and Ubuntu) use the _opt-in package_ popularity-contest that collects data on packages used and submits that to a host collecting the data. This drives the so-called 'popcon' statistics. Yes, and there are many ways in which one can criticise this data collection process. But I fail to see how __not having any data__ leads to more informed decisions. Once you have data, you have an option of using or discarding it. But if you have no data, you have no option. How is that better? I've also created a package named PopCon here: http://biostat.mc.vanderbilt.edu/twiki/pub/Main/JeffreyHorner/PopCon_0.1.tar.gz I provided it to the list many months ago and got no response on it's implementation or use. I encourage anyone to download it and understand how it can be used to implement a popularity contest for both packages and even functions and such. Maybe R can sponsor a Popularity Contest day where everyone is encouraged to download the package and push some data to r-project.org or even crantastic.org that notes what useRs currently have loaded on their search path... Best, Jeff -- http://biostat.mc.vanderbilt.edu/JeffreyHorner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
On 9/03/2009, at 4:14 AM, Duncan Murdoch wrote: ... analyzing bad data will just give bad conclusions. Fortune? cheers, Rolf Turner ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Rolf Turner wrote: On 9/03/2009, at 4:14 AM, Duncan Murdoch wrote: ... analyzing bad data will just give bad conclusions. Fortune? looking for fortunes? got one for you: A key reason that R is a good thing is because it is a language who/where is left as an (easy) exercise. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
On 9/03/2009, at 10:23 AM, John Fox wrote: Dear Rolf, Tukey put it nicely: The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. Inasmuch as there are no current fortunes from Tukey, I nominate this one. Indeed. That is one of my favourites. I second the nomination. cheers, Rolf ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Le dimanche 08 mars 2009 à 13:22 -0500, Dirk Eddelbuettel a écrit : On 8 March 2009 at 13:27, Duncan Murdoch wrote: | But we don't even have that data, since CRAN is distributed across lots | of mirrors. On 8 March 2009 at 19:01, Emmanuel Charpentier wrote: | As far as I can see (but I might be nearsighted), I see no model linking | package download to package use(s). Data may or may not become available Which is why Debian (and Ubuntu) use the _opt-in package_ popularity-contest that collects data on packages used and submits that to a host collecting the data. This drives the so-called 'popcon' statistics. Yes, and there are many ways in which one can criticise this data collection process. But I fail to see how __not having any data__ leads to more informed decisions. Once you have data, you have an option of using or discarding it. But if you have no data, you have no option. How is that better? I question 1) the usefulness of the effort necessary to get the data ; and 2) the very concept of data mining, which seems to be the rationale for this proposed effort. Furthermore (but this is seriously off-topic), I seriously despise the very idea of popularity in scientific debates... Everybody does it is *not* a valid argument. Nor Everyone knows Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
On 8 March 2009 at 23:45, Emmanuel Charpentier wrote: | Le dimanche 08 mars 2009 13:22 -0500, Dirk Eddelbuettel a crit : | Once you have data, you have an option of using or discarding it. But if you | have no data, you have no option. How is that better? | | I question 1) the usefulness of the effort necessary to get the data ; | and 2) the very concept of data mining, which seems to be the rationale | for this proposed effort. Re 1), Popcon is used for a few actual tasks as for example guiding in the knapsack problem of which of the 20,000+ packages should be placed on the first dvd, which on the second and so on simply to minimise disk swapping when installing. That's useful in my book, and solves a real problem. Also, and back to R, consider the relevant page for 'r-base' on Debian (and forgive them the ugly gnuplot chart) http://qa.debian.org/popcon.php?package=r-base This clearly shows a couple of things: - about 3% of all machines participating have r-base-core [ the main R package ] installed - 89% of those also install r-recommended (which pulls in VR, lattice, ...) - 63% of those have the all-in package r-base installs (which pulls in r-recommended and documentation package) - r-mathlib is not very well used - the debug package r-base-core-dbg is possible underused [ it allows you to run gdb by installing this package containing matching debug symbols without having to rebuild; these dbg are very useful but eat up lots of mirror space, whether they could or should be removed was a recent internal question Likewise, you can look at other CRAN package. Here is http://qa.debian.org/popcon.php?package=lme4 which is only about 0.3% of all machines. | Furthermore (but this is seriously off-topic), I seriously despise the | very idea of popularity in scientific debates... Everybody does it | is *not* a valid argument. Nor Everyone knows TTBOMK nobody suggested this. Dirk -- Three out of two people have difficulties with fractions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
2009/3/8 Emmanuel Charpentier charp...@bacbuc.dyndns.org: I question 1) the usefulness of the effort necessary to get the data ; and 2) the very concept of data mining, which seems to be the rationale for this proposed effort. Furthermore (but this is seriously off-topic), I seriously despise the very idea of popularity in scientific debates... Everybody does it is *not* a valid argument. Nor Everyone knows As long as we agree that pacakge downloads != popularity then we have useful data. Usefulness of the data? Let's think... Suppose we discover that spatstat is downloaded 100 times more than splancs is. Both packages compute K-functions of spatial data. Pretend there's an enhancement to K-function computation that could be implemented in spatstat and/or splancs. Why bother doing it in splancs? Currently the only usage stats we have are even worse measures such as number of mentions in R-help or number of bug reports. Or maybe citation counts, but who would make important decisions based on those? I'd love to go 'Hmmm how many people are using my package?' and get an exact answer. Given the impossibility of that information, I'd love to go 'Hmmm how many people downloaded my package?', a good approximation to which is not beyond the bounds of our technology. Web pages have had annoying 'this piece of software has been downloaded 443535 times' banners (often enclosed in blink tags) since 1996.Yes it would require some effort at each CRAN site, but maybe the CRAN mirror site maintainers might be interested in doing this. If they don't want to, then fine. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
I question 1) the usefulness of the effort necessary to get the data ; and 2) the very concept of data mining, which seems to be the rationale for this proposed effort. Furthermore (but this is seriously off-topic), I seriously despise the very idea of popularity in scientific debates... Everybody does it is *not* a valid argument. Nor Everyone knows As long as we agree that pacakge downloads != popularity then we have useful data. Usefulness of the data? Let's think... Suppose we discover that spatstat is downloaded 100 times more than splancs is. Both packages compute K-functions of spatial data. Pretend there's an enhancement to K-function computation that could be implemented in spatstat and/or splancs. Why bother doing it in splancs? Currently the only usage stats we have are even worse measures such as number of mentions in R-help or number of bug reports. Or maybe citation counts, but who would make important decisions based on those? I'd love to go 'Hmmm how many people are using my package?' and get an exact answer. Given the impossibility of that information, I'd love to go 'Hmmm how many people downloaded my package?', a good approximation to which is not beyond the bounds of our technology. Web pages have had annoying 'this piece of software has been downloaded 443535 times' banners (often enclosed in blink tags) since 1996.Yes it would require some effort at each CRAN site, but maybe the CRAN mirror site maintainers might be interested in doing this. If they don't want to, then fine. Here's a few either uses that I would put the data to: * In my tenure case, grant applications etc, I can say how many people have downloaded my packages. * If relatively few people are using a package, I'd know that I either need to promote the package more, or improve it so that it is useful to more people. * At a higher level, it would be interesting to see what types of packages are most frequently download. Modelling packages? Graphics packages? Packages for particular applications? ... Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
This function will show which other packages depend on a particular package: dep - function(pkg, AP = available.packages()) { +pkg - paste(\\b, pkg, \\b, sep = ) +cat(Depends:, rownames(AP)[grep(pkg, AP[, Depends])], \n) +cat(Suggests:, rownames(AP)[grep(pkg, AP[, Suggests])], \n) + } dep(zoo) Depends: AER BootPR FinTS PerformanceAnalytics RBloomberg StreamMetabolism TSfame TShistQuote VhayuR dyn dynlm fda fxregime lmtest meboot party quantmod sandwich sde strucchange tripEstimation tseries xts Suggests: TSMySQL TSPostgreSQL TSSQLite TSdbi TSodbc UsingR Zelig gsubfn playwith pscl tframePlus On Sat, Mar 7, 2009 at 2:57 PM, Jeroen Ooms j.c.l.o...@uu.nl wrote: I would like to get some idea of which R-packages are popular, and what R is used for in general. Are there any statistics available on which R packages are downloaded often, or is there something like a package-survey? Something similar to http://popcon.debian.org/ maybe? Any tips are welcome! - Jeroen Ooms * Dept. of Methodology and Statistics * Utrecht University Visit http://www.jeroenooms.com www.jeroenooms.com to explore some of my current projects. -- View this message in context: http://www.nabble.com/popular-R-packages-tp22391260p22391260.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
When the question arises How many R-users there are?, the consensus seems to be that there is no valid method to address the question. The thread R-business case from 2004 can be found here: https://stat.ethz.ch/pipermail/r-help/2004-March/047606.html I did not see any material revision to that conclusion during the recent discussion of the New York Times article on the r-challenge to SAS. Gmane tracks the number of r-help activity (I realize not what you asked for): http://www.gmane.org/info.php?group=gmane.comp.lang.r.general The distribution of r-packages is, well ... distributed: http://cran.r-project.org/mirrors.html At least one of the participants in the 2004 thread suggested that it would be a good thing to track the numbers of downloads by package. I have not heard of any such system being installed in the mirror software and I see nothing that suggests data gathering in the CRAN Mirror How-to: http://cran.r-project.org/mirror-howto.html On the other hand I am not part of R-core, so you must await more authoritative opinion since a 5 year-old thread and amateur speculation is not much of a leg to stand on. There are lexicographic packages for R. One approach to a de novo analysis would be to do some sort of natural language analysis of the r-help archives counting up either package names with non-English names or close proximity of the words library or package to package names that overlap the 30,000 common English words. That would have the danger of inflating counts of the packages with the least adequate documentation or a paucity of good worked examples, but there are many readers of this list who suspect that new users don't look at the documentation, so who knows? -- David Winsemius On Mar 7, 2009, at 2:57 PM, Jeroen Ooms wrote: I would like to get some idea of which R-packages are popular, and what R is used for in general. Are there any statistics available on which R packages are downloaded often, or is there something like a package-survey? Something similar to http://popcon.debian.org/ maybe? Any tips are welcome! - Jeroen Ooms * Dept. of Methodology and Statistics * Utrecht University Visit http://www.jeroenooms.com www.jeroenooms.com to explore some of my current projects. -- View this message in context: http://www.nabble.com/popular-R-packages-tp22391260p22391260.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
I don't think At least one of the participants in the 2004 thread suggested that it would be a good thing to track the numbers of downloads by package. is reasonable because I download R packages for 2 home computers (laptop desktop) and 2 at work (1 Linux 1 Mac). There must be many such cases… Tom David Winsemius wrote: When the question arises How many R-users there are?, the consensus seems to be that there is no valid method to address the question. The thread R-business case from 2004 can be found here: https://stat.ethz.ch/pipermail/r-help/2004-March/047606.html I did not see any material revision to that conclusion during the recent discussion of the New York Times article on the r-challenge to SAS. Gmane tracks the number of r-help activity (I realize not what you asked for): http://www.gmane.org/info.php?group=gmane.comp.lang.r.general The distribution of r-packages is, well ... distributed: http://cran.r-project.org/mirrors.html At least one of the participants in the 2004 thread suggested that it would be a good thing to track the numbers of downloads by package. I have not heard of any such system being installed in the mirror software and I see nothing that suggests data gathering in the CRAN Mirror How-to: http://cran.r-project.org/mirror-howto.html On the other hand I am not part of R-core, so you must await more authoritative opinion since a 5 year-old thread and amateur speculation is not much of a leg to stand on. There are lexicographic packages for R. One approach to a de novo analysis would be to do some sort of natural language analysis of the r-help archives counting up either package names with non-English names or close proximity of the words library or package to package names that overlap the 30,000 common English words. That would have the danger of inflating counts of the packages with the least adequate documentation or a paucity of good worked examples, but there are many readers of this list who suspect that new users don't look at the documentation, so who knows? -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: thomas.ad...@noaa.gov VOICE: 937-383-0528 FAX:937-383-0033 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
I agree with Thomas, over the years I have installed R on at least 5 computers. BTW: does any one knows how the website statistics of r-project are being analyzed? Since I can't see any google analytics or other tracking code in the main website, I am guessing someone might be running some log-file analyzer - but I'd rather hear that then assume. On Sun, Mar 8, 2009 at 12:45 AM, Thomas Adams thomas.ad...@noaa.gov wrote: I don't think At least one of the participants in the 2004 thread suggested that it would be a good thing to track the numbers of downloads by package. is reasonable because I download R packages for 2 home computers (laptop desktop) and 2 at work (1 Linux 1 Mac). There must be many such cases Tom David Winsemius wrote: When the question arises How many R-users there are?, the consensus seems to be that there is no valid method to address the question. The thread R-business case from 2004 can be found here: https://stat.ethz.ch/pipermail/r-help/2004-March/047606.html I did not see any material revision to that conclusion during the recent discussion of the New York Times article on the r-challenge to SAS. Gmane tracks the number of r-help activity (I realize not what you asked for): http://www.gmane.org/info.php?group=gmane.comp.lang.r.general The distribution of r-packages is, well ... distributed: http://cran.r-project.org/mirrors.html At least one of the participants in the 2004 thread suggested that it would be a good thing to track the numbers of downloads by package. I have not heard of any such system being installed in the mirror software and I see nothing that suggests data gathering in the CRAN Mirror How-to: http://cran.r-project.org/mirror-howto.html On the other hand I am not part of R-core, so you must await more authoritative opinion since a 5 year-old thread and amateur speculation is not much of a leg to stand on. There are lexicographic packages for R. One approach to a de novo analysis would be to do some sort of natural language analysis of the r-help archives counting up either package names with non-English names or close proximity of the words library or package to package names that overlap the 30,000 common English words. That would have the danger of inflating counts of the packages with the least adequate documentation or a paucity of good worked examples, but there are many readers of this list who suspect that new users don't look at the documentation, so who knows? -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: thomas.ad...@noaa.gov VOICE: 937-383-0528 FAX:937-383-0033 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- -- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: www.talgalili.com www.biostatistics.co.il [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Quite so. It certainly is the case that Dirk Eddelbuettel suggested would be very desirable and I think Dirk's track record speaks for itself. I never said (and I am sure Dirk never intended) that one could take the raw numbers as a basis for blandly asserting that copies of ttt package are currently installed. When I update packages, the automated process takes hold and I go for a cup of coffee. I only have at the moment two computers with R installed and have not updated any binary packages on Windoze in over a year. Nonetheless, I do think the relative numbers of package downloads might be interpretable, or at the very least, the basis for discussions over beer. -- David Winsemius On Mar 7, 2009, at 5:45 PM, Thomas Adams wrote: I don't think At least one of the participants in the 2004 thread suggested that it would be a good thing to track the numbers of downloads by package. is reasonable because I download R packages for 2 home computers (laptop desktop) and 2 at work (1 Linux 1 Mac). There must be many such cases… Tom David Winsemius wrote: When the question arises How many R-users there are?, the consensus seems to be that there is no valid method to address the question. The thread R-business case from 2004 can be found here: https://stat.ethz.ch/pipermail/r-help/2004-March/047606.html I did not see any material revision to that conclusion during the recent discussion of the New York Times article on the r-challenge to SAS. Gmane tracks the number of r-help activity (I realize not what you asked for): http://www.gmane.org/info.php?group=gmane.comp.lang.r.general The distribution of r-packages is, well ... distributed: http://cran.r-project.org/mirrors.html At least one of the participants in the 2004 thread suggested that it would be a good thing to track the numbers of downloads by package. I have not heard of any such system being installed in the mirror software and I see nothing that suggests data gathering in the CRAN Mirror How-to: http://cran.r-project.org/mirror-howto.html On the other hand I am not part of R-core, so you must await more authoritative opinion since a 5 year-old thread and amateur speculation is not much of a leg to stand on. There are lexicographic packages for R. One approach to a de novo analysis would be to do some sort of natural language analysis of the r-help archives counting up either package names with non- English names or close proximity of the words library or package to package names that overlap the 30,000 common English words. That would have the danger of inflating counts of the packages with the least adequate documentation or a paucity of good worked examples, but there are many readers of this list who suspect that new users don't look at the documentation, so who knows? -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: thomas.ad...@noaa.gov VOICE: 937-383-0528 FAX:937-383-0033 David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
I agree with Thomas, over the years I have installed R on at least 5 computers. I don't see why per-marchine statistics would not be useful. When you installed a package on five machines, you probably use it a lot, and it is more important to you than packages that you only installed once. Furthermore I don't think the distribution of packages has to be problematic. I guess downloads are only slightly related to the specific mirror, so download statistics from one of the popular mirror's would do for me. Of course these statistics are never perfect, but they could be informative... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
i have kept r installed on more than ten computers during the past few years, some of them running win + more than one linux distro, all of them having r, most often installed from a separate download. i know of many cases where students download r for the purpose of a course in statistics -- often an introductory course for students who otherwise have little to do with stats. some of them do it more than once during the semester, and many of them never use r again. taking into account that basic statistics courses are taught to most university students and that r is surely the most popular free statistical computing environment, download-based usage estimates may be a bit optimistic, unless 'usage' is taken to include 'learn-pass-forget'. vQ Tal Galili wrote: I agree with Thomas, over the years I have installed R on at least 5 computers. BTW: does any one knows how the website statistics of r-project are being analyzed? Since I can't see any google analytics or other tracking code in the main website, I am guessing someone might be running some log-file analyzer - but I'd rather hear that then assume. On Sun, Mar 8, 2009 at 12:45 AM, Thomas Adams thomas.ad...@noaa.gov wrote: I don't think At least one of the participants in the 2004 thread suggested that it would be a good thing to track the numbers of downloads by package. is reasonable because I download R packages for 2 home computers (laptop desktop) and 2 at work (1 Linux 1 Mac). There must be many such cases… Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
I just did RSiteSearch(library(xxx)) with xxx = the names of 6 packages familiar to me, with the following numbers of hits: hits package 169 lme4 165 nlme 6 fda 4 maps 2 FinTS 2 DierckxSpline Software could be written to (1) extract the names of current packages from CRAN then (2) perform queries similar to this on all such packages and summarize the results. I don't have the time now to write code for this, but I've written similar code before for step (1); it can be found in scripts/TsayFiles.R in the FinTS package on CRAN. For step (2), Sundar Dorai-Raj wrote code that is is included in the preliminary RSiteSearch package available from R-Forge via install.'packages(RSiteSearch,repos=http://r-forge.r-project.org;)'. Code to do this could probably be written (a) in a matter of seconds by many of those in the R Core team or (b) in a matter of hours by virtually any reader of this list using the examples I just cited. And it could provide numbers without a need to convince others to keep download statistics and make them available later. Hope this helps. Spencer Graves Wacek Kusnierczyk wrote: i have kept r installed on more than ten computers during the past few years, some of them running win + more than one linux distro, all of them having r, most often installed from a separate download. i know of many cases where students download r for the purpose of a course in statistics -- often an introductory course for students who otherwise have little to do with stats. some of them do it more than once during the semester, and many of them never use r again. taking into account that basic statistics courses are taught to most university students and that r is surely the most popular free statistical computing environment, download-based usage estimates may be a bit optimistic, unless 'usage' is taken to include 'learn-pass-forget'. vQ Tal Galili wrote: I agree with Thomas, over the years I have installed R on at least 5 computers. BTW: does any one knows how the website statistics of r-project are being analyzed? Since I can't see any google analytics or other tracking code in the main website, I am guessing someone might be running some log-file analyzer - but I'd rather hear that then assume. On Sun, Mar 8, 2009 at 12:45 AM, Thomas Adams thomas.ad...@noaa.gov wrote: I don't think At least one of the participants in the 2004 thread suggested that it would be a good thing to track the numbers of downloads by package. is reasonable because I download R packages for 2 home computers (laptop desktop) and 2 at work (1 Linux 1 Mac). There must be many such cases… Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Hi Spencer, XLSolutions is currently analyzing r-help archived questions to rank packages for the upcoming R-PLUS 3.3 Professional version and we will be happy to share the outcome with interested parties. Please email d...@xlsolutions-corp.com Regards - Sue Turner Senior Account Manager XLSolutions Corporation North American Division 1700 7th Ave Suite 2100 Seattle, WA 98101 Phone: 206-686-1578 Email: s...@xlsolutions-corp.com web: www.xlsolutions-corp.com --- On Sat, 3/7/09, Spencer Graves spencer.gra...@prodsyse.com wrote: From: Spencer Graves spencer.gra...@prodsyse.com Subject: Re: [R] popular R packages To: Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no Cc: r-help@r-project.org, Jeroen Ooms j.c.l.o...@uu.nl, Thomas Adams thomas.ad...@noaa.gov Date: Saturday, March 7, 2009, 5:22 PM I just did RSiteSearch(library(xxx)) with xxx = the names of 6 packages familiar to me, with the following numbers of hits: hits package 169 lme4 165 nlme 6 fda 4 maps 2 FinTS 2 DierckxSpline Software could be written to (1) extract the names of current packages from CRAN then (2) perform queries similar to this on all such packages and summarize the results. I don't have the time now to write code for this, but I've written similar code before for step (1); it can be found in scripts/TsayFiles.R in the FinTS package on CRAN. For step (2), Sundar Dorai-Raj wrote code that is is included in the preliminary RSiteSearch package available from R-Forge via install.'packages(RSiteSearch,repos=http://r-forge.r-project.org;)'. Code to do this could probably be written (a) in a matter of seconds by many of those in the R Core team or (b) in a matter of hours by virtually any reader of this list using the examples I just cited. And it could provide numbers without a need to convince others to keep download statistics and make them available later. Hope this helps. Spencer Graves Wacek Kusnierczyk wrote: i have kept r installed on more than ten computers during the past few years, some of them running win + more than one linux distro, all of them having r, most often installed from a separate download. i know of many cases where students download r for the purpose of a course in statistics -- often an introductory course for students who otherwise have little to do with stats. some of them do it more than once during the semester, and many of them never use r again. taking into account that basic statistics courses are taught to most university students and that r is surely the most popular free statistical computing environment, download-based usage estimates may be a bit optimistic, unless 'usage' is taken to include 'learn-pass-forget'. vQ Tal Galili wrote: I agree with Thomas, over the years I have installed R on at least 5 computers. BTW: does any one knows how the website statistics of r-project are being analyzed? Since I can't see any google analytics or other tracking code in the main website, I am guessing someone might be running some log-file analyzer - but I'd rather hear that then assume. On Sun, Mar 8, 2009 at 12:45 AM, Thomas Adams thomas.ad...@noaa.gov wrote: I don't think At least one of the participants in the 2004 thread suggested that it would be a good thing to track the numbers of downloads by package. is reasonable because I download R packages for 2 home computers (laptop desktop) and 2 at work (1 Linux 1 Mac). There must be many such cases… Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Hi all, I'm kind of amazed at the answers suggested for the relatively simple question, How many times has each R package been downloaded?. Some have veered off in another direction, like working out how many packages a package depends upon, or whether someone downloads more than one copy. The response about ranking packages by the number of questions asked about them may be interesting, but may not relate very well at all to popularity in terms of downloads. If people were constantly asking questions about one of the packages I maintain, I would be working on the help pages to improve them, not basking in the inferred glory of having a popular package. There is one way that the download count would be very useful for package maintainers, if no one else. Take as an example the package concord, that has not been maintained for a year or more since the content was merged into the irr package. If I knew that no one downloaded concord any more, I would surely petition those in charge of the archive to remove it or at least transfer it to the package museum. No point in having ever more packages on CRAN if they are never downloaded. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.