While reading all of these comments I started thinking about how you might collect the data. As pointed out, in many ways, there are immediate problems with the validity of what's collected. That is what is the concept that we are trying to measure.
So, that was always going to make it hard, but assuming that you got past that point and you could ask a question who would you ask it of. Here in Western Australia there's a small and mostly unconnected fraternity of people who use R. In a place where even strangers seem somehow kind of familiar (you've been sitting on the same bus for the last 20 years and still don't know who they are), so my gut feeling is that it's not a major penetration. Trying to do a random sample would be problematic to say the least. If you started to try and stratify the sample where would you start. If you go to places where you know R is being used you have problems with bias. That might rule out universities as a stratified sample because we're not really clear about what we are asking. So what other sources are there. Well there's been some comment about the mailing list and all the problems that might be involved in that and counting downloads. Guess that's out. Then I thought about the vibrancy of the R email list. For a topic such as a statistical programming language it's unusual to see such activity (at least in my experience.) So my mind turned to trying to measure some smaller but representative process. In much the same way that criminologists use the homicide rate as the pointed end of the violence spectrum (let's not go there, find another list to have that debate) So I started thinking about what questions you might ask the typical SPSS or SAS or ... user that would help. That's when it struck me. Lot's of us, R Users that is, also have the luck (some might say misfortune) to also work with other languages. Why do we pick R? The one thing that I have noticed that separates R Users and S Plus users (and probably some of the other products that I don't use and know about) with the mainstream use of SPSS and SAS (at least here in WA) is that the mainstream users are not pushing the envelope. So the question is not "How many users are there ?", it's "What are people doing with it ?" I put the formal citation in each publication I produce and while they are mainly in-house productions (a problem with applied research) maybe they'll eventually start to get into the citation charts. I hear some academics love browsing these. (That is aimed at no-one who's on this list) Tom Mulholland Tom Mulholland Associates Footnote: When 5 out of 6 paragraphs start with the word "So" it's time to get a life. So I'm off to get a life. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Kevin S. Van Horn Sent: Tuesday, April 20, 2004 2:13 AM To: [EMAIL PROTECTED] Subject: [R] Size of R user base I have been trying to determine the size of the R user base, and was asked to share my findings with this mailing list. Although I still don't have any definite estimate of this number, I do have some interesting and indicative information: 1. It appears that there are about 100,000 S-PLUS users. Rationale: According to Insightful's 2002 Annual Report, over 100,000 people use Insightful software; since license revenues from S-PLUS and add-on modules accounted for nearly all of their license revenues in 2002, and their other products are much more costly than S-PLUS, it seems that the great majority of users of Insightful software are S-PLUS users. Conclusion: S-PLUS costs $3500 (Windows) or $4500 (Linux/Unix) for an individual copy; R is free. This suggests that there may be more R users than S-PLUS users, which suggests > 100,000 R users. Does anyone has any other information that would give some notion as to the RELATIVE numbers of R and S-PLUS users? 2. At least one R book has achieved sales of just over 5,000 copies. (I could not find sales figures for other R books, as it appears that publishers are closed-mouthed about such figures. And no, I can't reveal which particular book this was, so don't ask.) Conclusion: Very few books sell to more than 12% of the population of potential buyers, and most books have a far lower penetration -- 1% or less is not uncommon. A 12% penetration for the book in question implies 42,000 R users; a more reasonable 5% penetration implies 100,000 users. A low 1% penetration implies 500,000 users. 3. There are a total of 3225 unique subscribers to the three R mailing lists. ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html --- Incoming mail is certified Virus Free. --- ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html