Re: [R] Popularity of R, SAS, SPSS, Stata, Statistica, S-PLUS updated

2011-03-25 Thread Allan Engelhardt
Not R, but just to get the data (format is month year,week,count) to 
compare with your students' output:


perl -MLWP::UserAgent -e 'my $ua = LWP::UserAgent-new(); my $l = 
$ua-request(HTTP::Request-new(GET = 
qq{http://www.listserv.uga.edu/archives/sas-l.html}))-content(); while 
( $l =~ m{href=\(/cgi-bin/wa\?A1=.*?)\(.*?), \s week \s 
(\d)/a}igxms ) { my ($m,$w) = ($2,$3); my $u = 
qq{http://www.listserv.uga.edu$1}; my $i = 
$ua-request(HTTP::Request-new(GET = qq{$u}))-content(); my $c = () = 
$i =~ m{href=/cgi-bin/wa\?A2}igxms; print qq{$m,$w,$c\n}; sleep(1); }'

March 2011,4,122
March 2011,3,255
March 2011,2,312
March 2011,1,318
February 2011,4,243
February 2011,3,230
February 2011,2,289
February 2011,1,354
January 2011,5,93
January 2011,4,345
January 2011,3,329
January 2011,2,385
January 2011,1,297
December 2010,5,110
December 2010,4,205
December 2010,3,290
December 2010,2,359
December 2010,1,311
November 2010,5,91
November 2010,4,246
November 2010,3,227
November 2010,2,262
November 2010,1,228
October 2010,5,51
October 2010,4,242
October 2010,3,212
October 2010,2,250
October 2010,1,384
September 2010,5,101
September 2010,4,345
September 2010,3,214
September 2010,2,287
September 2010,1,133
August 2010,5,90
August 2010,4,314
August 2010,3,242
August 2010,2,202
August 2010,1,234
July 2010,5,95
July 2010,4,231
July 2010,3,354
July 2010,2,306
July 2010,1,176
June 2010,5,98
June 2010,4,244
June 2010,3,147
June 2010,2,229
June 2010,1,216
May 2010,5,25
May 2010,4,237
May 2010,3,328
May 2010,2,261
May 2010,1,341
April 2010,5,167
April 2010,4,291
April 2010,3,324
April 2010,2,273
April 2010,1,288
March 2010,5,217
March 2010,4,351
March 2010,3,437
March 2010,2,524
March 2010,1,456
February 2010,4,473
February 2010,3,348
February 2010,2,379
February 2010,1,347
January 2010,5,108
January 2010,4,482
January 2010,3,387
January 2010,2,424
January 2010,1,398
December 2009,5,88
December 2009,4,252
December 2009,3,443
December 2009,2,373
December 2009,1,514
November 2009,5,158
November 2009,4,318
November 2009,3,461
November 2009,2,383
November 2009,1,494
October 2009,5,186
October 2009,4,515
October 2009,3,532
October 2009,2,547
October 2009,1,410
September 2009,5,209
September 2009,4,457
September 2009,3,435
September 2009,2,371
September 2009,1,355
August 2009,5,87
August 2009,4,528
August 2009,3,436
August 2009,2,526
August 2009,1,412
July 2009,5,270
July 2009,4,423
July 2009,3,449
July 2009,2,380
July 2009,1,346
June 2009,5,156
June 2009,4,390
June 2009,3,510
June 2009,2,473
June 2009,1,450
May 2009,5,107
May 2009,4,476
May 2009,3,487
May 2009,2,583
May 2009,1,494
April 2009,5,219
April 2009,4,592
April 2009,3,574
April 2009,2,516
April 2009,1,547
March 2009,5,284
March 2009,4,571
March 2009,3,553
March 2009,2,584
March 2009,1,691
February 2009,4,646
February 2009,3,508
February 2009,2,688
February 2009,1,489
[...]

Hope this helps a little.

Allan
(Who thinks it is very sad that he can remember that $c=()=$a=~$b 
construct...)



On 22/03/11 23:26, Bob Muenchen wrote:

 On 3/22/2011 5:15 PM, Hadley Wickham wrote:
I don't doubt that R may be the most popular in terms of 
discussion group

traffic, but you should be aware that the traffic for SAS comprises two
separate lists that used to be mirrored, but are no longer linked
Usenet --  news://comp.soft-sys.sas  (what you counted)
listserve -- SAS-L http://www.listserv.uga.edu/archives/sas-l.html

R programming challenge: create a script that parses those html pages
to compute the total number of messages per week!  (Maybe I'll use
this in class)

Hadley



That would be nice! I'd love to have all the sources, which includes 
various company forums. Sounds like students could be kept busy for 
quite a few projects. If any pull it off, please send it my way!


Cheers,
Bob



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Popularity of R, SAS, SPSS, Stata, Statistica, S-PLUS updated

2011-03-25 Thread Mike Marchywka












 Date: Fri, 25 Mar 2011 09:40:39 +
 From: all...@cybaea.com
 To: muenchen@gmail.com
 CC: frien...@yorku.ca; had...@rice.edu; r-h...@stat.math.ethz.ch
 Subject: Re: [R] Popularity of R, SAS, SPSS, Stata, Statistica, S-PLUS updated

 Not R, but just to get the data (format is month year,week,count) to
 compare with your students' output:

[...]
 Hope this helps a little.

 Allan
 (Who thinks it is very sad that he can remember that $c=()=$a=~$b
 construct...)


 On 22/03/11 23:26, Bob Muenchen wrote:
  On 3/22/2011 5:15 PM, Hadley Wickham wrote:
  I don't doubt that R may be the most popular in terms of
  discussion group
  traffic, but you should be aware that the traffic for SAS comprises two
  separate lists that used to be mirrored, but are no longer linked


I think this discussion highlights the need for more structured document
formats on the internet so you can separete out mirrored or copied
text( see for example all the ad supported sites that simply copy wikipedia 
content). 
I've sometimes done things like this with pubmed
citations but they provide something called eutils api


http://eutils.ncbi.nlm.nih.gov/

so you don't need to scrape html or other human readable content.
It is then easy to plot paper or author count as function of year
for some key word criteria and it can be interesting to see how
fads come and go. 
You see a lot of questions here on  how do I use R to scrape html
and it is a big problem in doing many kinds of analysis. Yahoo
is doing a nice service by making downloads of histoical data available
but this is not common, even places like census often only offer
Excel format downloads ( while this is fine for R users, csv files would be just
as good and reach a wider audience ) 
and some places do require you to make complicated POST or other request types. 




  Usenet -- news://comp.soft-sys.sas (what you counted)
  listserve -- SAS-L http://www.listserv.uga.edu/archives/sas-l.html
  R programming challenge: create a script that parses those html pages
  to compute the total number of messages per week! (Maybe I'll use
  this in class)
 
  Hadley
 
 
  That would be nice! I'd love to have all the sources, which includes
  various company forums. Sounds like students could be kept busy for
[[elided Hotmail spam]]
 
  Cheers,
  Bob
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Popularity of R, SAS, SPSS, Stata, Statistica, S-PLUS updated

2011-03-22 Thread Muenchen, Robert A (Bob)
Greetings,

I've just put out the latest version of The Popularity of Data Analysis 
Software at http://r4stats.com/popularity. This update includes complete data 
for 2010, the addition of number of blogs for each software, more coverage of 
Statistica, and, where possible, measures regarding the implementations of the 
SAS Language: Carolina and the World Programming System (WPS).

Cheers,
Bob

=
  Bob Muenchen (pronounced Min'-chen), Manager  
  Research Computing Support
  Voice: (865) 974-5230  
  Email: muenc...@utk.edu
  Web:   http://oit.utk.edu/research, 
  News:  http://oit.utk.edu/research/news.php


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Popularity of R, SAS, SPSS, Stata, Statistica, S-PLUS updated

2011-03-22 Thread Michael Friendly

On 3/22/2011 6:37 AM, Muenchen, Robert A (Bob) wrote:

Greetings,

I've just put out the latest version of The Popularity of Data Analysis 
Software at http://r4stats.com/popularity. This update includes complete data for 
2010, the addition of number of blogs for each software, more coverage of Statistica, 
and, where possible, measures regarding the implementations of the SAS Language: Carolina 
and the World Programming System (WPS).



I don't doubt that R may be the most popular in terms of discussion 
group traffic, but you should be aware that the traffic for SAS 
comprises two

separate lists that used to be mirrored, but are no longer linked
Usenet --  news://comp.soft-sys.sas  (what you counted)
listserve -- SAS-L http://www.listserv.uga.edu/archives/sas-l.html

They were split sometime around the point where your graph shows a 
downturn, and now comp.soft-sys.sas is considered the dark-side of 
SAS-L.  You know of the existence of SAS-L, but you haven't counted

that traffic, AFAICS.

-Michael

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Popularity of R, SAS, SPSS, Stata, Statistica, S-PLUS updated

2011-03-22 Thread Hadley Wickham
 I don't doubt that R may be the most popular in terms of discussion group
 traffic, but you should be aware that the traffic for SAS comprises two
 separate lists that used to be mirrored, but are no longer linked
 Usenet --  news://comp.soft-sys.sas  (what you counted)
 listserve -- SAS-L http://www.listserv.uga.edu/archives/sas-l.html

R programming challenge: create a script that parses those html pages
to compute the total number of messages per week!  (Maybe I'll use
this in class)

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Popularity of R, SAS, SPSS, Stata, Statistica, S-PLUS updated

2011-03-22 Thread Bob Muenchen

 On 3/22/2011 5:15 PM, Hadley Wickham wrote:

I don't doubt that R may be the most popular in terms of discussion group
traffic, but you should be aware that the traffic for SAS comprises two
separate lists that used to be mirrored, but are no longer linked
Usenet --  news://comp.soft-sys.sas  (what you counted)
listserve -- SAS-L http://www.listserv.uga.edu/archives/sas-l.html

R programming challenge: create a script that parses those html pages
to compute the total number of messages per week!  (Maybe I'll use
this in class)

Hadley



That would be nice! I'd love to have all the sources, which includes 
various company forums. Sounds like students could be kept busy for 
quite a few projects. If any pull it off, please send it my way!


Cheers,
Bob

--
=
  Bob Muenchen (pronounced Min'-chen), Manager
  Research Computing Support
  University of Tennessee
  Voice: (865) 974-5230
  Email: muenchen@gmail.com
  Web:   http://r4stats.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.