Re: [R] Reasons to Use R (no memory limitations :-))

2007-04-15 Thread charles loboz
This thread discussed R memory limitations, compared handling with S and SAS. 
Since I routinely use R to process multi-gigababyte sets on computers with 
sometimes 256mb of memory - here are some comments on that. 

Most memory limitations vanish if R is used with any relational database. [My 
personal preference is SQLite (RSQLite packaga)  because of speed and no-admin 
(used in embedded mode)]. The comments below apply to any relational database, 
unless otherwise stated.

Most people appear to think about database tables as dataframes - that is to 
store and load the _whole_ dataframe in one go - probably because appropriate 
function names are suggesting this approach. Also, it is a natural mapping. 
This is convenient if the data set can fit fully in memory - but limits the 
size of the data set the same way as without using the database.

However, using SQL language directly one can expand the size of the data set R 
is capable of operating on - we just have to stop treating database tables as 
'atomic'. For example, assume we have a set of several million patients and 
want to analyze some specific subset - the following SQL statement 
  SELECT * FROM patients WHERE gender='M" AND AGE BETWEEN 30 AND 35
will result in bringing to R much smaller dataframe than selection of the whole 
table. [Also, such subset selection may take _less_time_ then selecting from 
the total dataframe - assuming the table is properly indexed]. 
Also, direct SQL statements can be used to pre-compute some characteristics 
internally in the database and bring only the summaries to R:
 SELECT AVG(age) FROM patients GROUP BY gender
will bring a data frame of two rows only.

Admittedly, if the data set is really large and we cannot operate on its 
subsets, the above does not help. Though I do not believe that this would the 
the majority of the situations. 

Naturally, going for a 64bit system with enough memory will solve some problems 
without using the database -  but not all of them. Relational databases can be 
very efficient at selecting subsets as they do not have to do linear scans 
[when the tables are indexed] - while R has to do a linear scan every time(??? 
I did not look up the source code of R - please correct me if I am wrong). Two 
other areas where a database is better than R, especially for large data sets:
 - verification of data correctness for individual points [a frequent problem 
with large data sets]
 - combining data from several different types of tables into one dataframe

In summary: using SQL from R allows to process extremely large data sets in a 
limited memory, sometimes even faster then if we had a large memory and kept 
our data set fully in it. Relational database perfectly complements R 
capabilities.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-13 Thread Jim Lemon
(Ted Harding) wrote:
> On 12-Apr-07 10:14:21, Jim Lemon wrote:
> 
>>Charilaos Skiadas wrote:
>>
>>>A new fortune candidate perhaps?
>>>
>>>On Apr 10, 2007, at 6:27 PM, Greg Snow wrote:
>>>
>>>
>>>
Remember, everything is better than everything else given the
right comparison.

>>
>>Only if we remove the grammatical blip that turns it into an infinite 
>>regress, i.e.
>>
>>"Remember, anything is better than everything else given the right 
>>comparison"
>>
>>Jim
> 
> 
> Oh dear, I would be disappointed with that, Jim.
> 
> I was rather enjoying the vision of a "topological sort tree"
> (ordered by "better according to some comparison") in which every
> single thing had everything else hanging off it, and in turn was
> hanging off everything else!
> 
Sorry, Ted, I think Benoit Mandelbrot beat you to it.

Jim

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-12 Thread Ted Harding
On 12-Apr-07 10:14:21, Jim Lemon wrote:
> Charilaos Skiadas wrote:
>> A new fortune candidate perhaps?
>> 
>> On Apr 10, 2007, at 6:27 PM, Greg Snow wrote:
>> 
>> 
>>>Remember, everything is better than everything else given the
>>>right comparison.
>>>
> Only if we remove the grammatical blip that turns it into an infinite 
> regress, i.e.
> 
> "Remember, anything is better than everything else given the right 
> comparison"
> 
> Jim

Oh dear, I would be disappointed with that, Jim.

I was rather enjoying the vision of a "topological sort tree"
(ordered by "better according to some comparison") in which every
single thing had everything else hanging off it, and in turn was
hanging off everything else!

Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 12-Apr-07   Time: 11:45:05
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-12 Thread Joel J. Adamson
Lucke, Joseph F writes:
 > A re-interpretation of Zorn's lemma? 
 > 
 > -Original Message-
 > From: [EMAIL PROTECTED]
 > [mailto:[EMAIL PROTECTED] On Behalf Of Jim Lemon
 > Sent: Thursday, April 12, 2007 5:14 AM
 > To: [EMAIL PROTECTED]
 > Subject: Re: [R] Reasons to Use R
 > 
 > Charilaos Skiadas wrote:
 > > A new fortune candidate perhaps?
 > > 
 > > On Apr 10, 2007, at 6:27 PM, Greg Snow wrote:
 > > 
 > > 
 > >>Remember, everything is better than everything else given the right 
 > >>comparison.
 > >>
 > Only if we remove the grammatical blip that turns it into an infinite
 > regress, i.e.
 > 
 > "Remember, anything is better than everything else given the right
 > comparison"
 > 
 > Jim

Anything is potentially better than any other thing given the right
comparison.

Joel
-- 
Joel J. Adamson
Biostatistician
Pediatric Psychopharmacology Research Unit
Massachusetts General Hospital
Boston, MA  02114
(617) 643-1432
(303) 880-3109





The information transmitted in this electronic communication...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-12 Thread Lucke, Joseph F
A re-interpretation of Zorn's lemma? 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jim Lemon
Sent: Thursday, April 12, 2007 5:14 AM
To: [EMAIL PROTECTED]
Subject: Re: [R] Reasons to Use R

Charilaos Skiadas wrote:
> A new fortune candidate perhaps?
> 
> On Apr 10, 2007, at 6:27 PM, Greg Snow wrote:
> 
> 
>>Remember, everything is better than everything else given the right 
>>comparison.
>>
Only if we remove the grammatical blip that turns it into an infinite
regress, i.e.

"Remember, anything is better than everything else given the right
comparison"

Jim

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-12 Thread Joel J. Adamson
Douglas Bates writes:
 > One
 > can do data analysis by using the computer as a blunt instrument with
 > which to bludgeon the problem to death but one can't do elegant data
 > analysis like that.

One nice thing about a "blunt instrument" like Stata is the ability to
hold an entire dataset in memory and interactively play with the model
and generate new variables all in one session.  I figure out what I
want interactively and then separate the data management and analysis in
.do-files, then run them in batch mode.

However, when I first read of the approach of using Perl, sed or awk
to manage data and then only doing the analysis in R, I immediately
thought "Wow, that is a really great idea, I never thought of it like
that before."  It would really get me to think about the modelling and
the data management clearly.  A little voice said "Dude, you're not
using a PDP-11...(oh wait, that might be kinda cool)" but the logic of
it immediately made sense.  I consider it a big part of my
re-Unix-ization.

Joel

-- 
Joel J. Adamson
Biostatistician
Pediatric Psychopharmacology Research Unit
Massachusetts General Hospital
Boston, MA  02114
(617) 643-1432
(303) 880-3109





The information transmitted in this electronic communication...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-12 Thread Jim Lemon
Charilaos Skiadas wrote:
> A new fortune candidate perhaps?
> 
> On Apr 10, 2007, at 6:27 PM, Greg Snow wrote:
> 
> 
>>Remember, everything is better than everything else given the right
>>comparison.
>>
Only if we remove the grammatical blip that turns it into an infinite 
regress, i.e.

"Remember, anything is better than everything else given the right 
comparison"

Jim

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Thomas Lumley
On Wed, 11 Apr 2007, Alan Zaslavsky wrote:
> I have thought for a long time that a facility for efficient rowwise
> calculations might be a valuable enhancement to S/R.  The storage of the
> object would be handled by a database and there would have to be an
> efficient interface for pulling a row (or small chunk of rows) out of the
> database repeatedly; alternatively the operatons could be conducted inside
> the database.  Basic operations of rowwise calculation and cumulation
> (such as forming a column sum or a sum of outer-products) would be
> written in an R-like syntax and translated into an efficient set of
> operations that work through the database.  (Would be happy to share
> some jejeune notes on this.)  However the main answer to thie problem
> in the R world seems to have been Moore's Law.  Perhaps somebody could
> tell us more about the S-Plus large objects library, or the work that
> Doug Bates is doing on efficient calculations with large datasets.
>


I have been surprised to find how much you can get done in SQL, only 
transferring summaries of the data into R.  There is soon going to be an 
experimental "surveyNG" package that works with survey data stored in a SQLite 
database without transferring the whole thing into R for most operations (and I 
could get further if SQLite had the log() and exp() functions that most other 
SQL implementations for large databases provide). I'll be submitting a paper on 
this to useR2007.

The approach of transferring blocks of data into R and using a database just as 
backing store will allow more general computation but will be less efficient 
than performing the computation in the database, so a mixture of both is likely 
to be helpful.  Moore's Law will settle some issues, but there are problems 
where it is working to increase the size of datasets just as fast as it 
increases computational power.


 -thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Douglas Bates
On 4/11/07, Robert Duval <[EMAIL PROTECTED]> wrote:
> So I guess my question is...
>
> Is there any hope of R being modified on its core in order to handle
> more graciously large datasets? (You've mentioned SAS and SPSS, I'd
> add Stata to the list).
>
> Or should we (the users of large datasets) expect to keep on working
> with the present tools for the time to come?

We're certainly aware of the desire of many users to be able to handle
large data sets.  I have just spent a couple of days working with a
student from another department who wanted to work with a very large
data set that was poorly structured.  Most of my time was spent trying
to convince her about the limitations in the structure of her data and
what could realistically be expected to be computed with it.

If your purpose is to perform data manipulation and extraction on
large data sets then I think that it is not unreasonable to be
expected to learn to use SQL. I find it convenient to use R to do data
manipulation because I know the language and the support tools well
but I don't expect to do data cleaning on millions of records with it.
 I am probably too conservative in what I will ask R to handle for me
because I started using S on a Vax-11/750 that had 2 megabytes of
memory and it's hard to break old habits.

I think the trend in working with large data sets in R will be toward
a hybrid approach of using a database for data storage and retrieval
plus R for the model definition and computation.  Miguel Manese's
SQLiteDF package and some of the work in Bioconductor are steps in
this direction.

However, as was mentioned earlier in this thread, there is an
underlying assumption with R that the user is thinking about the
analysis as he/she is doing it. We sometimes see questions about "I
have a data set with (some large number) of records on several hundred
or thousands of variables" and I want to fit a generalized linear
model to it.

I would be hard pressed to think of a situation where I wanted
hundreds of variables in a statistical model unless they are generated
from one or more factors that have many levels.  And, in that case, I
would want to use random effects rather than fixed effects in a model.
 So just saying that the big challenge is to fit some kind of model
with lots of coefficients to a very large number of observations may
be missing the point.  Defining the model better may be the point.

Let me conclude by saying that these are general observations and not
directed to you personally, Robert.  I don't know what you want R to
do graciously to large data sets so my response is more to the general
point that there should always be a balance between thinking about the
structure of the data and the model and brute force computation.  One
can do data analysis by using the computer as a blunt instrument with
which to bludgeon the problem to death but one can't do elegant data
analysis like that.




>
> robert
>
> On 4/11/07, Marc Schwartz <[EMAIL PROTECTED]> wrote:
> > On Wed, 2007-04-11 at 11:26 -0500, Marc Schwartz wrote:
> > > On Wed, 2007-04-11 at 17:56 +0200, Bi-Info
> > > (http://members.home.nl/bi-info) wrote:
> > > > I certainly have that idea too. SPSS functions in a way the same,
> > > > although it specialises in PC applications. Memory addition to a PC is
> > > > not a very expensive thing these days. On my first AT some extra memory
> > > > cost 300 dollars or more. These days you get extra memory with a package
> > > > of marshmellows or chocolate bars if you need it.
> > > > All computations on a computer are discrete steps in a way, but I've
> > > > heard that SAS computations are split up in strictly divided steps. That
> > > > also makes procedures "attachable" I've been told, and interchangable.
> > > > Different procedures can use the same code which alternatively is
> > > > cheaper in memory usages or disk usage (the old days...). That makes SAS
> > > > by the way a complicated machine to build because procedures who are
> > > > split up into numerous fragments which make complicated bookkeeping. If
> > > > you do it that way, I've been told, you can do a lot of computations
> > > > with very little memory. One guy actually computed quite complicated
> > > > models with "only 32MB or less", which wasn't very much for "his type of
> > > > calculations". Which means that SAS is efficient in memory handling I
> > > > think. It's not very efficient in dollar handling... I estimate.
> > > >
> > > > Wilfred
> > >
> > > 
> > >
> > > OhSAS is quite efficient in dollar handling, at least when it comes
> > > to the annual commercial licenses...along the same lines as the
> > > purported efficiency of the U.S. income tax system:
> > >
> > >   "How much money do you have?  Send it in..."
> > >
> > > There is a reason why SAS is the largest privately held software company
> > > in the world and it is not due to the academic licensing structure,
> > > which constitutes only about 12% of their revenue, based upon th

Re: [R] Reasons to Use R

2007-04-11 Thread Wensui Liu
I think the reason that stata is fast is because it only keeps 1 work
table in ram. if you just keep 1 data frame in R, it will run fast
too. But ...

On 4/11/07, Robert Duval <[EMAIL PROTECTED]> wrote:
> So I guess my question is...
>
> Is there any hope of R being modified on its core in order to handle
> more graciously large datasets? (You've mentioned SAS and SPSS, I'd
> add Stata to the list).
>
> Or should we (the users of large datasets) expect to keep on working
> with the present tools for the time to come?
>
> robert
>
> On 4/11/07, Marc Schwartz <[EMAIL PROTECTED]> wrote:
> > On Wed, 2007-04-11 at 11:26 -0500, Marc Schwartz wrote:
> > > On Wed, 2007-04-11 at 17:56 +0200, Bi-Info
> > > (http://members.home.nl/bi-info) wrote:
> > > > I certainly have that idea too. SPSS functions in a way the same,
> > > > although it specialises in PC applications. Memory addition to a PC is
> > > > not a very expensive thing these days. On my first AT some extra memory
> > > > cost 300 dollars or more. These days you get extra memory with a package
> > > > of marshmellows or chocolate bars if you need it.
> > > > All computations on a computer are discrete steps in a way, but I've
> > > > heard that SAS computations are split up in strictly divided steps. That
> > > > also makes procedures "attachable" I've been told, and interchangable.
> > > > Different procedures can use the same code which alternatively is
> > > > cheaper in memory usages or disk usage (the old days...). That makes SAS
> > > > by the way a complicated machine to build because procedures who are
> > > > split up into numerous fragments which make complicated bookkeeping. If
> > > > you do it that way, I've been told, you can do a lot of computations
> > > > with very little memory. One guy actually computed quite complicated
> > > > models with "only 32MB or less", which wasn't very much for "his type of
> > > > calculations". Which means that SAS is efficient in memory handling I
> > > > think. It's not very efficient in dollar handling... I estimate.
> > > >
> > > > Wilfred
> > >
> > > 
> > >
> > > OhSAS is quite efficient in dollar handling, at least when it comes
> > > to the annual commercial licenses...along the same lines as the
> > > purported efficiency of the U.S. income tax system:
> > >
> > >   "How much money do you have?  Send it in..."
> > >
> > > There is a reason why SAS is the largest privately held software company
> > > in the world and it is not due to the academic licensing structure,
> > > which constitutes only about 12% of their revenue, based upon their
> > > public figures.
> >
> > Hmmm..here is a classic example of the problems of reading pie
> > charts.
> >
> > The figure I quoted above, which is from reading the 2005 SAS Annual
> > Report on their web site (such as it is for a private company) comes
> > from a 3D exploded pie chart (ick...).
> >
> > The pie chart uses 3 shades of grey and 5 shades of blue to
> > differentiate 8 market segments and their percentages of total worldwide
> > revenue.
> >
> > I mis-read the 'shade of grey' allocated to Education as being 12%
> > (actually 11.7%).
> >
> > A re-read of the chart, zooming in close on the pie in a PDF reader,
> > appears to actually show that Education is but 1.8% of their annual
> > worldwide revenue.
> >
> > Government based installations, which are presumably the other notable
> > market segment in which substantially discounted licenses are provided,
> > is 14.6%.
> >
> > The report is available here for anyone else curious:
> >
> >   http://www.sas.com/corporate/report05/annualreport05.pdf
> >
> > Somebody needs to send SAS a copy of Tufte or Cleveland.
> >
> > I have to go and rest my eyes now...  ;-)
> >
> > Regards,
> >
> > Marc
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Robert Duval
So I guess my question is...

Is there any hope of R being modified on its core in order to handle
more graciously large datasets? (You've mentioned SAS and SPSS, I'd
add Stata to the list).

Or should we (the users of large datasets) expect to keep on working
with the present tools for the time to come?

robert

On 4/11/07, Marc Schwartz <[EMAIL PROTECTED]> wrote:
> On Wed, 2007-04-11 at 11:26 -0500, Marc Schwartz wrote:
> > On Wed, 2007-04-11 at 17:56 +0200, Bi-Info
> > (http://members.home.nl/bi-info) wrote:
> > > I certainly have that idea too. SPSS functions in a way the same,
> > > although it specialises in PC applications. Memory addition to a PC is
> > > not a very expensive thing these days. On my first AT some extra memory
> > > cost 300 dollars or more. These days you get extra memory with a package
> > > of marshmellows or chocolate bars if you need it.
> > > All computations on a computer are discrete steps in a way, but I've
> > > heard that SAS computations are split up in strictly divided steps. That
> > > also makes procedures "attachable" I've been told, and interchangable.
> > > Different procedures can use the same code which alternatively is
> > > cheaper in memory usages or disk usage (the old days...). That makes SAS
> > > by the way a complicated machine to build because procedures who are
> > > split up into numerous fragments which make complicated bookkeeping. If
> > > you do it that way, I've been told, you can do a lot of computations
> > > with very little memory. One guy actually computed quite complicated
> > > models with "only 32MB or less", which wasn't very much for "his type of
> > > calculations". Which means that SAS is efficient in memory handling I
> > > think. It's not very efficient in dollar handling... I estimate.
> > >
> > > Wilfred
> >
> > 
> >
> > OhSAS is quite efficient in dollar handling, at least when it comes
> > to the annual commercial licenses...along the same lines as the
> > purported efficiency of the U.S. income tax system:
> >
> >   "How much money do you have?  Send it in..."
> >
> > There is a reason why SAS is the largest privately held software company
> > in the world and it is not due to the academic licensing structure,
> > which constitutes only about 12% of their revenue, based upon their
> > public figures.
>
> Hmmm..here is a classic example of the problems of reading pie
> charts.
>
> The figure I quoted above, which is from reading the 2005 SAS Annual
> Report on their web site (such as it is for a private company) comes
> from a 3D exploded pie chart (ick...).
>
> The pie chart uses 3 shades of grey and 5 shades of blue to
> differentiate 8 market segments and their percentages of total worldwide
> revenue.
>
> I mis-read the 'shade of grey' allocated to Education as being 12%
> (actually 11.7%).
>
> A re-read of the chart, zooming in close on the pie in a PDF reader,
> appears to actually show that Education is but 1.8% of their annual
> worldwide revenue.
>
> Government based installations, which are presumably the other notable
> market segment in which substantially discounted licenses are provided,
> is 14.6%.
>
> The report is available here for anyone else curious:
>
>   http://www.sas.com/corporate/report05/annualreport05.pdf
>
> Somebody needs to send SAS a copy of Tufte or Cleveland.
>
> I have to go and rest my eyes now...  ;-)
>
> Regards,
>
> Marc
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Marc Schwartz
On Wed, 2007-04-11 at 11:26 -0500, Marc Schwartz wrote:
> On Wed, 2007-04-11 at 17:56 +0200, Bi-Info
> (http://members.home.nl/bi-info) wrote:
> > I certainly have that idea too. SPSS functions in a way the same, 
> > although it specialises in PC applications. Memory addition to a PC is 
> > not a very expensive thing these days. On my first AT some extra memory 
> > cost 300 dollars or more. These days you get extra memory with a package 
> > of marshmellows or chocolate bars if you need it.
> > All computations on a computer are discrete steps in a way, but I've 
> > heard that SAS computations are split up in strictly divided steps. That 
> > also makes procedures "attachable" I've been told, and interchangable. 
> > Different procedures can use the same code which alternatively is 
> > cheaper in memory usages or disk usage (the old days...). That makes SAS 
> > by the way a complicated machine to build because procedures who are 
> > split up into numerous fragments which make complicated bookkeeping. If 
> > you do it that way, I've been told, you can do a lot of computations 
> > with very little memory. One guy actually computed quite complicated 
> > models with "only 32MB or less", which wasn't very much for "his type of 
> > calculations". Which means that SAS is efficient in memory handling I 
> > think. It's not very efficient in dollar handling... I estimate.
> > 
> > Wilfred
> 
> 
> 
> OhSAS is quite efficient in dollar handling, at least when it comes
> to the annual commercial licenses...along the same lines as the
> purported efficiency of the U.S. income tax system:
> 
>   "How much money do you have?  Send it in..."
> 
> There is a reason why SAS is the largest privately held software company
> in the world and it is not due to the academic licensing structure,
> which constitutes only about 12% of their revenue, based upon their
> public figures.

Hmmm..here is a classic example of the problems of reading pie
charts. 

The figure I quoted above, which is from reading the 2005 SAS Annual
Report on their web site (such as it is for a private company) comes
from a 3D exploded pie chart (ick...). 

The pie chart uses 3 shades of grey and 5 shades of blue to
differentiate 8 market segments and their percentages of total worldwide
revenue. 

I mis-read the 'shade of grey' allocated to Education as being 12%
(actually 11.7%).

A re-read of the chart, zooming in close on the pie in a PDF reader,
appears to actually show that Education is but 1.8% of their annual
worldwide revenue.

Government based installations, which are presumably the other notable
market segment in which substantially discounted licenses are provided,
is 14.6%.

The report is available here for anyone else curious:

  http://www.sas.com/corporate/report05/annualreport05.pdf

Somebody needs to send SAS a copy of Tufte or Cleveland.

I have to go and rest my eyes now...  ;-)

Regards,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R [Broadcast]

2007-04-11 Thread Liaw, Andy
From: Douglas Bates
> 
> On 4/10/07, Wensui Liu <[EMAIL PROTECTED]> wrote:
> > Greg,
> > As far as I understand, SAS is more efficient handling large data 
> > probably than S+/R. Do you have any idea why?
> 
> SAS originated at a time when large data sets were stored on 
> magnetic tape and the only reasonable way to process them was 
> sequentially.
> Thus most statistics procedures in SAS act as filters, 
> processing one record at a time and accumulating summary 
> information.  In the past SAS performed a least squares fit 
> by accumulating the crossproduct of [X:y] and then using the 
> using the sweep operator to reduce that matrix. For such an 
> approach the number of observations does not affect the 
> amount of storage required.  Adding observations just 
> requires more time.
> 
> This works fine (although there are numerical disadvantages 
> to this approach - try mentioning the sweep operator to an 
> expert in numerical linear algebra - you get a blank stare) 

For those who stared blankly at the above:  The sweep operator is 
just a facier version of the good old Gaussian elimination...

Andy

> as long as the operations that you wish to perform fit into 
> this model.  Making the desired operations fit into the model 
> is the primary reason for the awkwardness in many SAS analyses.
> 
> The emphasis in R is on flexibility and the use of good 
> numerical techniques - not on processing large data sets 
> sequentially.  The algorithms used in R for most least 
> squares fits generate and analyze the complete model matrix 
> instead of summary quantities.  (The algorithms in the biglm 
> package are a compromise that work on horizontal sections of 
> the model matrix.)
> 
> If your only criterion for comparison is the ability to work 
> with very large data sets performing operations that can fit 
> into the filter model used by SAS then SAS will be a better 
> choice.  However you do lock yourself into a certain set of 
> operations and you are doing it to save memory, which is a 
> commodity that decreases in price very rapidly.
> 
> As mentioned in other replies, for many years the majority of 
> SAS uses are for data manipulation rather than for 
> statistical analysis so the filter model has been modified in 
> later versions.
> 
> 
> 
> 
> 
> > On 4/10/07, Greg Snow <[EMAIL PROTECTED]> wrote:
> > > > -Original Message-
> > > > From: [EMAIL PROTECTED] 
> > > > [mailto:[EMAIL PROTECTED] On Behalf Of Bi-Info 
> > > > (http://members.home.nl/bi-info)
> > > > Sent: Monday, April 09, 2007 4:23 PM
> > > > To: Gabor Grothendieck
> > > > Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
> > > > Subject: Re: [R] Reasons to Use R
> > >
> > > [snip]
> > >
> > > > So what's the big deal about S using files instead of 
> memory like 
> > > > R. I don't get the point. Isn't there enough swap space for S? 
> > > > (Who cares
> > > > anyway: it works, isn't it?) Or are there any problems 
> with S and 
> > > > large datasets? I don't get it. You use them, Greg. So 
> you might 
> > > > discuss that issue.
> > > >
> > > > Wilfred
> > > >
> > > >
> > >
> > > This is my understanding of the issue (not anything official).
> > >
> > > If you use up all the memory while in R, then the OS will start 
> > > swapping memory to disk, but the OS does not know what parts of 
> > > memory correspond to which objects, so it is entirely 
> possible that 
> > > the chunk swapped to disk contains parts of different 
> data objects, 
> > > so when you need one of those objects again, everything 
> needs to be 
> > > swapped back in.  This is very inefficient.
> > >
> > > S-PLUS occasionally runs into the same problem, but since it does 
> > > some of its own swapping to disk it can be more efficient by 
> > > swapping single data objects (data frames, etc.).  Also, since 
> > > S-PLUS is already saving everything to disk, it does not actually 
> > > need to do a full swap, it can just look and see that a 
> particular 
> > > data frame has not been used for a while, know that it is already 
> > > saved on the disk, and unload it from memory without 
> having to write it to disk first.
> > >
> > > The g.data package for R has some of this functionality 
> of keeping 
> > > data on the disk until needed.
> > >
> > > The better approach for large data sets is to o

Re: [R] Reasons to Use R

2007-04-11 Thread Alan Zaslavsky
thanks, I will take a look.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Marc Schwartz
On Wed, 2007-04-11 at 17:56 +0200, Bi-Info
(http://members.home.nl/bi-info) wrote:
> I certainly have that idea too. SPSS functions in a way the same, 
> although it specialises in PC applications. Memory addition to a PC is 
> not a very expensive thing these days. On my first AT some extra memory 
> cost 300 dollars or more. These days you get extra memory with a package 
> of marshmellows or chocolate bars if you need it.
> All computations on a computer are discrete steps in a way, but I've 
> heard that SAS computations are split up in strictly divided steps. That 
> also makes procedures "attachable" I've been told, and interchangable. 
> Different procedures can use the same code which alternatively is 
> cheaper in memory usages or disk usage (the old days...). That makes SAS 
> by the way a complicated machine to build because procedures who are 
> split up into numerous fragments which make complicated bookkeeping. If 
> you do it that way, I've been told, you can do a lot of computations 
> with very little memory. One guy actually computed quite complicated 
> models with "only 32MB or less", which wasn't very much for "his type of 
> calculations". Which means that SAS is efficient in memory handling I 
> think. It's not very efficient in dollar handling... I estimate.
> 
> Wilfred



OhSAS is quite efficient in dollar handling, at least when it comes
to the annual commercial licenses...along the same lines as the
purported efficiency of the U.S. income tax system:

  "How much money do you have?  Send it in..."

There is a reason why SAS is the largest privately held software company
in the world and it is not due to the academic licensing structure,
which constitutes only about 12% of their revenue, based upon their
public figures.

Since SPSS is mentioned, it also functions using similar economic
models...

:-)

Regards,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Greg Snow
> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Alan Zaslavsky
> Sent: Wednesday, April 11, 2007 9:07 AM
> To: R-help@stat.math.ethz.ch
> Subject: [R] Reasons to Use R

[snip]
 
> I have thought for a long time that a facility for efficient 
> rowwise calculations might be a valuable enhancement to S/R.  
> The storage of the object would be handled by a database and 
> there would have to be an efficient interface for pulling a 
> row (or small chunk of rows) out of the database repeatedly; 
> alternatively the operatons could be conducted inside the 
> database.  Basic operations of rowwise calculation and 
> cumulation (such as forming a column sum or a sum of 
> outer-products) would be written in an R-like syntax and 
> translated into an efficient set of operations that work 
> through the database.  (Would be happy to share some jejeune 
> notes on this.)

The biglm and SQLiteDF packages have made a start in this direction
(unless I am missunderstanding you), adding functionality to either of
those seems the best use of effort.

>  However the main answer to thie problem in 
> the R world seems to have been Moore's Law.  Perhaps somebody 
> could tell us more about the S-Plus large objects library, or 
> the work that Doug Bates is doing on efficient calculations 
> with large datasets.

This link gives an overview and some detail of the S-PLUS big data
library
http://www.insightful.com/support/splus70win/eduguide.pdf


>   Alan Zaslavsky
>   [EMAIL PROTECTED]



-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Duncan Temple Lang
Rajarshi Guha wrote:
> On Wed, 2007-04-11 at 11:06 -0400, Alan Zaslavsky wrote:
> 
> > I have thought for a long time that a facility for efficient rowwise 
> > calculations might be a valuable enhancement to S/R.  The storage of the 
> > object would be handled by a database and there would have to be an 
> > efficient interface for pulling a row (or small chunk of rows) out of the 
> > database repeatedly; alternatively the operatons could be conducted inside
> > the database. 
> 
> You can embed R inside postgres, though I don't know how efficient this
> would be. But it does allow one to operator on a per row basis.
> 
> http://www.omegahat.org/RSPostgres/

I still like this idea a lot and a more recent implementation of it was created 
by
Joe Conway  and can be found at

   http://www.joeconway.com/plr/

 D.


> 
> ---
> Rajarshi Guha <[EMAIL PROTECTED]>
> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
> ---
> Finally I am becoming stupider no more
> - Paul Erdos' epitaph
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Duncan Temple Lang[EMAIL PROTECTED]
Department of Statistics  work:  (530) 752-4782
4210 Mathematical Sciences Bldg.  fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis, CA 95616, USA





pgpinhw3Ik8qk.pgp
Description: PGP signature
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Rajarshi Guha
On Wed, 2007-04-11 at 11:06 -0400, Alan Zaslavsky wrote:

> I have thought for a long time that a facility for efficient rowwise 
> calculations might be a valuable enhancement to S/R.  The storage of the 
> object would be handled by a database and there would have to be an 
> efficient interface for pulling a row (or small chunk of rows) out of the 
> database repeatedly; alternatively the operatons could be conducted inside
> the database. 

You can embed R inside postgres, though I don't know how efficient this
would be. But it does allow one to operator on a per row basis.

http://www.omegahat.org/RSPostgres/

---
Rajarshi Guha <[EMAIL PROTECTED]>
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
---
Finally I am becoming stupider no more
- Paul Erdos' epitaph

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Bi-Info (http://members.home.nl/bi-info)
I certainly have that idea too. SPSS functions in a way the same, 
although it specialises in PC applications. Memory addition to a PC is 
not a very expensive thing these days. On my first AT some extra memory 
cost 300 dollars or more. These days you get extra memory with a package 
of marshmellows or chocolate bars if you need it.
All computations on a computer are discrete steps in a way, but I've 
heard that SAS computations are split up in strictly divided steps. That 
also makes procedures "attachable" I've been told, and interchangable. 
Different procedures can use the same code which alternatively is 
cheaper in memory usages or disk usage (the old days...). That makes SAS 
by the way a complicated machine to build because procedures who are 
split up into numerous fragments which make complicated bookkeeping. If 
you do it that way, I've been told, you can do a lot of computations 
with very little memory. One guy actually computed quite complicated 
models with "only 32MB or less", which wasn't very much for "his type of 
calculations". Which means that SAS is efficient in memory handling I 
think. It's not very efficient in dollar handling... I estimate.

Wilfred


--




Certainly true.  In particular, SAS was designed from to store
data items on disk, and to read into core memory the minimum
needed for a particular calculation.

The kind of data SAS handles is (for the most part) limited to
rectangular arrays, similar to R data frames. In many procedures
they can be read from disk sequentially (row by row), which
undoubtedly simplifies memory handling.  It seems logical to
suppose that in developing SAS, algorithms were chosen to
support that style of memory management. Finally, the style of
writing programs in SAS consists of discrete steps of
computation, between which nothing but the program need be held
in core memory.


"Gabor Grothendieck" <[EMAIL PROTECTED]> wrote:

> I think SAS was developed at a time when computer memory was
> much smaller than it is now and the legacy of that is its better
> usage of computer resources.
> 
> On 4/10/07, Wensui Liu <[EMAIL PROTECTED]> wrote:
> > Greg,
> > As far as I understand, SAS is more efficient handling large data
> > probably than S+/R. Do you have any idea why?

-- 
Mike Prager, NOAA, Beaufort, NC
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


-- 
No virus found in this incoming message.


22:44

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reasons to Use R

2007-04-11 Thread Alan Zaslavsky
Right: SAS objects (at least in the base and statistics components of the 
system -- there are dozens of add-ons for particular markets) are simple 
databases.  the predominant model for data manipulation and statistical 
calculation is a row by row operation that creates modified rows and/or 
accumulates totals.  This was pretty much the only way things could be 
done in the days when real (and typically virtual) memory was much smaller 
than it now is.  It can be a pretty efficient model for calculatons that 
fit that pattern.  One downside of course is that a line of R code can 
easily turn into 30 lines of SAS with data steps, sort steps, steps to 
accumulate totals, etc.

As noted by a couple of previous writers, S-Plus might be regarded as 
somewhat intermediate in its model in that objects constitute files but 
rows do not correspond to chunks of adjacent bytes in memory or filespace.

I have thought for a long time that a facility for efficient rowwise 
calculations might be a valuable enhancement to S/R.  The storage of the 
object would be handled by a database and there would have to be an 
efficient interface for pulling a row (or small chunk of rows) out of the 
database repeatedly; alternatively the operatons could be conducted inside
the database.  Basic operations of rowwise calculation and cumulation
(such as forming a column sum or a sum of outer-products) would be
written in an R-like syntax and translated into an efficient set of
operations that work through the database.  (Would be happy to share
some jejeune notes on this.)  However the main answer to thie problem
in the R world seems to have been Moore's Law.  Perhaps somebody could
tell us more about the S-Plus large objects library, or the work that
Doug Bates is doing on efficient calculations with large datasets.

Alan Zaslavsky
[EMAIL PROTECTED]

> Date: Tue, 10 Apr 2007 16:27:50 -0600
> From: "Greg Snow" <[EMAIL PROTECTED]>
> Subject: Re: [R] Reasons to Use R
> To: "Wensui Liu" <[EMAIL PROTECTED]>
>
> I think SAS has the database part built into it.  I have heard 2nd hand
> of new statisticians going to work for a company and asking if they have
> SAS, the reply is "Yes we use SAS for our database, does it do
> statistics also?"  Also I heard something about SAS is no longer
> considered an acronym, they like having it be just a name and don't want
> the fact that one of the S's used to stand for statistics to scare away
> companies that use it as a database.
>
> Maybe someone more up on SAS can confirm or deny this.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Mike Prager
Certainly true.  In particular, SAS was designed from to store
data items on disk, and to read into core memory the minimum
needed for a particular calculation.

The kind of data SAS handles is (for the most part) limited to
rectangular arrays, similar to R data frames. In many procedures
they can be read from disk sequentially (row by row), which
undoubtedly simplifies memory handling.  It seems logical to
suppose that in developing SAS, algorithms were chosen to
support that style of memory management. Finally, the style of
writing programs in SAS consists of discrete steps of
computation, between which nothing but the program need be held
in core memory.


"Gabor Grothendieck" <[EMAIL PROTECTED]> wrote:

> I think SAS was developed at a time when computer memory was
> much smaller than it is now and the legacy of that is its better
> usage of computer resources.
> 
> On 4/10/07, Wensui Liu <[EMAIL PROTECTED]> wrote:
> > Greg,
> > As far as I understand, SAS is more efficient handling large data
> > probably than S+/R. Do you have any idea why?

-- 
Mike Prager, NOAA, Beaufort, NC
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Douglas Bates
On 4/10/07, Wensui Liu <[EMAIL PROTECTED]> wrote:
> Greg,
> As far as I understand, SAS is more efficient handling large data
> probably than S+/R. Do you have any idea why?

SAS originated at a time when large data sets were stored on magnetic
tape and the only reasonable way to process them was sequentially.
Thus most statistics procedures in SAS act as filters, processing one
record at a time and accumulating summary information.  In the past
SAS performed a least squares fit by accumulating the crossproduct of
[X:y] and then using the using the sweep operator to reduce that
matrix. For such an approach the number of observations does not
affect the amount of storage required.  Adding observations just
requires more time.

This works fine (although there are numerical disadvantages to this
approach - try mentioning the sweep operator to an expert in numerical
linear algebra - you get a blank stare) as long as the operations that
you wish to perform fit into this model.  Making the desired
operations fit into the model is the primary reason for the
awkwardness in many SAS analyses.

The emphasis in R is on flexibility and the use of good numerical
techniques - not on processing large data sets sequentially.  The
algorithms used in R for most least squares fits generate and analyze
the complete model matrix instead of summary quantities.  (The
algorithms in the biglm package are a compromise that work on
horizontal sections of the model matrix.)

If your only criterion for comparison is the ability to work with very
large data sets performing operations that can fit into the filter
model used by SAS then SAS will be a better choice.  However you do
lock yourself into a certain set of operations and you are doing it to
save memory, which is a commodity that decreases in price very
rapidly.

As mentioned in other replies, for many years the majority of SAS uses
are for data manipulation rather than for statistical analysis so the
filter model has been modified in later versions.





> On 4/10/07, Greg Snow <[EMAIL PROTECTED]> wrote:
> > > -Original Message-
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf Of
> > > Bi-Info (http://members.home.nl/bi-info)
> > > Sent: Monday, April 09, 2007 4:23 PM
> > > To: Gabor Grothendieck
> > > Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
> > > Subject: Re: [R] Reasons to Use R
> >
> > [snip]
> >
> > > So what's the big deal about S using files instead of memory
> > > like R. I don't get the point. Isn't there enough swap space
> > > for S? (Who cares
> > > anyway: it works, isn't it?) Or are there any problems with S
> > > and large datasets? I don't get it. You use them, Greg. So
> > > you might discuss that issue.
> > >
> > > Wilfred
> > >
> > >
> >
> > This is my understanding of the issue (not anything official).
> >
> > If you use up all the memory while in R, then the OS will start swapping
> > memory to disk, but the OS does not know what parts of memory correspond
> > to which objects, so it is entirely possible that the chunk swapped to
> > disk contains parts of different data objects, so when you need one of
> > those objects again, everything needs to be swapped back in.  This is
> > very inefficient.
> >
> > S-PLUS occasionally runs into the same problem, but since it does some
> > of its own swapping to disk it can be more efficient by swapping single
> > data objects (data frames, etc.).  Also, since S-PLUS is already saving
> > everything to disk, it does not actually need to do a full swap, it can
> > just look and see that a particular data frame has not been used for a
> > while, know that it is already saved on the disk, and unload it from
> > memory without having to write it to disk first.
> >
> > The g.data package for R has some of this functionality of keeping data
> > on the disk until needed.
> >
> > The better approach for large data sets is to only have some of the data
> > in memory at a time and to automatically read just the parts that you
> > need.  So for big datasets it is recommended to have the actual data
> > stored in a database and use one of the database connection packages to
> > only read in the subset that you need.  The SQLiteDF package for R is
> > working on automating this process for R.  There are also the bigdata
> > module for S-PLUS and the biglm package for R have ways of doing some of
> > the common analyses using chunks of data at a time.  This idea is not
> > new.  There was a program in the late 1970s and 80s called Rummage by
> > Del Scott (I guess technically it st

Re: [R] Reasons to Use R

2007-04-11 Thread Charilaos Skiadas
A new fortune candidate perhaps?

On Apr 10, 2007, at 6:27 PM, Greg Snow wrote:

> Remember, everything is better than everything else given the right
> comparison.
>
> -- 
> Gregory (Greg) L. Snow Ph.D.

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-10 Thread Greg Snow
I think SAS has the database part built into it.  I have heard 2nd hand
of new statisticians going to work for a company and asking if they have
SAS, the reply is "Yes we use SAS for our database, does it do
statistics also?"  Also I heard something about SAS is no longer
considered an acronym, they like having it be just a name and don't want
the fact that one of the S's used to stand for statistics to scare away
companies that use it as a database.

Maybe someone more up on SAS can confirm or deny this.

Also one issue to always look at is central control versus ease of
extendability.  If you have a program that is completely under your
control and does one set of things, then extending it to a new model
(big data) is fairly straight forward.  R is the opposite end of the
spectrum with many contributers and many techniques.  Extending some
basic pieces to be very efficient with big data could be done easily,
but would break many other pieces.  Getting all the different packages
to conform to a single standard in a short amount of time would be near
impossible.

With R's flexibility, there are probably some problems that can be done
quicker with a proper use of biglm than with SAS and I expect that with
some more work and maturity the SQLiteDF package may start to rival SAS
as well on certain problems.  While SAS is a useful program and great at
certain things, there are some tecniques that I would not even attempt
using SAS that are fairly straigh forward in R (I remember seeing some
SAS code to do a bootstrap that included a datastep to read in and
extract information from a SAS output file, <>  SAS/ODS has
improved this, but I would much rather bootstrap in R/S-PLUS than
anything else).

Remember, everything is better than everything else given the right
comparison.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

> -Original Message-
> From: Wensui Liu [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, April 10, 2007 3:26 PM
> To: Greg Snow
> Cc: Bi-Info (http://members.home.nl/bi-info); Gabor 
> Grothendieck; Lorenzo Isella; r-help@stat.math.ethz.ch
> Subject: Re: [R] Reasons to Use R
> 
> Greg,
> As far as I understand, SAS is more efficient handling large 
> data probably than S+/R. Do you have any idea why?
> 
> On 4/10/07, Greg Snow <[EMAIL PROTECTED]> wrote:
> > > -Original Message-
> > > From: [EMAIL PROTECTED] 
> > > [mailto:[EMAIL PROTECTED] On Behalf Of Bi-Info 
> > > (http://members.home.nl/bi-info)
> > > Sent: Monday, April 09, 2007 4:23 PM
> > > To: Gabor Grothendieck
> > > Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
> > > Subject: Re: [R] Reasons to Use R
> >
> > [snip]
> >
> > > So what's the big deal about S using files instead of 
> memory like R. 
> > > I don't get the point. Isn't there enough swap space for S? (Who 
> > > cares
> > > anyway: it works, isn't it?) Or are there any problems with S and 
> > > large datasets? I don't get it. You use them, Greg. So you might 
> > > discuss that issue.
> > >
> > > Wilfred
> > >
> > >
> >
> > This is my understanding of the issue (not anything official).
> >
> > If you use up all the memory while in R, then the OS will start 
> > swapping memory to disk, but the OS does not know what 
> parts of memory 
> > correspond to which objects, so it is entirely possible 
> that the chunk 
> > swapped to disk contains parts of different data objects, 
> so when you 
> > need one of those objects again, everything needs to be 
> swapped back 
> > in.  This is very inefficient.
> >
> > S-PLUS occasionally runs into the same problem, but since 
> it does some 
> > of its own swapping to disk it can be more efficient by swapping 
> > single data objects (data frames, etc.).  Also, since S-PLUS is 
> > already saving everything to disk, it does not actually 
> need to do a 
> > full swap, it can just look and see that a particular data 
> frame has 
> > not been used for a while, know that it is already saved on 
> the disk, 
> > and unload it from memory without having to write it to disk first.
> >
> > The g.data package for R has some of this functionality of keeping 
> > data on the disk until needed.
> >
> > The better approach for large data sets is to only have some of the 
> > data in memory at a time and to automatically read just the 
> parts that 
> > you need.  So for big datasets it is recommended to have the actual 
> > data stored in a database and use one of the database connection 
> > 

Re: [R] Reasons to Use R

2007-04-10 Thread Wensui Liu
Greg,
As far as I understand, SAS is more efficient handling large data
probably than S+/R. Do you have any idea why?

On 4/10/07, Greg Snow <[EMAIL PROTECTED]> wrote:
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of
> > Bi-Info (http://members.home.nl/bi-info)
> > Sent: Monday, April 09, 2007 4:23 PM
> > To: Gabor Grothendieck
> > Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
> > Subject: Re: [R] Reasons to Use R
>
> [snip]
>
> > So what's the big deal about S using files instead of memory
> > like R. I don't get the point. Isn't there enough swap space
> > for S? (Who cares
> > anyway: it works, isn't it?) Or are there any problems with S
> > and large datasets? I don't get it. You use them, Greg. So
> > you might discuss that issue.
> >
> > Wilfred
> >
> >
>
> This is my understanding of the issue (not anything official).
>
> If you use up all the memory while in R, then the OS will start swapping
> memory to disk, but the OS does not know what parts of memory correspond
> to which objects, so it is entirely possible that the chunk swapped to
> disk contains parts of different data objects, so when you need one of
> those objects again, everything needs to be swapped back in.  This is
> very inefficient.
>
> S-PLUS occasionally runs into the same problem, but since it does some
> of its own swapping to disk it can be more efficient by swapping single
> data objects (data frames, etc.).  Also, since S-PLUS is already saving
> everything to disk, it does not actually need to do a full swap, it can
> just look and see that a particular data frame has not been used for a
> while, know that it is already saved on the disk, and unload it from
> memory without having to write it to disk first.
>
> The g.data package for R has some of this functionality of keeping data
> on the disk until needed.
>
> The better approach for large data sets is to only have some of the data
> in memory at a time and to automatically read just the parts that you
> need.  So for big datasets it is recommended to have the actual data
> stored in a database and use one of the database connection packages to
> only read in the subset that you need.  The SQLiteDF package for R is
> working on automating this process for R.  There are also the bigdata
> module for S-PLUS and the biglm package for R have ways of doing some of
> the common analyses using chunks of data at a time.  This idea is not
> new.  There was a program in the late 1970s and 80s called Rummage by
> Del Scott (I guess technically it still exists, I have a copy on a 5.25"
> floppy somewhere) that used the approach of specify the model you wanted
> to fit first, then specify the data file.  Rummage would then figure out
> which sufficient statistics were needed and read the data in chunks,
> compute the sufficient statistics on the fly, and not keep more than a
> couple of lines of the data in memory at once.  Unfortunately it did not
> have much of a user interface, so when memory was cheap and datasets
> only medium sized it did not compete well, I guess it was just a bit too
> ahead of its time.
>
> Hope this helps,
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> [EMAIL PROTECTED]
> (801) 408-8111
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-10 Thread Frank E Harrell Jr
Taylor, Z Todd wrote:
> On Monday, April 09, 2007 3:23 PM, someone named Wilfred wrote:
> 
>> So what's the big deal about S using files instead of memory
>> like R. I don't get the point. Isn't there enough swap space
>> for S? (Who cares anyway: it works, isn't it?) Or are there
>> any problems with S and large datasets? I don't get it. You
>> use them, Greg. So you might discuss that issue.
> 
> S's one-to-one correspondence between S objects and filesystem
> objects is the single remaining reason I haven't completely
> converted over to R.  With S I can manage my objects via
> makefiles.  Corrections to raw data or changes to analysis
> scripts get applied to all objects in the project (and there
> are often thousands of them) by simply typing 'make'.  That
> includes everything right down to the graphics that will go
> in the report.
> 
> How do people live without that?

Personally I'd rather have R's save( ) and load( ).

Frank

> 
> --Todd


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-10 Thread Andrew Robinson
Hi Todd,

I guess I don't see the difference between that strategy and using
make to look after scripts, raw data, Sweave files, and (if necessary)
images.  I find that I can get pretty fine-grained control over what
parts of a project need to be rerun by breaking the analysis into
chapters.  I suppose it depends on whether one takes a script-centric
or an object-centric view of a data analysis project.  A script-centric
view is nicer for version control.  I think that make is
centric-neutral :).

Cheers,

Andrew

On Tue, Apr 10, 2007 at 04:23:54PM -0700, Taylor, Z Todd wrote:
> On Monday, April 09, 2007 3:23 PM, someone named Wilfred wrote:
> 
> > So what's the big deal about S using files instead of memory
> > like R. I don't get the point. Isn't there enough swap space
> > for S? (Who cares anyway: it works, isn't it?) Or are there
> > any problems with S and large datasets? I don't get it. You
> > use them, Greg. So you might discuss that issue.
> 
> S's one-to-one correspondence between S objects and filesystem
> objects is the single remaining reason I haven't completely
> converted over to R.  With S I can manage my objects via
> makefiles.  Corrections to raw data or changes to analysis
> scripts get applied to all objects in the project (and there
> are often thousands of them) by simply typing 'make'.  That
> includes everything right down to the graphics that will go
> in the report.
> 
> How do people live without that?
> 
> --Todd
> -- 
> Why is 'abbreviation' such a long word?
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-10 Thread Taylor, Z Todd
On Monday, April 09, 2007 3:23 PM, someone named Wilfred wrote:

> So what's the big deal about S using files instead of memory
> like R. I don't get the point. Isn't there enough swap space
> for S? (Who cares anyway: it works, isn't it?) Or are there
> any problems with S and large datasets? I don't get it. You
> use them, Greg. So you might discuss that issue.

S's one-to-one correspondence between S objects and filesystem
objects is the single remaining reason I haven't completely
converted over to R.  With S I can manage my objects via
makefiles.  Corrections to raw data or changes to analysis
scripts get applied to all objects in the project (and there
are often thousands of them) by simply typing 'make'.  That
includes everything right down to the graphics that will go
in the report.

How do people live without that?

--Todd
-- 
Why is 'abbreviation' such a long word?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-10 Thread Gabor Grothendieck
I think SAS was developed at a time when computer memory was
much smaller than it is now and the legacy of that is its better
usage of computer resources.

On 4/10/07, Wensui Liu <[EMAIL PROTECTED]> wrote:
> Greg,
> As far as I understand, SAS is more efficient handling large data
> probably than S+/R. Do you have any idea why?
>
> On 4/10/07, Greg Snow <[EMAIL PROTECTED]> wrote:
> > > -Original Message-
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf Of
> > > Bi-Info (http://members.home.nl/bi-info)
> > > Sent: Monday, April 09, 2007 4:23 PM
> > > To: Gabor Grothendieck
> > > Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
> > > Subject: Re: [R] Reasons to Use R
> >
> > [snip]
> >
> > > So what's the big deal about S using files instead of memory
> > > like R. I don't get the point. Isn't there enough swap space
> > > for S? (Who cares
> > > anyway: it works, isn't it?) Or are there any problems with S
> > > and large datasets? I don't get it. You use them, Greg. So
> > > you might discuss that issue.
> > >
> > > Wilfred
> > >
> > >
> >
> > This is my understanding of the issue (not anything official).
> >
> > If you use up all the memory while in R, then the OS will start swapping
> > memory to disk, but the OS does not know what parts of memory correspond
> > to which objects, so it is entirely possible that the chunk swapped to
> > disk contains parts of different data objects, so when you need one of
> > those objects again, everything needs to be swapped back in.  This is
> > very inefficient.
> >
> > S-PLUS occasionally runs into the same problem, but since it does some
> > of its own swapping to disk it can be more efficient by swapping single
> > data objects (data frames, etc.).  Also, since S-PLUS is already saving
> > everything to disk, it does not actually need to do a full swap, it can
> > just look and see that a particular data frame has not been used for a
> > while, know that it is already saved on the disk, and unload it from
> > memory without having to write it to disk first.
> >
> > The g.data package for R has some of this functionality of keeping data
> > on the disk until needed.
> >
> > The better approach for large data sets is to only have some of the data
> > in memory at a time and to automatically read just the parts that you
> > need.  So for big datasets it is recommended to have the actual data
> > stored in a database and use one of the database connection packages to
> > only read in the subset that you need.  The SQLiteDF package for R is
> > working on automating this process for R.  There are also the bigdata
> > module for S-PLUS and the biglm package for R have ways of doing some of
> > the common analyses using chunks of data at a time.  This idea is not
> > new.  There was a program in the late 1970s and 80s called Rummage by
> > Del Scott (I guess technically it still exists, I have a copy on a 5.25"
> > floppy somewhere) that used the approach of specify the model you wanted
> > to fit first, then specify the data file.  Rummage would then figure out
> > which sufficient statistics were needed and read the data in chunks,
> > compute the sufficient statistics on the fly, and not keep more than a
> > couple of lines of the data in memory at once.  Unfortunately it did not
> > have much of a user interface, so when memory was cheap and datasets
> > only medium sized it did not compete well, I guess it was just a bit too
> > ahead of its time.
> >
> > Hope this helps,
> >
> >
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > [EMAIL PROTECTED]
> > (801) 408-8111
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> WenSui Liu
> A lousy statistician who happens to know a little programming
> (http://spaces.msn.com/statcompute/blog)
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-10 Thread Greg Snow
> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Bi-Info (http://members.home.nl/bi-info)
> Sent: Monday, April 09, 2007 4:23 PM
> To: Gabor Grothendieck
> Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
> Subject: Re: [R] Reasons to Use R

[snip] 

> So what's the big deal about S using files instead of memory 
> like R. I don't get the point. Isn't there enough swap space 
> for S? (Who cares
> anyway: it works, isn't it?) Or are there any problems with S 
> and large datasets? I don't get it. You use them, Greg. So 
> you might discuss that issue.
> 
> Wilfred
> 
> 

This is my understanding of the issue (not anything official).

If you use up all the memory while in R, then the OS will start swapping
memory to disk, but the OS does not know what parts of memory correspond
to which objects, so it is entirely possible that the chunk swapped to
disk contains parts of different data objects, so when you need one of
those objects again, everything needs to be swapped back in.  This is
very inefficient.

S-PLUS occasionally runs into the same problem, but since it does some
of its own swapping to disk it can be more efficient by swapping single
data objects (data frames, etc.).  Also, since S-PLUS is already saving
everything to disk, it does not actually need to do a full swap, it can
just look and see that a particular data frame has not been used for a
while, know that it is already saved on the disk, and unload it from
memory without having to write it to disk first.

The g.data package for R has some of this functionality of keeping data
on the disk until needed.

The better approach for large data sets is to only have some of the data
in memory at a time and to automatically read just the parts that you
need.  So for big datasets it is recommended to have the actual data
stored in a database and use one of the database connection packages to
only read in the subset that you need.  The SQLiteDF package for R is
working on automating this process for R.  There are also the bigdata
module for S-PLUS and the biglm package for R have ways of doing some of
the common analyses using chunks of data at a time.  This idea is not
new.  There was a program in the late 1970s and 80s called Rummage by
Del Scott (I guess technically it still exists, I have a copy on a 5.25"
floppy somewhere) that used the approach of specify the model you wanted
to fit first, then specify the data file.  Rummage would then figure out
which sufficient statistics were needed and read the data in chunks,
compute the sufficient statistics on the fly, and not keep more than a
couple of lines of the data in memory at once.  Unfortunately it did not
have much of a user interface, so when memory was cheap and datasets
only medium sized it did not compete well, I guess it was just a bit too
ahead of its time.

Hope this helps, 



-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-10 Thread Greg Snow
For a previous version of SAS we had parts installed on each computer
where it was used, but there were key pieces located on a network drive
(not internet, but local network) such that if you tried to start SAS
while someone else was using it you would get an error message.

We had troubles with the network, so now we have a full version
installed on each computer, but the person in the company that is the
contact between us and SAS (my group has 1 licence, but the company as a
whole has several) checks up on us from time to time to make sure that
we stick within the 1 at a time guidelines (not hard, we mostly use
other things) or pay for additional licences.

S-PLUS has also had similar types of licences, I was teaching in a
computer lab where all the computers could run S-PLUS, but once 5 people
had started S-PLUS, noone else could until someone else quite out of it
(So we used R for that Class).  For S-PLUS 7 when I upgraded my computer
and installed my licenced copy on the new computer, it disabled the copy
on my old computer.  This may have changed somewhat, because I remember
there being some complaints from people who legitimately installed it on
their laptop, but it would not work when the laptop was not connected to
the internet.

There are a lot of different ways to try to enforce licence conditions
on software (and doing so is important for companies that want to make a
profit these days), unfortunately the current pendulum swing is making
thing more inconvienient for the common user (at home I have some
software that we use to program my wife's sewing machines that can be
installed on any computer, but only works if a hardware key is plugged
into a usb port).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

> -Original Message-
> From: Charilaos Skiadas [mailto:[EMAIL PROTECTED] 
> Sent: Monday, April 09, 2007 3:24 PM
> To: Greg Snow
> Cc: Gabor Grothendieck; Lorenzo Isella; R-Help list
> Subject: Re: [R] Reasons to Use R
> 
> On Apr 9, 2007, at 1:45 PM, Greg Snow wrote:
> 
> > The licences keep changing, some have in the past but don't 
> now, some 
> > you can get an additional licence for home at a discounted 
> price. Some 
> > it depends on the type of licence you have at work 
> (currently our SAS 
> > licence is such that the 3 people in my group can all have it 
> > installed, but at most 1 can be using it at any 1 time, how 
> does that 
> > affect installing/using it at home).
> 
> Hm, this intrigues me, it would seem to me that the only way 
> for SAS to check that only one of your colleagues uses it at 
> any given time would be to contact some sort of online 
> server. Does that mean that SAS can only be run when you have 
> internet access?
> 
> Or is it simply a clause on the license, without any "runtime checks"?
> 
> Haris Skiadas
> Department of Mathematics and Computer Science Hanover College
> 
> 
> 
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-10 Thread Jeffrey J. Hallman
"halldor bjornsson" <[EMAIL PROTECTED]> writes:
> ...
> Now, R does not have everything we want. One thing missing is a decent
> R-DB2 connection, for windows the excellent RODBC works fine, but ODBC
> support on Linux is  a hassle. 
> 

A hassle?  I use RODBC on Linux to read data from a mainframe DB2 database.  I
had to create the file .odbc.ini in my home directory with lines like this:

[m1db2p]
Driver = DB2
Servername = NameOfOurMainframe
Database   = fdrp
UserName   = "NachoBizness"
TraceFile  = /home/NachoBizness/.odbc.log

and then to connect I do this:

Sys.putenv(DB2INSTANCE = "db2inst")
myConnection <- odbcConnect(dsn = "m1db2p", uid = uid, pwd = pwd, case = 
"toupper")

with 'uid' and 'pwd' set to my mainframe uid and password.

Now, I am not the sysadmin for our Linux machines, but I don't think they had
to do much beyond the standard rpm installation to get this working.  

-- 
Jeff

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R [Broadcast]

2007-04-09 Thread Wensui Liu
Andy,
I totally agree with you. Money should be spent on the people working
hard instead of on the fancy software. But in real life, it is the
opposite. ^_^.

On 4/9/07, Liaw, Andy <[EMAIL PROTECTED]> wrote:
> I've probably been away from SAS for too long... we've recently tried to
> get SAS on our 64-bit Linux boxes (because SAS on PC is not sufficient
> for some of my colleagues who need it).  I was shocked by the quote for
> our 28-core Scyld cluster--- the annual fee was a few times the total
> cost of our hardware.  We ended up buying a new quad 3GHz Opterons box
> with 32GB ram just so that the fee for SAS on such a box would be more
> tolerable.  It just boggles my mind that the right to use SAS for a year
> is about the price of a nice four-bedroom house (near SAS Institute!).
> I don't understand people who rather pay that kind of price for the
> software, instead of spending the money on state-of-the-art hardware and
> save more than a bundle.
>
> Just my $0.02...
> Andy
>
> From: Jorge Cornejo-Donoso
> >
> > I have a Dell with 2 Intel XEON 3.0 procesors and 2GB of ram
> > The problem is the DB size.
> >
> > -Mensaje original-
> > De: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
> > Enviado el: Lunes, 09 de Abril de 2007 11:28
> > Para: Jorge Cornejo-Donoso
> > CC: r-help@stat.math.ethz.ch
> > Asunto: Re: [R] Reasons to Use R
> >
> > Have you tried 64 bit machines with larger memory or do you
> > mean that you can't use R on your current machines?
> >
> > Also have you tried S-Plus?  Will that work for you? The
> > transition from that to R would be less than from SAS to R.
> >
> > On 4/9/07, Jorge Cornejo-Donoso <[EMAIL PROTECTED]> wrote:
> > > tha s9ze of db is an issue with R. We are still using SAS because R
> > > can't handle own db, and of couse we don't want to sacrify
> > resolution,
> > > because the data collection is expensive (at least in fisheries and
> > > oceagraphy), so.. I think that R need to improve the use of big DBs.
> > > Now I only can use R for graph preparation and some data
> > analisis, but
> > > we can't do the main work on R, abd that is really sad.
> > >
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
>
> --
> Notice:  This e-mail message, together with any attachments,...{{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R [Broadcast]

2007-04-09 Thread Liaw, Andy
I've probably been away from SAS for too long... we've recently tried to
get SAS on our 64-bit Linux boxes (because SAS on PC is not sufficient
for some of my colleagues who need it).  I was shocked by the quote for
our 28-core Scyld cluster--- the annual fee was a few times the total
cost of our hardware.  We ended up buying a new quad 3GHz Opterons box
with 32GB ram just so that the fee for SAS on such a box would be more
tolerable.  It just boggles my mind that the right to use SAS for a year
is about the price of a nice four-bedroom house (near SAS Institute!).
I don't understand people who rather pay that kind of price for the
software, instead of spending the money on state-of-the-art hardware and
save more than a bundle.

Just my $0.02...
Andy

From: Jorge Cornejo-Donoso
> 
> I have a Dell with 2 Intel XEON 3.0 procesors and 2GB of ram
> The problem is the DB size. 
> 
> -Mensaje original-
> De: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
> Enviado el: Lunes, 09 de Abril de 2007 11:28
> Para: Jorge Cornejo-Donoso
> CC: r-help@stat.math.ethz.ch
> Asunto: Re: [R] Reasons to Use R
> 
> Have you tried 64 bit machines with larger memory or do you 
> mean that you can't use R on your current machines?
> 
> Also have you tried S-Plus?  Will that work for you? The 
> transition from that to R would be less than from SAS to R.
> 
> On 4/9/07, Jorge Cornejo-Donoso <[EMAIL PROTECTED]> wrote:
> > tha s9ze of db is an issue with R. We are still using SAS because R 
> > can't handle own db, and of couse we don't want to sacrify 
> resolution, 
> > because the data collection is expensive (at least in fisheries and 
> > oceagraphy), so.. I think that R need to improve the use of big DBs.
> > Now I only can use R for graph preparation and some data 
> analisis, but 
> > we can't do the main work on R, abd that is really sad.
> >
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-09 Thread Bi-Info (http://members.home.nl/bi-info)
Licensing is a big issue in software. The way I prefer it is an easy 
license, a license which makes it possible that I can work on another 
PC, without paying a lot of money. R produces quite good results and is 
widely used. That makes it a statistical package that I want.
The other thing is that working with large datasets requires "some" 
effort by software makers to get it working. I doubt if R has the 
capability of working consistently with large datasets. That is an issue 
I think. I have done some comparisons between SPSS and R, and R seems to 
be performing allright, so I can do computations with it. Nonetheless: 
the data handling is not quite as good I think in comparison with SAS.

When I started doing statistics there were about three packages: SPSS, 
SAS and BMDP (at least: these were available). On a PC you were required 
to use SPSS.
Nowadays there are hundreds, some with excellent database facilities, or 
you can compute the newest statistical tests, or an exotic one. I 
haven't got a clue how to work with new database facilities. dBase was 
my only database education and everything has changed. So I cannot 
answer if R is capable of working with large datasets in relation to 
databases. I really don't know. The only thing I know that if I compute 
a ChiSq, it works on a relatively large dataset (not Fisher tests by the 
way). The same with a likelihood procedure, or tabulations including 
non-parametrics or factor analysis.   But databases are an issue I've 
been told by a guy who works with R. SAS was a better option he told me.

So what's the big deal about S using files instead of memory like R. I 
don't get the point. Isn't there enough swap space for S? (Who cares 
anyway: it works, isn't it?) Or are there any problems with S and large 
datasets? I don't get it. You use them, Greg. So you might discuss that 
issue.

Wilfred










The licences keep changing, some have in the past but don't now, some
you can get an additional licence for home at a discounted price. Some
it depends on the type of licence you have at work (currently our SAS
licence is such that the 3 people in my group can all have it installed,
but at most 1 can be using it at any 1 time, how does that affect
installing/using it at home).  I may be able to install some of the
software at home also, but for most of them I have given up trying to
figure out the legality of it and so I have not installed them at home
to be on the safe side.

Some of the doctors I work with who are also affiliated with the local
university have mentioned that they can get a discounted academic
version of SAS and could use that, but my interpretation of the academic
licence that one showed me (probably not the most recent) said (in my
interpretation, I am not a lawyer) that if they published the results
without paying a licence upgrade fee, they would be violating the
licence (the academic version was intended for teaching only).

The R licence on the other hand is pretty clear that I can install it
and use it pretty much anywhere I want.

You are right in correcting me, R is not the only package that can be
used on multiple computers.  I do think it is the most straight forward
of the good ones.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111



> -Original Message-
> From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
> Sent: Monday, April 09, 2007 10:44 AM
> To: Greg Snow
> Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
> Subject: Re: [R] Reasons to Use R
> 
> I might be wrong about this but I thought that the licenses 
> for at least some of the commercial packages do let you make 
> a copy of the one you have at work for home use.
> 
> On 4/9/07, Greg Snow <[EMAIL PROTECTED]> wrote:
> > Here are a couple more thougts to add to what you have 
> already received:
> >
> > You mentioned that price is not at issue, but there are other costs 
> > than money that you may want to look at.  On my work 
> machine I have R, 
> > S-PLUS, SAS, SPSS, and a couple of other stats programs; on 
> my laptop 
> > and home computers I have R installed.  So, if a deadline 
> is looming 
> > and I am working on a project mainly in R, it is easy to 
> work on it on 
> > the bus or at home (or in a boring meeting), the same does not work 
> > for a SAS or SPSS project (Hmm, thinking about this now, 
> maybe I need 
> > to do less in R :-).
> >
> > R and S-PLUS are very flexible/customizable, if you have a certain 
> > plot that you make often you can write your own 
> function/script to do 
> > it automatically, most other programs will give you their standard, 
> > then you have to modify it to meet your specifications.  
> With sweave 
> > (and the odf and html e

Re: [R] Reasons to Use R

2007-04-09 Thread Charilaos Skiadas
On Apr 9, 2007, at 1:45 PM, Greg Snow wrote:

> The licences keep changing, some have in the past but don't now, some
> you can get an additional licence for home at a discounted price. Some
> it depends on the type of licence you have at work (currently our SAS
> licence is such that the 3 people in my group can all have it  
> installed,
> but at most 1 can be using it at any 1 time, how does that affect
> installing/using it at home).

Hm, this intrigues me, it would seem to me that the only way for SAS  
to check that only one of your colleagues uses it at any given time  
would be to contact some sort of online server. Does that mean that  
SAS can only be run when you have internet access?

Or is it simply a clause on the license, without any "runtime checks"?

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-09 Thread halldor bjornsson
Dear Lorenzo,

Thanks for starting a great thread here. Like others, I would like to
hear a summary
if you make one.

My institute uses R for internal data processing and analyzing. Below
are some of our reasons, and yes cost (or lack thereof) is not the
only one.

First, prior to the rise of R we already had a number of people using
Splus, and our
main compute server had licenses for Splus. As the institution moved
from Sun Unix
servers to Linux workstations and servers, the licensing issue became
important. Having
to service many licenses (one per workstation, and several on the
servers) is time consuming for overworked IT staff. Furthermore, our
Splus programs that ran routinely on the servers
could all be easily made run on R. Hence, this was really a no-brainer.

Second, R runs on both windows and linux (and solaris and macs,-
although the last one is not really an issue for us). We have made
some user programs that are tailor-made for the work we do, these we
bundle into R packages, that then can be used on both windows and
linux. This was a very important consideration for us.

Third, user community. Even with commercial solutions (such as Matlab)
the quality of the
user community is very important, - if we had felt that R did not have
an active and responsive community we probably would have been more
hesitant. Needless to say
R has an incredibly active community which makes it an attractive environment.
Furthermore, other institutions in our field are also adopting R, at
least in the research departments.

Fourth, R is a good choice for many of the things that we do (data
analysis of varying complexity, good graphics, maptools [working with
shapefiles] etc). It was therefore an obvious candiate for us from the
start.

Now, R does not have everything we want. One thing missing is a decent
R-DB2 connection, for windows the excellent RODBC works fine, but ODBC
support on Linux is  a hassle. The big file issue is there, but many
of our files are GRIB which is a format that is  generally not
supported by anyone Furthermore, object graphics, ala pythons
matplotlib (and of course  Matlab) is not there, but would be very
handy. However, that being said, it is easy to make publication (print
and web) quality graphics with R. And of course as always with Open
Source if you miss something bad enough why not do it (or have it
done) yourself and add it to the package.

We have not used R much for large NetCDF datasets, there are other
tools (such as
the CDO package, which also supports GRIB) that are better oriented for this.

We have used R on solaris, Linux (several different flavours) and
Windows (since W98).  We currently use it on our primary production
servers (RedHat Enterprise Edition), but we have not used it in a
parallel setting. We have not used R for making on-the-fly
calculations and graphics for the web, although this is clearly
possible.

I hope this helps, I have found  this thread to be a good one.

Sincerely,
Halldór

On 4/5/07, Lorenzo Isella <[EMAIL PROTECTED]> wrote:
> Dear All,
> The institute I work for is organizing an internal workshop for High
> Performance Computing (HPC).
> I am planning to attend it and talk a bit about fluid dynamics, but
> there is also quite a lot of interest devoted to data post-processing
> and management of huge data sets.
> A lot of people are interested in image processing/pattern recognition
> and statistic applied to geography/ecology, but I would like not to
> post this on too many lists.
> The final aim of the workshop is  understanding hardware requirements
> and drafting a list of the equipment we would like to buy. I think
> this could be the venue to talk about R as well.
> Therefore, even if it is not exactly a typical mailing list question,
> I would like to have suggestions about where to collect info about:
> (1)Institutions (not only academia) using R
> (2)Hardware requirements, possibly benchmarks
> (3)R & clusters, R & multiple CPU machines, R performance on different 
> hardware.
> (4)finally, a list of the advantages for using R over commercial
> statistical packages. The money-saving in itself is not a reason good
> enough and some people are scared by the lack of professional support,
> though this mailing list is simply wonderful.
>
> Kind Regards
>
> Lorenzo Isella
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Halldór Björnsson
Deildarstj. Ranns. & Þróun
Veðursvið Veðurstofu Íslands

Halldór Bjornsson
Weatherservice R & D
Icelandic Met. Office

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-cont

Re: [R] Reasons to Use R

2007-04-09 Thread Greg Snow
The licences keep changing, some have in the past but don't now, some
you can get an additional licence for home at a discounted price. Some
it depends on the type of licence you have at work (currently our SAS
licence is such that the 3 people in my group can all have it installed,
but at most 1 can be using it at any 1 time, how does that affect
installing/using it at home).  I may be able to install some of the
software at home also, but for most of them I have given up trying to
figure out the legality of it and so I have not installed them at home
to be on the safe side.

Some of the doctors I work with who are also affiliated with the local
university have mentioned that they can get a discounted academic
version of SAS and could use that, but my interpretation of the academic
licence that one showed me (probably not the most recent) said (in my
interpretation, I am not a lawyer) that if they published the results
without paying a licence upgrade fee, they would be violating the
licence (the academic version was intended for teaching only).

The R licence on the other hand is pretty clear that I can install it
and use it pretty much anywhere I want.

You are right in correcting me, R is not the only package that can be
used on multiple computers.  I do think it is the most straight forward
of the good ones.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

> -Original Message-
> From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
> Sent: Monday, April 09, 2007 10:44 AM
> To: Greg Snow
> Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
> Subject: Re: [R] Reasons to Use R
> 
> I might be wrong about this but I thought that the licenses 
> for at least some of the commercial packages do let you make 
> a copy of the one you have at work for home use.
> 
> On 4/9/07, Greg Snow <[EMAIL PROTECTED]> wrote:
> > Here are a couple more thougts to add to what you have 
> already received:
> >
> > You mentioned that price is not at issue, but there are other costs 
> > than money that you may want to look at.  On my work 
> machine I have R, 
> > S-PLUS, SAS, SPSS, and a couple of other stats programs; on 
> my laptop 
> > and home computers I have R installed.  So, if a deadline 
> is looming 
> > and I am working on a project mainly in R, it is easy to 
> work on it on 
> > the bus or at home (or in a boring meeting), the same does not work 
> > for a SAS or SPSS project (Hmm, thinking about this now, 
> maybe I need 
> > to do less in R :-).
> >
> > R and S-PLUS are very flexible/customizable, if you have a certain 
> > plot that you make often you can write your own 
> function/script to do 
> > it automatically, most other programs will give you their standard, 
> > then you have to modify it to meet your specifications.  
> With sweave 
> > (and the odf and html extensions) you can automate whole 
> reports, very 
> > useful for things that you do month after month.
> >
> > And what I think is the biggest advantage of R and S-PLUS 
> is that they 
> > strongly encourage you to think about your data.  Other 
> programs (at 
> > least that I am familiar with) tend to have 1 specific way 
> of treating 
> > your data, and expect you to modify your data to fit that programs 
> > model.  These models can be overrestrictive (force you to 
> restructure 
> > your data to fit their model) or underrestrictive (allow 
> things that 
> > should really be separate data objects to be combined into a single
> > "dataset") and sometimes both.  S on the other hand allows many 
> > different ways to store and work with your data, and as you analyze 
> > the data, different branches of new analysis open up depending on 
> > early results rather than just getting stock output for a 
> procedure.  
> > If all you want is a black box where data goes in one end and a 
> > specific answer comes out the other, then most programs 
> will work; but 
> > if you want to really understand what your data has to tell 
> you, then 
> > R/S-PLUS makes this easy and natural.
> >
> > Hope this helps,
> >
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > [EMAIL PROTECTED]
> > (801) 408-8111
> >
> >
> >
> > > -Original Message-
> > > From: [EMAIL PROTECTED] 
> > > [mailto:[EMAIL PROTECTED] On Behalf Of Lorenzo 
> > > Isella
> > > Sent: Thursday, April 05, 2007 9:02 AM
> > > To: r-help@stat.math.ethz.ch
> > > Subject: [R] Reasons to Use R
> > >
> > &

Re: [R] Reasons to Use R

2007-04-09 Thread Gabor Grothendieck
I might be wrong about this but I thought that the licenses for at least
some of the commercial packages do let you make a copy of the one
you have at work for home use.

On 4/9/07, Greg Snow <[EMAIL PROTECTED]> wrote:
> Here are a couple more thougts to add to what you have already received:
>
> You mentioned that price is not at issue, but there are other costs than
> money that you may want to look at.  On my work machine I have R,
> S-PLUS, SAS, SPSS, and a couple of other stats programs; on my laptop
> and home computers I have R installed.  So, if a deadline is looming and
> I am working on a project mainly in R, it is easy to work on it on the
> bus or at home (or in a boring meeting), the same does not work for a
> SAS or SPSS project (Hmm, thinking about this now, maybe I need to do
> less in R :-).
>
> R and S-PLUS are very flexible/customizable, if you have a certain plot
> that you make often you can write your own function/script to do it
> automatically, most other programs will give you their standard, then
> you have to modify it to meet your specifications.  With sweave (and the
> odf and html extensions) you can automate whole reports, very useful for
> things that you do month after month.
>
> And what I think is the biggest advantage of R and S-PLUS is that they
> strongly encourage you to think about your data.  Other programs (at
> least that I am familiar with) tend to have 1 specific way of treating
> your data, and expect you to modify your data to fit that programs
> model.  These models can be overrestrictive (force you to restructure
> your data to fit their model) or underrestrictive (allow things that
> should really be separate data objects to be combined into a single
> "dataset") and sometimes both.  S on the other hand allows many
> different ways to store and work with your data, and as you analyze the
> data, different branches of new analysis open up depending on early
> results rather than just getting stock output for a procedure.  If all
> you want is a black box where data goes in one end and a specific answer
> comes out the other, then most programs will work; but if you want to
> really understand what your data has to tell you, then R/S-PLUS makes
> this easy and natural.
>
> Hope this helps,
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> [EMAIL PROTECTED]
> (801) 408-8111
>
>
>
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Lorenzo Isella
> > Sent: Thursday, April 05, 2007 9:02 AM
> > To: r-help@stat.math.ethz.ch
> > Subject: [R] Reasons to Use R
> >
> > Dear All,
> > The institute I work for is organizing an internal workshop
> > for High Performance Computing (HPC).
> > I am planning to attend it and talk a bit about fluid
> > dynamics, but there is also quite a lot of interest devoted
> > to data post-processing and management of huge data sets.
> > A lot of people are interested in image processing/pattern
> > recognition and statistic applied to geography/ecology, but I
> > would like not to post this on too many lists.
> > The final aim of the workshop is  understanding hardware
> > requirements and drafting a list of the equipment we would
> > like to buy. I think this could be the venue to talk about R as well.
> > Therefore, even if it is not exactly a typical mailing list
> > question, I would like to have suggestions about where to
> > collect info about:
> > (1)Institutions (not only academia) using R (2)Hardware
> > requirements, possibly benchmarks (3)R & clusters, R &
> > multiple CPU machines, R performance on different hardware.
> > (4)finally, a list of the advantages for using R over
> > commercial statistical packages. The money-saving in itself
> > is not a reason good enough and some people are scared by the
> > lack of professional support, though this mailing list is
> > simply wonderful.
> >
> > Kind Regards
> >
> > Lorenzo Isella
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-09 Thread Greg Snow
Here are a couple more thougts to add to what you have already received:

You mentioned that price is not at issue, but there are other costs than
money that you may want to look at.  On my work machine I have R,
S-PLUS, SAS, SPSS, and a couple of other stats programs; on my laptop
and home computers I have R installed.  So, if a deadline is looming and
I am working on a project mainly in R, it is easy to work on it on the
bus or at home (or in a boring meeting), the same does not work for a
SAS or SPSS project (Hmm, thinking about this now, maybe I need to do
less in R :-).

R and S-PLUS are very flexible/customizable, if you have a certain plot
that you make often you can write your own function/script to do it
automatically, most other programs will give you their standard, then
you have to modify it to meet your specifications.  With sweave (and the
odf and html extensions) you can automate whole reports, very useful for
things that you do month after month.

And what I think is the biggest advantage of R and S-PLUS is that they
strongly encourage you to think about your data.  Other programs (at
least that I am familiar with) tend to have 1 specific way of treating
your data, and expect you to modify your data to fit that programs
model.  These models can be overrestrictive (force you to restructure
your data to fit their model) or underrestrictive (allow things that
should really be separate data objects to be combined into a single
"dataset") and sometimes both.  S on the other hand allows many
different ways to store and work with your data, and as you analyze the
data, different branches of new analysis open up depending on early
results rather than just getting stock output for a procedure.  If all
you want is a black box where data goes in one end and a specific answer
comes out the other, then most programs will work; but if you want to
really understand what your data has to tell you, then R/S-PLUS makes
this easy and natural.

Hope this helps,


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Lorenzo Isella
> Sent: Thursday, April 05, 2007 9:02 AM
> To: r-help@stat.math.ethz.ch
> Subject: [R] Reasons to Use R
> 
> Dear All,
> The institute I work for is organizing an internal workshop 
> for High Performance Computing (HPC).
> I am planning to attend it and talk a bit about fluid 
> dynamics, but there is also quite a lot of interest devoted 
> to data post-processing and management of huge data sets.
> A lot of people are interested in image processing/pattern 
> recognition and statistic applied to geography/ecology, but I 
> would like not to post this on too many lists.
> The final aim of the workshop is  understanding hardware 
> requirements and drafting a list of the equipment we would 
> like to buy. I think this could be the venue to talk about R as well.
> Therefore, even if it is not exactly a typical mailing list 
> question, I would like to have suggestions about where to 
> collect info about:
> (1)Institutions (not only academia) using R (2)Hardware 
> requirements, possibly benchmarks (3)R & clusters, R & 
> multiple CPU machines, R performance on different hardware.
> (4)finally, a list of the advantages for using R over 
> commercial statistical packages. The money-saving in itself 
> is not a reason good enough and some people are scared by the 
> lack of professional support, though this mailing list is 
> simply wonderful.
> 
> Kind Regards
> 
> Lorenzo Isella
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-09 Thread Gabor Grothendieck
What about the S-Plus question?  S-Plus stores objects in files
whereas R stores them in memory.

On 4/9/07, Jorge Cornejo-Donoso <[EMAIL PROTECTED]> wrote:
> I have a Dell with 2 Intel XEON 3.0 procesors and 2GB of ram
> The problem is the DB size.
>
> -Mensaje original-
> De: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
> Enviado el: Lunes, 09 de Abril de 2007 11:28
> Para: Jorge Cornejo-Donoso
> CC: r-help@stat.math.ethz.ch
> Asunto: Re: [R] Reasons to Use R
>
> Have you tried 64 bit machines with larger memory or do you mean that you
> can't use R on your current machines?
>
> Also have you tried S-Plus?  Will that work for you? The transition from
> that to R would be less than from SAS to R.
>
> On 4/9/07, Jorge Cornejo-Donoso <[EMAIL PROTECTED]> wrote:
> > tha s9ze of db is an issue with R. We are still using SAS because R
> > can't handle own db, and of couse we don't want to sacrify resolution,
> > because the data collection is expensive (at least in fisheries and
> > oceagraphy), so.. I think that R need to improve the use of big DBs.
> > Now I only can use R for graph preparation and some data analisis, but
> > we can't do the main work on R, abd that is really sad.
> >
>
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-09 Thread Jorge Cornejo-Donoso
I have a Dell with 2 Intel XEON 3.0 procesors and 2GB of ram
The problem is the DB size. 

-Mensaje original-
De: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
Enviado el: Lunes, 09 de Abril de 2007 11:28
Para: Jorge Cornejo-Donoso
CC: r-help@stat.math.ethz.ch
Asunto: Re: [R] Reasons to Use R

Have you tried 64 bit machines with larger memory or do you mean that you
can't use R on your current machines?

Also have you tried S-Plus?  Will that work for you? The transition from
that to R would be less than from SAS to R.

On 4/9/07, Jorge Cornejo-Donoso <[EMAIL PROTECTED]> wrote:
> tha s9ze of db is an issue with R. We are still using SAS because R 
> can't handle own db, and of couse we don't want to sacrify resolution, 
> because the data collection is expensive (at least in fisheries and 
> oceagraphy), so.. I think that R need to improve the use of big DBs. 
> Now I only can use R for graph preparation and some data analisis, but 
> we can't do the main work on R, abd that is really sad.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-09 Thread Gabor Grothendieck
Have you tried 64 bit machines with larger memory or do you mean
that you can't use R on your current machines?

Also have you tried S-Plus?  Will that work for you? The transition from
that to R would be less than from SAS to R.

On 4/9/07, Jorge Cornejo-Donoso <[EMAIL PROTECTED]> wrote:
> tha s9ze of db is an issue with R. We are still using SAS because R
> can't handle own db, and of couse we don't want to sacrify resolution,
> because the data collection is expensive (at least in fisheries and
> oceagraphy), so.. I think that R need to improve the use of big DBs. Now
> I only can use R for graph preparation and some data analisis, but we
> can't do the main work on R, abd that is really sad.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-08 Thread Wilfred Zegwaard
Dear Johann and Gabor,

It's what amounts to large datasets. There are hundreds of datasets R
can't handle, probably thousands or more. I noticed on my computer
(which is nothing more that an average PC) that R breaks down after 250
MB of memory. I also note that SPSS breaks down, Matlab, etc.

I'm not a SAS user, but I have worked in the past with SAS. It's very
good as a remember, but it's ten years ago. And it's a "dollar machine"
I've been told: you add dollars to SAS as you add dollars to a Porsche.
I haven't got it and for most statistical applications it isn't
necessary I've been told. R is sufficient for that. The datasets I use
are often not that big (the way I like it).
About three years ago I spoke to somebody who has worked with it and
said "it's database system is excellent and statistical profound".
Someone with a PhD, so probably he is right.

Monte-Carlo simulations are computationally time-consuming, but probably
these can be done in R. I haven't seen any libaries for it (they might
be there). It has been done with S (the commercial counterpart of R), so
probably with R too. If you tie Monte Carlo simulaton with large
datasets you probably run into problems with a conventional R system.
What I've been told in those instances is "buy a new computer" / "add
memory and buy a new processor"... and don't smoke hashiesh.

That wasn't a good advice because the guy who told me smoked hashiesh
like hell and drank Pastis (blue liqor) like water. I kicked him out.
But that's another story.

Cheers,

Wilfred

(I drink wine and tailor made beer, and only on occasions. That's why.
His simulations were good I've been told.)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-08 Thread Gabor Grothendieck
On 4/8/07, Johann Hibschman <[EMAIL PROTECTED]> wrote:
> R's pass-by-value semantics also make it harder than it should be to
> deal with where it's crucial that you not make a copy of the data
> frame, for fear of running out of memory.  Pass-by-reference would
> make implementing data transformations so much easier that I don't
> really understand how pass-by-value became the standard.  (If there's
> a trick to doing in-place transformations, I've not found it.)

Because R processes objects in memory I also would not rate it as
as strong as some other packages on very large data sets but you can
use databases which may make it less important in some cases and you
can get a certain amount of mileage out of R environments and as
64 bit computers become commonplace and memory sizes grow
larger and larger data sets will become easy to handle.

Regarding environments, also available are proto objects from the
proto package which are environments with slightly different semantics.
Even if you don't intend to use the proto package its got quite a bit
of documentation and supporting information that might be
helpful:

- home page:
  http://code.google.com/p/r-proto/
- overview (click on Wiki tab at home page) which includes article links
  that discuss OO and environments
- tutorial, reference card, reference manual, vignette (see Links box)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-08 Thread Johann Hibschman
On 4/6/07, Wilfred Zegwaard <[EMAIL PROTECTED]> wrote:

> I'm not a programmer, but I have the experience that R is good for
> processing large datasets, especially in combination with specialised
> statistics.

This I find a little surprising, but maybe it's just a sign that I'm
not experienced enough with R yet.

I can't use R for big datasets.  At all.  Big datasets take forever to
load with read.table, R frequently runs out of memory,  and nlm or
gnlm never seem to actually converge to answers.  By comparison, I can
point SAS and NLIN at this data without problem.  (Of course, SAS is
running on a pretty powerful dedicated machine with a big ram disk, so
that may be part of the problem.)

R's pass-by-value semantics also make it harder than it should be to
deal with where it's crucial that you not make a copy of the data
frame, for fear of running out of memory.  Pass-by-reference would
make implementing data transformations so much easier that I don't
really understand how pass-by-value became the standard.  (If there's
a trick to doing in-place transformations, I've not found it.)

Right now, I'm considering starting on a project involving some big
Monte Carlo integrations over the complicated posterior parameter
distributions of a nonlinear regression model, and I have the strong
feeling that R will just choke.

R's great for small projects, but as soon as you even a few hundred
megs of data, it seems to break down.

If I'm doing things wrong, please tell me.  :-)  SAS is a beast to work with.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-06 Thread Stephen Tucker
Regarding (2),

I wonder if this information is too outdated or not relevant when scaled up
to larger problems...

http://www.sciviews.org/benchmark/index.html




--- Ramon Diaz-Uriarte <[EMAIL PROTECTED]> wrote:

> Dear Lorenzo,
> 
> I'll try not to repeat what other have answered before.
> 
> On 4/5/07, Lorenzo Isella <[EMAIL PROTECTED]> wrote:
> > The institute I work for is organizing an internal workshop for High
> > Performance Computing (HPC).
> (...)
> 
> > (1)Institutions (not only academia) using R
> 
> You can count my institution too. Several groups. (I can provide more
> details off-list if you want).
> 
> > (2)Hardware requirements, possibly benchmarks
> > (3)R & clusters, R & multiple CPU machines, R performance on different
> hardware.
> 
> We do use R in commodity off-the shelf clusters; our two clusters are
> running Debian GNU/Linux; both 32-bit machines ---Xeons--- and 64-bit
> machines ---dual-core AMD Opterons. We use parallelization quite a
> bit, with MPI (via Rmpi and papply packages mainly). One convenient
> feature is that (once the lam universe is up and running) whether we
> are using the 4 cores in a single box, or the max available 120, is
> completeley transparent. Using R and MPI is, really, a piece of cake.
> That said, there are things that I miss; in particular, oftentimes I
> wish R were Erlang or Oz because of the straightforward fault-tolerant
> distributed computing and the built-in abstractions for distribution
> and concurrency. The issue of multithreading has come up several times
> in this list and is something that some people miss.
> 
> I am not sure how much R is used in the usual HPC realms. It is my
> understanding that the "traditional HPC" is still dominated by things
> such as HPF, and C with MPI, OpenMP, or UPC or Cilk. The usual answer
> to "but R is too slow" is "but you can write Fortran or C code for the
> bottlenecks and call it from R". I guess you could use, say, UPC in
> that C that is linked to R, but I have no experience. And I think this
> code can become a pain to write and maintain (specially if you want to
> play around with what you try to parallelize, etc). My feeling (based
> on no information or documentation whatsoever) is that how far R can
> be stretched or extended into HPC is still an open question.
> 
> 
> > (4)finally, a list of the advantages for using R over commercial
> > statistical packages. The money-saving in itself is not a reason good
> > enough and some people are scared by the lack of professional support,
> > though this mailing list is simply wonderful.
> >
> 
> (In addition to all the already mentioned answers)
> Complete source code availability. Being able to look at the C source
> code for a few things has been invaluable for me.
> And, of course, and extremely active, responsive, and vibrant
> community that, among other things, has contributed packages and code
> for an incredible range of problems.
> 
> 
> Best,
> 
> R.
> 
> P.S. I'd be interested in hearing about the responses you get to your
> presentation.
> 
> 
> > Kind Regards
> >
> > Lorenzo Isella
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 
> -- 
> Ramon Diaz-Uriarte
> Statistical Computing Team
> Structural Biology and Biocomputing Programme
> Spanish National Cancer Centre (CNIO)
> http://ligarto.org/rdiaz
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



 

TV dinner still cooling?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-06 Thread Wilfred Zegwaard
Dear Lorenzo and Steven,

I'm not a programmer, but I have the experience that R is good for
processing large datasets, especially in combination with specialised
statistics. There are some limits to that, but R handles large datasets
/ complicated computation a lot better that SPSS for example. I cannot
speak of Fortran, but I have the experience of Pascal. I prefer R,
because in Pascal you become easily confused an endless programming
effort which has nothing to do with the problem. I do like Pascal, it's
the only programming language I actually learned, but it isn't an
adequate replacement of R.
The experience I have is that the SPSS language, and menu-driven
package, is far easier to handle than R, but when it comes to specific
computations, SPSS loses it, by far. Non-parametrics is good in R, e.g.
Dataset handling is adequate (my SPSS ports can be read), I noticed that
R has good numerical routines like optimisation (even mixed integer
programming), good procedures for regression (GLM, which is not an SPSS
standard). Try to compute a Kendall-W statistic in SPSS. It's relatively
easy in R.
The only thing that I DON'T like about R is dataset computations and
it's syntax. When I have a dataset with only non-parametric content
which is also "dirty" (dataset is incomplete / wrong value), I have to
call in almost a technician how to do that. To be honest: I use a
spreadsheet for these dataset computations, and then export it to R. But
I noted in R there are several solutions for that. With SciViews I could
get a basic feeling for it.
Pascal is basically the only programming language that I syntactically
understood. It had a kind of logical mathematical structure to it. The
logic of Fortran (and to some extent R): I completely miss it.

Statistically: R is my choice, and luckely most procedures in R are
easily accessible. And my experience with computations in R are... good.

I have done in the past simulations, especially with time-series, but I
cannot recommend R for it (arima.sim is not sufficient for these types
of simulations). I still would prefer Pascal for it. There is also an
excellent open source package for Pascal: Free Pascal, but I hardly use
it. I do have some good experiences with computations in C, but little
experience. Instead of C I would prefer R, I believe.

Cheers,

Wilfred

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-06 Thread Ramon Diaz-Uriarte
Dear Lorenzo,

I'll try not to repeat what other have answered before.

On 4/5/07, Lorenzo Isella <[EMAIL PROTECTED]> wrote:
> The institute I work for is organizing an internal workshop for High
> Performance Computing (HPC).
(...)

> (1)Institutions (not only academia) using R

You can count my institution too. Several groups. (I can provide more
details off-list if you want).

> (2)Hardware requirements, possibly benchmarks
> (3)R & clusters, R & multiple CPU machines, R performance on different 
> hardware.

We do use R in commodity off-the shelf clusters; our two clusters are
running Debian GNU/Linux; both 32-bit machines ---Xeons--- and 64-bit
machines ---dual-core AMD Opterons. We use parallelization quite a
bit, with MPI (via Rmpi and papply packages mainly). One convenient
feature is that (once the lam universe is up and running) whether we
are using the 4 cores in a single box, or the max available 120, is
completeley transparent. Using R and MPI is, really, a piece of cake.
That said, there are things that I miss; in particular, oftentimes I
wish R were Erlang or Oz because of the straightforward fault-tolerant
distributed computing and the built-in abstractions for distribution
and concurrency. The issue of multithreading has come up several times
in this list and is something that some people miss.

I am not sure how much R is used in the usual HPC realms. It is my
understanding that the "traditional HPC" is still dominated by things
such as HPF, and C with MPI, OpenMP, or UPC or Cilk. The usual answer
to "but R is too slow" is "but you can write Fortran or C code for the
bottlenecks and call it from R". I guess you could use, say, UPC in
that C that is linked to R, but I have no experience. And I think this
code can become a pain to write and maintain (specially if you want to
play around with what you try to parallelize, etc). My feeling (based
on no information or documentation whatsoever) is that how far R can
be stretched or extended into HPC is still an open question.


> (4)finally, a list of the advantages for using R over commercial
> statistical packages. The money-saving in itself is not a reason good
> enough and some people are scared by the lack of professional support,
> though this mailing list is simply wonderful.
>

(In addition to all the already mentioned answers)
Complete source code availability. Being able to look at the C source
code for a few things has been invaluable for me.
And, of course, and extremely active, responsive, and vibrant
community that, among other things, has contributed packages and code
for an incredible range of problems.


Best,

R.

P.S. I'd be interested in hearing about the responses you get to your
presentation.


> Kind Regards
>
> Lorenzo Isella
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-06 Thread Roland Rau
Hi Lorenzo,

On 4/5/07, Lorenzo Isella <[EMAIL PROTECTED]> wrote:
>
> I would like to have suggestions about where to collect info about:
> (1)Institutions (not only academia) using R


A starting point might be to look at the R-project homepage and look at the
members and donors list. This is, of course, not a comprehensive list; but
at least it can give an overview in which diverse backgrounds people are
using R --- even if it is only the tip of the iceberg.

(2)Hardware requirements, possibly benchmarks


Maybe you should also mention that you can run just from a USB stick if you
want (See R for Windows FAQ 2.6).


(3)R & clusters, R & multiple CPU machines, R performance on different
> hardware.


Have a look a the 'R Administration and Installation' manual; it gives a
nice overview on how many platforms are is running.

Best,
Roland

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-06 Thread bogdan romocea
> (1)Institutions (not only academia) using R

http://www.r-project.org/useR-2006/participants.html

> (2)Hardware requirements, possibly benchmarks

Since you mention huge data sets, GNU/Linux running on 64-bit machines
with as much RAM as your budget allows.

> (3)R & clusters, R & multiple CPU machines,
> R performance on different hardware.

OpenMosix, Quantian for clusters; the archive for multiple CPUs (this
was asked quite a few times). It may be best to measure R performance
on different hardware by yourself, using your own data and code.

> (4)finally, a list of the advantages for using R over
> commercial statistical packages.

I'd say it's not R vs. commercial packages, but S vs. the rest of the
world. Check http://www.insightful.com/ , much of what they say is
applicable to R. Make the case that S is vastly superior directly, not
just through a list of reasons: take a few data sets and show how they
can be analyzed with S compared to other choices. Both R and S-Plus
are likely to significantly outperform most other software, depending
on the kind of work that needs to be done.


> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Lorenzo Isella
> Sent: Thursday, April 05, 2007 11:02 AM
> To: r-help@stat.math.ethz.ch
> Subject: [R] Reasons to Use R
>
> Dear All,
> The institute I work for is organizing an internal workshop for High
> Performance Computing (HPC).
> I am planning to attend it and talk a bit about fluid dynamics, but
> there is also quite a lot of interest devoted to data post-processing
> and management of huge data sets.
> A lot of people are interested in image processing/pattern recognition
> and statistic applied to geography/ecology, but I would like not to
> post this on too many lists.
> The final aim of the workshop is  understanding hardware requirements
> and drafting a list of the equipment we would like to buy. I think
> this could be the venue to talk about R as well.
> Therefore, even if it is not exactly a typical mailing list question,
> I would like to have suggestions about where to collect info about:
> (1)Institutions (not only academia) using R
> (2)Hardware requirements, possibly benchmarks
> (3)R & clusters, R & multiple CPU machines, R performance on
> different hardware.
> (4)finally, a list of the advantages for using R over commercial
> statistical packages. The money-saving in itself is not a reason good
> enough and some people are scared by the lack of professional support,
> though this mailing list is simply wonderful.
>
> Kind Regards
>
> Lorenzo Isella
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-06 Thread Stephen Tucker
Hi Lorenzo,

I don't think I'm qualified to provide solid information on the first
three questions, but I'd like to drop a few thoughts on (4). While
there are no shortage of language advocates out there, I'd like to
join in for this once. My background is in chemical engineering and
atmospheric science; I've done simulation on a smaller scale but spend
much of my time analyzing large sets of experimental data. I am
comfortable programming in Matlab, R, Python, C, Fortran, Igor Pro,
and I also know a little IDL but have not programmed in it
extensively.

As you are probably aware, I would count among these, Matlab, R,
Python, and IDL as good candidates for processing large data sets, as
they are high-level languages and can communicate with netCDF files
(which I imagine will be used to transfer data).

Each language boasts an impressive array of libraries, but what I
think gives R the advantage for analyzing data is the level of
abstraction in the language. I am extremely impressed with the objects
available to represent data sets, and the functions support them very
well - it requires that I carry around a fewer number of objects to
hold information about my data (and I don't have to "unpack" them to
feed them into functions). The language is also very "expressive" in
that it lets you write a procedure in many different ways, some
shorter, some more readable, depending on what your situation
requires. System commands and text processing are integrated into the
language, and the input/output facilities are excellent, in terms of
data and graphics. Once I have my data object I am only a few
keystrokes to split, sort, and visualize multivariate data; even after
several years I keep discovering new functions for basic things like
manipulation of data objects and descriptive statistics, and plotting
- truly, an analyst's needs have been well anticipated.

And this is a recent obsession of mine, which I was introduced to
through Python, but the functional programming support for R is
amazing. By using higher-order functions like lapply(), I infrequently
rely on FOR-LOOPS, which have often caused me trouble in the past
because I had forgotten to re-initialize a variable, or incremented
the wrong variable, etc. Though I'm definitely not militant about
functional programming, in general I try to write functions and then
apply them to the data (if the functions don't exist in R already),
often through higher-order functions such as lapply(). This approach
keeps most variables out of the global namespace and so I am less
likely to reassign a value to a variable that I had intended to
keep. It also makes my code more modular so that I can re-use bits of
my code as my analysis inevitably grows much larger than I had
originally intended.

Furthermore, my code in R ends up being much, much shorter than code I
imagine writing in other languages to accomplish the same task; I
believe this leads to fewer places for errors to occur, and the nature
of the code is immediately comprehensible (though a series of nested
functions can get pretty hard to read at times), not to mention it
takes less effort to write. This also makes it easier to interact with
the data, I think, because after making a plot I can set up for the
next plot with only a few function calls instead of setting out to
write a block of code with loops, etc.

I have actually recommended R to colleagues who needed to analyze the
information from large-scale air quality/ global climate simulations,
and they are extremely pleased. I think the capability for statistics
and graphics is well-established enough that I don't need to do a
hard-sell on that so much, but R's language is something I get very
excited about. I do appreciate all the contributors who have made this
available.

Best regards,
ST


--- Lorenzo Isella <[EMAIL PROTECTED]> wrote:

> Dear All,
> The institute I work for is organizing an internal workshop for High
> Performance Computing (HPC).
> I am planning to attend it and talk a bit about fluid dynamics, but
> there is also quite a lot of interest devoted to data post-processing
> and management of huge data sets.
> A lot of people are interested in image processing/pattern recognition
> and statistic applied to geography/ecology, but I would like not to
> post this on too many lists.
> The final aim of the workshop is  understanding hardware requirements
> and drafting a list of the equipment we would like to buy. I think
> this could be the venue to talk about R as well.
> Therefore, even if it is not exactly a typical mailing list question,
> I would like to have suggestions about where to collect info about:
> (1)Institutions (not only academia) using R
> (2)Hardware requirements, possibly benchmarks
> (3)R & clusters, R & multiple CPU machines, R performance on different
> hardware.
> (4)finally, a list of the advantages for using R over commercial
> statistical packages. The money-saving in itself is not a reason good
> enough and some peop

Re: [R] Reasons to Use R

2007-04-06 Thread Wilfred Zegwaard
As to my knowledge the core of R is considered "adequate" and "good" by
the statisticians. That's sufficient isn't it?
Last year I read some documentation about R and most routines were
considered "good", but "some very bad". That is a benchmark somehow.

There must be some benchmarks you want. R is widely used and there must
be people around who can provide you with the adequate stuff. CRAN is a
way to that, or the project page.

The core is free by the way and you can participate in the development.
People can provide you there with the information you want. R is quite
well documented (not everybody thinks it's well doc'ed, but... you
know... opinions do vary).

There is one simple reason to use R. It's free that's for one. If you
have the money commercial software is sufficient. That doesn't mean that
R is the poor mans software. It works quite well actually (but you...
know... opinions vary, especially about statistical software). I think
that's the usual reason to use it: it works quite well, and it's
documentation is widely available. A LOT of statistical procedures are
available. R crashed about 2 times last year on my computer and that's a
better than SPSS, and there are a lot of user interfaces available which
make working with R easier.
Personally I don't like SPSS, but I do know that the R core is used in
commercial applications. So at least one person has done some benchmarks.

Wilfred

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-06 Thread Lorenzo Isella
John Kane wrote:
> --- Lorenzo Isella <[EMAIL PROTECTED]> wrote:
>
>   
>> (4)finally, a list of the advantages for using R
>> over commercial
>> statistical packages. The money-saving in itself is
>> not a reason good
>> enough and some people are scared by the lack of
>> professional support,
>> though this mailing list is simply wonderful.
>>
>> 
> Given that I can do as much if not more with R (in
> most cases) than with commercial software, as an
> independent consultant,  'cost' is a very significant
> factor. 
>
> A very major advantage of R is the money-saving.  Have
> a look at
> http://www.spss.com/stores/1/Software_Full_Version_C2.cfm
>
>  and convince me that cost ( for an independent
> contractor) is not a good reason. 
>
> __
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
>
>   
Hello,
No doubt that for an independent contractor money is a significant 
issue, but we are talking about the case of a large organization for 
which spending a few thousand euros on software is routine.
To avoid misunderstandings: I am myself an R user and I have no 
intention to pay a cent for statistical software, but in order to speak 
up for R vs any commercial software for data analysis and 
postprocessing, I need technical details (benchmarks, etc...) rather 
than the fact that it helps saving money.
Kind Regards

Lorenzo

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-05 Thread John Kane

--- Lorenzo Isella <[EMAIL PROTECTED]> wrote:

>
> (4)finally, a list of the advantages for using R
> over commercial
> statistical packages. The money-saving in itself is
> not a reason good
> enough and some people are scared by the lack of
> professional support,
> though this mailing list is simply wonderful.
>
Given that I can do as much if not more with R (in
most cases) than with commercial software, as an
independent consultant,  'cost' is a very significant
factor. 

A very major advantage of R is the money-saving.  Have
a look at
http://www.spss.com/stores/1/Software_Full_Version_C2.cfm

 and convince me that cost ( for an independent
contractor) is not a good reason.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reasons to Use R

2007-04-05 Thread Joel J. Adamson
Lorenzo Isella writes:

 > (4)finally, a list of the advantages for using R over commercial
 > statistical packages.

Here's my entry on the list, as this was a topic of conversation over
lunch: it's better than the proprietary statistical software I use
most of the time.  By better I mean that the language is consistent,
the features are all well-documented and none of it appears to have
been rushed out onto the market.  The proprietary software that I use
most of the time at work seems hurriedly cobbled together and R (nor LaTeX nor
Emacs nor Linux nor...) doesn't give me that feeling.

 > The money-saving in itself is not a reason good
 > enough

Interesting ;)  I know what you mean -- it may even make them
suspicious.

Joel
-- 
Joel J. Adamson
Biostatistician
Pediatric Psychopharmacology Research Unit
Massachusetts General Hospital
Boston, MA  02114
(617) 643-1432
(303) 880-3109





The information transmitted in this electronic communication is intended only 
for the person or entity to whom it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
information in error, please contact the Compliance HelpLine at 800-856-1983 
and properly dispose of this information.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-05 Thread Schmitt, Corinna
Dear Mr. Isella,

I just started my PhD Thesis. I need to work with R. Good sources are 
Bioconductor (www.bioconductor.org). It is a DB based on R-programming.
Another institute which has good experiences with R is the HKI in Jena, 
Germany. Perhaps you can contact Mrs. Radke to get more information or speakers 
for your workshop. Both parties are mainly for bioinformatics methods but 
perhaps can help you.

A good reason to use R is that computations are much quicker and you can 
import/export from many other programs or languages files.

Happy Easter,
C.Schmitt

**
Corinna Schmitt, Dipl.Inf.(Bioinformatik)
Fraunhofer Institut für Grenzflächen- & Bioverfahrenstechnik
Nobelstrasse 12, B 3.24
70569 Stuttgart
Germany

phone: +49 711 9704044 
fax: +49 711 9704200
e-mail: [EMAIL PROTECTED]
http://www.igb.fraunhofer.de

 

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Lorenzo Isella
Gesendet: Donnerstag, 5. April 2007 17:02
An: r-help@stat.math.ethz.ch
Betreff: [R] Reasons to Use R

Dear All,
The institute I work for is organizing an internal workshop for High
Performance Computing (HPC).
I am planning to attend it and talk a bit about fluid dynamics, but
there is also quite a lot of interest devoted to data post-processing
and management of huge data sets.
A lot of people are interested in image processing/pattern recognition
and statistic applied to geography/ecology, but I would like not to
post this on too many lists.
The final aim of the workshop is  understanding hardware requirements
and drafting a list of the equipment we would like to buy. I think
this could be the venue to talk about R as well.
Therefore, even if it is not exactly a typical mailing list question,
I would like to have suggestions about where to collect info about:
(1)Institutions (not only academia) using R
(2)Hardware requirements, possibly benchmarks
(3)R & clusters, R & multiple CPU machines, R performance on different hardware.
(4)finally, a list of the advantages for using R over commercial
statistical packages. The money-saving in itself is not a reason good
enough and some people are scared by the lack of professional support,
though this mailing list is simply wonderful.

Kind Regards

Lorenzo Isella

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reasons to Use R

2007-04-05 Thread Lorenzo Isella
Dear All,
The institute I work for is organizing an internal workshop for High
Performance Computing (HPC).
I am planning to attend it and talk a bit about fluid dynamics, but
there is also quite a lot of interest devoted to data post-processing
and management of huge data sets.
A lot of people are interested in image processing/pattern recognition
and statistic applied to geography/ecology, but I would like not to
post this on too many lists.
The final aim of the workshop is  understanding hardware requirements
and drafting a list of the equipment we would like to buy. I think
this could be the venue to talk about R as well.
Therefore, even if it is not exactly a typical mailing list question,
I would like to have suggestions about where to collect info about:
(1)Institutions (not only academia) using R
(2)Hardware requirements, possibly benchmarks
(3)R & clusters, R & multiple CPU machines, R performance on different hardware.
(4)finally, a list of the advantages for using R over commercial
statistical packages. The money-saving in itself is not a reason good
enough and some people are scared by the lack of professional support,
though this mailing list is simply wonderful.

Kind Regards

Lorenzo Isella

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.