On 17/05/2007 10:11 PM, Andrew Yee wrote:
This is a dumb question, but I'm having trouble finding the answer to this.
I'd like to do the following:
x-asdf
and then have
the object x.y become automatically converted/represented as asdf.y (sort of
akin to macro variables in SAS where
This is a dumb question, but I'm having trouble finding the answer to this.
I'd like to do the following:
x-asdf
and then have
the object x.y become automatically converted/represented as asdf.y (sort of
akin to macro variables in SAS where you would do:
%let x=asdf and do x..y)
What is the
: Thursday, July 01, 2004 5:22 PM
To: [EMAIL PROTECTED]
Subject: RE: [R] naive question
As part of a continuing thread on the cost of loading large
amounts of data into R,
Vadim Ogranovich [EMAIL PROTECTED] wrote:
R's IO is indeed 20 - 50 times slower than that of
equivalent C code
Douglas Bates bates at stat.wisc.edu writes:
: If you are routinely working with very large data sets it would be
: worthwhile learning to use a relational database (PostgreSQL, MySQL,
: even Access) to store the data and then access it from R with RODBC or
: one of the specialized database
As far as I know, read.table() in S-plus performs similarly to read.table()
in R with respect to speed. So, I wouldn't put high hopes in finding much
satisfaction there.
I do frequently read large tables in S-plus, and with a considerable amount
of work was able to speed things up
Tony Plate tplate at blackmesacapital.com writes:
I get the best read performance out of S-plus by using
a homegrown binary file format with each column stored in a contiguous
block of memory and meta data (i.e., column types and dimensions) stored at
the start of the file. The S-plus
Thank you! It's interesting about S-Plus, since they apparently try to support
work with much larger data sets by writing everything out to disk (thus getting
around the, eg, address space limitations, I guess), so it is a little surprising
that they did not tweak the I/O more...
Thanks
To be careful, there's lots more to I/O than the functions read.table()
scan() -- I was only commenting on those, and no inference should be made
about other aspects of S-plus I/O based on those comments!
I suspect that what has happened is that memory, CPU speed, and I/O speed
have evolved
I suspect that what has happened is that memory, CPU speed, and I/O
speed have evolved at different rates, so what used to be acceptable
code in read.table() (in both R and S-plus) is now showing its
limitations and has reached the point where it can take half an hour to
read in, on a
[EMAIL PROTECTED] writes:
I did not use R ten years ago, but reasonable RAM amounts have
multiplied by roughly a factor of 10 (from 128Mb to 1Gb), CPU speeds have
gone up by a factor of 30 (from 90Mhz to 3Ghz), and disk space availabilty
has gone up probably by a factor of 10. So, unless the
[EMAIL PROTECTED] writes:
I did not use R ten years ago, but reasonable RAM amounts have
multiplied by roughly a factor of 10 (from 128Mb to 1Gb), CPU speeds
have gone up by a factor of 30 (from 90Mhz to 3Ghz), and disk space
availabilty has gone up probably by a factor of 10. So, unless the
PROTECTED] [EMAIL PROTECTED], [EMAIL PROTECTED]
ath.ethz.ch Subject: Re: [R] naive question
I have a 100Mb comma-separated file, and R takes several minutes to read it
(via read.table()). This is R 1.9.0 on a linux box with a couple gigabytes of
RAM. I am conjecturing that R is gc-ing, so maybe there is some command-line
arg I can give it to convince it that I have a lot of space, or?!
There are hints in the R Data Import/Export Manual. Just checking: you
_have_ read it?
On Tue, 29 Jun 2004, Igor Rivin wrote:
I have a 100Mb comma-separated file, and R takes several minutes to read it
(via read.table()). This is R 1.9.0 on a linux box with a couple gigabytes of
RAM. I am
I did read the Import/Export document. It is true that replacing
the read.table by read.csv and setting the commentChar= speeds
things up some (a factor of two?) -- this is very far from acceptable performance,
being some two orders of magnitude worse than SAS (the IO of which is, in turn, much
From: [EMAIL PROTECTED]
I did read the Import/Export document. It is true that replacing
the read.table by read.csv and setting the commentChar= speeds
things up some (a factor of two?) -- this is very far from
acceptable performance,
being some two orders of magnitude worse than SAS (the
At 01:22 PM 6/29/2004, Igor Rivin wrote:
I did read the Import/Export document. It is true that replacing
the read.table by read.csv and setting the commentChar= speeds
things up some (a factor of two?) -- this is very far from acceptable
performance,
being some two orders of magnitude worse than
R's IO is indeed 20 - 50 times slower than that of equivalent C code no
matter what you do, which has been a pain for some of us. It does
however help read the Import/Export tips as w/o them the ratio gets much
worse. As Gabor G. suggested in another mail, if you use the file
repeatedly you can
I was not particularly annoyed, just disappointed, since R seems like
a much better thing than SAS in general, and doing everything with a combination
of hand-rolled tools is too much work. However, I do need to work with very large data
sets, and if it takes 20 minutes to read them in, I have
At 01:22 PM 6/29/2004, Igor Rivin wrote:
I did read the Import/Export document. It is true that replacing the
read.table by read.csv and setting the commentChar= speeds things up
some (a factor of two?) -- this is very far from acceptable
performance,
being some two orders of magnitude
Igor Rivin wrote:
I was not particularly annoyed, just disappointed, since R seems like
a much better thing than SAS in general, and doing everything with a combination
of hand-rolled tools is too much work. However, I do need to work with very large data sets, and if it takes 20 minutes to read
I am working with data sets that have 2 matrices of 300 columns by 19,000
rows , and I manage to get the data loaded in a reasonable amount of time.
Once its in I save the workspace and load from there. Once I start doing
some work on the data, I am taking up about 600 Meg's of RAM out of the 1
On Tue, 29 Jun 2004 16:59:58 -0700, Vadim Ogranovich
[EMAIL PROTECTED] wrote:
R's IO is indeed 20 - 50 times slower than that of equivalent C code no
matter what you do, which has been a pain for some of us.
Things like this shouldn't be a pain for long. If C code works well,
why not use C?
We need more details about your problem to provide any useful
help. Are all the variables numeric? Are they all completely
different? Is it possible to use `colClasses'?
It is possible, but very inconvenient. There are mostly numeric columns,
but some integer categories, and some string
Also, having a couple of gigabytes of RAM is not necessarily useful if
you're on a 32-bit OS since the total process size is usually limited to
be less than ~3GB.
well 2^32 gives you more like 4 GB, how much of that can be given to a
process my highest workspace reached 1.2 Gig. I will
Igor Rivin wrote:
I was not particularly annoyed, just disappointed, since R seems like
a much better thing than SAS in general, and doing everything with a
combination of hand-rolled tools is too much work. However, I do need
to work with very large data sets, and if it takes 20 minutes to
On Tue, 29-Jun-2004 at 10:31PM -0400, [EMAIL PROTECTED] wrote:
| We need more details about your problem to provide any useful
| help. Are all the variables numeric? Are they all completely
| different? Is it possible to use `colClasses'?
|
| It is possible, but very inconvenient. There
On Tue, 29-Jun-2004 at 10:31PM -0400, [EMAIL PROTECTED]
wrote:
| We need more details about your problem to provide any useful |
help. Are all the variables numeric? Are they all completely |
different? Is it possible to use `colClasses'?
|
| It is possible, but very inconvenient.
28 matches
Mail list logo