Re: [Rd] Importing csv files

Prof Brian Ripley Thu, 23 Dec 2004 09:32:53 -0800

On Thu, 23 Dec 2004, Frank E Harrell Jr wrote:

Prof Brian Ripley wrote:
I think we need to know what you mean by `large' and why read.table is not fast enough (and hence if some of the planned improvements might be all that is needed).
I was referring to the e-mail exchanges on r-help about read.table a few weeks ago, then there was a new discussion the other day concerning RAM usage and read.table not knowing the number of rows up front. I believe that the posters provided some timings and examples.

I have yet to see any which used read.table competently which were slow (although the RAM usage could be higher than some people expected). Unless people have followed _all_ the hints in the Data manual, I don't think there is anything to discuss.

There is an issue with reading factors with just a few unique values, but that is one of the things being worked on.

Could you make some examples available for profiling?


Anyone who actually has a problem, then?

It seems to me that there are some delicate licensing issues in distributing a product that writes .rda format except under GPL. See, for example, the GPL FAQ.
My understanding is that David is not distributing dataload any more, though I would not like to discourage commercial vendors (such as providers of Stat/Transfer and DBMSCOPY) from providing .rda output as an option. I assume that new code written under GPL would not be a problem. -Frank

I said `except under GPL'. I am not trying to discourage anyone, just pointing out that GPL has far-ranging implications that are often over-looked.

On Thu, 23 Dec 2004, Frank E Harrell Jr wrote:
There is a recurring need for importing large csv files quickly. David Baird's dataload is a standalone program that will directly create .rda files from .csv (it also handles many other conversions). Unfortunately dataload is no longer publicly available because of some kind of relationship with Stat/Transfer. The idea is a good one, though. I wonder if anyone would volunteer to replicate the csv->rda standalone functionality or to provide some Perl or Python tools for making creation of .rda files somewhat easy outside of R.

As an aside, I routinely see 30-fold reductions in file sizes for .rda files (made with save(..., compress=TRUE)) compared with the size of SAS binary datasets. And load( ) times are fast.

It's been a great year for R. Let me take this opportunity to thank the R leaders for a fantastic job that gives immeasurable benefits to the community.

It's certainly been a great year for people to complain about R, R-help .... We say

        R is a collaborative project with many contributors.

but it seems to me much less than it used to be.

--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Importing csv files

Reply via email to