On Jan 9, 2008 2:01 PM, Derek Stephen Elmerick <[EMAIL PROTECTED]> wrote:
> Hello â > > I am trying to write code that will read in multiple datasets; > however, I would like to skip any dataset where the read-in process > takes longer than some fixed cutoff. A generic version of the function > is the following: > > for(k in 1:number.of.datasets) > { > X[k]=read.table(â¦) > } > > The issue is that I cannot find a way to embed logic that will abort > the read-in process of a specific dataset without manual intervention. > I scanned the help manual and other postings, but no luck based on my > search. Any thoughts? A simple solution is to use nrows=1000000 or so (whatever makes sense). Then, any dataset larger than that will be truncated. If you use a connection, you could even check after the read.table completes to see if more rows are available--if so, the entire dataset has not been read. A slightly more complicated solution might be to read in 1000 lines or so (depends a bit on the data) at a time and then rbind the results of multiple read.table() calls at the end. If you capture the colClasses from the first read, this can potentially be even faster than standard read.table() on the whole dataset. You can read from a connection so that the file does not need to be reopened and the connection need not be reset. You could check the time after each chunk of lines to see if you have exceeded your threshold. There, of course, may be more clever solutions that I haven't thought of. Sean [[alternative HTML version deleted]]
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel