Ben Bolker <bbolker <at> gmail.com> writes: > On 02/11/2011 03:37 PM, Laurent Gatto wrote: > > On 11 February 2011 19:39, Ben Bolker <bbolker <at> gmail.com> wrote: > >> > > [snip] > >>
Bump. Is there any opinion about this from R-core?? Will I be scolded if I submit this as a bug ... ?? > >> What is dangerous/confusing is that R silently **wraps** longer lines if > >> fill=TRUE (which is the default for read.csv). I encountered this when > >> working with a colleague on a long, messy CSV file that had some phantom > >> extra fields in some rows, which then turned into empty lines in the > >> data frame. > >> [snip snip] > >> Here is an example and a workaround that runs count.fields on the > >> whole file to find the maximum column length and set col.names > >> accordingly. (It assumes you don't already have a file named "test.csv" > >> in your working directory ...) > >> > >> I haven't dug in to try to write a patch for this -- I wanted to test > >> the waters and see what people thought first, and I realize that > >> read.table() is a very complicated piece of code that embodies a lot of > >> tradeoffs, so there could be lots of different approaches to trying to > >> mitigate this problem. I appreciate very much how hard it is to write a > >> robust and general function to read data files, but I also think it's > >> really important to minimize the number of traps in read.table(), which > >> will often be the first part of R that new users encounter ... > >> > >> A quick fix for this might be to allow the number of lines analyzed > >> for length to be settable by the user, or to allow a settable 'maxcols' > >> parameter, although those would only help in the case where the user > >> already knows there is a problem. > >> > >> cheers > >> Ben Bolker > >> =============== writeLines(c("A,B,C,D", "1,a,b,c", "2,f,g,c", "3,a,i,j", "4,a,b,c", "5,d,e,f", "6,g,h,i,j,k,l,m,n"), con=file("test.csv")) > >> > >> read.csv("test.csv") try(read.csv("test.csv",fill=FALSE)) > >> ## assumes header=TRUE, fill=TRUE; should be a little more careful ## with comment, quote arguments (possibly explicit) ## ... contains information about quote, comment.char, sep Read.csv <- function(fn,sep=",",...) { colnames <- scan(fn,nlines=1,what="character",sep=sep,...) ncolnames <- length(colnames) maxcols <- max(count.fields(fn,sep=sep,...)) if (maxcols>ncolnames) { colnames <- c(colnames,paste("V",(ncolnames+1):maxcols,sep="")) } ## assumes you don't have any other columns labeled "V[large number]" read.csv(fn,...,col.names=colnames) } Read.csv("test.csv") ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel