On Oct 4, 2013, at 17:10 , Henrik Bengtsson wrote: > On Fri, Oct 4, 2013 at 4:55 AM, Duncan Murdoch <murdoch.dun...@gmail.com> > wrote: >> On 13-10-04 7:31 AM, Joshua Ulrich wrote: >>> >>> On Tue, Oct 1, 2013 at 11:29 AM, David Winsemius <dwinsem...@comcast.net> >>> wrote: >>>> >>>> >>>> On Sep 30, 2013, at 6:38 AM, Joshua Ulrich wrote: >>>> >>>>> On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimi...@club.fr> >>>>> wrote: >>>>>> >>>>>> Hi! >>>>>> >>>>>> >>>>>> It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider >>>>>> quoted integers as an acceptable value for columns for which >>>>>> colClasses="integer". But when colClasses is omitted, these columns are >>>>>> read as integer anyway. >>>>>> >>>>>> For example, let's consider a file named file.dat, containing: >>>>>> "1" >>>>>> "2" >>>>>> >>>>>>> read.table("file.dat", colClasses="integer") >>>>>> >>>>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >>>>>> na.strings, : >>>>>> scan() expected 'an integer' and got '"1"' >>>>>> >>>>>> But: >>>>>>> >>>>>>> str(read.table("file.dat")) >>>>>> >>>>>> 'data.frame': 2 obs. of 1 variable: >>>>>> $ V1: int 1 2 >>>>>> >>>>>> The latter result is indeed documented in ?read.table: >>>>>> Unless ‘colClasses’ is specified, all columns are read as >>>>>> character columns and then converted using ‘type.convert’ to >>>>>> logical, integer, numeric, complex or (depending on ‘as.is’) >>>>>> factor as appropriate. Quotes are (by default) interpreted in all >>>>>> fields, so a column of values like ‘"42"’ will result in an >>>>>> integer column. >>>>>> >>>>>> >>>>>> Should the former behavior be considered a bug? >>>>>> >>>>> No. If you tell read.table the column is integer and it's actually >>>>> character on disk, it should be an error. >>>> >>>> >>>> My reading of the `read.table` help page is that one should expect that >>>> when >>>> there is an 'integer'-class and an `as.integer` function and "integer" >>>> is the >>>> argument to colClasses, that `as.integer` will be applied to the values >>>> in the >>>> column. Should I be reading elsewhere? >>>> >>> I assume you're referring to the paragraph below. >>> >>> Possible values are ‘NA’ (the default, when ‘type.convert’ is >>> used), ‘"NULL"’ (when the column is skipped), one of the >>> atomic vector classes (logical, integer, numeric, complex, >>> character, raw), or ‘"factor"’, ‘"Date"’ or ‘"POSIXct"’. >>> Otherwise there needs to be an ‘as’ method (from package >>> ‘methods’) for conversion from ‘"character"’ to the specified >>> formal class. >>> >>> I read that as meaning that an "as" method is required for classes not >>> already listed in the prior sentence. It doesn't say an "as" method >>> will be applied if colClasses is one of the atomic, factor, Date, or >>> POSIXct classes; but I can see how you might assume that, since all >>> the atomic, factor, Date, and POSIXct classes already have "as" >>> methods... >> >> >> And this does suggest a workaround for ffdf: instead of declaring the class >> to be "integer", declare a class "ffdf_integer", and write a conversion >> method. Or simply read everything as character and call as.integer() >> explicitly. > > Just a note of concert since several proposed it:
concerN? > colClasses="character") followed by as.integer() or strtoi() misses > the validation, e.g. "foo" will be turned into NA_integer_. Using > read.table() or scan() gives an error. The obvious fix for that would seem to be to use scan() on the character vector: > y <- c("1","2",3,4,5) > y [1] "1" "2" "3" "4" "5" > scan(text=y) Read 5 items [1] 1 2 3 4 5 > y <- c("1","2",3,4,"NA") > scan(text=y) Read 5 items [1] 1 2 3 4 NA > y <- c("1","2",3,4,"foo") > scan(text=y) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got 'foo' > > /Henrik > >> >> Duncan Murdoch >> >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel