On Nov 16, 2010, at 02:59 , Ben Bolker wrote: > Ben Bolker <bbolker <at> gmail.com> writes: > >> >> Ben Bolker <bbolker <at> gmail.com> writes: >> >>> >>> >> >> Can simplify this still farther: >> >> a b'c >> d e'f >> g h'i > > This example file leads to duplicate lines. > Arguably it should have behavior analogous to: > >> scan(what="") > 1: a b'c > 3: d e'f > 5: g h'i > 7: Read 6 items > [1] "a" "b'c" "d" "e'f" "g" "h'i" > > >> >>> One of the first things that happens in read.table is that >>> the first few lines are read with readTableHead: >>> >>> lines <- .Internal(readTableHead(file, nlines, comment.char, >>> blank.lines.skip, quote, sep)) >>> >> in this case, this reads the first two lines as one line; >> the single quote at pos. 4 of the first line closes on pos. >> 4 of the second line, preventing the first new line from >> ending a line. >> >> R then pushes back two copies of the lines that have >> been read (this is normal behavior; I don't quite follow the >> logic). >> >> The rest of the file is read with scan(), 1 line at a time. >> However, there is the discrepancy between the way >> that readTableHead interprets new lines in the middle of >> quoted strings (it ignores them) and the way that scan() >> interprets them (it takes them as the end of the quoted string). > > > Ping? > I think this counts as a small, but real, bug. Should I go ahead > and report it as such, or would someone explain why it's not a bug? >
I think it can be defended to file as a bug, but it is tricky to pinpoint exactly what the issue is. E.g., notice that adding a few spaces changes the behaviour of scan() considerably: > scan(what="") 1: a b 'c 1: d e' f 5: g h' i 8: Read 7 items [1] "a" "b" "c\nd e" "f" "g" "h'" "i" (I'm confused... What is it that we really want here?) Also, as you noted originally, beware the "Well don't do that then" aspect... -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel