If the records are always of the form:
number,...,...,number
where ... may contain commas but not double quotes then here is a
kludgy solution. Perhaps its sufficient?
# scan in data using as the delimiter and keep first and last fields
s - scan(clipboard,skip=1,what=list(,NULL,NULL,NULL,),sep=\)
# remove commas from fields, convert to numeric and reshape into matrix
matrix(as.numeric(sub(,,,unlist(s))),nc=2)
--- Paul Bayer [EMAIL PROTECTED] wrote:
Dear R-helpers,
I have to read some large csv-files into R (30 - 100MB).
Since reading with read.csv leads to memory exhausted, I tried
with scan(), skipping not needed columns by NULL-elements in
what.
When these skipped elements are quoted strings with commata inside,
R interprets each such quoted comma as element separator
leading to wrong records in the rest of the line.
A little test will show what I mean. I have the following test.csv:
col.A,col.B,col.C,col.D
1,quoted string,again, again again,123
2,nice quotes, isnt it,you got it,456
First I read all elements:
tst - scan(test.csv, what=list(a=0,b=,c=,d=0), sep=,, skip=1)
Read 2 records
tst
$a
[1] 1 2
$b
[1] quoted stringnice quotes, isnt it
$c
[1] again, again again you got it
$d
[1] 123 456
Everything is fine. Then I try to skip the 2nd column by giving b=NULL:
tst - scan(test.csv, what=list(a=0,b=NULL,c=,d=0), sep=,,
skip=1)
Read 2 records
Warning message:
number of items read is not a multiple of the number of columns
tst
$a
[1] 1 2
$b
NULL
$c
[1] again, again again isnt it,you got it,456\n\n\n
$d
[1] 123 NA
I got garbage.
Isn't this a bug?
Or did I something wrong?
Is there a workaround?
Thank you all,
Paul Bayer,
Feldafing, Germany
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help