[R] scan() problem

2003-09-10 Thread Paul Bayer
Dear R-helpers,

I have to read some large csv-files into R (30 - 100MB).
Since reading with read.csv leads to memory exhausted, I tried
with scan(), skipping not needed columns by NULL-elements in
what.
When these skipped elements are quoted strings with commata inside,
R interprets each such quoted comma as element separator
leading to wrong records in the rest of the line.
A little test will show what I mean. I have the following test.csv:

col.A,col.B,col.C,col.D
1,quoted string,again, again again,123
2,nice quotes, isnt it,you got it,456
First I read all elements:

 tst - scan(test.csv, what=list(a=0,b=,c=,d=0), sep=,, skip=1)
Read 2 records
 tst
$a
[1] 1 2
$b
[1] quoted stringnice quotes, isnt it
$c
[1] again, again again you got it
$d
[1] 123 456
Everything is fine. Then I try to skip the 2nd column by giving b=NULL:

 tst - scan(test.csv, what=list(a=0,b=NULL,c=,d=0), sep=,, 
skip=1)
Read 2 records
Warning message:
number of items read is not a multiple of the number of columns
 tst
$a
[1] 1 2

$b
NULL
$c
[1] again, again again isnt it,you got it,456\n\n\n
$d
[1] 123  NA


I got garbage.
Isn't this a bug?
Or did I something wrong?
Is there a workaround?
Thank you all,

Paul Bayer,
Feldafing, Germany
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] scan() problem

2003-09-10 Thread Gabor Grothendieck

If the records are always of the form:

number,...,...,number 

where ... may contain commas but not double quotes then here is a 
kludgy solution.  Perhaps its sufficient?

# scan in data using  as the delimiter and keep first and last fields
s - scan(clipboard,skip=1,what=list(,NULL,NULL,NULL,),sep=\)

# remove commas from fields, convert to numeric and reshape into matrix
matrix(as.numeric(sub(,,,unlist(s))),nc=2)



--- Paul Bayer [EMAIL PROTECTED] wrote:
Dear R-helpers,

I have to read some large csv-files into R (30 - 100MB).
Since reading with read.csv leads to memory exhausted, I tried
with scan(), skipping not needed columns by NULL-elements in
what.

When these skipped elements are quoted strings with commata inside,
R interprets each such quoted comma as element separator
leading to wrong records in the rest of the line.

A little test will show what I mean. I have the following test.csv:

col.A,col.B,col.C,col.D
1,quoted string,again, again again,123
2,nice quotes, isnt it,you got it,456

First I read all elements:

  tst - scan(test.csv, what=list(a=0,b=,c=,d=0), sep=,, skip=1)
Read 2 records
  tst
$a
[1] 1 2

$b
[1] quoted stringnice quotes, isnt it

$c
[1] again, again again you got it

$d
[1] 123 456

Everything is fine. Then I try to skip the 2nd column by giving b=NULL:

  tst - scan(test.csv, what=list(a=0,b=NULL,c=,d=0), sep=,, 
skip=1)
Read 2 records
Warning message:
number of items read is not a multiple of the number of columns
  tst
$a
[1] 1 2

$b
NULL

$c
[1] again, again again isnt it,you got it,456\n\n\n

$d
[1] 123  NA

 

I got garbage.
Isn't this a bug?
Or did I something wrong?
Is there a workaround?

Thank you all,

Paul Bayer,
Feldafing, Germany

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help