Re: [R] reading csv files

2010-02-06 Thread analys...@hotmail.com


On Feb 5, 7:16 pm, Jim Lemon j...@bitwrit.com.au wrote:
 On 02/06/2010 09:05 AM, analys...@hotmail.com wrote:





  On Feb 5, 8:57 am, Barry Rowlingsonb.rowling...@lancaster.ac.uk
  wrote:
  On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com

  analys...@hotmail.com  wrote:
  the csv files are downloaded from a database and it looks like some
  character fields contain the CR-LF sequence within them.

  This causes R to see a new record/row and the number of rows it sees
  is different (usually higher) from the number of rows actually
  extracted.

    Hard to tell without an example, but I just tried this in a file:

  1,2,this
  is a test,99
  2,3,oneliner,45

  and:

  read.table(test.csv,sep=,)

     V1 V2              V3 V4
  1  1  2 this\nis a test 99
  2  2  3        oneliner 45

  seemed to work. But if your strings aren't quoted (hard to tell
  without an example) then you might have to find another way. Hard to
  tell without an example.

  Barry

  __
  r-h...@r-project.org mailing 
  listhttps://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

  Here is a Hex dump (please igmore the '' at the start of each line) -
  of the file that results from extracting two rows.

  EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A   ...description..
  22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E   strongUnknown
  20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65    Anytime, Anywhe
  72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F   re Learningbr /
  3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65../strong  The
  20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6F    answer is Unkno
  77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75   wn.strong  you
  20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66    can start and f
  69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68   inish in less th
  65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73   en 17 months./s
  74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C   trong  br /..
  62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61   br /..Unknown a
  62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F   bout ensuring yo
  75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A   u learn ..

  R, Fortran and Excel see five lines, but the database has only two
  lines.

 Okay, you have five CR-LF pairs with two being EORs. It looks like the
 br /CR-LF is the EOR sequence, so it should be possible to preserve
 those while changing the others to something like ~ or deleting them.
 As I said previously, the regexperts can work out a way to distinguish
 the CR-LF pairs that are _not_ in an EOR sequence.

 You might want to think about dumping the control characters as well.

 Jim

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.- Hide 
 quoted text -


I am sure other sequences cause a false EOR also.  The false EORs are
CRLF sequences are within commas - I don't know if R can read a fixed
number of bytes regardless of EOR markers. If it can, it should be
possible to assemble the true database rows from the bytes read in.
 - Show quoted text -

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reading csv files

2010-02-05 Thread analys...@hotmail.com
the csv files are downloaded from a database and it looks like some
character fields contain the CR-LF sequence within them.

This causes R to see a new record/row and the number of rows it sees
is different (usually higher) from the number of rows actually
extracted.

Any suggestions?

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading csv files

2010-02-05 Thread Barry Rowlingson
On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com
analys...@hotmail.com wrote:
 the csv files are downloaded from a database and it looks like some
 character fields contain the CR-LF sequence within them.

 This causes R to see a new record/row and the number of rows it sees
 is different (usually higher) from the number of rows actually
 extracted.

 Hard to tell without an example, but I just tried this in a file:

1,2,this
is a test,99
2,3,oneliner,45

and:

 read.table(test.csv,sep=,)
  V1 V2  V3 V4
1  1  2 this\nis a test 99
2  2  3oneliner 45

seemed to work. But if your strings aren't quoted (hard to tell
without an example) then you might have to find another way. Hard to
tell without an example.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading csv files

2010-02-05 Thread Jim Lemon

On 02/06/2010 12:57 AM, Barry Rowlingson wrote:

On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com
analys...@hotmail.com  wrote:

the csv files are downloaded from a database and it looks like some
character fields contain the CR-LF sequence within them.

This causes R to see a new record/row and the number of rows it sees
is different (usually higher) from the number of rows actually
extracted.


  Hard to tell without an example, but I just tried this in a file:

1,2,this
is a test,99
2,3,oneliner,45

and:


read.table(test.csv,sep=,)

   V1 V2  V3 V4
1  1  2 this\nis a test 99
2  2  3oneliner 45

seemed to work. But if your strings aren't quoted (hard to tell
without an example) then you might have to find another way. Hard to
tell without an example.


Maybe the database output looks like this:

1,2,this
is a test,99
2,3,oneliner,45

in which case:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, 
na.strings,  :

  line 1 did not have 4 elements

However, if we try:

read.csv(test.csv,header=FALSE)
 V1 V2   V3 V4
1 1  2 this NA
2 is a test 99  NA
3 2  3 oneliner 45

If you can determine whether the embedded EOLs are different from those 
at the end of a record, you could do a global replace on the input file 
for the embedded EOLs to some character that isn't used (e.g. ~ or |) in 
the input file. I'll leave the syntax to the regexperts.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading csv files

2010-02-05 Thread analys...@hotmail.com


On Feb 5, 8:57 am, Barry Rowlingson b.rowling...@lancaster.ac.uk
wrote:
 On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com

 analys...@hotmail.com wrote:
  the csv files are downloaded from a database and it looks like some
  character fields contain the CR-LF sequence within them.

  This causes R to see a new record/row and the number of rows it sees
  is different (usually higher) from the number of rows actually
  extracted.

  Hard to tell without an example, but I just tried this in a file:

 1,2,this
 is a test,99
 2,3,oneliner,45

 and:

  read.table(test.csv,sep=,)

   V1 V2              V3 V4
 1  1  2 this\nis a test 99
 2  2  3        oneliner 45

 seemed to work. But if your strings aren't quoted (hard to tell
 without an example) then you might have to find another way. Hard to
 tell without an example.

 Barry

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Here is a Hex dump (please igmore the '' at the start of each line) -
of the file that results from extracting two rows.


 EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A   ...description..
 22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E   strongUnknown
 20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65Anytime, Anywhe
 72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F   re Learningbr /
 3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65   ../strong The
 20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6Fanswer is Unkno
 77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75   wn. strong you
 20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66can start and f
 69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68   inish in less th
 65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73   en 17 months./s
 74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C   trong br /..
 62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61   br /..Unknown a
 62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F   bout ensuring yo
 75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A   u learn ..



R, Fortran and Excel see five lines, but the database has only two
lines.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading csv files

2010-02-05 Thread Jim Lemon

On 02/06/2010 09:05 AM, analys...@hotmail.com wrote:



On Feb 5, 8:57 am, Barry Rowlingsonb.rowling...@lancaster.ac.uk
wrote:

On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com

analys...@hotmail.com  wrote:

the csv files are downloaded from a database and it looks like some
character fields contain the CR-LF sequence within them.



This causes R to see a new record/row and the number of rows it sees
is different (usually higher) from the number of rows actually
extracted.


  Hard to tell without an example, but I just tried this in a file:

1,2,this
is a test,99
2,3,oneliner,45

and:


read.table(test.csv,sep=,)


   V1 V2  V3 V4
1  1  2 this\nis a test 99
2  2  3oneliner 45

seemed to work. But if your strings aren't quoted (hard to tell
without an example) then you might have to find another way. Hard to
tell without an example.

Barry

__
r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Here is a Hex dump (please igmore the '' at the start of each line) -
of the file that results from extracting two rows.



EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A   ...description..
22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E   strongUnknown
20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65Anytime, Anywhe
72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F   re Learningbr /
3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65../strong  The
20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6Fanswer is Unkno
77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75   wn.strong  you
20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66can start and f
69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68   inish in less th
65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73   en 17 months./s
74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C   trong  br /..
62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61   br /..Unknown a
62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F   bout ensuring yo
75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A   u learn ..




R, Fortran and Excel see five lines, but the database has only two
lines.

Okay, you have five CR-LF pairs with two being EORs. It looks like the 
br /CR-LF is the EOR sequence, so it should be possible to preserve 
those while changing the others to something like ~ or deleting them. 
As I said previously, the regexperts can work out a way to distinguish 
the CR-LF pairs that are _not_ in an EOR sequence.


You might want to think about dumping the control characters as well.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reading csv files - SYLK : file format not valid

2007-11-10 Thread Bob Green

I am wanting to read EXCEL files into R. In the past I have saved 
EXCEL files as csv files without difficulty. Recently, when I have 
saved the files in this format I am then unable to open them again in 
EXCEL or though windows (XP). I receive a message stating -  SYLK: 
file format not valid.

R will read the file, but I cannot open it in windows. Csv files 
created and saved in the past (on the same computer) can still be opened.

I am not saving as csv (Macintosh) and saving as a SYLK file results 
in the same error.

Any suggestions are appreciated,

Bob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.