Re: [R] reading csv files
On Feb 5, 7:16 pm, Jim Lemon j...@bitwrit.com.au wrote: On 02/06/2010 09:05 AM, analys...@hotmail.com wrote: On Feb 5, 8:57 am, Barry Rowlingsonb.rowling...@lancaster.ac.uk wrote: On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com analys...@hotmail.com wrote: the csv files are downloaded from a database and it looks like some character fields contain the CR-LF sequence within them. This causes R to see a new record/row and the number of rows it sees is different (usually higher) from the number of rows actually extracted. Hard to tell without an example, but I just tried this in a file: 1,2,this is a test,99 2,3,oneliner,45 and: read.table(test.csv,sep=,) V1 V2 V3 V4 1 1 2 this\nis a test 99 2 2 3 oneliner 45 seemed to work. But if your strings aren't quoted (hard to tell without an example) then you might have to find another way. Hard to tell without an example. Barry __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Here is a Hex dump (please igmore the '' at the start of each line) - of the file that results from extracting two rows. EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A ...description.. 22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E strongUnknown 20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65 Anytime, Anywhe 72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F re Learningbr / 3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65../strong The 20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6F answer is Unkno 77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75 wn.strong you 20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66 can start and f 69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68 inish in less th 65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73 en 17 months./s 74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C trong br /.. 62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61 br /..Unknown a 62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F bout ensuring yo 75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A u learn .. R, Fortran and Excel see five lines, but the database has only two lines. Okay, you have five CR-LF pairs with two being EORs. It looks like the br /CR-LF is the EOR sequence, so it should be possible to preserve those while changing the others to something like ~ or deleting them. As I said previously, the regexperts can work out a way to distinguish the CR-LF pairs that are _not_ in an EOR sequence. You might want to think about dumping the control characters as well. Jim __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.- Hide quoted text - I am sure other sequences cause a false EOR also. The false EORs are CRLF sequences are within commas - I don't know if R can read a fixed number of bytes regardless of EOR markers. If it can, it should be possible to assemble the true database rows from the bytes read in. - Show quoted text - __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reading csv files
the csv files are downloaded from a database and it looks like some character fields contain the CR-LF sequence within them. This causes R to see a new record/row and the number of rows it sees is different (usually higher) from the number of rows actually extracted. Any suggestions? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading csv files
On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com analys...@hotmail.com wrote: the csv files are downloaded from a database and it looks like some character fields contain the CR-LF sequence within them. This causes R to see a new record/row and the number of rows it sees is different (usually higher) from the number of rows actually extracted. Hard to tell without an example, but I just tried this in a file: 1,2,this is a test,99 2,3,oneliner,45 and: read.table(test.csv,sep=,) V1 V2 V3 V4 1 1 2 this\nis a test 99 2 2 3oneliner 45 seemed to work. But if your strings aren't quoted (hard to tell without an example) then you might have to find another way. Hard to tell without an example. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading csv files
On 02/06/2010 12:57 AM, Barry Rowlingson wrote: On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com analys...@hotmail.com wrote: the csv files are downloaded from a database and it looks like some character fields contain the CR-LF sequence within them. This causes R to see a new record/row and the number of rows it sees is different (usually higher) from the number of rows actually extracted. Hard to tell without an example, but I just tried this in a file: 1,2,this is a test,99 2,3,oneliner,45 and: read.table(test.csv,sep=,) V1 V2 V3 V4 1 1 2 this\nis a test 99 2 2 3oneliner 45 seemed to work. But if your strings aren't quoted (hard to tell without an example) then you might have to find another way. Hard to tell without an example. Maybe the database output looks like this: 1,2,this is a test,99 2,3,oneliner,45 in which case: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 4 elements However, if we try: read.csv(test.csv,header=FALSE) V1 V2 V3 V4 1 1 2 this NA 2 is a test 99 NA 3 2 3 oneliner 45 If you can determine whether the embedded EOLs are different from those at the end of a record, you could do a global replace on the input file for the embedded EOLs to some character that isn't used (e.g. ~ or |) in the input file. I'll leave the syntax to the regexperts. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading csv files
On Feb 5, 8:57 am, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com analys...@hotmail.com wrote: the csv files are downloaded from a database and it looks like some character fields contain the CR-LF sequence within them. This causes R to see a new record/row and the number of rows it sees is different (usually higher) from the number of rows actually extracted. Hard to tell without an example, but I just tried this in a file: 1,2,this is a test,99 2,3,oneliner,45 and: read.table(test.csv,sep=,) V1 V2 V3 V4 1 1 2 this\nis a test 99 2 2 3 oneliner 45 seemed to work. But if your strings aren't quoted (hard to tell without an example) then you might have to find another way. Hard to tell without an example. Barry __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Here is a Hex dump (please igmore the '' at the start of each line) - of the file that results from extracting two rows. EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A ...description.. 22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E strongUnknown 20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65Anytime, Anywhe 72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F re Learningbr / 3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65 ../strong The 20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6Fanswer is Unkno 77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75 wn. strong you 20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66can start and f 69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68 inish in less th 65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73 en 17 months./s 74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C trong br /.. 62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61 br /..Unknown a 62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F bout ensuring yo 75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A u learn .. R, Fortran and Excel see five lines, but the database has only two lines. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading csv files
On 02/06/2010 09:05 AM, analys...@hotmail.com wrote: On Feb 5, 8:57 am, Barry Rowlingsonb.rowling...@lancaster.ac.uk wrote: On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com analys...@hotmail.com wrote: the csv files are downloaded from a database and it looks like some character fields contain the CR-LF sequence within them. This causes R to see a new record/row and the number of rows it sees is different (usually higher) from the number of rows actually extracted. Hard to tell without an example, but I just tried this in a file: 1,2,this is a test,99 2,3,oneliner,45 and: read.table(test.csv,sep=,) V1 V2 V3 V4 1 1 2 this\nis a test 99 2 2 3oneliner 45 seemed to work. But if your strings aren't quoted (hard to tell without an example) then you might have to find another way. Hard to tell without an example. Barry __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Here is a Hex dump (please igmore the '' at the start of each line) - of the file that results from extracting two rows. EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A ...description.. 22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E strongUnknown 20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65Anytime, Anywhe 72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F re Learningbr / 3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65../strong The 20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6Fanswer is Unkno 77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75 wn.strong you 20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66can start and f 69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68 inish in less th 65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73 en 17 months./s 74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C trong br /.. 62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61 br /..Unknown a 62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F bout ensuring yo 75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A u learn .. R, Fortran and Excel see five lines, but the database has only two lines. Okay, you have five CR-LF pairs with two being EORs. It looks like the br /CR-LF is the EOR sequence, so it should be possible to preserve those while changing the others to something like ~ or deleting them. As I said previously, the regexperts can work out a way to distinguish the CR-LF pairs that are _not_ in an EOR sequence. You might want to think about dumping the control characters as well. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reading csv files - SYLK : file format not valid
I am wanting to read EXCEL files into R. In the past I have saved EXCEL files as csv files without difficulty. Recently, when I have saved the files in this format I am then unable to open them again in EXCEL or though windows (XP). I receive a message stating - SYLK: file format not valid. R will read the file, but I cannot open it in windows. Csv files created and saved in the past (on the same computer) can still be opened. I am not saving as csv (Macintosh) and saving as a SYLK file results in the same error. Any suggestions are appreciated, Bob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.