Re: [R] laf_open_fwf
Jan, Many thanks for your suggestion! The code runs perfectly fine on the test set. Applying it to the complete data set, however, results in the following error: while (TRUE) { + lines - readLines(con, encoding='LATIN1') + if (length(lines) == 0) break + lines - sprintf(%-238s, lines) + writeLines(lines, out, useBytes=TRUE) } Error: cannot allocate vector of size 23.2 Mb Best Regard Christian Kamenik Project Manager Federal Department of the Environment, Transport, Energy and Communications DETEC Federal Roads Office FEDRO Division Road Traffic Road Accident Statistics Mailing Address: 3003 Bern Location: Weltpoststrasse 5, 3015 Bern Tel +41 31 323 14 89 Fax +41 31 323 43 21 christian.kame...@astra.admin.ch www.astra.admin.ch -Ursprüngliche Nachricht- Von: Jan van der Laan [mailto:rh...@eoos.dds.nl] Gesendet: Freitag, 9. August 2013 10:01 An: Kamenik Christian ASTRA Betreff: Re: AW: AW: [R] laf_open_fwf Christian, It seems some of the lines in your file have additional characters at the end causing the line lengths to vary. The only way I could think of is to first add whitespace to the shorter lines to make all line lengths equal: # Add whitespace to the end of the lines to make all lines the same length con - file(testdata.txt, rt) out - file(testdata_2.txt, wt) while (TRUE) { lines - readLines(con, n=1E5) if (length(lines) == 0) break lines - sprintf(%-238s, lines) writeLines(lines, out, useBytes=TRUE) } close(con) close(out) I am then able to read you test file using LaF: library(LaF) column_widths - c(3, 28, 4, 30, 28, 6, 3, 30, 10, 26, 25, 30, 2, 5, 5) column_types - rep(string, length(column_widths)) column_types[c(1, 3, 7)] - integer laf - laf_open_fwf(testdata_2.txt, column_types = column_types, column_widths = column_widths) HTH, Jan christian.kame...@astra.admin.ch schreef: Hello Jan I attached an example. Any help is highly appreciated! Kind Regard Christian Kamenik Project Manager Federal Department of the Environment, Transport, Energy and Communications DETEC Federal Roads Office FEDRO Division Road Traffic Road Accident Statistics Mailing Address: 3003 Bern Location: Weltpoststrasse 5, 3015 Bern Tel +41 31 323 14 89 Fax +41 31 323 43 21 christian.kame...@astra.admin.ch www.astra.admin.ch -Ursprüngliche Nachricht- Von: Jan van der Laan [mailto:rh...@eoos.dds.nl] Gesendet: Donnerstag, 8. August 2013 13:58 An: r-help@r-project.org Cc: Kamenik Christian ASTRA Betreff: Re: AW: [R] laf_open_fwf Without example data it is difficult to give suggestions on how you might read this file. Are you sure your file is fixed width? Sometimes columns are neatly aligned using whitespace (tabs/spaces). In that case you could use read.table with the default settings. Another possibility might be that the file is encoded in utf8. I expect that reading it in assuming another encoding (such as latin1) would lead to varying line sizes. Although I would expect the lengths to be larger than the sum of your column widths (as one symbol can be larger than one byte). Jan christian.kame...@astra.admin.ch schreef: Dear Jan Many thanks for your help. In fact, all lines are shorter than my column width... my.column.widths:238 range(nchar(lines)): 235 237 So, it seems I have an inconsistent file structure... I guess there is no way to handle this in an automated way? Best Regard Christian Kamenik Project Manager Federal Department of the Environment, Transport, Energy and Communications DETEC Federal Roads Office FEDRO Division Road Traffic Road Accident Statistics Mailing Address: 3003 Bern Location: Weltpoststrasse 5, 3015 Bern Tel +41 31 323 14 89 Fax +41 31 323 43 21 christian.kame...@astra.admin.ch www.astra.admin.ch -Ursprüngliche Nachricht- Von: Jan van der Laan [mailto:rh...@eoos.dds.nl] Gesendet: Mittwoch, 7. August 2013 20:57 An: r-help@r-project.org Cc: Kamenik Christian ASTRA Betreff: Re: [R] laf_open_fwf Dear Christian, Well... it shouldn't normally do that. The only way I can currently think of that might cause this problem is that the file has \r\n\r\n, which would mean that every line is followed by an empty line. Another cause might be (although I would not really expect the results you see) that the sum of your column widths is larger than the actual with of the line. You can check your line lengths using: lines - readLines(my.filename) nchar(lines) Each line should have the same length and be equal to (or at least larger than) sum(my.column.widths) If this is not the problem: would it be possible that you send me a small part of your file so that I could try to reproduce the problem? Or if you cannot share your data: replace the actual values with nonsense values. Regards, Jan PS I read your mail by chance as I am not a regular r-help reader
Re: [R] laf_open_fwf
Christian, In my original example I had an n=1E5 argument in readLines: lines - readLines(con, n=1E5) This ensures that every iteration of the loop only 10 lines are read (which should usually fit into memory). Without this argument readLines tries to read in the complete file. Jan On 08/09/2013 04:43 PM, christian.kame...@astra.admin.ch wrote: Jan, Many thanks for your suggestion! The code runs perfectly fine on the test set. Applying it to the complete data set, however, results in the following error: while (TRUE) { + lines - readLines(con, encoding='LATIN1') + if (length(lines) == 0) break + lines - sprintf(%-238s, lines) + writeLines(lines, out, useBytes=TRUE) } Error: cannot allocate vector of size 23.2 Mb Best Regard Christian Kamenik Project Manager Federal Department of the Environment, Transport, Energy and Communications DETEC Federal Roads Office FEDRO Division Road Traffic Road Accident Statistics Mailing Address: 3003 Bern Location: Weltpoststrasse 5, 3015 Bern Tel +41 31 323 14 89 Fax +41 31 323 43 21 christian.kame...@astra.admin.ch www.astra.admin.ch -Ursprüngliche Nachricht- Von: Jan van der Laan [mailto:rh...@eoos.dds.nl] Gesendet: Freitag, 9. August 2013 10:01 An: Kamenik Christian ASTRA Betreff: Re: AW: AW: [R] laf_open_fwf Christian, It seems some of the lines in your file have additional characters at the end causing the line lengths to vary. The only way I could think of is to first add whitespace to the shorter lines to make all line lengths equal: # Add whitespace to the end of the lines to make all lines the same length con - file(testdata.txt, rt) out - file(testdata_2.txt, wt) while (TRUE) { lines - readLines(con, n=1E5) if (length(lines) == 0) break lines - sprintf(%-238s, lines) writeLines(lines, out, useBytes=TRUE) } close(con) close(out) I am then able to read you test file using LaF: library(LaF) column_widths - c(3, 28, 4, 30, 28, 6, 3, 30, 10, 26, 25, 30, 2, 5, 5) column_types - rep(string, length(column_widths)) column_types[c(1, 3, 7)] - integer laf - laf_open_fwf(testdata_2.txt, column_types = column_types, column_widths = column_widths) HTH, Jan christian.kame...@astra.admin.ch schreef: Hello Jan I attached an example. Any help is highly appreciated! Kind Regard Christian Kamenik Project Manager Federal Department of the Environment, Transport, Energy and Communications DETEC Federal Roads Office FEDRO Division Road Traffic Road Accident Statistics Mailing Address: 3003 Bern Location: Weltpoststrasse 5, 3015 Bern Tel +41 31 323 14 89 Fax +41 31 323 43 21 christian.kame...@astra.admin.ch www.astra.admin.ch -Ursprüngliche Nachricht- Von: Jan van der Laan [mailto:rh...@eoos.dds.nl] Gesendet: Donnerstag, 8. August 2013 13:58 An: r-help@r-project.org Cc: Kamenik Christian ASTRA Betreff: Re: AW: [R] laf_open_fwf Without example data it is difficult to give suggestions on how you might read this file. Are you sure your file is fixed width? Sometimes columns are neatly aligned using whitespace (tabs/spaces). In that case you could use read.table with the default settings. Another possibility might be that the file is encoded in utf8. I expect that reading it in assuming another encoding (such as latin1) would lead to varying line sizes. Although I would expect the lengths to be larger than the sum of your column widths (as one symbol can be larger than one byte). Jan christian.kame...@astra.admin.ch schreef: Dear Jan Many thanks for your help. In fact, all lines are shorter than my column width... my.column.widths: 238 range(nchar(lines)):235 237 So, it seems I have an inconsistent file structure... I guess there is no way to handle this in an automated way? Best Regard Christian Kamenik Project Manager Federal Department of the Environment, Transport, Energy and Communications DETEC Federal Roads Office FEDRO Division Road Traffic Road Accident Statistics Mailing Address: 3003 Bern Location: Weltpoststrasse 5, 3015 Bern Tel +41 31 323 14 89 Fax +41 31 323 43 21 christian.kame...@astra.admin.ch www.astra.admin.ch -Ursprüngliche Nachricht- Von: Jan van der Laan [mailto:rh...@eoos.dds.nl] Gesendet: Mittwoch, 7. August 2013 20:57 An: r-help@r-project.org Cc: Kamenik Christian ASTRA Betreff: Re: [R] laf_open_fwf Dear Christian, Well... it shouldn't normally do that. The only way I can currently think of that might cause this problem is that the file has \r\n\r\n, which would mean that every line is followed by an empty line. Another cause might be (although I would not really expect the results you see) that the sum of your column widths is larger than the actual with of the line. You can check your line lengths using: lines - readLines(my.filename) nchar(lines) Each line should have the same length and be equal to (or at least larger than) sum(my.column.widths) If this is not the problem
Re: [R] laf_open_fwf
Dear Jan Many thanks for your help. In fact, all lines are shorter than my column width... my.column.widths: 238 range(nchar(lines)):235 237 So, it seems I have an inconsistent file structure... I guess there is no way to handle this in an automated way? Best Regard Christian Kamenik Project Manager Federal Department of the Environment, Transport, Energy and Communications DETEC Federal Roads Office FEDRO Division Road Traffic Road Accident Statistics Mailing Address: 3003 Bern Location: Weltpoststrasse 5, 3015 Bern Tel +41 31 323 14 89 Fax +41 31 323 43 21 christian.kame...@astra.admin.ch www.astra.admin.ch -Ursprüngliche Nachricht- Von: Jan van der Laan [mailto:rh...@eoos.dds.nl] Gesendet: Mittwoch, 7. August 2013 20:57 An: r-help@r-project.org Cc: Kamenik Christian ASTRA Betreff: Re: [R] laf_open_fwf Dear Christian, Well... it shouldn't normally do that. The only way I can currently think of that might cause this problem is that the file has \r\n\r\n, which would mean that every line is followed by an empty line. Another cause might be (although I would not really expect the results you see) that the sum of your column widths is larger than the actual with of the line. You can check your line lengths using: lines - readLines(my.filename) nchar(lines) Each line should have the same length and be equal to (or at least larger than) sum(my.column.widths) If this is not the problem: would it be possible that you send me a small part of your file so that I could try to reproduce the problem? Or if you cannot share your data: replace the actual values with nonsense values. Regards, Jan PS I read your mail by chance as I am not a regular r-help reader. When you have specific LaF problems it is better to also cc me directly. On 08/06/2013 12:35 PM, christian.kame...@astra.admin.ch wrote: Dear all I was trying the (fairly new) LaF package, and came across the following problem: I opened a connection to a fixed width ASCII file using laf_open_fwf(my.filename, my.column_types, my.column_widths, my.column_names) When looking at the data, it turned out that \n (newline) and \r (carriage return) were considered as characters, thus destroying the structure in my data (the second column does not include any numbers): my.data[1565:1575,1:3] MF_FARZ1 Fahrzeugarttext MF_MARKE 1 \n043 Landwirt. Traktor2140 2 \n043 Landwirt. Traktor6206 3 \n001 Personenwagen2026 4 \n001 Personenwagen2026 5\r\n00 1Personenwagen404 6\r\n02 0Gesellschaftswagen 710 7\r\n00 1Personenwagen505 8\r\n00 1Personenwagen505 9\r\n00 1Personenwagen301 10 \r\n00 1Personenwagen553 11 \r\n04 3Landwirt. Traktor257 I am working on Windows 7 32-bit. Any help would be highly appreciated. Best Regard Christian Kamenik Project Manager Federal Department of the Environment, Transport, Energy and Communications DETEC Federal Roads Office FEDRO Division Road Traffic Road Accident Statistics Mailing Address: 3003 Bern Location: Weltpoststrasse 5, 3015 Bern Tel +41 31 323 14 89 Fax +41 31 323 43 21 christian.kame...@astra.admin.chmailto:christian.kamenik@astra.admin. ch www.astra.admin.chhttp://www.astra.admin.ch/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] laf_open_fwf
Without example data it is difficult to give suggestions on how you might read this file. Are you sure your file is fixed width? Sometimes columns are neatly aligned using whitespace (tabs/spaces). In that case you could use read.table with the default settings. Another possibility might be that the file is encoded in utf8. I expect that reading it in assuming another encoding (such as latin1) would lead to varying line sizes. Although I would expect the lengths to be larger than the sum of your column widths (as one symbol can be larger than one byte). Jan christian.kame...@astra.admin.ch schreef: Dear Jan Many thanks for your help. In fact, all lines are shorter than my column width... my.column.widths: 238 range(nchar(lines)):235 237 So, it seems I have an inconsistent file structure... I guess there is no way to handle this in an automated way? Best Regard Christian Kamenik Project Manager Federal Department of the Environment, Transport, Energy and Communications DETEC Federal Roads Office FEDRO Division Road Traffic Road Accident Statistics Mailing Address: 3003 Bern Location: Weltpoststrasse 5, 3015 Bern Tel +41 31 323 14 89 Fax +41 31 323 43 21 christian.kame...@astra.admin.ch www.astra.admin.ch -Ursprüngliche Nachricht- Von: Jan van der Laan [mailto:rh...@eoos.dds.nl] Gesendet: Mittwoch, 7. August 2013 20:57 An: r-help@r-project.org Cc: Kamenik Christian ASTRA Betreff: Re: [R] laf_open_fwf Dear Christian, Well... it shouldn't normally do that. The only way I can currently think of that might cause this problem is that the file has \r\n\r\n, which would mean that every line is followed by an empty line. Another cause might be (although I would not really expect the results you see) that the sum of your column widths is larger than the actual with of the line. You can check your line lengths using: lines - readLines(my.filename) nchar(lines) Each line should have the same length and be equal to (or at least larger than) sum(my.column.widths) If this is not the problem: would it be possible that you send me a small part of your file so that I could try to reproduce the problem? Or if you cannot share your data: replace the actual values with nonsense values. Regards, Jan PS I read your mail by chance as I am not a regular r-help reader. When you have specific LaF problems it is better to also cc me directly. On 08/06/2013 12:35 PM, christian.kame...@astra.admin.ch wrote: Dear all I was trying the (fairly new) LaF package, and came across the following problem: I opened a connection to a fixed width ASCII file using laf_open_fwf(my.filename, my.column_types, my.column_widths, my.column_names) When looking at the data, it turned out that \n (newline) and \r (carriage return) were considered as characters, thus destroying the structure in my data (the second column does not include any numbers): my.data[1565:1575,1:3] MF_FARZ1 Fahrzeugarttext MF_MARKE 1 \n043 Landwirt. Traktor2140 2 \n043 Landwirt. Traktor6206 3 \n001 Personenwagen2026 4 \n001 Personenwagen2026 5\r\n00 1Personenwagen404 6\r\n02 0Gesellschaftswagen 710 7\r\n00 1Personenwagen505 8\r\n00 1Personenwagen505 9\r\n00 1Personenwagen301 10 \r\n00 1Personenwagen553 11 \r\n04 3Landwirt. Traktor257 I am working on Windows 7 32-bit. Any help would be highly appreciated. Best Regard Christian Kamenik Project Manager Federal Department of the Environment, Transport, Energy and Communications DETEC Federal Roads Office FEDRO Division Road Traffic Road Accident Statistics Mailing Address: 3003 Bern Location: Weltpoststrasse 5, 3015 Bern Tel +41 31 323 14 89 Fax +41 31 323 43 21 christian.kame...@astra.admin.chmailto:christian.kamenik@astra.admin. ch www.astra.admin.chhttp://www.astra.admin.ch/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] laf_open_fwf
Dear Christian, Well... it shouldn't normally do that. The only way I can currently think of that might cause this problem is that the file has \r\n\r\n, which would mean that every line is followed by an empty line. Another cause might be (although I would not really expect the results you see) that the sum of your column widths is larger than the actual with of the line. You can check your line lengths using: lines - readLines(my.filename) nchar(lines) Each line should have the same length and be equal to (or at least larger than) sum(my.column.widths) If this is not the problem: would it be possible that you send me a small part of your file so that I could try to reproduce the problem? Or if you cannot share your data: replace the actual values with nonsense values. Regards, Jan PS I read your mail by chance as I am not a regular r-help reader. When you have specific LaF problems it is better to also cc me directly. On 08/06/2013 12:35 PM, christian.kame...@astra.admin.ch wrote: Dear all I was trying the (fairly new) LaF package, and came across the following problem: I opened a connection to a fixed width ASCII file using laf_open_fwf(my.filename, my.column_types, my.column_widths, my.column_names) When looking at the data, it turned out that \n (newline) and \r (carriage return) were considered as characters, thus destroying the structure in my data (the second column does not include any numbers): my.data[1565:1575,1:3] MF_FARZ1 Fahrzeugarttext MF_MARKE 1 \n043 Landwirt. Traktor2140 2 \n043 Landwirt. Traktor6206 3 \n001 Personenwagen2026 4 \n001 Personenwagen2026 5\r\n00 1Personenwagen404 6\r\n02 0Gesellschaftswagen 710 7\r\n00 1Personenwagen505 8\r\n00 1Personenwagen505 9\r\n00 1Personenwagen301 10 \r\n00 1Personenwagen553 11 \r\n04 3Landwirt. Traktor257 I am working on Windows 7 32-bit. Any help would be highly appreciated. Best Regard Christian Kamenik Project Manager Federal Department of the Environment, Transport, Energy and Communications DETEC Federal Roads Office FEDRO Division Road Traffic Road Accident Statistics Mailing Address: 3003 Bern Location: Weltpoststrasse 5, 3015 Bern Tel +41 31 323 14 89 Fax +41 31 323 43 21 christian.kame...@astra.admin.chmailto:christian.kame...@astra.admin.ch www.astra.admin.chhttp://www.astra.admin.ch/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] laf_open_fwf
Dear all I was trying the (fairly new) LaF package, and came across the following problem: I opened a connection to a fixed width ASCII file using laf_open_fwf(my.filename, my.column_types, my.column_widths, my.column_names) When looking at the data, it turned out that \n (newline) and \r (carriage return) were considered as characters, thus destroying the structure in my data (the second column does not include any numbers): my.data[1565:1575,1:3] MF_FARZ1 Fahrzeugarttext MF_MARKE 1 \n043 Landwirt. Traktor2140 2 \n043 Landwirt. Traktor6206 3 \n001 Personenwagen2026 4 \n001 Personenwagen2026 5\r\n00 1Personenwagen404 6\r\n02 0Gesellschaftswagen 710 7\r\n00 1Personenwagen505 8\r\n00 1Personenwagen505 9\r\n00 1Personenwagen301 10 \r\n00 1Personenwagen553 11 \r\n04 3Landwirt. Traktor257 I am working on Windows 7 32-bit. Any help would be highly appreciated. Best Regard Christian Kamenik Project Manager Federal Department of the Environment, Transport, Energy and Communications DETEC Federal Roads Office FEDRO Division Road Traffic Road Accident Statistics Mailing Address: 3003 Bern Location: Weltpoststrasse 5, 3015 Bern Tel +41 31 323 14 89 Fax +41 31 323 43 21 christian.kame...@astra.admin.chmailto:christian.kame...@astra.admin.ch www.astra.admin.chhttp://www.astra.admin.ch/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.