Re: [R] laf_open_fwf

2013-08-09 Thread christian.kamenik
Jan,

Many thanks for your suggestion! The code runs perfectly fine on the test set. 
Applying it to the complete data set, however, results in the following error:

 while (TRUE) {
+  lines - readLines(con, encoding='LATIN1')
+  if (length(lines) == 0) break
+  lines - sprintf(%-238s, lines)
+  writeLines(lines, out, useBytes=TRUE) }
Error: cannot allocate vector of size 23.2 Mb


Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and Communications 
DETEC  
Federal Roads Office FEDRO
Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89 
Fax +41 31 323 43 21

christian.kame...@astra.admin.ch
www.astra.admin.ch


-Ursprüngliche Nachricht-
Von: Jan van der Laan [mailto:rh...@eoos.dds.nl] 
Gesendet: Freitag, 9. August 2013 10:01
An: Kamenik Christian ASTRA
Betreff: Re: AW: AW: [R] laf_open_fwf

Christian,

It seems some of the lines in your file have additional characters at the end 
causing the line lengths to vary. The only way I could think of is to first add 
whitespace to the shorter lines to make all line lengths equal:

# Add whitespace to the end of the lines to make all lines the same length con 
- file(testdata.txt, rt) out - file(testdata_2.txt, wt) while (TRUE) {
   lines - readLines(con, n=1E5)
   if (length(lines) == 0) break
   lines - sprintf(%-238s, lines)
   writeLines(lines, out, useBytes=TRUE) }
close(con)
close(out)


I am then able to read you test file using LaF:

library(LaF)

column_widths - c(3, 28, 4, 30, 28, 6, 3, 30, 10, 26, 25, 30, 2, 5, 5) 
column_types - rep(string, length(column_widths)) column_types[c(1, 3, 7)] 
- integer

laf - laf_open_fwf(testdata_2.txt, column_types = column_types, 
column_widths = column_widths)


HTH,
Jan







christian.kame...@astra.admin.ch schreef:

 Hello Jan

 I attached an example. Any help is highly appreciated!

 Kind Regard

 Christian Kamenik
 Project Manager

 Federal Department of the Environment, Transport, Energy and 
 Communications DETEC Federal Roads Office FEDRO Division Road Traffic 
 Road Accident Statistics

 Mailing Address: 3003 Bern
 Location: Weltpoststrasse 5, 3015 Bern

 Tel +41 31 323 14 89
 Fax +41 31 323 43 21

 christian.kame...@astra.admin.ch
 www.astra.admin.ch
 -Ursprüngliche Nachricht-
 Von: Jan van der Laan [mailto:rh...@eoos.dds.nl]
 Gesendet: Donnerstag, 8. August 2013 13:58
 An: r-help@r-project.org
 Cc: Kamenik Christian ASTRA
 Betreff: Re: AW: [R] laf_open_fwf


 Without example data it is difficult to give suggestions on how you 
 might read this file.

 Are you sure your file is fixed width? Sometimes columns are neatly 
 aligned using whitespace (tabs/spaces). In that case you could use 
 read.table with the default settings.

 Another possibility might be that the file is encoded in utf8. I 
 expect that reading it in assuming another encoding (such as latin1) 
 would lead to varying line sizes. Although I would expect the lengths 
 to be larger than the sum of your column widths (as one symbol can be 
 larger than one byte).

 Jan



 christian.kame...@astra.admin.ch schreef:

 Dear Jan

 Many thanks for your help. In fact, all lines are shorter than my 
 column width...

 my.column.widths:238
 range(nchar(lines)): 235 237

 So, it seems I have an inconsistent file structure...
 I guess there is no way to handle this in an automated way?

 Best Regard

 Christian Kamenik
 Project Manager

 Federal Department of the Environment, Transport, Energy and 
 Communications DETEC Federal Roads Office FEDRO Division Road Traffic 
 Road Accident Statistics

 Mailing Address: 3003 Bern
 Location: Weltpoststrasse 5, 3015 Bern

 Tel +41 31 323 14 89
 Fax +41 31 323 43 21

 christian.kame...@astra.admin.ch
 www.astra.admin.ch
 -Ursprüngliche Nachricht-
 Von: Jan van der Laan [mailto:rh...@eoos.dds.nl]
 Gesendet: Mittwoch, 7. August 2013 20:57
 An: r-help@r-project.org
 Cc: Kamenik Christian ASTRA
 Betreff: Re: [R] laf_open_fwf

 Dear Christian,

 Well... it shouldn't normally do that. The only way I can currently 
 think of that might cause this problem is that the file has \r\n\r\n, 
 which would mean that every line is followed by an empty line.

 Another cause might be (although I would not really expect the 
 results you see) that the sum of your column widths is larger than 
 the actual with of the line.

 You can check your line lengths using:

 lines - readLines(my.filename)
 nchar(lines)

 Each line should have the same length and be equal to (or at least 
 larger than) sum(my.column.widths)

 If this is not the problem: would it be possible that you send me a 
 small part of your file so that I could try to reproduce the problem?
 Or if you cannot share your data: replace the actual values with 
 nonsense values.

 Regards,
 Jan

 PS I read your mail by chance as I am not a regular r-help reader

Re: [R] laf_open_fwf

2013-08-09 Thread Jan van der Laan

Christian,

In my original example I had an n=1E5 argument in readLines:

lines - readLines(con, n=1E5)

This ensures that every iteration of the loop only 10 lines are read 
(which should usually fit into memory). Without this argument readLines 
tries to read in the complete file.


Jan


On 08/09/2013 04:43 PM, christian.kame...@astra.admin.ch wrote:

Jan,

Many thanks for your suggestion! The code runs perfectly fine on the test set. 
Applying it to the complete data set, however, results in the following error:


while (TRUE) {

+  lines - readLines(con, encoding='LATIN1')
+  if (length(lines) == 0) break
+  lines - sprintf(%-238s, lines)
+  writeLines(lines, out, useBytes=TRUE) }
Error: cannot allocate vector of size 23.2 Mb


Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and Communications 
DETEC
Federal Roads Office FEDRO
Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.ch
www.astra.admin.ch


-Ursprüngliche Nachricht-
Von: Jan van der Laan [mailto:rh...@eoos.dds.nl]
Gesendet: Freitag, 9. August 2013 10:01
An: Kamenik Christian ASTRA
Betreff: Re: AW: AW: [R] laf_open_fwf

Christian,

It seems some of the lines in your file have additional characters at the end 
causing the line lengths to vary. The only way I could think of is to first add 
whitespace to the shorter lines to make all line lengths equal:

# Add whitespace to the end of the lines to make all lines the same length con - file(testdata.txt, 
rt) out - file(testdata_2.txt, wt) while (TRUE) {
lines - readLines(con, n=1E5)
if (length(lines) == 0) break
lines - sprintf(%-238s, lines)
writeLines(lines, out, useBytes=TRUE) }
close(con)
close(out)


I am then able to read you test file using LaF:

library(LaF)

column_widths - c(3, 28, 4, 30, 28, 6, 3, 30, 10, 26, 25, 30, 2, 5, 5) column_types - 
rep(string, length(column_widths)) column_types[c(1, 3, 7)] - integer

laf - laf_open_fwf(testdata_2.txt, column_types = column_types, 
column_widths = column_widths)


HTH,
Jan







christian.kame...@astra.admin.ch schreef:


Hello Jan

I attached an example. Any help is highly appreciated!

Kind Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and
Communications DETEC Federal Roads Office FEDRO Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.ch
www.astra.admin.ch
-Ursprüngliche Nachricht-
Von: Jan van der Laan [mailto:rh...@eoos.dds.nl]
Gesendet: Donnerstag, 8. August 2013 13:58
An: r-help@r-project.org
Cc: Kamenik Christian ASTRA
Betreff: Re: AW: [R] laf_open_fwf


Without example data it is difficult to give suggestions on how you
might read this file.

Are you sure your file is fixed width? Sometimes columns are neatly
aligned using whitespace (tabs/spaces). In that case you could use
read.table with the default settings.

Another possibility might be that the file is encoded in utf8. I
expect that reading it in assuming another encoding (such as latin1)
would lead to varying line sizes. Although I would expect the lengths
to be larger than the sum of your column widths (as one symbol can be
larger than one byte).

Jan



christian.kame...@astra.admin.ch schreef:


Dear Jan

Many thanks for your help. In fact, all lines are shorter than my
column width...

my.column.widths:   238
range(nchar(lines)):235 237

So, it seems I have an inconsistent file structure...
I guess there is no way to handle this in an automated way?

Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and
Communications DETEC Federal Roads Office FEDRO Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.ch
www.astra.admin.ch
-Ursprüngliche Nachricht-
Von: Jan van der Laan [mailto:rh...@eoos.dds.nl]
Gesendet: Mittwoch, 7. August 2013 20:57
An: r-help@r-project.org
Cc: Kamenik Christian ASTRA
Betreff: Re: [R] laf_open_fwf

Dear Christian,

Well... it shouldn't normally do that. The only way I can currently
think of that might cause this problem is that the file has \r\n\r\n,
which would mean that every line is followed by an empty line.

Another cause might be (although I would not really expect the
results you see) that the sum of your column widths is larger than
the actual with of the line.

You can check your line lengths using:

lines - readLines(my.filename)
nchar(lines)

Each line should have the same length and be equal to (or at least
larger than) sum(my.column.widths)

If this is not the problem

Re: [R] laf_open_fwf

2013-08-08 Thread christian.kamenik
Dear Jan

Many thanks for your help. In fact, all lines are shorter than my column 
width...

my.column.widths:   238
range(nchar(lines)):235 237

So, it seems I have an inconsistent file structure...
I guess there is no way to handle this in an automated way?

Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and Communications 
DETEC  
Federal Roads Office FEDRO
Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89 
Fax +41 31 323 43 21

christian.kame...@astra.admin.ch
www.astra.admin.ch
-Ursprüngliche Nachricht-
Von: Jan van der Laan [mailto:rh...@eoos.dds.nl] 
Gesendet: Mittwoch, 7. August 2013 20:57
An: r-help@r-project.org
Cc: Kamenik Christian ASTRA
Betreff: Re: [R] laf_open_fwf

Dear Christian,

Well... it shouldn't normally do that. The only way I can currently think of 
that might cause this problem is that the file has \r\n\r\n, which would mean 
that every line is followed by an empty line.

Another cause might be (although I would not really expect the results you see) 
that the sum of your column widths is larger than the actual with of the line.

You can check your line lengths using:

lines - readLines(my.filename)
nchar(lines)

Each line should have the same length and be equal to (or at least larger than) 
sum(my.column.widths)

If this is not the problem: would it be possible that you send me a small part 
of your file so that I could try to reproduce the problem? Or if you cannot 
share your data: replace the actual values with nonsense values.

Regards,
Jan

PS I read your mail by chance as I am not a regular r-help reader. When you 
have specific LaF problems it is better to also cc me directly.

On 08/06/2013 12:35 PM, christian.kame...@astra.admin.ch wrote:
 Dear all

 I was trying the (fairly new) LaF package, and came across the following 
 problem:

 I opened a connection to a fixed width ASCII file using 
 laf_open_fwf(my.filename, my.column_types, my.column_widths, 
 my.column_names)

 When looking at the data, it turned out that \n (newline) and \r (carriage 
 return) were considered as characters, thus destroying the structure in my 
 data (the second column does not include any numbers):

 my.data[1565:1575,1:3]

 MF_FARZ1  Fahrzeugarttext MF_MARKE
 1 \n043 Landwirt. Traktor2140
 2 \n043 Landwirt. Traktor6206
 3 \n001 Personenwagen2026
 4 \n001 Personenwagen2026
 5\r\n00 1Personenwagen404
 6\r\n02 0Gesellschaftswagen   710
 7\r\n00 1Personenwagen505
 8\r\n00 1Personenwagen505
 9\r\n00 1Personenwagen301
 10   \r\n00 1Personenwagen553
 11   \r\n04 3Landwirt. Traktor257

 I am working on Windows 7 32-bit.

 Any help would be highly appreciated.

 Best Regard

 Christian Kamenik
 Project Manager

 Federal Department of the Environment, Transport, Energy and 
 Communications DETEC Federal Roads Office FEDRO Division Road Traffic 
 Road Accident Statistics

 Mailing Address: 3003 Bern
 Location: Weltpoststrasse 5, 3015 Bern

 Tel +41 31 323 14 89
 Fax +41 31 323 43 21

 christian.kame...@astra.admin.chmailto:christian.kamenik@astra.admin.
 ch www.astra.admin.chhttp://www.astra.admin.ch/


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] laf_open_fwf

2013-08-08 Thread Jan van der Laan


Without example data it is difficult to give suggestions on how you  
might read this file.


Are you sure your file is fixed width? Sometimes columns are neatly  
aligned using whitespace (tabs/spaces). In that case you could use  
read.table with the default settings.


Another possibility might be that the file is encoded in utf8. I  
expect that reading it in assuming another encoding (such as latin1)  
would lead to varying line sizes. Although I would expect the lengths  
to be larger than the sum of your column widths (as one symbol can be  
larger than one byte).


Jan



christian.kame...@astra.admin.ch schreef:


Dear Jan

Many thanks for your help. In fact, all lines are shorter than my  
column width...


my.column.widths:   238
range(nchar(lines)):235 237

So, it seems I have an inconsistent file structure...
I guess there is no way to handle this in an automated way?

Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and  
Communications DETEC 

Federal Roads Office FEDRO
Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.ch
www.astra.admin.ch
-Ursprüngliche Nachricht-
Von: Jan van der Laan [mailto:rh...@eoos.dds.nl]
Gesendet: Mittwoch, 7. August 2013 20:57
An: r-help@r-project.org
Cc: Kamenik Christian ASTRA
Betreff: Re: [R] laf_open_fwf

Dear Christian,

Well... it shouldn't normally do that. The only way I can currently  
think of that might cause this problem is that the file has  
\r\n\r\n, which would mean that every line is followed by an empty  
line.


Another cause might be (although I would not really expect the  
results you see) that the sum of your column widths is larger than  
the actual with of the line.


You can check your line lengths using:

lines - readLines(my.filename)
nchar(lines)

Each line should have the same length and be equal to (or at least  
larger than) sum(my.column.widths)


If this is not the problem: would it be possible that you send me a  
small part of your file so that I could try to reproduce the  
problem? Or if you cannot share your data: replace the actual values  
with nonsense values.


Regards,
Jan

PS I read your mail by chance as I am not a regular r-help reader.  
When you have specific LaF problems it is better to also cc me  
directly.


On 08/06/2013 12:35 PM, christian.kame...@astra.admin.ch wrote:

Dear all

I was trying the (fairly new) LaF package, and came across the  
following problem:


I opened a connection to a fixed width ASCII file using
laf_open_fwf(my.filename, my.column_types, my.column_widths,
my.column_names)

When looking at the data, it turned out that \n (newline) and \r  
(carriage return) were considered as characters, thus destroying  
the structure in my data (the second column does not include any  
numbers):



my.data[1565:1575,1:3]


MF_FARZ1  Fahrzeugarttext MF_MARKE
1 \n043 Landwirt. Traktor2140
2 \n043 Landwirt. Traktor6206
3 \n001 Personenwagen2026
4 \n001 Personenwagen2026
5\r\n00 1Personenwagen404
6\r\n02 0Gesellschaftswagen   710
7\r\n00 1Personenwagen505
8\r\n00 1Personenwagen505
9\r\n00 1Personenwagen301
10   \r\n00 1Personenwagen553
11   \r\n04 3Landwirt. Traktor257

I am working on Windows 7 32-bit.

Any help would be highly appreciated.

Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and
Communications DETEC Federal Roads Office FEDRO Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.chmailto:christian.kamenik@astra.admin.
ch www.astra.admin.chhttp://www.astra.admin.ch/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] laf_open_fwf

2013-08-07 Thread Jan van der Laan

Dear Christian,

Well... it shouldn't normally do that. The only way I can currently 
think of that might cause this problem is that the file has \r\n\r\n, 
which would mean that every line is followed by an empty line.


Another cause might be (although I would not really expect the results 
you see) that the sum of your column widths is larger than the actual 
with of the line.


You can check your line lengths using:

lines - readLines(my.filename)
nchar(lines)

Each line should have the same length and be equal to (or at least 
larger than) sum(my.column.widths)


If this is not the problem: would it be possible that you send me a 
small part of your file so that I could try to reproduce the problem? Or 
if you cannot share your data: replace the actual values with nonsense 
values.


Regards,
Jan

PS I read your mail by chance as I am not a regular r-help reader. When 
you have specific LaF problems it is better to also cc me directly.


On 08/06/2013 12:35 PM, christian.kame...@astra.admin.ch wrote:

Dear all

I was trying the (fairly new) LaF package, and came across the following 
problem:

I opened a connection to a fixed width ASCII file using
laf_open_fwf(my.filename, my.column_types, my.column_widths, my.column_names)

When looking at the data, it turned out that \n (newline) and \r (carriage 
return) were considered as characters, thus destroying the structure in my data 
(the second column does not include any numbers):


my.data[1565:1575,1:3]


MF_FARZ1  Fahrzeugarttext MF_MARKE
1 \n043 Landwirt. Traktor2140
2 \n043 Landwirt. Traktor6206
3 \n001 Personenwagen2026
4 \n001 Personenwagen2026
5\r\n00 1Personenwagen404
6\r\n02 0Gesellschaftswagen   710
7\r\n00 1Personenwagen505
8\r\n00 1Personenwagen505
9\r\n00 1Personenwagen301
10   \r\n00 1Personenwagen553
11   \r\n04 3Landwirt. Traktor257

I am working on Windows 7 32-bit.

Any help would be highly appreciated.

Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and Communications 
DETEC
Federal Roads Office FEDRO
Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.chmailto:christian.kame...@astra.admin.ch
www.astra.admin.chhttp://www.astra.admin.ch/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] laf_open_fwf

2013-08-06 Thread christian.kamenik
Dear all

I was trying the (fairly new) LaF package, and came across the following 
problem:

I opened a connection to a fixed width ASCII file using
laf_open_fwf(my.filename, my.column_types, my.column_widths, my.column_names)

When looking at the data, it turned out that \n (newline) and \r (carriage 
return) were considered as characters, thus destroying the structure in my data 
(the second column does not include any numbers):

 my.data[1565:1575,1:3]

   MF_FARZ1  Fahrzeugarttext MF_MARKE
1 \n043 Landwirt. Traktor2140
2 \n043 Landwirt. Traktor6206
3 \n001 Personenwagen2026
4 \n001 Personenwagen2026
5\r\n00 1Personenwagen404
6\r\n02 0Gesellschaftswagen   710
7\r\n00 1Personenwagen505
8\r\n00 1Personenwagen505
9\r\n00 1Personenwagen301
10   \r\n00 1Personenwagen553
11   \r\n04 3Landwirt. Traktor257

I am working on Windows 7 32-bit.

Any help would be highly appreciated.

Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and Communications 
DETEC
Federal Roads Office FEDRO
Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.chmailto:christian.kame...@astra.admin.ch
www.astra.admin.chhttp://www.astra.admin.ch/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.