Re: [R] read.fwf and header

2006-11-04 Thread Martin Maechler

 FrPi == François Pinard [EMAIL PROTECTED]
 on Wed, 1 Nov 2006 20:21:11 -0500 writes:

FrPi [Martin Maechler]
 In my (and probably R-core's) view, read.fwf() should only have
 to be used for ``legacy data files'' (those times when people used *no*
 separators in order to save disk space), since nowadays, such
 data files should automatically have correct separators. 

FrPi In my day-to-day experience, the main virtue for fixed width format 
FrPi files is basic, humble legibility, much more than disk space savings. 
 
Good point.  For this reason, I often prefer  tab-delimited data files
which are human readable too (and don't need quoting of strings,
typically).  But also, the read.table() default white space-separated files
are very well humanly readable if the column starts are
aligned. You do need to quote (..) strings with embedded white
space then, but that is very well
human-readable if you have a smart editor (such as Emacs ;-)
which then automatically colorizes strings differently than the
rest of the file entries.

However, I think this (human-readibility) only
applies to relatively small files.


FrPi The FWF files I see have delimiters between fields,
FrPi but also embedded space within fields, or at end of
FrPi fields, without extraneous quotes.  XML markup, CSVs,
FrPi quoted fields, etc. are devices meant for helping
FrPi machines much more than for helping humans.  They
FrPi significantly decrease legibility.  Humans not only
FrPi know better, they decipher fixed width format easily
FrPi enough for not really needing hairier devices in
FrPi general.

FrPi FWF files may be archaic, they are not obsolescent.
FrPi They will resist the fashion of the day for
FrPi complexity, and survive in the long run.

I cannot really oppose this statement, 
but am not as sure as you seem  ;-)

Thanks anyway for the thought provoking reply.
With regards,
Martin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.fwf and header

2006-11-01 Thread Greg Snow
How about using a connection and reading the header separate from the
data, like this:

tmp1 - file('c:/temp/tmp.dat')
open(tmp1)

my.names - scan(tmp1, nlines=1, what='')


new.data-read.fwf(file=tmp1, widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 19),
header=FALSE)

names(new.data) - my.names

close(tmp1)



-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Gregor Gorjanc
Sent: Monday, October 30, 2006 3:33 PM
To: Daniel Nordlund
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] read.fwf and header

Daniel Nordlund wrote:
 Gregor,
 
 According to the help for read.fwf, sep needs to be set to a value
that occurs only in the header record.  I changed the spaces to commas
in the header record of your example and used the following syntax and
was able to read the file just fine.
 
 new.data-read.fwf(file=test.txt, widths=c(3, 4, 10, 3, 2, 2, 2, 2,
11, 19),
   header=TRUE, sep=',')
 
 Hope this is helpful,
 
 Dan

Thanks Dan! But I have to modfy file first. Not that much of work but
still.

Regards, Gregor

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.fwf and header

2006-11-01 Thread Gregor Gorjanc
Greg Snow wrote:
 How about using a connection and reading the header separate from the
 data, like this:
 
 tmp1 - file('c:/temp/tmp.dat')
 open(tmp1)
 
 my.names - scan(tmp1, nlines=1, what='')
 
 
 new.data-read.fwf(file=tmp1, widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 19),
 header=FALSE)
 
 names(new.data) - my.names
 
 close(tmp1)

Yes, also possible as has been shown in previous posts.

-- 
Lep pozdrav / With regards,
Gregor Gorjanc
--
University of Ljubljana PhD student
Biotechnical Faculty
Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
Groblje 3   mail: gregor.gorjanc at bfro.uni-lj.si

SI-1230 Domzale tel: +386 (0)1 72 17 861
Slovenia, Europefax: +386 (0)1 72 17 888

--
One must learn by doing the thing; for though you think you know it,
 you have no certainty until you try. Sophocles ~ 450 B.C.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.fwf and header

2006-11-01 Thread François Pinard
[Martin Maechler]

In my (and probably R-core's) view, read.fwf() should only have
to be used for ``legacy data files'' (those times when people used *no*
separators in order to save disk space), since nowadays, such
data files should automatically have correct separators. 

In my day-to-day experience, the main virtue for fixed width format 
files is basic, humble legibility, much more than disk space savings.  
The FWF files I see have delimiters between fields, but also embedded 
space within fields, or at end of fields, without extraneous quotes.  
XML markup, CSVs, quoted fields, etc. are devices meant for helping 
machines much more than for helping humans.  They significantly decrease 
legibility.  Humans not only know better, they decipher fixed width 
format easily enough for not really needing hairier devices in general.

FWF files may be archaic, they are not obsolescent.  They will resist 
the fashion of the day for complexity, and survive in the long run.

-- 
François Pinard   http://pinard.progiciels-bpi.ca

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.fwf and header

2006-10-31 Thread Gregor Gorjanc
Martin Maechler wrote:
 Gregor == Gregor Gorjanc [EMAIL PROTECTED]
 on Mon, 30 Oct 2006 23:33:21 +0100 writes:
 
 Gregor Daniel Nordlund wrote:
  Gregor,
  
  According to the help for read.fwf, sep needs to be set
  to a value that occurs only in the header record.  I
  changed the spaces to commas in the header record of your
  example and used the following syntax and was able to
  read the file just fine.
  
  new.data-read.fwf(file=test.txt, widths=c(3, 4, 10, 3,
  2, 2, 2, 2, 11, 19), header=TRUE, sep=',')
  
  Hope this is helpful,
  
  Dan
 
 Gregor Thanks Dan! But I have to modfy file first. Not that
 Gregor much of work but still.
 
 Yes, but I think it shows read.fwf() should not be extended for
 even more special cases:
 
 In my (and probably R-core's) view, read.fwf() should only have
 to be used for ``legacy data files'' (those times when people used *no*
 separators in order to save disk space), since nowadays, such
 data files should automatically have correct separators. 
 
 -- Fix the file producing process rather than make read.fwf()
 unnecessarily more complicated.

Thank you for this explanation of your (and probably R-core's) view! I
really appreciate such feedback. I do agree that read.fwf is a bit
archaic way to import data, but sometimes you can not fix file
producing process.

Perhaps above explanation and code examples from this thread could be
added to read.fwf help page. I can provide a patch if my proposal is sane.

Regards, Gregor

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.fwf and header

2006-10-31 Thread davidr
Archaic it may be, but I still have to deal with fixed format data
files on a daily basis.

David L. Reiner
Rho Trading Securities, LLC
Chicago  IL  60605
312-362-4963

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Martin Maechler
Sent: Tuesday, October 31, 2006 1:52 AM
To: [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] read.fwf and header

snip

In my (and probably R-core's) view, read.fwf() should only have
to be used for ``legacy data files'' (those times when people used *no*
separators in order to save disk space), since nowadays, such
data files should automatically have correct separators. 

-- Fix the file producing process rather than make read.fwf()
unnecessarily more complicated.

Martin Maechler, ETH Zurich

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.fwf and header

2006-10-31 Thread Gabor Grothendieck
I also have to deal with fixed format files from time to time.
Generally I have no control over the format in those cases.

On 10/31/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Archaic it may be, but I still have to deal with fixed format data
 files on a daily basis.

 David L. Reiner
 Rho Trading Securities, LLC
 Chicago  IL  60605
 312-362-4963

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Martin Maechler
 Sent: Tuesday, October 31, 2006 1:52 AM
 To: [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] read.fwf and header

 snip

 In my (and probably R-core's) view, read.fwf() should only have
 to be used for ``legacy data files'' (those times when people used *no*
 separators in order to save disk space), since nowadays, such
 data files should automatically have correct separators.

 -- Fix the file producing process rather than make read.fwf()
 unnecessarily more complicated.

 Martin Maechler, ETH Zurich

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.fwf and header

2006-10-30 Thread Gregor Gorjanc
Hi!

I have data (also in attached file) in the following form:

num1 num2 num3 int1 fac1 fac2 cha1 cha2 Date POSIXt
 11   f q   1900-01-01 1900-01-01 01:01:01
 2 1.0 131.5  2 a g r z1900-01-01 01:01:01
 3 1.5 1188830.5  3 b h s y 1900-01-01 1900-01-01 01:01:01
 4 2.0 1271846.3  4 c i t x 1900-01-01 1900-01-01 01:01:01
 5 2.5  829737.4d j u w 1900-01-01
 6 3.0 1240967.3  5 e k v v 1900-01-01 1900-01-01 01:01:01
 7 3.5  919684.4  6 f l w u 1900-01-01 1900-01-01 01:01:01
 8 4.0  968214.6  7 g m x t 1900-01-01 1900-01-01 01:01:01
 9 4.5 1232076.4  8 h n y s 1900-01-01 1900-01-01 01:01:01
10 5.0 1141273.4  9 i o z r 1900-01-01 1900-01-01 01:01:01
   5.5  988481.4 10 j q 1900-01-01 1900-01-01 01:01:01

This is a FWF (fixed width format) file. I can not use read.table here,
because of missing values. I have tried with the following

 read.fwf(file=test.txt, widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
header=TRUE)

Error in read.table(file = FILE, header = header, sep = sep, as.is =
as.is,  :
more columns than column names

I could use:

 read.fwf(file=test.txt, widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
header=FALSE, skip=1)
   V1  V2V3 V4 V5 V6 V7 V8  V9 V10
1   1  NANA  1f  q 1900-01-01  1900-01-01 01:01:01
2   2 1.0 131.5  2 a  g  r  z  1900-01-01 01:01:01
3   3 1.5 1188830.5  3 b  h  s  y  1900-01-01  1900-01-01 01:01:01
4   4 2.0 1271846.3  4 c  i  t  x  1900-01-01  1900-01-01 01:01:01
5   5 2.5  829737.4 NA d  j  u  w  1900-01-01
6   6 3.0 1240967.3  5 e  k  v  v  1900-01-01  1900-01-01 01:01:01
7   7 3.5  919684.4  6 f  l  w  u  1900-01-01  1900-01-01 01:01:01
8   8 4.0  968214.6  7 g  m  x  t  1900-01-01  1900-01-01 01:01:01
9   9 4.5 1232076.4  8 h  n  y  s  1900-01-01  1900-01-01 01:01:01
10 10 5.0 1141273.4  9 i  o  z  r  1900-01-01  1900-01-01 01:01:01
11 NA 5.5  988481.4 10 jq  1900-01-01  1900-01-01 01:01:01

Does anyone have a clue, how to get above result with header?

Thanks!

-- 
Lep pozdrav / With regards,
Gregor Gorjanc
--
University of Ljubljana PhD student
Biotechnical Faculty
Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
Groblje 3   mail: gregor.gorjanc at bfro.uni-lj.si

SI-1230 Domzale tel: +386 (0)1 72 17 861
Slovenia, Europefax: +386 (0)1 72 17 888

--
One must learn by doing the thing; for though you think you know it,
 you have no certainty until you try. Sophocles ~ 450 B.C.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.fwf and header

2006-10-30 Thread Marc Schwartz
On Mon, 2006-10-30 at 19:51 +0100, Gregor Gorjanc wrote:
 Hi!
 
 I have data (also in attached file) in the following form:
 
 num1 num2 num3 int1 fac1 fac2 cha1 cha2 Date POSIXt
  11   f q   1900-01-01 1900-01-01 01:01:01
  2 1.0 131.5  2 a g r z1900-01-01 01:01:01
  3 1.5 1188830.5  3 b h s y 1900-01-01 1900-01-01 01:01:01
  4 2.0 1271846.3  4 c i t x 1900-01-01 1900-01-01 01:01:01
  5 2.5  829737.4d j u w 1900-01-01
  6 3.0 1240967.3  5 e k v v 1900-01-01 1900-01-01 01:01:01
  7 3.5  919684.4  6 f l w u 1900-01-01 1900-01-01 01:01:01
  8 4.0  968214.6  7 g m x t 1900-01-01 1900-01-01 01:01:01
  9 4.5 1232076.4  8 h n y s 1900-01-01 1900-01-01 01:01:01
 10 5.0 1141273.4  9 i o z r 1900-01-01 1900-01-01 01:01:01
5.5  988481.4 10 j q 1900-01-01 1900-01-01 01:01:01
 
 This is a FWF (fixed width format) file. I can not use read.table here,
 because of missing values. I have tried with the following
 
  read.fwf(file=test.txt, widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
 header=TRUE)
 
 Error in read.table(file = FILE, header = header, sep = sep, as.is =
 as.is,  :
   more columns than column names
 
 I could use:
 
  read.fwf(file=test.txt, widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
 header=FALSE, skip=1)
V1  V2V3 V4 V5 V6 V7 V8  V9 V10
 1   1  NANA  1f  q 1900-01-01  1900-01-01 01:01:01
 2   2 1.0 131.5  2 a  g  r  z  1900-01-01 01:01:01
 3   3 1.5 1188830.5  3 b  h  s  y  1900-01-01  1900-01-01 01:01:01
 4   4 2.0 1271846.3  4 c  i  t  x  1900-01-01  1900-01-01 01:01:01
 5   5 2.5  829737.4 NA d  j  u  w  1900-01-01
 6   6 3.0 1240967.3  5 e  k  v  v  1900-01-01  1900-01-01 01:01:01
 7   7 3.5  919684.4  6 f  l  w  u  1900-01-01  1900-01-01 01:01:01
 8   8 4.0  968214.6  7 g  m  x  t  1900-01-01  1900-01-01 01:01:01
 9   9 4.5 1232076.4  8 h  n  y  s  1900-01-01  1900-01-01 01:01:01
 10 10 5.0 1141273.4  9 i  o  z  r  1900-01-01  1900-01-01 01:01:01
 11 NA 5.5  988481.4 10 jq  1900-01-01  1900-01-01 01:01:01
 
 Does anyone have a clue, how to get above result with header?
 
 Thanks!

The attachment did not come through. Perhaps it was too large?

Not sure if this is the most efficient way, but how about this:

DF - read.fwf(test.txt, 
widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
skip = 1, strip.white = TRUE,
col.names = read.table(test.txt, 
   nrow = 1, as.is = TRUE)[1, ])


 DF
   num1 num2  num3 int1 fac1 fac2 cha1 cha2   Date
1 1   NANA1 fq  1900-01-01
2 2  1.0 131.52agrz   
3 3  1.5 1188830.53bhsy 1900-01-01
4 4  2.0 1271846.34citx 1900-01-01
5 5  2.5  829737.4   NAdjuw 1900-01-01
6 6  3.0 1240967.35ekvv 1900-01-01
7 7  3.5  919684.46flwu 1900-01-01
8 8  4.0  968214.67gmxt 1900-01-01
9 9  4.5 1232076.48hnys 1900-01-01
10   10  5.0 1141273.49iozr 1900-01-01
11   NA  5.5  988481.4   10j  q 1900-01-01
POSIXt
1  1900-01-01 01:01:01
2  1900-01-01 01:01:01
3  1900-01-01 01:01:01
4  1900-01-01 01:01:01
5 NA
6  1900-01-01 01:01:01
7  1900-01-01 01:01:01
8  1900-01-01 01:01:01
9  1900-01-01 01:01:01
10 1900-01-01 01:01:01
11 1900-01-01 01:01:01


Of course, with the limited number of columns, you can always just set 

colnames(DF) - c(num1, num2, num3, int1, fac1, 
  fac2, cha1, cha2, Date, POSIXt)

as a post-import step.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.fwf and header

2006-10-30 Thread Daniel Nordlund
Gregor,

According to the help for read.fwf, sep needs to be set to a value that occurs 
only in the header record.  I changed the spaces to commas in the header record 
of your example and used the following syntax and was able to read the file 
just fine.

new.data-read.fwf(file=test.txt, widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 19),
  header=TRUE, sep=',')

Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA  USA

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 On Behalf Of Gregor Gorjanc
 Sent: Monday, October 30, 2006 10:52 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] read.fwf and header
 
 Hi!
 
 I have data (also in attached file) in the following form:
 
 num1 num2 num3 int1 fac1 fac2 cha1 cha2 Date POSIXt
  11   f q   1900-01-01 1900-01-01 01:01:01
  2 1.0 131.5  2 a g r z1900-01-01 01:01:01
  3 1.5 1188830.5  3 b h s y 1900-01-01 1900-01-01 01:01:01
  4 2.0 1271846.3  4 c i t x 1900-01-01 1900-01-01 01:01:01
  5 2.5  829737.4d j u w 1900-01-01
  6 3.0 1240967.3  5 e k v v 1900-01-01 1900-01-01 01:01:01
  7 3.5  919684.4  6 f l w u 1900-01-01 1900-01-01 01:01:01
  8 4.0  968214.6  7 g m x t 1900-01-01 1900-01-01 01:01:01
  9 4.5 1232076.4  8 h n y s 1900-01-01 1900-01-01 01:01:01
 10 5.0 1141273.4  9 i o z r 1900-01-01 1900-01-01 01:01:01
5.5  988481.4 10 j q 1900-01-01 1900-01-01 01:01:01
 
 This is a FWF (fixed width format) file. I can not use read.table here,
 because of missing values. I have tried with the following
 
  read.fwf(file=test.txt, widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
 header=TRUE)
 
 Error in read.table(file = FILE, header = header, sep = sep, as.is =
 as.is,  :
   more columns than column names
 
 I could use:
 
  read.fwf(file=test.txt, widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
 header=FALSE, skip=1)
V1  V2V3 V4 V5 V6 V7 V8  V9 V10
 1   1  NANA  1f  q 1900-01-01  1900-01-01 01:01:01
 2   2 1.0 131.5  2 a  g  r  z  1900-01-01 01:01:01
 3   3 1.5 1188830.5  3 b  h  s  y  1900-01-01  1900-01-01 01:01:01
 4   4 2.0 1271846.3  4 c  i  t  x  1900-01-01  1900-01-01 01:01:01
 5   5 2.5  829737.4 NA d  j  u  w  1900-01-01
 6   6 3.0 1240967.3  5 e  k  v  v  1900-01-01  1900-01-01 01:01:01
 7   7 3.5  919684.4  6 f  l  w  u  1900-01-01  1900-01-01 01:01:01
 8   8 4.0  968214.6  7 g  m  x  t  1900-01-01  1900-01-01 01:01:01
 9   9 4.5 1232076.4  8 h  n  y  s  1900-01-01  1900-01-01 01:01:01
 10 10 5.0 1141273.4  9 i  o  z  r  1900-01-01  1900-01-01 01:01:01
 11 NA 5.5  988481.4 10 jq  1900-01-01  1900-01-01 01:01:01
 
 Does anyone have a clue, how to get above result with header?
 
 Thanks!
 
 --
 Lep pozdrav / With regards,
 Gregor Gorjanc
 --
 University of Ljubljana PhD student
 Biotechnical Faculty
 Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
 Groblje 3   mail: gregor.gorjanc at bfro.uni-lj.si
 
 SI-1230 Domzale tel: +386 (0)1 72 17 861
 Slovenia, Europefax: +386 (0)1 72 17 888
 
 --
 One must learn by doing the thing; for though you think you know it,
  you have no certainty until you try. Sophocles ~ 450 B.C.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.fwf and header

2006-10-30 Thread Gregor Gorjanc
Marc Schwartz wrote:
 On Mon, 2006-10-30 at 19:51 +0100, Gregor Gorjanc wrote:
 Hi!

 I have data (also in attached file) in the following form:

 num1 num2 num3 int1 fac1 fac2 cha1 cha2 Date POSIXt
  11   f q   1900-01-01 1900-01-01 01:01:01
  2 1.0 131.5  2 a g r z1900-01-01 01:01:01
  3 1.5 1188830.5  3 b h s y 1900-01-01 1900-01-01 01:01:01
  4 2.0 1271846.3  4 c i t x 1900-01-01 1900-01-01 01:01:01
  5 2.5  829737.4d j u w 1900-01-01
  6 3.0 1240967.3  5 e k v v 1900-01-01 1900-01-01 01:01:01
  7 3.5  919684.4  6 f l w u 1900-01-01 1900-01-01 01:01:01
  8 4.0  968214.6  7 g m x t 1900-01-01 1900-01-01 01:01:01
  9 4.5 1232076.4  8 h n y s 1900-01-01 1900-01-01 01:01:01
 10 5.0 1141273.4  9 i o z r 1900-01-01 1900-01-01 01:01:01
5.5  988481.4 10 j q 1900-01-01 1900-01-01 01:01:01

 This is a FWF (fixed width format) file. I can not use read.table here,
 because of missing values. I have tried with the following

 read.fwf(file=test.txt, widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
 header=TRUE)

 Error in read.table(file = FILE, header = header, sep = sep, as.is =
 as.is,  :
  more columns than column names

 I could use:

 read.fwf(file=test.txt, widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
 header=FALSE, skip=1)
V1  V2V3 V4 V5 V6 V7 V8  V9 V10
 1   1  NANA  1f  q 1900-01-01  1900-01-01 01:01:01
 2   2 1.0 131.5  2 a  g  r  z  1900-01-01 01:01:01
 3   3 1.5 1188830.5  3 b  h  s  y  1900-01-01  1900-01-01 01:01:01
 4   4 2.0 1271846.3  4 c  i  t  x  1900-01-01  1900-01-01 01:01:01
 5   5 2.5  829737.4 NA d  j  u  w  1900-01-01
 6   6 3.0 1240967.3  5 e  k  v  v  1900-01-01  1900-01-01 01:01:01
 7   7 3.5  919684.4  6 f  l  w  u  1900-01-01  1900-01-01 01:01:01
 8   8 4.0  968214.6  7 g  m  x  t  1900-01-01  1900-01-01 01:01:01
 9   9 4.5 1232076.4  8 h  n  y  s  1900-01-01  1900-01-01 01:01:01
 10 10 5.0 1141273.4  9 i  o  z  r  1900-01-01  1900-01-01 01:01:01
 11 NA 5.5  988481.4 10 jq  1900-01-01  1900-01-01 01:01:01

 Does anyone have a clue, how to get above result with header?

 Thanks!
 
 The attachment did not come through. Perhaps it was too large?
 
 Not sure if this is the most efficient way, but how about this:
 
 DF - read.fwf(test.txt, 
 widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
 skip = 1, strip.white = TRUE,
 col.names = read.table(test.txt, 
nrow = 1, as.is = TRUE)[1, ])
 

Argh, my fault as I forgot to attach it :(

 Not sure if this is the most efficient way, but how about this:

 DF - read.fwf(test.txt,
 widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
 skip = 1, strip.white = TRUE,
 col.names = read.table(test.txt,
nrow = 1, as.is = TRUE)[1, ])


That is a very nice compromise! No need for [1, ], due to nrow=1.

 Of course, with the limited number of columns, you can always just set

 colnames(DF) - c(num1, num2, num3, int1, fac1,
   fac2, cha1, cha2, Date, POSIXt)


I fully agree here, but I kind of lack this directly in read.fwf. I hope
that someone from R-core is also listening to this ;)

Thank you!

Gregor
num1 num2 num3 int1 fac1 fac2 cha1 cha2 Date POSIXt
 11   f q   1900-01-01 1900-01-01 01:01:01
 2 1.0 131.5  2 a g r z1900-01-01 01:01:01
 3 1.5 1188830.5  3 b h s y 1900-01-01 1900-01-01 01:01:01
 4 2.0 1271846.3  4 c i t x 1900-01-01 1900-01-01 01:01:01
 5 2.5  829737.4d j u w 1900-01-01
 6 3.0 1240967.3  5 e k v v 1900-01-01 1900-01-01 01:01:01
 7 3.5  919684.4  6 f l w u 1900-01-01 1900-01-01 01:01:01
 8 4.0  968214.6  7 g m x t 1900-01-01 1900-01-01 01:01:01
 9 4.5 1232076.4  8 h n y s 1900-01-01 1900-01-01 01:01:01
10 5.0 1141273.4  9 i o z r 1900-01-01 1900-01-01 01:01:01
   5.5  988481.4 10 j q 1900-01-01 1900-01-01 01:01:01
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.fwf and header

2006-10-30 Thread Gregor Gorjanc
Daniel Nordlund wrote:
 Gregor,
 
 According to the help for read.fwf, sep needs to be set to a value that 
 occurs only in the header record.  I changed the spaces to commas in the 
 header record of your example and used the following syntax and was able to 
 read the file just fine.
 
 new.data-read.fwf(file=test.txt, widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 19),
   header=TRUE, sep=',')
 
 Hope this is helpful,
 
 Dan

Thanks Dan! But I have to modfy file first. Not that much of work but still.

Regards, Gregor

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.fwf and header

2006-10-30 Thread Martin Maechler
 Gregor == Gregor Gorjanc [EMAIL PROTECTED]
 on Mon, 30 Oct 2006 23:33:21 +0100 writes:

Gregor Daniel Nordlund wrote:
 Gregor,
 
 According to the help for read.fwf, sep needs to be set
 to a value that occurs only in the header record.  I
 changed the spaces to commas in the header record of your
 example and used the following syntax and was able to
 read the file just fine.
 
 new.data-read.fwf(file=test.txt, widths=c(3, 4, 10, 3,
 2, 2, 2, 2, 11, 19), header=TRUE, sep=',')
 
 Hope this is helpful,
 
 Dan

Gregor Thanks Dan! But I have to modfy file first. Not that
Gregor much of work but still.

Yes, but I think it shows read.fwf() should not be extended for
even more special cases:

In my (and probably R-core's) view, read.fwf() should only have
to be used for ``legacy data files'' (those times when people used *no*
separators in order to save disk space), since nowadays, such
data files should automatically have correct separators. 

-- Fix the file producing process rather than make read.fwf()
unnecessarily more complicated.

Martin Maechler, ETH Zurich

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.