Marc Schwartz wrote: > On Mon, 2006-10-30 at 19:51 +0100, Gregor Gorjanc wrote: >> Hi! >> >> I have data (also in attached file) in the following form: >> >> num1 num2 num3 int1 fac1 fac2 cha1 cha2 Date POSIXt >> 1 1 f q 1900-01-01 1900-01-01 01:01:01 >> 2 1.0 1316666.5 2 a g r z 1900-01-01 01:01:01 >> 3 1.5 1188830.5 3 b h s y 1900-01-01 1900-01-01 01:01:01 >> 4 2.0 1271846.3 4 c i t x 1900-01-01 1900-01-01 01:01:01 >> 5 2.5 829737.4 d j u w 1900-01-01 >> 6 3.0 1240967.3 5 e k v v 1900-01-01 1900-01-01 01:01:01 >> 7 3.5 919684.4 6 f l w u 1900-01-01 1900-01-01 01:01:01 >> 8 4.0 968214.6 7 g m x t 1900-01-01 1900-01-01 01:01:01 >> 9 4.5 1232076.4 8 h n y s 1900-01-01 1900-01-01 01:01:01 >> 10 5.0 1141273.4 9 i o z r 1900-01-01 1900-01-01 01:01:01 >> 5.5 988481.4 10 j q 1900-01-01 1900-01-01 01:01:01 >> >> This is a FWF (fixed width format) file. I can not use read.table here, >> because of missing values. I have tried with the following >> >>> read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20), >> header=TRUE) >> >> Error in read.table(file = FILE, header = header, sep = sep, as.is = >> as.is, : >> more columns than column names >> >> I could use: >> >>> read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20), >> header=FALSE, skip=1) >> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 >> 1 1 NA NA 1 f q 1900-01-01 1900-01-01 01:01:01 >> 2 2 1.0 1316666.5 2 a g r z 1900-01-01 01:01:01 >> 3 3 1.5 1188830.5 3 b h s y 1900-01-01 1900-01-01 01:01:01 >> 4 4 2.0 1271846.3 4 c i t x 1900-01-01 1900-01-01 01:01:01 >> 5 5 2.5 829737.4 NA d j u w 1900-01-01 >> 6 6 3.0 1240967.3 5 e k v v 1900-01-01 1900-01-01 01:01:01 >> 7 7 3.5 919684.4 6 f l w u 1900-01-01 1900-01-01 01:01:01 >> 8 8 4.0 968214.6 7 g m x t 1900-01-01 1900-01-01 01:01:01 >> 9 9 4.5 1232076.4 8 h n y s 1900-01-01 1900-01-01 01:01:01 >> 10 10 5.0 1141273.4 9 i o z r 1900-01-01 1900-01-01 01:01:01 >> 11 NA 5.5 988481.4 10 j q 1900-01-01 1900-01-01 01:01:01 >> >> Does anyone have a clue, how to get above result with header? >> >> Thanks! > > The attachment did not come through. Perhaps it was too large? > > Not sure if this is the most efficient way, but how about this: > > DF <- read.fwf("test.txt", > widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20), > skip = 1, strip.white = TRUE, > col.names = read.table("test.txt", > nrow = 1, as.is = TRUE)[1, ]) >
Argh, my fault as I forgot to attach it :( > Not sure if this is the most efficient way, but how about this: > > DF <- read.fwf("test.txt", > widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20), > skip = 1, strip.white = TRUE, > col.names = read.table("test.txt", > nrow = 1, as.is = TRUE)[1, ]) > That is a very nice compromise! No need for [1, ], due to nrow=1. > Of course, with the limited number of columns, you can always just set > > colnames(DF) <- c("num1", "num2", "num3", "int1", "fac1", > "fac2", "cha1", "cha2", "Date", "POSIXt") > I fully agree here, but I kind of lack this directly in read.fwf. I hope that someone from R-core is also listening to this ;) Thank you! Gregor
num1 num2 num3 int1 fac1 fac2 cha1 cha2 Date POSIXt 1 1 f q 1900-01-01 1900-01-01 01:01:01 2 1.0 1316666.5 2 a g r z 1900-01-01 01:01:01 3 1.5 1188830.5 3 b h s y 1900-01-01 1900-01-01 01:01:01 4 2.0 1271846.3 4 c i t x 1900-01-01 1900-01-01 01:01:01 5 2.5 829737.4 d j u w 1900-01-01 6 3.0 1240967.3 5 e k v v 1900-01-01 1900-01-01 01:01:01 7 3.5 919684.4 6 f l w u 1900-01-01 1900-01-01 01:01:01 8 4.0 968214.6 7 g m x t 1900-01-01 1900-01-01 01:01:01 9 4.5 1232076.4 8 h n y s 1900-01-01 1900-01-01 01:01:01 10 5.0 1141273.4 9 i o z r 1900-01-01 1900-01-01 01:01:01 5.5 988481.4 10 j q 1900-01-01 1900-01-01 01:01:01
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel