Thank you Gabor, Jean
On Tue, 16 Aug 2005, Gabor Grothendieck wrote: > On 8/16/05, Jean Eid <[EMAIL PROTECTED]> wrote: > > Dear all, > > > > My question is concerning the line > > "This is adequate for small files, but for anything more complicated we > > recommend using the facilities of a language like perl to pre-process > > the file." > > > > in the import/export manual. > > > > I have a large fixed-width file that I would like to preprocess in Perl or > > awk. The problem is that I do not know where to start. Does anyone have a > > simple example on how to turn a fixed-width file in any of these > > facilities into csv or tab delimited file. I guess I am looking for > > somewhat a perl for dummies or awk for dummies that does this. any > > pointers for website will be greatly appreciated > > > > > > Try to do it in R first. I have found that I rarely need to go to > an outside language to massage my data. > > # fixed with fields of 10 and 5 > Lines <- readLines("mydata.dat") > data.frame( field1 = as.numeric(substring(1,10,Lines), > field2 = as.numeric(substring(11,15,Lines) ) > > If you do find that you have speed or memory problems that > require that you go outside of R to preprocess your data > then the gawk version of awk has a FIELDWIDTHS variable that > makes handling fixed fields very easy. The gawk program below > assumes two fields of widths 10 and 5, respectively, which > is set in the first line. Then it repeatedly executes the > second line for each input line forcing field splitting by a > dummy manipulation (since field splitting is lazy) and then > printing each line, the default being to print out the > entire line with a space between successive fields: > > BEGIN { FIELDWIDTHS = "10 5" } > { $1 = $1; print } > > In R, do the following assuming the above two lines are in > split.awk: > > read.table(pipe("gawk -f split.awk mydata.dat")) > > or else run gawk outside of R then read in the output file > created: > > gawk -f split.awk mydata.dat > mydata2.dat > > For more information, google for > > FIELDWIDTHS gawk > > for that portion of the manual on FIELDWIDTHS -- it includes > an example and, of course, the whole manual is there too. The > book by Kernighan et al is also good. > > I have used both awk and perl and I think its unlikely you > would need perl given that you have R at your disposal for > the hard parts and awk is easier to learn, better designed > and more focused on this sort of task. > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html