[R] preprocessing data

2005-08-16 Thread Jean Eid
Dear all,

My question is concerning the line
This is adequate for small files, but for anything more complicated we
recommend using the facilities   of a language like perl to pre-process
the file.

in the import/export manual.

I have a large fixed-width file that I would like to preprocess in Perl or
awk. The problem is that I do not know where to start. Does anyone have a
simple example on how to turn a fixed-width file in any of these
facilities into csv or tab delimited file. I guess I am looking for
somewhat a perl for dummies or awk for dummies that does this. any
pointers for website will be greatly appreciated

Thank you


Jean Eid

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] preprocessing data

2005-08-16 Thread Kevin E. Thorpe
Some time ago, Doug Bates wrote a useful paper called Data
manipulatation in perl.  It is a very concise intoduction and
introduces the unpack function which is one way to deal with fixed
format data.  Just google for

   data manipulation in perl bates

and you should be able to find a copy.

Jean Eid wrote:
 Dear all,
 
 My question is concerning the line
 This is adequate for small files, but for anything more complicated we
 recommend using the facilities   of a language like perl to pre-process
 the file.
 
 in the import/export manual.
 
 I have a large fixed-width file that I would like to preprocess in Perl or
 awk. The problem is that I do not know where to start. Does anyone have a
 simple example on how to turn a fixed-width file in any of these
 facilities into csv or tab delimited file. I guess I am looking for
 somewhat a perl for dummies or awk for dummies that does this. any
 pointers for website will be greatly appreciated
 
 Thank you
 
 
 Jean Eid
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 


-- 
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
email: [EMAIL PROTECTED]  Tel: 416.946.8081  Fax: 416.971.2462

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] preprocessing data

2005-08-16 Thread Jean Eid
Thank you, that is exactly what I was looking for.

Just a minor suggestion to the manual Import/Export. maybe a reference to
the paper right underneath the line below would be helpfull for people
like me that have never used perl and would like to take the suggestion to
preprosses the data


Jean

On Tue, 16 Aug 2005, Kevin E. Thorpe wrote:

 Some time ago, Doug Bates wrote a useful paper called Data
 manipulatation in perl.  It is a very concise intoduction and
 introduces the unpack function which is one way to deal with fixed
 format data.  Just google for

data manipulation in perl bates

 and you should be able to find a copy.

 Jean Eid wrote:
  Dear all,
 
  My question is concerning the line
  This is adequate for small files, but for anything more complicated we
  recommend using the facilities   of a language like perl to pre-process
  the file.
 
  in the import/export manual.
 
  I have a large fixed-width file that I would like to preprocess in Perl or
  awk. The problem is that I do not know where to start. Does anyone have a
  simple example on how to turn a fixed-width file in any of these
  facilities into csv or tab delimited file. I guess I am looking for
  somewhat a perl for dummies or awk for dummies that does this. any
  pointers for website will be greatly appreciated
 
  Thank you
 
 
  Jean Eid
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
 


 --
 Kevin E. Thorpe
 Biostatistician/Trialist, Knowledge Translation Program
 Assistant Professor, Department of Public Health Sciences
 Faculty of Medicine, University of Toronto
 email: [EMAIL PROTECTED]  Tel: 416.946.8081  Fax: 416.971.2462


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] preprocessing data

2005-08-16 Thread Gabor Grothendieck
On 8/16/05, Jean Eid [EMAIL PROTECTED] wrote:
 Dear all,
 
 My question is concerning the line
 This is adequate for small files, but for anything more complicated we
 recommend using the facilities   of a language like perl to pre-process
 the file.
 
 in the import/export manual.
 
 I have a large fixed-width file that I would like to preprocess in Perl or
 awk. The problem is that I do not know where to start. Does anyone have a
 simple example on how to turn a fixed-width file in any of these
 facilities into csv or tab delimited file. I guess I am looking for
 somewhat a perl for dummies or awk for dummies that does this. any
 pointers for website will be greatly appreciated
 



Try to do it in R first.  I have found that I rarely need to go to 
an outside language to massage my data.

# fixed with fields of 10 and 5
Lines - readLines(mydata.dat)
data.frame( field1 = as.numeric(substring(1,10,Lines),
field2 = as.numeric(substring(11,15,Lines) )

If you do find that you have speed or memory problems that
require that you go outside of R to preprocess your data
then the gawk version of awk has a FIELDWIDTHS variable that 
makes handling fixed fields very easy.  The gawk program below 
assumes two fields of widths 10 and 5, respectively, which
is set in the first line.   Then it repeatedly executes the 
second line for each input line forcing field splitting by a 
dummy manipulation (since field splitting is lazy) and then 
printing each line, the default being to print out the
entire line with a space between successive fields:

BEGIN { FIELDWIDTHS = 10 5 }
{ $1 = $1; print }  

In R, do the following assuming the above two lines are in 
split.awk:

read.table(pipe(gawk -f split.awk mydata.dat))

or else run gawk outside of R then read in the output file
created:

gawk -f split.awk mydata.dat  mydata2.dat

For more information, google for 

FIELDWIDTHS gawk 

for that portion of the manual on FIELDWIDTHS -- it includes
an example and, of course, the whole manual is there too.  The 
book by Kernighan et al is also good.

I have used both awk and perl and I think its unlikely you
would need perl given that you have R at your disposal for
the hard parts and awk is easier to learn, better designed 
and more focused on this sort of task.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] preprocessing data

2005-08-16 Thread Jean Eid
Thank you Gabor,

Jean

On Tue, 16 Aug 2005, Gabor Grothendieck wrote:

 On 8/16/05, Jean Eid [EMAIL PROTECTED] wrote:
  Dear all,
 
  My question is concerning the line
  This is adequate for small files, but for anything more complicated we
  recommend using the facilities   of a language like perl to pre-process
  the file.
 
  in the import/export manual.
 
  I have a large fixed-width file that I would like to preprocess in Perl or
  awk. The problem is that I do not know where to start. Does anyone have a
  simple example on how to turn a fixed-width file in any of these
  facilities into csv or tab delimited file. I guess I am looking for
  somewhat a perl for dummies or awk for dummies that does this. any
  pointers for website will be greatly appreciated
 



 Try to do it in R first.  I have found that I rarely need to go to
 an outside language to massage my data.

   # fixed with fields of 10 and 5
   Lines - readLines(mydata.dat)
   data.frame( field1 = as.numeric(substring(1,10,Lines),
   field2 = as.numeric(substring(11,15,Lines) )

 If you do find that you have speed or memory problems that
 require that you go outside of R to preprocess your data
 then the gawk version of awk has a FIELDWIDTHS variable that
 makes handling fixed fields very easy.  The gawk program below
 assumes two fields of widths 10 and 5, respectively, which
 is set in the first line.   Then it repeatedly executes the
 second line for each input line forcing field splitting by a
 dummy manipulation (since field splitting is lazy) and then
 printing each line, the default being to print out the
 entire line with a space between successive fields:

   BEGIN { FIELDWIDTHS = 10 5 }
   { $1 = $1; print }

 In R, do the following assuming the above two lines are in
 split.awk:

   read.table(pipe(gawk -f split.awk mydata.dat))

 or else run gawk outside of R then read in the output file
 created:

   gawk -f split.awk mydata.dat  mydata2.dat

 For more information, google for

   FIELDWIDTHS gawk

 for that portion of the manual on FIELDWIDTHS -- it includes
 an example and, of course, the whole manual is there too.  The
 book by Kernighan et al is also good.

 I have used both awk and perl and I think its unlikely you
 would need perl given that you have R at your disposal for
 the hard parts and awk is easier to learn, better designed
 and more focused on this sort of task.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] preprocessing data

2005-08-16 Thread David Smith
 My question is concerning the line 
 This is adequate for small files, but for anything more 
 complicated we
 recommend using the facilities   of a language like perl to 
 pre-process the file.

An alternative to Perl is to use the big data library of S-PLUS 7 Enterprise,
which would allow you to read in the entire fixed-format file and pre-process
it using S commands. You could then export the processed data to a file from
S-PLUS and import into R.  If your university has S-PLUS, S-PLUS 7 Enterprise
should be available (all academic institutions were upgraded to S-PLUS 7
Enterprise, which has the big data library).

You can read more information about the big data library at:

http://www.insightful.com/insightful_doclib/document.asp?id=167

# David Smith

-- 
David M Smith [EMAIL PROTECTED]
Senior Product Manager, Insightful Corp, Seattle WA
Tel: +1 (206) 802 2360
Fax: +1 (206) 283 6310

New S-PLUS 7! Create advanced statistical applications with large data sets.
www.insightful.com/splus

 -Original Message-
 From: Jean Eid [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, August 16, 2005 5:39 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] preprocessing data
 
 
 Dear all,
 
 My question is concerning the line
 This is adequate for small files, but for anything more 
 complicated we
 recommend using the facilities   of a language like perl to 
 pre-process
 the file.
 
 in the import/export manual.
 
 I have a large fixed-width file that I would like to 
 preprocess in Perl or
 awk. The problem is that I do not know where to start. Does 
 anyone have a
 simple example on how to turn a fixed-width file in any of these
 facilities into csv or tab delimited file. I guess I am looking for
 somewhat a perl for dummies or awk for dummies that does this. any
 pointers for website will be greatly appreciated
 
 Thank you
 
 
 Jean Eid
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html