Hello David,

I had the same problem with log files containing many fields separated by the 
"|" character.

My task was to extract parts of some fields with regular expression and 
normalize the result to compact them (using R functions factor and table)

To reduce the data size, I first split the logfile into "subfiles" containing 
only one field from the original data.
So I could process one field after the other instead of loading the complete 
log file.

under Linux:

        cutfile<-function(index,afile,tmpdir,wd){
        #index: list of fields to keep
        #afile: logfile
        setwd(wd)
        system(paste('for n  in ',index,'; \n',
         'do sudo gzip -dc ',afile,' | cut -f$n -d"|" > ',tmpdir,'/',afile,'.$n 
\n',
         'done;',sep=''))
        return(1)
}

exampe: cutfile(c(1,5,8),'mylog',outputdir,sourcedir)

=> files mylog,1, mylog.5, mylog.8

HTH,

Marc Mamin


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of David Mitchell
Sent: Friday, November 19, 2004 4:54 AM
To: [EMAIL PROTECTED]
Subject: [R] Tools for data preparation?


Hello list,

I'm regularly in the position where I have to do a lot of data
manipulation, in order to get the data I have into a format R is happy
with.  This manipulation would generally be in one of two forms:
- getting data from e.g. text log files into a tabular format
- extracting sensible sample data from a very large data set (i.e. too
large for R to handle)

In general, I use Perl or Python to do the task; I'm curious as to
what others use when they hit the same problem.

Regards

Dave Mitchell

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to