strapply in the gsubfn package can parse any input format that can be described using a regular expression and generally requires only one line of code to do the parsing. See: http://gsubfn.googlecode.com
On Mon, Dec 7, 2009 at 12:37 PM, Marshall Feldman <ma...@uri.edu> wrote: > I totally agree with Barry, although it's sometimes convenient to > include data with analysis code for debugging and/or documentation purposes. > > However, the example actually applies equally to separate data files. In > fact, the example is from the U.S. Bureau of Labor Statistics at > ftp://ftp.bls.gov/pub/time.series/sm/, which contains nothing but data > and documentation files. At issue is not where the data come from, but > rather how to parse relatively complex data organized inconsistently. > SAS has built-in the ability to parse five different organizations of > data: list (delimited), modified list, column, formatted, and mixed (see > http://www.masil.org/sas/input.html). It seems R can parse such data, > but only with considerable work by the user. It would be great to have a > function/package that implements something with as easy (hah!) and > flexible as SAS. > > Marsh > > Barry Rowlingson wrote: >> On Mon, Dec 7, 2009 at 3:53 PM, Marshall Feldman <ma...@uri.edu> wrote: >> >>> Regarding the various methods people have suggested, what if a typical >>> tab-delimited data line looks like: >>> >>> SMS11000000000000001 1990 M01 688.0 >>> >>> and the SAS INPUT statement is >>> >>> INPUT survey $ 1-2 seasonal $ 3 state $ 4-5 area $ 6-10 supersector $ >>> 11-12 @13 industry $8. datatype $ 21-22 year period $ value footnote $ ; >>> >>> Note that most data lines have no footnote item, as in the sample. >>> >>> Here (I think) we'd want all the character variables to be read as factors, >>> possibly "year" as a date, and "value" as numeric. >>> >> >> Actually I'm surprised that nobody has yet said what a clearly >> bonkers thing it is to mix up your data and your analysis code in a >> single file. Now suppose you have another set of data you want to >> analyse with the same code? Are you going to create a new file and >> paste the new data in? You've now got two copies of your analysis code >> - good luck keeping corrections to that code synchronised. >> >> This just seems like horrendously bad practice, which is one reason >> it's kludgy in R. If it was good practice, someone would surely have >> written a way to do it neatly. >> >> Keep your data in data files, and your functions in .R function >> files. You'll thank me later. >> >> Barry >> > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.