Ah, I should have mentioned this. Personally I work on Macs (Leopard)
and PC's (XP Pro and XP Pro x64). Even though the PC's do have Cygwin,
I'm trying to make this code portable. So I want to avoid such things as
sed, perl, etc.
I want to do this in R, even if processing is a bit slower. Eventually,
I'll hide the code in a class, so the code can be a bit complex.
Marsh Feldman
On 3/2/2010 12:29 PM, Chidambaram Annamalai wrote:
> I tried to shoehorn the read.* functions and match both the fixed
> width and the variable width fields
> in the data but it doesn't seem evident to me. (read.fwf reads fixed
> width data properly but the rest
> of the fields must be processed separately -- maybe insert NULL stubs
> in the remaining fields and
> fill them in later?)
>
> One way is to sidestep the entire issue and convert the structured
> data you have into a csv
> file using sed (usually available on most *nix systems) with
> something like so:
>
> cat data | sed -r 's/^(..)(.)(..)(.{6})(..)[ \t]*([^ \t]*)[ \t]*([^
> \t]*)[ \t]*([^ \t]*)[ \t]*([^ \t]*)[ \t]*([^
> \t]*)/\1,\2,\3,\4,\5,\6,\7,\8,\9/' | less
>
> and see if the output is alright and use the resulting .csv file
> directly in R using read.csv
>
> If that does not satisfy you maybe the R Wizards on the list might be
> able to point you to a
> native R way of doing this possibly using scan? I'm not sure though.
>
> Hope this helps,
> Chillu
>
> On Tue, Mar 2, 2010 at 9:42 PM, Marshall Feldman <[email protected]
> <mailto:[email protected]>> wrote:
>
> Hello R wizards,
>
> What is the best way to read a data file containing both
> fixed-width and
> tab-delimited files? (More detail follows.)
>
> _*Details:*_
> The U.S. Bureau of Labor Statistics provides local area unemployment
> statistics at ftp://ftp.bls.gov/pub/time.series/la/, and the data are
> documented in the file la.txt
> <ftp://ftp.bls.gov/pub/time.series/la/la.txt>. Each data file has five
> tab-delimited fields:
>
> * series_id
> * year
> * period (codes for things like quarter or month of year)
> * value
> * footnote_codes
>
> The series_id consists of five fixed-width subfields (length in
> parentheses):
>
> * survey abbreviation (2)
> * seasonal code (1)
> * area type code (2)
> * area code (6)
> * measure code (2)
>
> So an example record might be:
>
> LASPS36040003 1990 M01 8.8 L
>
> I want to read in the data in one pass and convert them to a data
> frame with the following columns (actual name, class in parentheses):
>
> Survey abbreviation (survey, character)
> Seasonal (seasonal, logical seasonal=T)
> Area type (area_type_code, factor)
> Area (area_code, factor)
> Measure (measure_code, factor)
> Year (year, Date)
> Period (period, factor)
> Value (value, numeric)
> Footnote (footnote_codes, character but see note)
>
> (Regarding the Footnote, I have to look at the data more. If there's
> just one code per record, this will be a factor; if there are
> multiple,
> it will either be character or a list. For not I'm making it only
> character.)
>
> Currently I can read the data just fine using read.table, but this
> makes
> series_id the first variable. I want to break out the subfields as
> separate columns.
>
> Any suggestions?
>
> Thanks.
> Marsh Feldman
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected] <mailto:[email protected]> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Dr. Marshall Feldman, PhD
Director of Research and Academic Affairs
CUSR Logo
Center for Urban Studies and Research
The University of Rhode Island
email: marsh @ uri .edu (remove spaces)
Contact Information:
Kingston:
202 Hart House
Charles T. Schmidt Labor Research Center
The University of Rhode Island
36 Upper College Road
Kingston, RI 02881-0815
tel. (401) 874-5953:
fax: (401) 874-5511
Providence:
206E Shepard Building
URI Feinstein Providence Campus
80 Washington Street
Providence, RI 02903-1819
tel. (401) 277-5218
fax: (401) 277-5464
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.