[R] how to drop fields by name when reading in data?

2010-03-19 Thread Peter Keller

I have a number of space separated files of weather data, with some
equivalent column names, and differing number of fields in each file.  Some
of the files have 40 or more vars, but I only want a subset of the fields. 
I can use colClasses with read.table to drop some of the fields, but only if
I know where those columns are in the first place, and they're not always in
the same place.   So I would like to be able to drop all unwanted columns on
import, by name.

In addition, most fields have a Q (quality) field next to them, and I need
to read of those as well, each Q next to its relevant field, such as
Temp, and rename to e.g., Temp.Q.

Some example data: 
Date HrMn I Type Dir Q I Spd Q Visby Q I Q Temp Q Dewpt Q Slp Q Pr Amt I Q
19450101 0900 4 SAO 315 1 N 1.0 1 024000 1 N 1 -37.0 1 -45.9 1 1031.8 1 99
999.9 9 9
19450101 1000 4 SAO 315 1 N 1.0 1 024000 1 N 1 -35.9 1 -43.1 1 1032.2 1 99
999.9 9 9
19450101 1100 4 SAO 360 1 N 1.0 1 024000 1 N 1 -35.9 1 -43.1 1 1032.5 1 99
999.9 9 9
19450101 1200 4 SAO 315 1 N 1.0 1 024000 1 N 1 -36.4 1 -50.9 1 1032.9 1 99
999.9 9 9
19450101 1300 4 SAO 360 1 N 1.0 1 024000 1 N 1 -36.4 1 -43.1 1 1032.9 1 99
999.9 9 9
19450101 1400 4 SAO 315 1 N 1.0 1 016000 1 N 1 -36.4 1 -42.0 1 1032.5 1 99
999.9 9 9
19450101 1500 4 SAO 180 1 N 1.0 1 016000 1 N 1 -36.4 1 -45.3 1 1032.5 1 99
999.9 9 9
19450101 1600 4 SAO 360 1 N 1.0 1 024000 1 N 1 -37.5 1 -45.9 1 1032.9 1 99
999.9 9 9

So if I want to extract Date, HrMn, Temp, and the Q following Temp: 
tmp1-read.table(ex.dat,  sep= , strip.white=TRUE,
colClasses=c(character,character,
rep(NULL,11),numeric,factor,rep(NULL,8)),na.strings=999.9,
header=T)

But having to alter colClasses for every file, the fields of which may
change when next year's data is retrieved, is no fun.  And is there a way to
specify na.strings per column?

-- 
View this message in context: 
http://n4.nabble.com/how-to-drop-fields-by-name-when-reading-in-data-tp1601166p1601166.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to drop fields by name when reading in data?

2010-03-19 Thread David Winsemius


On Mar 19, 2010, at 3:03 PM, Peter Keller wrote:



I have a number of space separated files of weather data, with some
equivalent column names, and differing number of fields in each  
file.  Some
of the files have 40 or more vars, but I only want a subset of the  
fields.
I can use colClasses with read.table to drop some of the fields, but  
only if
I know where those columns are in the first place, and they're not  
always in
the same place.   So I would like to be able to drop all unwanted  
columns on

import, by name.

In addition, most fields have a Q (quality) field next to them,  
and I need

to read of those as well, each Q next to its relevant field, such as
Temp, and rename to e.g., Temp.Q.


Those will probably get changed to Q.1, Q.2, etc by check.names()



Some example data:
Date HrMn I Type Dir Q I Spd Q Visby Q I Q Temp Q Dewpt Q Slp Q Pr  
Amt I Q
19450101 0900 4 SAO 315 1 N 1.0 1 024000 1 N 1 -37.0 1 -45.9 1  
1031.8 1 99

999.9 9 9
19450101 1000 4 SAO 315 1 N 1.0 1 024000 1 N 1 -35.9 1 -43.1 1  
1032.2 1 99

999.9 9 9
19450101 1100 4 SAO 360 1 N 1.0 1 024000 1 N 1 -35.9 1 -43.1 1  
1032.5 1 99

999.9 9 9
19450101 1200 4 SAO 315 1 N 1.0 1 024000 1 N 1 -36.4 1 -50.9 1  
1032.9 1 99

999.9 9 9
19450101 1300 4 SAO 360 1 N 1.0 1 024000 1 N 1 -36.4 1 -43.1 1  
1032.9 1 99

999.9 9 9
19450101 1400 4 SAO 315 1 N 1.0 1 016000 1 N 1 -36.4 1 -42.0 1  
1032.5 1 99

999.9 9 9
19450101 1500 4 SAO 180 1 N 1.0 1 016000 1 N 1 -36.4 1 -45.3 1  
1032.5 1 99

999.9 9 9
19450101 1600 4 SAO 360 1 N 1.0 1 024000 1 N 1 -37.5 1 -45.9 1  
1032.9 1 99

999.9 9 9

So if I want to extract Date, HrMn, Temp, and the Q following Temp:
tmp1-read.table(ex.dat, sep= , strip.white=TRUE,
colClasses=c(character,character,
rep(NULL,11),numeric,factor,rep(NULL,8)),na.strings=999.9,
header=T)

But having to alter colClasses for every file, the fields of which may
change when next year's data is retrieved, is no fun.  And is there  
a way to

specify na.strings per column?


There might be if you wanted to write an as.Method for a new data  
type. There was a recent answer to an r-help currency conversion  
question that illustrated this approach.




--
View this message in context: 
http://n4.nabble.com/how-to-drop-fields-by-name-when-reading-in-data-tp1601166p1601166.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.