On Sat, 2005-10-15 at 23:54 +0800, ronggui wrote:
> It seems my last post  not sent successfully ,so I post again.
> 
> -------------
> the data file has such structure:
> 
>      1992       6245         49          .          .         20          1
>         0          0   8.739536          0          .          .          .
>         .          .          .          .          .            "alabama"
>         .          0          .
>      1993       7677         58          .          .         15          1
>         0          0   8.945984          1          .          0   .2064476
>        -5          0          .          0   8.739536            "alabama"
>         9          0          0
>      1992      13327         57         36         58         16          0
>         0          0   9.497547          0         47          .          .
>         .          .          .          0          .            "arizona"
>         .          0          .
>      1993      19860         57         36         58         16          1
>         1          0   9.896463          1         47          0   .3989162
>         0          1          0          1   9.497547            "arizona"
>         0          1          1
>      1992      10422         37         28         58         20          0
>         0          0   9.251675          0         43          .          .
>         .          .          .         -1          .      "arizona state"
>         .          0          .
> 
> ------snip-----
> 
> the data descriptions is:
> 
> variable names:
> 
> year      apps      top25     ver500    mth500    stufac    bowl      btitle  
>  
> finfour   lapps     d93       avg500    cfinfour  clapps    cstufac   cbowl   
>  
> cavg500   cbtitle   lapps_1   school    ctop25    bball     cbball    
> 
>   Obs:   118
> 
>   1. year                     1992 or 1993
>   2. apps                     # applics for admission
>   3. top25                    perc frosh class in 25th high sch percen
>   4. ver500                   perc frosh >= 500 on verbal SAT
>   5. mth500                   perc frosh >= 500 on math SAT
>   6. stufac                   student-faculty ratio
>   7. bowl                     = 1 if bowl game in prev year
>   8. btitle                   = 1 if men's cnf chmps prev year
>   9. finfour                  = 1 if men's final 4 prev year
>  10. lapps                    log(apps)
>  11. d93                      =1 if year = 1993
>  12. avg500                   (ver500+mth500)/2
>  13. cfinfour                 change in finfour
>  14. clapps                   change in lapps
>  15. cstufac                  change in stufac
>  16. cbowl                    change in bowl
>  17. cavg500                  change in avg500
>  18. cbtitle                  change in btitle
>  19. lapps_1                  lapps lagged
>  20. school                   university name
>  21. ctop25                   change in top25
>  22. bball                    =1 if btitle or finfour
>  23. cbball                   change in bball
> 
> 
> so the each four lines represent  one case,can some variables are numeric and 
> some are character.
> I though the scan can read it in ,but it seems somewhat tricky as the mixed 
> type of variables.any suggestions?

There may be an easier way, but here is one possible approach:

First, use scan to read in the data. Set the 'what' argument to a list
of atomic data types, based upon your specs above. Also, set the
'na.names' argument to '.'.

This will read in the multiple lines for each record, into a single
record based upon there being 23 elements per record. That is based upon
'length(what)'.  Note also the 'multi.line' argument in scan().

data <- scan("data.txt", 
             what = c(rep(list(numeric(0)), 19), 
                      list(character(0)), 
                      rep(list(numeric(0)), 3)), 
             na.strings = ".")


'data' is now a list of values, where each list element is a proper
column from your original data file. Now use as.data.frame(), which will
take each list element and turn it into a column in a data frame.
preserving the data types.

data <- as.data.frame(data)


Now, read in the column names for the data frame from a text file,
containing your field names above, and set the data frame column names
to these.

Names <- scan("names.txt", what = character(0))
names(data) <- Names


Now review the structure of 'data':

> data
  year  apps top25 ver500 mth500 stufac bowl btitle finfour    lapps
1 1992  6245    49     NA     NA     20    1      0       0 8.739536
2 1993  7677    58     NA     NA     15    1      0       0 8.945984
3 1992 13327    57     36     58     16    0      0       0 9.497547
4 1993 19860    57     36     58     16    1      1       0 9.896463
5 1992 10422    37     28     58     20    0      0       0 9.251675
  d93 avg500 cfinfour    clapps cstufac cbowl cavg500 cbtitle  lapps_1
1   0     NA       NA        NA      NA    NA      NA      NA       NA
2   1     NA        0 0.2064476      -5     0      NA       0 8.739536
3   0     47       NA        NA      NA    NA      NA       0       NA
4   1     47        0 0.3989162       0     1       0       1 9.497547
5   0     43       NA        NA      NA    NA      NA      -1       NA
         school ctop25 bball cbball
1       alabama     NA     0     NA
2       alabama      9     0      0
3       arizona     NA     0     NA
4       arizona      0     1      1
5 arizona state     NA     0     NA


HTH,

Marc Schwartz

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to