[R] how to import such data to R?

2005-10-17 Thread ronggui
the data file has such structure:

 1992   6245 49  .  . 20  1
0  0   8.739536  0  .  .  .
.  .  .  .  .alabama
.  0  .
 1993   7677 58  .  . 15  1
0  0   8.945984  1  .  0   .2064476
   -5  0  .  0   8.739536alabama
9  0  0
 1992  13327 57 36 58 16  0
0  0   9.497547  0 47  .  .
.  .  .  0  .arizona
.  0  .
 1993  19860 57 36 58 16  1
1  0   9.896463  1 47  0   .3989162
0  1  0  1   9.497547arizona
0  1  1
 1992  10422 37 28 58 20  0
0  0   9.251675  0 43  .  .
.  .  . -1  .  arizona state
.  0  .

--snip-

the data descriptions is:

variable names:

year  apps  top25 ver500mth500stufacbowl  btitle   
finfour   lapps d93   avg500cfinfour  clappscstufac   cbowl
cavg500   cbtitle   lapps_1   schoolctop25bball cbball

  Obs:   118

  1. year 1992 or 1993
  2. apps # applics for admission
  3. top25perc frosh class in 25th high sch percen
  4. ver500   perc frosh = 500 on verbal SAT
  5. mth500   perc frosh = 500 on math SAT
  6. stufac   student-faculty ratio
  7. bowl = 1 if bowl game in prev year
  8. btitle   = 1 if men's cnf chmps prev year
  9. finfour  = 1 if men's final 4 prev year
 10. lappslog(apps)
 11. d93  =1 if year = 1993
 12. avg500   (ver500+mth500)/2
 13. cfinfour change in finfour
 14. clapps   change in lapps
 15. cstufac  change in stufac
 16. cbowlchange in bowl
 17. cavg500  change in avg500
 18. cbtitle  change in btitle
 19. lapps_1  lapps lagged
 20. school   university name
 21. ctop25   change in top25
 22. bball=1 if btitle or finfour
 23. cbball   change in bball


so the each four lines represent  one case,can some variables are numeric and 
some are character.
I though the scan can read it in ,but it seems somewhat tricky as the mixed 
type of variables.any suggestions?

the attachmen is the raw data and the description of the data.  



2005-10-15

--
Deparment of Sociology
Fudan University

My new mail addres is [EMAIL PROTECTED]
Blog:http://sociology.yculblog.com
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] how to import such data to R?

2005-10-17 Thread John Fox
Dear ronggui,

I didn't find any attachments, but using the data lines in your
message, and assuming that . represents missing data, the following
appears to do what you want:

as.data.frame(scan(c:/temp/ronggui.txt, 
list(year=1, apps=1, top25=1, ver500=1, 
mth500=1, stufac=1, bowl=1, btitle=1, finfour=1, lapps=1,
d93=1, 
avg500=1, cfinfour=1, clapps=1, cstufac=1, cbowl=1, cavg500=1,
 
cbtitle=1, lapps_1=1, school=, ctop25=1, bball=1, cbball=1,),
na.strings=.))

See ?scan for details.

I hope this helps,
 John

On Sat, 15 Oct 2005 15:57:42 +0800
 ronggui [EMAIL PROTECTED] wrote:
 the data file has such structure:
 
  1992   6245 49  .  . 20
  1
 0  0   8.739536  0  .  .
  .
 .  .  .  .  .
alabama
 .  0  .
  1993   7677 58  .  . 15
  1
 0  0   8.945984  1  .  0
   .2064476
-5  0  .  0   8.739536
alabama
 9  0  0
  1992  13327 57 36 58 16
  0
 0  0   9.497547  0 47  .
  .
 .  .  .  0  .
arizona
 .  0  .
  1993  19860 57 36 58 16
  1
 1  0   9.896463  1 47  0
   .3989162
 0  1  0  1   9.497547
arizona
 0  1  1
  1992  10422 37 28 58 20
  0
 0  0   9.251675  0 43  .
  .
 .  .  . -1  .  arizona
 state
 .  0  .
 
 --snip-
 
 the data descriptions is:
 
 variable names:
 
 year  apps  top25 ver500mth500stufacbowl
  btitle   
 finfour   lapps d93   avg500cfinfour  clappscstufac
   cbowl
 cavg500   cbtitle   lapps_1   schoolctop25bball cbball

 
   Obs:   118
 
   1. year 1992 or 1993
   2. apps # applics for admission
   3. top25perc frosh class in 25th high sch
 percen
   4. ver500   perc frosh = 500 on verbal SAT
   5. mth500   perc frosh = 500 on math SAT
   6. stufac   student-faculty ratio
   7. bowl = 1 if bowl game in prev year
   8. btitle   = 1 if men's cnf chmps prev year
   9. finfour  = 1 if men's final 4 prev year
  10. lappslog(apps)
  11. d93  =1 if year = 1993
  12. avg500   (ver500+mth500)/2
  13. cfinfour change in finfour
  14. clapps   change in lapps
  15. cstufac  change in stufac
  16. cbowlchange in bowl
  17. cavg500  change in avg500
  18. cbtitle  change in btitle
  19. lapps_1  lapps lagged
  20. school   university name
  21. ctop25   change in top25
  22. bball=1 if btitle or finfour
  23. cbball   change in bball
 
 
 so the each four lines represent  one case,can some variables are
 numeric and some are character.
 I though the scan can read it in ,but it seems somewhat tricky as the
 mixed type of variables.any suggestions?
 
 the attachmen is the raw data and the description of the data.
 
 
 
 2005-10-15
 
 --
 Deparment of Sociology
 Fudan University
 
 My new mail addres is [EMAIL PROTECTED]
 Blog:http://sociology.yculblog.com


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] how to import such data to R?

2005-10-15 Thread ronggui
It seems my last post  not sent successfully ,so I post again.

-
the data file has such structure:

 1992   6245 49  .  . 20  1
0  0   8.739536  0  .  .  .
.  .  .  .  .alabama
.  0  .
 1993   7677 58  .  . 15  1
0  0   8.945984  1  .  0   .2064476
   -5  0  .  0   8.739536alabama
9  0  0
 1992  13327 57 36 58 16  0
0  0   9.497547  0 47  .  .
.  .  .  0  .arizona
.  0  .
 1993  19860 57 36 58 16  1
1  0   9.896463  1 47  0   .3989162
0  1  0  1   9.497547arizona
0  1  1
 1992  10422 37 28 58 20  0
0  0   9.251675  0 43  .  .
.  .  . -1  .  arizona state
.  0  .

--snip-

the data descriptions is:

variable names:

year  apps  top25 ver500mth500stufacbowl  btitle   
finfour   lapps d93   avg500cfinfour  clappscstufac   cbowl
cavg500   cbtitle   lapps_1   schoolctop25bball cbball

  Obs:   118

  1. year 1992 or 1993
  2. apps # applics for admission
  3. top25perc frosh class in 25th high sch percen
  4. ver500   perc frosh = 500 on verbal SAT
  5. mth500   perc frosh = 500 on math SAT
  6. stufac   student-faculty ratio
  7. bowl = 1 if bowl game in prev year
  8. btitle   = 1 if men's cnf chmps prev year
  9. finfour  = 1 if men's final 4 prev year
 10. lappslog(apps)
 11. d93  =1 if year = 1993
 12. avg500   (ver500+mth500)/2
 13. cfinfour change in finfour
 14. clapps   change in lapps
 15. cstufac  change in stufac
 16. cbowlchange in bowl
 17. cavg500  change in avg500
 18. cbtitle  change in btitle
 19. lapps_1  lapps lagged
 20. school   university name
 21. ctop25   change in top25
 22. bball=1 if btitle or finfour
 23. cbball   change in bball


so the each four lines represent  one case,can some variables are numeric and 
some are character.
I though the scan can read it in ,but it seems somewhat tricky as the mixed 
type of variables.any suggestions?



2005-10-15

--
Deparment of Sociology
Fudan University

My new mail addres is [EMAIL PROTECTED]
Blog:http://sociology.yculblog.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] how to import such data to R?

2005-10-15 Thread Marc Schwartz
On Sat, 2005-10-15 at 23:54 +0800, ronggui wrote:
 It seems my last post  not sent successfully ,so I post again.
 
 -
 the data file has such structure:
 
  1992   6245 49  .  . 20  1
 0  0   8.739536  0  .  .  .
 .  .  .  .  .alabama
 .  0  .
  1993   7677 58  .  . 15  1
 0  0   8.945984  1  .  0   .2064476
-5  0  .  0   8.739536alabama
 9  0  0
  1992  13327 57 36 58 16  0
 0  0   9.497547  0 47  .  .
 .  .  .  0  .arizona
 .  0  .
  1993  19860 57 36 58 16  1
 1  0   9.896463  1 47  0   .3989162
 0  1  0  1   9.497547arizona
 0  1  1
  1992  10422 37 28 58 20  0
 0  0   9.251675  0 43  .  .
 .  .  . -1  .  arizona state
 .  0  .
 
 --snip-
 
 the data descriptions is:
 
 variable names:
 
 year  apps  top25 ver500mth500stufacbowl  btitle  
  
 finfour   lapps d93   avg500cfinfour  clappscstufac   cbowl   
  
 cavg500   cbtitle   lapps_1   schoolctop25bball cbball
 
   Obs:   118
 
   1. year 1992 or 1993
   2. apps # applics for admission
   3. top25perc frosh class in 25th high sch percen
   4. ver500   perc frosh = 500 on verbal SAT
   5. mth500   perc frosh = 500 on math SAT
   6. stufac   student-faculty ratio
   7. bowl = 1 if bowl game in prev year
   8. btitle   = 1 if men's cnf chmps prev year
   9. finfour  = 1 if men's final 4 prev year
  10. lappslog(apps)
  11. d93  =1 if year = 1993
  12. avg500   (ver500+mth500)/2
  13. cfinfour change in finfour
  14. clapps   change in lapps
  15. cstufac  change in stufac
  16. cbowlchange in bowl
  17. cavg500  change in avg500
  18. cbtitle  change in btitle
  19. lapps_1  lapps lagged
  20. school   university name
  21. ctop25   change in top25
  22. bball=1 if btitle or finfour
  23. cbball   change in bball
 
 
 so the each four lines represent  one case,can some variables are numeric and 
 some are character.
 I though the scan can read it in ,but it seems somewhat tricky as the mixed 
 type of variables.any suggestions?

There may be an easier way, but here is one possible approach:

First, use scan to read in the data. Set the 'what' argument to a list
of atomic data types, based upon your specs above. Also, set the
'na.names' argument to '.'.

This will read in the multiple lines for each record, into a single
record based upon there being 23 elements per record. That is based upon
'length(what)'.  Note also the 'multi.line' argument in scan().

data - scan(data.txt, 
 what = c(rep(list(numeric(0)), 19), 
  list(character(0)), 
  rep(list(numeric(0)), 3)), 
 na.strings = .)


'data' is now a list of values, where each list element is a proper
column from your original data file. Now use as.data.frame(), which will
take each list element and turn it into a column in a data frame.
preserving the data types.

data - as.data.frame(data)


Now, read in the column names for the data frame from a text file,
containing your field names above, and set the data frame column names
to these.

Names - scan(names.txt, what = character(0))
names(data) - Names


Now review the structure of 'data':

 data
  year  apps top25 ver500 mth500 stufac bowl btitle finfourlapps
1 1992  624549 NA NA 201  0   0 8.739536
2 1993  767758 NA NA 151  0   0 8.945984
3 1992 1332757 36 58 160  0   0 9.497547
4 1993 1986057 36 58 161  1   0 9.896463
5 1992 1042237 28 58 200  0   0 9.251675
  d93 avg500 cfinfourclapps cstufac cbowl cavg500 cbtitle  lapps_1
1   0 NA   NANA  NANA  NA  NA   NA
2   1 NA0 0.2064476  -5 0  NA   0 8.739536

Re: [R] how to import such data to R?

2005-10-15 Thread Marc Schwartz
On Sat, 2005-10-15 at 11:43 -0500, Marc Schwartz wrote:

 There may be an easier way, but here is one possible approach:
 
 First, use scan to read in the data. Set the 'what' argument to a list
 of atomic data types, based upon your specs above. Also, set the
 'na.names' argument to '.'.

Ackthat should of course read 'na.strings', not 'na.names'...

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html