[R] how to import such data to R?
the data file has such structure: 1992 6245 49 . . 20 1 0 0 8.739536 0 . . . . . . . .alabama . 0 . 1993 7677 58 . . 15 1 0 0 8.945984 1 . 0 .2064476 -5 0 . 0 8.739536alabama 9 0 0 1992 13327 57 36 58 16 0 0 0 9.497547 0 47 . . . . . 0 .arizona . 0 . 1993 19860 57 36 58 16 1 1 0 9.896463 1 47 0 .3989162 0 1 0 1 9.497547arizona 0 1 1 1992 10422 37 28 58 20 0 0 0 9.251675 0 43 . . . . . -1 . arizona state . 0 . --snip- the data descriptions is: variable names: year apps top25 ver500mth500stufacbowl btitle finfour lapps d93 avg500cfinfour clappscstufac cbowl cavg500 cbtitle lapps_1 schoolctop25bball cbball Obs: 118 1. year 1992 or 1993 2. apps # applics for admission 3. top25perc frosh class in 25th high sch percen 4. ver500 perc frosh = 500 on verbal SAT 5. mth500 perc frosh = 500 on math SAT 6. stufac student-faculty ratio 7. bowl = 1 if bowl game in prev year 8. btitle = 1 if men's cnf chmps prev year 9. finfour = 1 if men's final 4 prev year 10. lappslog(apps) 11. d93 =1 if year = 1993 12. avg500 (ver500+mth500)/2 13. cfinfour change in finfour 14. clapps change in lapps 15. cstufac change in stufac 16. cbowlchange in bowl 17. cavg500 change in avg500 18. cbtitle change in btitle 19. lapps_1 lapps lagged 20. school university name 21. ctop25 change in top25 22. bball=1 if btitle or finfour 23. cbball change in bball so the each four lines represent one case,can some variables are numeric and some are character. I though the scan can read it in ,but it seems somewhat tricky as the mixed type of variables.any suggestions? the attachmen is the raw data and the description of the data. 2005-10-15 -- Deparment of Sociology Fudan University My new mail addres is [EMAIL PROTECTED] Blog:http://sociology.yculblog.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] how to import such data to R?
Dear ronggui, I didn't find any attachments, but using the data lines in your message, and assuming that . represents missing data, the following appears to do what you want: as.data.frame(scan(c:/temp/ronggui.txt, list(year=1, apps=1, top25=1, ver500=1, mth500=1, stufac=1, bowl=1, btitle=1, finfour=1, lapps=1, d93=1, avg500=1, cfinfour=1, clapps=1, cstufac=1, cbowl=1, cavg500=1, cbtitle=1, lapps_1=1, school=, ctop25=1, bball=1, cbball=1,), na.strings=.)) See ?scan for details. I hope this helps, John On Sat, 15 Oct 2005 15:57:42 +0800 ronggui [EMAIL PROTECTED] wrote: the data file has such structure: 1992 6245 49 . . 20 1 0 0 8.739536 0 . . . . . . . . alabama . 0 . 1993 7677 58 . . 15 1 0 0 8.945984 1 . 0 .2064476 -5 0 . 0 8.739536 alabama 9 0 0 1992 13327 57 36 58 16 0 0 0 9.497547 0 47 . . . . . 0 . arizona . 0 . 1993 19860 57 36 58 16 1 1 0 9.896463 1 47 0 .3989162 0 1 0 1 9.497547 arizona 0 1 1 1992 10422 37 28 58 20 0 0 0 9.251675 0 43 . . . . . -1 . arizona state . 0 . --snip- the data descriptions is: variable names: year apps top25 ver500mth500stufacbowl btitle finfour lapps d93 avg500cfinfour clappscstufac cbowl cavg500 cbtitle lapps_1 schoolctop25bball cbball Obs: 118 1. year 1992 or 1993 2. apps # applics for admission 3. top25perc frosh class in 25th high sch percen 4. ver500 perc frosh = 500 on verbal SAT 5. mth500 perc frosh = 500 on math SAT 6. stufac student-faculty ratio 7. bowl = 1 if bowl game in prev year 8. btitle = 1 if men's cnf chmps prev year 9. finfour = 1 if men's final 4 prev year 10. lappslog(apps) 11. d93 =1 if year = 1993 12. avg500 (ver500+mth500)/2 13. cfinfour change in finfour 14. clapps change in lapps 15. cstufac change in stufac 16. cbowlchange in bowl 17. cavg500 change in avg500 18. cbtitle change in btitle 19. lapps_1 lapps lagged 20. school university name 21. ctop25 change in top25 22. bball=1 if btitle or finfour 23. cbball change in bball so the each four lines represent one case,can some variables are numeric and some are character. I though the scan can read it in ,but it seems somewhat tricky as the mixed type of variables.any suggestions? the attachmen is the raw data and the description of the data. 2005-10-15 -- Deparment of Sociology Fudan University My new mail addres is [EMAIL PROTECTED] Blog:http://sociology.yculblog.com John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] how to import such data to R?
It seems my last post not sent successfully ,so I post again. - the data file has such structure: 1992 6245 49 . . 20 1 0 0 8.739536 0 . . . . . . . .alabama . 0 . 1993 7677 58 . . 15 1 0 0 8.945984 1 . 0 .2064476 -5 0 . 0 8.739536alabama 9 0 0 1992 13327 57 36 58 16 0 0 0 9.497547 0 47 . . . . . 0 .arizona . 0 . 1993 19860 57 36 58 16 1 1 0 9.896463 1 47 0 .3989162 0 1 0 1 9.497547arizona 0 1 1 1992 10422 37 28 58 20 0 0 0 9.251675 0 43 . . . . . -1 . arizona state . 0 . --snip- the data descriptions is: variable names: year apps top25 ver500mth500stufacbowl btitle finfour lapps d93 avg500cfinfour clappscstufac cbowl cavg500 cbtitle lapps_1 schoolctop25bball cbball Obs: 118 1. year 1992 or 1993 2. apps # applics for admission 3. top25perc frosh class in 25th high sch percen 4. ver500 perc frosh = 500 on verbal SAT 5. mth500 perc frosh = 500 on math SAT 6. stufac student-faculty ratio 7. bowl = 1 if bowl game in prev year 8. btitle = 1 if men's cnf chmps prev year 9. finfour = 1 if men's final 4 prev year 10. lappslog(apps) 11. d93 =1 if year = 1993 12. avg500 (ver500+mth500)/2 13. cfinfour change in finfour 14. clapps change in lapps 15. cstufac change in stufac 16. cbowlchange in bowl 17. cavg500 change in avg500 18. cbtitle change in btitle 19. lapps_1 lapps lagged 20. school university name 21. ctop25 change in top25 22. bball=1 if btitle or finfour 23. cbball change in bball so the each four lines represent one case,can some variables are numeric and some are character. I though the scan can read it in ,but it seems somewhat tricky as the mixed type of variables.any suggestions? 2005-10-15 -- Deparment of Sociology Fudan University My new mail addres is [EMAIL PROTECTED] Blog:http://sociology.yculblog.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] how to import such data to R?
On Sat, 2005-10-15 at 23:54 +0800, ronggui wrote: It seems my last post not sent successfully ,so I post again. - the data file has such structure: 1992 6245 49 . . 20 1 0 0 8.739536 0 . . . . . . . .alabama . 0 . 1993 7677 58 . . 15 1 0 0 8.945984 1 . 0 .2064476 -5 0 . 0 8.739536alabama 9 0 0 1992 13327 57 36 58 16 0 0 0 9.497547 0 47 . . . . . 0 .arizona . 0 . 1993 19860 57 36 58 16 1 1 0 9.896463 1 47 0 .3989162 0 1 0 1 9.497547arizona 0 1 1 1992 10422 37 28 58 20 0 0 0 9.251675 0 43 . . . . . -1 . arizona state . 0 . --snip- the data descriptions is: variable names: year apps top25 ver500mth500stufacbowl btitle finfour lapps d93 avg500cfinfour clappscstufac cbowl cavg500 cbtitle lapps_1 schoolctop25bball cbball Obs: 118 1. year 1992 or 1993 2. apps # applics for admission 3. top25perc frosh class in 25th high sch percen 4. ver500 perc frosh = 500 on verbal SAT 5. mth500 perc frosh = 500 on math SAT 6. stufac student-faculty ratio 7. bowl = 1 if bowl game in prev year 8. btitle = 1 if men's cnf chmps prev year 9. finfour = 1 if men's final 4 prev year 10. lappslog(apps) 11. d93 =1 if year = 1993 12. avg500 (ver500+mth500)/2 13. cfinfour change in finfour 14. clapps change in lapps 15. cstufac change in stufac 16. cbowlchange in bowl 17. cavg500 change in avg500 18. cbtitle change in btitle 19. lapps_1 lapps lagged 20. school university name 21. ctop25 change in top25 22. bball=1 if btitle or finfour 23. cbball change in bball so the each four lines represent one case,can some variables are numeric and some are character. I though the scan can read it in ,but it seems somewhat tricky as the mixed type of variables.any suggestions? There may be an easier way, but here is one possible approach: First, use scan to read in the data. Set the 'what' argument to a list of atomic data types, based upon your specs above. Also, set the 'na.names' argument to '.'. This will read in the multiple lines for each record, into a single record based upon there being 23 elements per record. That is based upon 'length(what)'. Note also the 'multi.line' argument in scan(). data - scan(data.txt, what = c(rep(list(numeric(0)), 19), list(character(0)), rep(list(numeric(0)), 3)), na.strings = .) 'data' is now a list of values, where each list element is a proper column from your original data file. Now use as.data.frame(), which will take each list element and turn it into a column in a data frame. preserving the data types. data - as.data.frame(data) Now, read in the column names for the data frame from a text file, containing your field names above, and set the data frame column names to these. Names - scan(names.txt, what = character(0)) names(data) - Names Now review the structure of 'data': data year apps top25 ver500 mth500 stufac bowl btitle finfourlapps 1 1992 624549 NA NA 201 0 0 8.739536 2 1993 767758 NA NA 151 0 0 8.945984 3 1992 1332757 36 58 160 0 0 9.497547 4 1993 1986057 36 58 161 1 0 9.896463 5 1992 1042237 28 58 200 0 0 9.251675 d93 avg500 cfinfourclapps cstufac cbowl cavg500 cbtitle lapps_1 1 0 NA NANA NANA NA NA NA 2 1 NA0 0.2064476 -5 0 NA 0 8.739536
Re: [R] how to import such data to R?
On Sat, 2005-10-15 at 11:43 -0500, Marc Schwartz wrote: There may be an easier way, but here is one possible approach: First, use scan to read in the data. Set the 'what' argument to a list of atomic data types, based upon your specs above. Also, set the 'na.names' argument to '.'. Ackthat should of course read 'na.strings', not 'na.names'... Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html