Re: [R] Stacking several vectors from the list
Monday, June 28, 2010, 4:40:11 PM, you wrote: On Mon, Jun 28, 2010 at 7:30 PM, astar...@uci.edu wrote: Hi everybody, I'm working on the very messy data, I have tried to clean it up in SAS and SAS/IML but there is not enough info on how to handle certain things in SAS so I have turned to R. The thing itself should be rather simple, so i was wondering if someone could help me out. The original .csv has ([1] 7138 6338 ) dimensions with funds with the corresponding dates and observations for each date for around 10 years and 4000+ funds, meaning in COL5 has the next fund's name and so on. COL1 COL2 COL3 COL4 HBNNF US Equity Date EQY_SH_OUT PX_VOLUME #NAME? #N/A N/A 135000 7/7/2008 #N/A N/A 105000 7/17/2008 #N/A N/A 59 7/22/2008 #N/A N/A 4 so in R this .csv is somehow read as list (using typeof) and not as dataframe, and a lot of stuff like regexpr searches in the The typeof of a data.frame is list so you do have a data frame -- not a list. Perhaps the problem is that you do not want factor columns but want character columns instead. Use read.csv(..., as.is = TRUE) Thanks!! This as.is trick solved the list issue and the whole indexing problem. Now the table is a true dataframe searchable and indexable. I'm still reading on those differences between in list and dataframe types. Arsenio __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stacking several vectors from the list
Hi everybody, I'm working on the very messy data, I have tried to clean it up in SAS and SAS/IML but there is not enough info on how to handle certain things in SAS so I have turned to R. The thing itself should be rather simple, so i was wondering if someone could help me out. The original .csv has ([1] 7138 6338 ) dimensions with funds with the corresponding dates and observations for each date for around 10 years and 4000+ funds, meaning in COL5 has the next fund's name and so on. COL1 COL2 COL3 COL4 HBNNF US Equity DateEQY_SH_OUT PX_VOLUME #NAME? #N/A N/A 135000 7/7/2008#N/A N/A 105000 7/17/2008 #N/A N/A 59 7/22/2008 #N/A N/A 4 so in R this .csv is somehow read as list (using typeof) and not as dataframe, and a lot of stuff like regexpr searches in the whole file do not work or behave strangely. I want to stack the fund data, and create a long dataset with a fund name, date, eqy_sh_out and px_volume, with fund name present for each date. That should look like this, Fund_name DateEQY_SH_OUT PX_VOLUME HBNNF US Equity 7/7/2008#N/A N/A105000 HBNNF US Equity 7/17/2008 #N/A N/A59 HBNNF US Equity 7/22/2008 #N/A N/A4 HBNNF US Equity 7/24/2008 #N/A N/A3000 HBNNF US Equity 7/31/2008 #N/A N/A1000 HBNNF US Equity 8/20/2008 #N/A N/A1000 HBNNF US Equity 8/26/2008 #N/A N/A2000 HBNNF US Equity 8/27/2008 #N/A N/A2000 HBNNF US Equity 9/2/2008#N/A N/A5000 HND CN Equity 1/17/2008 #N/A N/A28000 HND CN Equity 1/18/2008 #N/A N/A25000 HND CN Equity 1/21/2008 #N/A N/A5000 HND CN Equity 1/22/2008 #N/A N/A101000 HND CN Equity 1/23/2008 #N/A N/A122000 Any way to accomplish this? Should be an easy way, but i have never worked with lists and somehow it doesn't read as a dataframe with strange results. small_raw[1,1] [1] HBNNF US Equity Levels: 0.26 0.46 COL1 HBNNF US Equity grep(Equity,as.character(small_raw)) integer(0) small_raw[[1]] [1] HBNNF US Equity [5] [9] [13] [17] [21] [25] [29] [33] [37] [41] [45] [49] [53] [57] [61] [65] [69] [73] [77] [81] [85] [89] [93] [97] 0.460.46 [101] 0.460.26 [105] 0.260.26 [109] 0.260.26 [113] 0.260.26 [117] 0.260.26 [121] 0.260.26 [125] 0.260.26 [129] 0.260.26 [133] 0.260.26 [137] 0.260.26 [141] 0.260.26 [145] 0.260.26 [149] 0.26
Re: [R] Stacking several vectors from the list
On Mon, Jun 28, 2010 at 7:30 PM, astar...@uci.edu wrote: Hi everybody, I'm working on the very messy data, I have tried to clean it up in SAS and SAS/IML but there is not enough info on how to handle certain things in SAS so I have turned to R. The thing itself should be rather simple, so i was wondering if someone could help me out. The original .csv has ([1] 7138 6338 ) dimensions with funds with the corresponding dates and observations for each date for around 10 years and 4000+ funds, meaning in COL5 has the next fund's name and so on. COL1 COL2 COL3 COL4 HBNNF US Equity Date EQY_SH_OUT PX_VOLUME #NAME? #N/A N/A 135000 7/7/2008 #N/A N/A 105000 7/17/2008 #N/A N/A 59 7/22/2008 #N/A N/A 4 so in R this .csv is somehow read as list (using typeof) and not as dataframe, and a lot of stuff like regexpr searches in the The typeof of a data.frame is list so you do have a data frame -- not a list. Perhaps the problem is that you do not want factor columns but want character columns instead. Use read.csv(..., as.is = TRUE) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stacking several vectors from the list
On Mon, Jun 28, 2010 at 7:40 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Mon, Jun 28, 2010 at 7:30 PM, astar...@uci.edu wrote: Hi everybody, I'm working on the very messy data, I have tried to clean it up in SAS and SAS/IML but there is not enough info on how to handle certain things in SAS so I have turned to R. The thing itself should be rather simple, so i was wondering if someone could help me out. The original .csv has ([1] 7138 6338 ) dimensions with funds with the corresponding dates and observations for each date for around 10 years and 4000+ funds, meaning in COL5 has the next fund's name and so on. COL1 COL2 COL3 COL4 HBNNF US Equity Date EQY_SH_OUT PX_VOLUME #NAME? #N/A N/A 135000 7/7/2008 #N/A N/A 105000 7/17/2008 #N/A N/A 59 7/22/2008 #N/A N/A 4 so in R this .csv is somehow read as list (using typeof) and not as dataframe, and a lot of stuff like regexpr searches in the The typeof of a data.frame is list so you do have a data frame -- not a list. Perhaps the problem is that you do not want factor columns but want character columns instead. Use read.csv(..., as.is = TRUE) Just to be clear a data frame is a list so not a list means not just a list -- its also a data frame. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.