Re: [R] Reshape question
On Mar 6, 2012, at 3:00 PM, David Perlman wrote: I have a data frame in wide format. There are six variables that represent two factors in long format 3x2, Valence and Temperature: head(dpts) File Subj Time Group PainNeg.hot PainNeg.warm SociNeg.hot SociNeg.warm Positiv.hot Positiv.warm Errors 1 WB101_1_1_dp.txt 1011 MNP 30.70 13.75000 16.31904835.1730.1833314.38 1 2 WB101_2_1_dp.txt 1012 MNP5.27-79.6 -24.738095-5.5023.95000 -14.70 0 3 WB102_1_1_dp.txt 1021 MNP 50.75-13.43214 5.18571419.08 4.2-8.03 1 4 WB102_2_1_dp.txt 1022 MNP -41.38 9.32500 -9.84523822.95 -14.3500040.93 0 5 WB103_1_1_dp.txt 1031 MNP 25.27-48.27500 48.726190 8.14166710.98333 -31.97 2 6 WB103_2_1_dp.txt 1032 MNP 26.75-13.28929 3.447619-8.641667 -10.9 -27.416667 1 The following command does part of what I want: dptsr-reshape(dpts, varying = c ('PainNeg .hot ','PainNeg .warm','SociNeg.hot','SociNeg.warm','Positiv.hot','Positiv.warm'), Try instead: dptsr-reshape(dpts, varying= c('PainNeg.hot','PainNeg.warm', 'SociNeg.hot','SociNeg.warm', 'Positiv.hot', 'Positiv.warm') , v.names=c('Bias'), direction='long', timevar=c('Valence.Temp'), times=c('PainNeg.hot','PainNeg.warm', 'SociNeg.hot','SociNeg.warm', 'Positiv.hot', 'Positiv.warm') , idvar=c('Subj','Time')) dptsr$Valence - sub(\\..+$, , dptsr$Valence.Temp) dptsr$Temp - sub(^.+\\., , dptsr$Valence.Temp) I admit that I haven't figured out how to do it on one step within reshape. The should be a way, but I have tried a bunch of (failed) methods. -- David. v.names=c('Bias'),direction='long',timevar=c('Valence','Temperature'), times = c ('PainNeg .hot ','PainNeg .warm','SociNeg.hot','SociNeg.warm','Positiv.hot','Positiv.warm'), idvar=c('Subj','Time')) But it doesn't break out the two factors: head(dptsr) File Subj Time Group Errors Valence Temperature Bias 101.1.PainNeg.hot WB101_1_1_dp.txt 1011 MNP 1 PainNeg.hot PainNeg.hot 30.70 101.2.PainNeg.hot WB101_2_1_dp.txt 1012 MNP 0 PainNeg.hot PainNeg.hot 5.27 102.1.PainNeg.hot WB102_1_1_dp.txt 1021 MNP 1 PainNeg.hot PainNeg.hot 50.75 102.2.PainNeg.hot WB102_2_1_dp.txt 1022 MNP 0 PainNeg.hot PainNeg.hot -41.38 103.1.PainNeg.hot WB103_1_1_dp.txt 1031 MNP 2 PainNeg.hot PainNeg.hot 25.27 103.2.PainNeg.hot WB103_2_1_dp.txt 1032 MNP 1 PainNeg.hot PainNeg.hot 26.75 So I did successfully create two factor variables, but they both contain the same values. Instead I would want Valence to be (for example) PainNeg and Temperature to be hot. Can anyone help me figure out how to get reshape to do this? I have never been able to make much sense out of the reshape documentation... I am very sympathetic to that lament. Thanks! David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape question
Thanks for your help on this Hadley and David! Dennis Murphy also had a good solution (changing list(data.out2[-1])...etc to names(data.out2[-1]}...): data.out3 - reshape(data.out2, direction = 'long', varying = names(data.out2[-1]), + idvar = 'id') data.out4 - split(data.out3, data.out3$time) names(data.out4) - paste('time', 1:4, sep = '') data.out4 AC On Tue, Nov 24, 2009 at 9:05 PM, hadley wickham h.wick...@gmail.com wrote: I don't really understand what you want and the example solution throws away quite a lot of data, so consider this alternative: data.out2 - read.table(textConnection(id rater.1 n.1 rater.2 n.2 rater.3 n.3 rater.4 n.4 11 11 0.118 79NA NANA NANA NA 114 114 0.2478709 113NA NANA NANA NA 12 12 0.3130655 54 0.3668242 54NA NANA NA 121 121 0.240 331NA NANA NANA NA 122 122 0.3004164 25 0.1046278 25 0.2424871 25 0.2796937 25 125 125 0.1634865 190NA NANA NANA NA), header=T, stringsAsFactors=F) Or library(reshape) df - melt(data.out2, na.rm = T, id = id) df - cbind(df, colsplit(df$variable, \\., c(var, time))) cast(df, id + time ~ var) See http://had.co.nz/reshape for more details. Hadley -- http://had.co.nz/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape question
On Nov 24, 2009, at 8:33 PM, AC Del Re wrote: Hi All, I am wanting to convert a data.frame from a wide format to a long format (with 1 variable) and am having difficulties. Any help is appreciated! #current wide format head(data.out2) id rater.1 n.1 rater.2 n.2 rater.3 n.3 rater.4 n.4 11 11 0.118 79NA NANA NANA NA 114 114 0.2478709 113NA NANA NANA NA 12 12 0.3130655 54 0.3668242 54NA NANA NA 121 121 0.240 331NA NANA NANA NA 122 122 0.3004164 25 0.1046278 25 0.2424871 25 0.2796937 25 125 125 0.1634865 190NA NANA NANA NA #This is close but I would like the 'n' column to remain and for the '.1' to drop off I don't really understand what you want and the example solution throws away quite a lot of data, so consider this alternative: data.out2 - read.table(textConnection(id rater.1 n.1 rater.2 n. 2 rater.3 n.3 rater.4 n.4 11 11 0.118 79NA NANA NANA NA 114 114 0.2478709 113NA NANA NANA NA 12 12 0.3130655 54 0.3668242 54NA NANA NA 121 121 0.240 331NA NANA NANA NA 122 122 0.3004164 25 0.1046278 25 0.2424871 25 0.2796937 25 125 125 0.1634865 190NA NANA NANA NA), header=T, stringsAsFactors=F) data.frame(id= data.out2$id, rater=stack(data.out2[,grep(rater, names(data.out2))]), n= stack(data.out2[,grep(n, names(data.out2))]) ) data.out2.id rater.values rater.ind n.values n.ind 1110.118 rater.1 79 n.1 2 1140.2478709 rater.1 113 n.1 3120.3130655 rater.1 54 n.1 4 1210.240 rater.1 331 n.1 5 1220.3004164 rater.1 25 n.1 6 1250.1634865 rater.1 190 n.1 711 NA rater.2 NA n.2 8 114 NA rater.2 NA n.2 9120.3668242 rater.2 54 n.2 10 121 NA rater.2 NA n.2 11 1220.1046278 rater.2 25 n.2 12 125 NA rater.2 NA n.2 13 11 NA rater.3 NA n.3 14 114 NA rater.3 NA n.3 15 12 NA rater.3 NA n.3 16 121 NA rater.3 NA n.3 17 1220.2424871 rater.3 25 n.3 18 125 NA rater.3 NA n.3 19 11 NA rater.4 NA n.4 20 114 NA rater.4 NA n.4 21 12 NA rater.4 NA n.4 22 121 NA rater.4 NA n.4 23 1220.2796937 rater.4 25 n.4 24 125 NA rater.4 NA n.4 You can take what you like from what I would consider a version that has no loss of the original information. data.out3-reshape(data.out2,varying=list(names(data.out2)[-1]), + idvar='id',direction='long') head(data.out3) id time rater.1 11.1 111 0.118 114.1 1141 0.2478709 12.1 121 0.3130655 121.1 1211 0.240 122.1 1221 0.3004164 125.1 1251 0.1634865 Ideally I would like the columns to be set up in this manner: idtimerater n What is time? Thanks, html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape question
I don't really understand what you want and the example solution throws away quite a lot of data, so consider this alternative: data.out2 - read.table(textConnection(id rater.1 n.1 rater.2 n.2 rater.3 n.3 rater.4 n.4 11 11 0.118 79 NA NA NA NA NA NA 114 114 0.2478709 113 NA NA NA NA NA NA 12 12 0.3130655 54 0.3668242 54 NA NA NA NA 121 121 0.240 331 NA NA NA NA NA NA 122 122 0.3004164 25 0.1046278 25 0.2424871 25 0.2796937 25 125 125 0.1634865 190 NA NA NA NA NA NA), header=T, stringsAsFactors=F) Or library(reshape) df - melt(data.out2, na.rm = T, id = id) df - cbind(df, colsplit(df$variable, \\., c(var, time))) cast(df, id + time ~ var) See http://had.co.nz/reshape for more details. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape question
What about the melt function in reshape package? EX: x=sample(1:100,20,replace=T) x [1] 48 94 32 96 81 99 10 64 64 94 57 60 16 64 32 76 63 1 64 8 y=sample(1:100,20,replace=T) y [1] 73 78 82 43 58 85 74 64 73 41 45 38 63 36 44 74 7 88 91 1 xy=cbind(x,y) melt(xy) X1 X2 value 1 1 x48 2 2 x94 3 3 x32 4 4 x96 5 5 x81 6 6 x99 7 7 x10 8 8 x64 9 9 x64 10 10 x94 11 11 x57 12 12 x60 13 13 x16 14 14 x64 15 15 x32 16 16 x76 17 17 x63 18 18 x 1 19 19 x64 20 20 x 8 21 1 y73 22 2 y78 23 3 y82 24 4 y43 25 5 y58 26 6 y85 27 7 y74 28 8 y64 29 9 y73 30 10 y41 31 11 y45 32 12 y38 33 13 y63 34 14 y36 35 15 y44 36 16 y74 37 17 y 7 38 18 y88 39 19 y91 40 20 y 1 Joe King 206-913-2912 j...@joepking.com Never throughout history has a man who lived a life of ease left a name worth remembering. --Theodore Roosevelt -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Tuesday, November 24, 2009 6:43 PM To: AC Del Re Cc: r-help@r-project.org Subject: Re: [R] reshape question On Nov 24, 2009, at 8:33 PM, AC Del Re wrote: Hi All, I am wanting to convert a data.frame from a wide format to a long format (with 1 variable) and am having difficulties. Any help is appreciated! #current wide format head(data.out2) id rater.1 n.1 rater.2 n.2 rater.3 n.3 rater.4 n.4 11 11 0.118 79NA NANA NANA NA 114 114 0.2478709 113NA NANA NANA NA 12 12 0.3130655 54 0.3668242 54NA NANA NA 121 121 0.240 331NA NANA NANA NA 122 122 0.3004164 25 0.1046278 25 0.2424871 25 0.2796937 25 125 125 0.1634865 190NA NANA NANA NA #This is close but I would like the 'n' column to remain and for the '.1' to drop off I don't really understand what you want and the example solution throws away quite a lot of data, so consider this alternative: data.out2 - read.table(textConnection(id rater.1 n.1 rater.2 n. 2 rater.3 n.3 rater.4 n.4 11 11 0.118 79NA NANA NANA NA 114 114 0.2478709 113NA NANA NANA NA 12 12 0.3130655 54 0.3668242 54NA NANA NA 121 121 0.240 331NA NANA NANA NA 122 122 0.3004164 25 0.1046278 25 0.2424871 25 0.2796937 25 125 125 0.1634865 190NA NANA NANA NA), header=T, stringsAsFactors=F) data.frame(id= data.out2$id, rater=stack(data.out2[,grep(rater, names(data.out2))]), n= stack(data.out2[,grep(n, names(data.out2))]) ) data.out2.id rater.values rater.ind n.values n.ind 1110.118 rater.1 79 n.1 2 1140.2478709 rater.1 113 n.1 3120.3130655 rater.1 54 n.1 4 1210.240 rater.1 331 n.1 5 1220.3004164 rater.1 25 n.1 6 1250.1634865 rater.1 190 n.1 711 NA rater.2 NA n.2 8 114 NA rater.2 NA n.2 9120.3668242 rater.2 54 n.2 10 121 NA rater.2 NA n.2 11 1220.1046278 rater.2 25 n.2 12 125 NA rater.2 NA n.2 13 11 NA rater.3 NA n.3 14 114 NA rater.3 NA n.3 15 12 NA rater.3 NA n.3 16 121 NA rater.3 NA n.3 17 1220.2424871 rater.3 25 n.3 18 125 NA rater.3 NA n.3 19 11 NA rater.4 NA n.4 20 114 NA rater.4 NA n.4 21 12 NA rater.4 NA n.4 22 121 NA rater.4 NA n.4 23 1220.2796937 rater.4 25 n.4 24 125 NA rater.4 NA n.4 You can take what you like from what I would consider a version that has no loss of the original information. data.out3-reshape(data.out2,varying=list(names(data.out2)[-1]), + idvar='id',direction='long') head(data.out3) id time rater.1 11.1 111 0.118 114.1 1141 0.2478709 12.1 121 0.3130655 121.1 1211 0.240 122.1 1221 0.3004164 125.1 1251 0.1634865 Ideally I would like the columns to be set up in this manner: idtimerater n What is time? Thanks, html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https
Re: [R] Reshape question.
Thank you for you reply. I will try this. The inital few rows in the .dat file look like: Year,DayOfYear,Sku,Quantity,CatId,Category,SubCategory 2009,1,100051,1,10113,MEN,Historical men's 2009,1,100130,1,10638,ACCESSORIES MAKEUP,ALL Kids Accessories 2009,1,100916,1,10222,WOMEN,TV Movies Women 2009,1,101241,1,10897,HOLIDAY,Colonial (Presidents) 2009,1,101252,1,10640,ACCESSORIES MAKEUP,Finishing Touches 2009,1,101298,1,10865,HOLIDAY,Easter 2009,1,101613,1,10410,GIRLS,Classic Girls 2009,1,101645,1,10320,BOYS,Superheroes Boys 2009,1,101648,1,10320,BOYS,Superheroes Boys 2009,1,101718,1,10897,HOLIDAY,Colonial (Presidents) 2009,1,101719,1,10897,HOLIDAY,Colonial (Presidents) 2009,1,101751,1,10420,GIRLS,Superheroes Girls 2009,1,102125,1,10638,ACCESSORIES MAKEUP,ALL Kids Accessories 2009,1,102174,1,10897,HOLIDAY,Colonial (Presidents) 2009,1,102558,1,10636,ACCESSORIES MAKEUP,Armor/Weapons/Guns 2009,1,102582,1,10636,ACCESSORIES MAKEUP,Armor/Weapons/Guns 2009,1,102717,1,10862,HOLIDAY,Christmas 2009,1,104705,1,10518,PLUS,Plus Women 2009,1,104745,6,10748,HATS, WIGS MASKS,Wigs - Men's 2009,1,104745,1,10748,HATS, WIGS MASKS,Wigs - Men's 2009,1,104751,1,10310,BOYS,Classic Boys 2009,1,105238,1,10742,HATS, WIGS MASKS,Hats-Miscellaneous 2009,1,105352,10,10742,HATS, WIGS MASKS,Hats-Miscellaneous 2009,1,107420,10,10744,HATS, WIGS MASKS,Masks - Miscellaneous 2009,1,107420,1,10744,HATS, WIGS MASKS,Masks - Miscellaneous 2009,1,107479,1,10743,HATS, WIGS MASKS,Masks - Famous 2009,1,107479,1,10743,HATS, WIGS MASKS,Masks - Famous If your propose solution works I am confused as to why during the original I was able to specify: c2009 - cast(m2009, DayOfYear ~ variable | Category, sum) t2009 - cast(m2009, DayOfYear ~ variable, sum) By combining this into one 'cast' is it better (as in faster)? Thanks again. Kevin Tal Galili tal.gal...@gmail.com wrote: how about: c2009 - cast(m2009, Category + SubCategory +DayOfYear ~ variable , sum) ? p.s: toy data would be nice to have :) On Wed, Mar 11, 2009 at 9:47 PM, rkevinbur...@charter.net wrote: This hopefully is trivial. I am trying to reshape the data using the reshape package. First I read in the data: a2009 - read.csv(Total2009.dat, header = TRUE) Then I trim it so that it only contains the columns that I have interested in: m2009 - melt(a2009, id.var=c(DayOfYear,Category,SubCategory,Sku), measure.var=c(Quantity), na.rm=TRUE) Then I start to formulate the data that I will process: c2009 - cast(m2009, DayOfYear ~ variable | Category, sum) Finally I aggregate the data: t2009 - cast(m2009, DayOfYear ~ variable, sum) My question is on the third step above (repeated here) c2009 - cast(m2009, DayOfYear ~ variable | Category, sum) This gets the data assocated with a unique 'Category' name. I want to get the data grouped by 'Category' and 'SubCategory'. The 'SubCategory' is not unique but the combination 'Category' and 'SubCategory' form a unique pair. What would be the formula that would give me the data grouped by Category AND SubCategory? Would it be as simple as: c2009 - cast(m2009, DayOfYear ~ variable | Category SubCategory, sum) ? Thank you for your suggestions. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- -- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: www.talgalili.com www.biostatistics.co.il __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape question
solution: reshape package, melt function. On Tue, 2008-11-18 at 02:07 +, Alexandre Swarowsky wrote: Hi, It's probably a simple issue but I'm struggling with that. I'll use the example shown in the help page. head(Indometh) wide - reshape(Indometh, v.names=conc, idvar=Subject, timevar=time, direction=wide) head(wide) reshape(wide, idvar=Subject, varying=list(2:12), v.names=conc, direction=long) but, in my case I'll like the time column be filled with the row names with the exception of Subject, of course, instead of 1,2,3, and so on. I want something like that: Subject time conc 1.11conc.0.25 1.50 2.12conc.0.25 2.03 3.13conc.0.25 2.72 Thanks in advance, Alex -- Alexandre Swarowsky Soils and Biogeochemistry Graduate Group University of California at Davis One Shields Avenue Davis CA 95618 Office: (530)752-4131 cell: (530)574-3028 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape question
On 2/8/2008 9:15 AM, Ista Zahn wrote: I know there are a lot of reshape questions on the mailing list, but I haven't been able to find an answer to this particular issue. I am trying to get a datafame structured like this: sub - rep(1:5) ta1 - rep(1,5) ta2 - rep(2,5) tb1- rep(3,5) tb2 - rep(4,5) DF - data.frame(sub,ta1,ta2,tb1,tb2) DF sub ta1 ta2 tb1 tb2 1 1 1 2 3 4 2 2 1 2 3 4 3 3 1 2 3 4 4 4 1 2 3 4 5 5 1 2 3 4 into a form like this: sub time x1 x2 1.1 11 1 3 1.2 12 2 4 2.1 21 1 3 2.2 22 2 4 3.1 31 1 3 3.2 32 2 4 4.1 41 1 3 4.2 42 2 4 5.1 51 1 3 5.2 52 2 4 using the reshape command. But when I try reshaping I don't get the desired structure: DF.L - reshape(DF, varying = 2:5, idvar=sub, v.names = c(x1, x2), times=c(1,2), direction=long) library(doBy) orderBy(~sub, data=DF.L) sub time x1 x2 1.1 11 1 2 1.2 12 3 4 2.1 21 1 2 2.2 22 3 4 3.1 31 1 2 3.2 32 3 4 4.1 41 1 2 4.2 42 3 4 5.1 51 1 2 5.2 52 3 4 The varying argument to reshape() can be a list. For example: DF.long - reshape(DF, varying = list(c(ta1,ta2), c(tb1,tb2)), idvar=sub, v.names = c(x1,x2), times=c(1,2), direction=long) DF.long[order(DF.long$sub),] sub time x1 x2 1.1 11 1 3 1.2 12 2 4 2.1 21 1 3 2.2 22 2 4 3.1 31 1 3 3.2 32 2 4 4.1 41 1 3 4.2 42 2 4 5.1 51 1 3 5.2 52 2 4 I can get the desired result by rearranging the original dataframe, like DF2 - data.frame(sub,ta1,tb1,ta2,tb2) before running the reshape command, but I'm hoping someone knows a way to do the desired reshaping without this step, as it becomes very time consuming with large numbers of repeated measurements. Thanks, Ista __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.