Re: [R] reshaping a dataset
Thank you Gabor, I'll need to explore a bit the reshape package to see what benefits I get compared with the basic reshape function, but I'm glad you made me aware of it. And your solution for fixing NAs just for the columns I want is just what I wanted. Many thanks, Denis Le 06-09-13 à 00:55, Gabor Grothendieck a écrit : I missed your second question which was how to set the NAs to zero for some of the columns. Suppose we want to replace the NAs in columns ic and for sake of example suppose ic specifies columns 1 to 8: library(reshape) testm - melt(test, id = 1:6) out - cast(testm, nbpc + trip + set + tagno + depth ~ prey, sum) # fix up NAs ic - 1:8 out2 - out[,ic] out2[is.na(out2)] - 0 out[,ic] - out2 On 9/13/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: If I understand this correctly we want to sum the mass over each combination of the first 6 variables and display the result with the 6th, prey, along the top and the others along the side. library(reshape) testm - melt(test, id = 1:6) cast(testm, nbpc + trip + set + tagno + depth ~ prey) Now fix up the NAs. On 9/12/06, Denis Chabot [EMAIL PROTECTED] wrote: Hi, I'm trying to move to R the last few data handling routines I was performing in SAS. I'm working on stomach content data. In the simplified example I provide below, there are variables describing the origin of each prey item (nbpc is a ship number, each ship may have been used on different trips, each trip has stations, and individual fish (tagno) can be caught at each station. For each stomach the number of lines corresponds to the number of prey items. Thus a variable identifies prey type, and others (here only one, mass) provide information on prey abundance or size or digestion level. Finally, there can be accompanying variables that are not used but that I need to keep for later analyses (e.g. depth in the example below). At some point I need to transform such a dataset into another format where each stomach occupies a single line, and there are columns for each prey item. The reshape function works really well, my program is in fact simpler than the SAS equivalent (not shown, don't want to bore you, but available on request), except that I need zeros when prey types are absent from a stomach instead of NAs, a problem for which I only have a shaky solution at the moment: 1) creation of a dummy dataset: ### nbpc - rep(c(20,34), c(110,90)) trip - c(rep(1:3, c(40, 40, 30)), rep(1:2, c(60,30))) set - c(rep(1:4, c(10, 8, 7, 15)), rep(c(10,12), c(25,15)), rep (1:3, rep(10,3)), rep(10:12, c(20, 10, 30)), rep(7:8, rep(15,2))) depth - c(rep(c(100, 150, 200, 250), c(10, 8, 7, 15)), rep(c (100,120), c(25,15)), rep(c(75, 50, 200), rep(10,3)), rep(c(200, 150, 50), c(20, 10, 30)), rep(c(100, 250), rep (15,2))) tagno - rep(round(runif(42,1,200)), c(7,3, 4,4, 2,2,3, 5,5,5, 4,6,4,3,5,3, 7,8, 4,6, 5,5, 7,3, 6,6,4,4, 4,6, 3,3,4,5,5,6,4, 5,5,5, 8,7)) prey.codes -c(187, 438, 792, 811) prey - sample(prey.codes, 200, replace=T) mass - runif(200, 0, 10) test - data.frame(nbpc, trip, set, depth, tagno, prey, mass) Because there are often multiple occurrences of the same prey in a single stomach, I need to sum them for each stomach before using reshape. Here I use summarizeBy because my understanding of the many variants of apply is not very good: test2 - summaryBy(mass~nbpc+trip+set+tagno+prey, data=test, FUN=sum, keep.names=T, id=~depth) #this messes up sorting order, I fix it k - order(test2$nbpc, test2$trip, test2$set, test2$tagno) test3 - test2[k,] result - reshape(test3, v.names=mass, idvar=c(nbpc, trip, set, tagno), timevar=prey, direction=wide) # I'm quite happy with this, although you may know of better ways of doing it. But my problem is with preys that are absent from a stomach. In later analyses, I need them to have zero abundance instead of NA. My shaky solution is: # empties - is.na(result) result[empties] - 0 # which did the job in this example, but it won't always. For instance there could have been NAs for depth, which I do not want to become zero. Is there a way to transform NAs into zeros for multiple columns of a dataframe in one step, while ignoring some columns? Or maybe there is another way to achieve this that would have put zeros where I need them (i.e. something else than reshape)? Thanking you in advance, Denis Chabot __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reshaping a dataset
Hi, I'm trying to move to R the last few data handling routines I was performing in SAS. I'm working on stomach content data. In the simplified example I provide below, there are variables describing the origin of each prey item (nbpc is a ship number, each ship may have been used on different trips, each trip has stations, and individual fish (tagno) can be caught at each station. For each stomach the number of lines corresponds to the number of prey items. Thus a variable identifies prey type, and others (here only one, mass) provide information on prey abundance or size or digestion level. Finally, there can be accompanying variables that are not used but that I need to keep for later analyses (e.g. depth in the example below). At some point I need to transform such a dataset into another format where each stomach occupies a single line, and there are columns for each prey item. The reshape function works really well, my program is in fact simpler than the SAS equivalent (not shown, don't want to bore you, but available on request), except that I need zeros when prey types are absent from a stomach instead of NAs, a problem for which I only have a shaky solution at the moment: 1) creation of a dummy dataset: ### nbpc - rep(c(20,34), c(110,90)) trip - c(rep(1:3, c(40, 40, 30)), rep(1:2, c(60,30))) set - c(rep(1:4, c(10, 8, 7, 15)), rep(c(10,12), c(25,15)), rep(1:3, rep(10,3)), rep(10:12, c(20, 10, 30)), rep(7:8, rep(15,2))) depth - c(rep(c(100, 150, 200, 250), c(10, 8, 7, 15)), rep(c (100,120), c(25,15)), rep(c(75, 50, 200), rep(10,3)), rep(c(200, 150, 50), c(20, 10, 30)), rep(c(100, 250), rep (15,2))) tagno - rep(round(runif(42,1,200)), c(7,3, 4,4, 2,2,3, 5,5,5, 4,6,4,3,5,3, 7,8, 4,6, 5,5, 7,3, 6,6,4,4, 4,6, 3,3,4,5,5,6,4, 5,5,5, 8,7)) prey.codes -c(187, 438, 792, 811) prey - sample(prey.codes, 200, replace=T) mass - runif(200, 0, 10) test - data.frame(nbpc, trip, set, depth, tagno, prey, mass) Because there are often multiple occurrences of the same prey in a single stomach, I need to sum them for each stomach before using reshape. Here I use summarizeBy because my understanding of the many variants of apply is not very good: test2 - summaryBy(mass~nbpc+trip+set+tagno+prey, data=test, FUN=sum, keep.names=T, id=~depth) #this messes up sorting order, I fix it k - order(test2$nbpc, test2$trip, test2$set, test2$tagno) test3 - test2[k,] result - reshape(test3, v.names=mass, idvar=c(nbpc, trip, set, tagno), timevar=prey, direction=wide) # I'm quite happy with this, although you may know of better ways of doing it. But my problem is with preys that are absent from a stomach. In later analyses, I need them to have zero abundance instead of NA. My shaky solution is: # empties - is.na(result) result[empties] - 0 # which did the job in this example, but it won't always. For instance there could have been NAs for depth, which I do not want to become zero. Is there a way to transform NAs into zeros for multiple columns of a dataframe in one step, while ignoring some columns? Or maybe there is another way to achieve this that would have put zeros where I need them (i.e. something else than reshape)? Thanking you in advance, Denis Chabot __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshaping a dataset
If I understand this correctly we want to sum the mass over each combination of the first 6 variables and display the result with the 6th, prey, along the top and the others along the side. library(reshape) testm - melt(test, id = 1:6) cast(testm, nbpc + trip + set + tagno + depth ~ prey) Now fix up the NAs. On 9/12/06, Denis Chabot [EMAIL PROTECTED] wrote: Hi, I'm trying to move to R the last few data handling routines I was performing in SAS. I'm working on stomach content data. In the simplified example I provide below, there are variables describing the origin of each prey item (nbpc is a ship number, each ship may have been used on different trips, each trip has stations, and individual fish (tagno) can be caught at each station. For each stomach the number of lines corresponds to the number of prey items. Thus a variable identifies prey type, and others (here only one, mass) provide information on prey abundance or size or digestion level. Finally, there can be accompanying variables that are not used but that I need to keep for later analyses (e.g. depth in the example below). At some point I need to transform such a dataset into another format where each stomach occupies a single line, and there are columns for each prey item. The reshape function works really well, my program is in fact simpler than the SAS equivalent (not shown, don't want to bore you, but available on request), except that I need zeros when prey types are absent from a stomach instead of NAs, a problem for which I only have a shaky solution at the moment: 1) creation of a dummy dataset: ### nbpc - rep(c(20,34), c(110,90)) trip - c(rep(1:3, c(40, 40, 30)), rep(1:2, c(60,30))) set - c(rep(1:4, c(10, 8, 7, 15)), rep(c(10,12), c(25,15)), rep(1:3, rep(10,3)), rep(10:12, c(20, 10, 30)), rep(7:8, rep(15,2))) depth - c(rep(c(100, 150, 200, 250), c(10, 8, 7, 15)), rep(c (100,120), c(25,15)), rep(c(75, 50, 200), rep(10,3)), rep(c(200, 150, 50), c(20, 10, 30)), rep(c(100, 250), rep (15,2))) tagno - rep(round(runif(42,1,200)), c(7,3, 4,4, 2,2,3, 5,5,5, 4,6,4,3,5,3, 7,8, 4,6, 5,5, 7,3, 6,6,4,4, 4,6, 3,3,4,5,5,6,4, 5,5,5, 8,7)) prey.codes -c(187, 438, 792, 811) prey - sample(prey.codes, 200, replace=T) mass - runif(200, 0, 10) test - data.frame(nbpc, trip, set, depth, tagno, prey, mass) Because there are often multiple occurrences of the same prey in a single stomach, I need to sum them for each stomach before using reshape. Here I use summarizeBy because my understanding of the many variants of apply is not very good: test2 - summaryBy(mass~nbpc+trip+set+tagno+prey, data=test, FUN=sum, keep.names=T, id=~depth) #this messes up sorting order, I fix it k - order(test2$nbpc, test2$trip, test2$set, test2$tagno) test3 - test2[k,] result - reshape(test3, v.names=mass, idvar=c(nbpc, trip, set, tagno), timevar=prey, direction=wide) # I'm quite happy with this, although you may know of better ways of doing it. But my problem is with preys that are absent from a stomach. In later analyses, I need them to have zero abundance instead of NA. My shaky solution is: # empties - is.na(result) result[empties] - 0 # which did the job in this example, but it won't always. For instance there could have been NAs for depth, which I do not want to become zero. Is there a way to transform NAs into zeros for multiple columns of a dataframe in one step, while ignoring some columns? Or maybe there is another way to achieve this that would have put zeros where I need them (i.e. something else than reshape)? Thanking you in advance, Denis Chabot __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshaping a dataset
I missed your second question which was how to set the NAs to zero for some of the columns. Suppose we want to replace the NAs in columns ic and for sake of example suppose ic specifies columns 1 to 8: library(reshape) testm - melt(test, id = 1:6) out - cast(testm, nbpc + trip + set + tagno + depth ~ prey, sum) # fix up NAs ic - 1:8 out2 - out[,ic] out2[is.na(out2)] - 0 out[,ic] - out2 On 9/13/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: If I understand this correctly we want to sum the mass over each combination of the first 6 variables and display the result with the 6th, prey, along the top and the others along the side. library(reshape) testm - melt(test, id = 1:6) cast(testm, nbpc + trip + set + tagno + depth ~ prey) Now fix up the NAs. On 9/12/06, Denis Chabot [EMAIL PROTECTED] wrote: Hi, I'm trying to move to R the last few data handling routines I was performing in SAS. I'm working on stomach content data. In the simplified example I provide below, there are variables describing the origin of each prey item (nbpc is a ship number, each ship may have been used on different trips, each trip has stations, and individual fish (tagno) can be caught at each station. For each stomach the number of lines corresponds to the number of prey items. Thus a variable identifies prey type, and others (here only one, mass) provide information on prey abundance or size or digestion level. Finally, there can be accompanying variables that are not used but that I need to keep for later analyses (e.g. depth in the example below). At some point I need to transform such a dataset into another format where each stomach occupies a single line, and there are columns for each prey item. The reshape function works really well, my program is in fact simpler than the SAS equivalent (not shown, don't want to bore you, but available on request), except that I need zeros when prey types are absent from a stomach instead of NAs, a problem for which I only have a shaky solution at the moment: 1) creation of a dummy dataset: ### nbpc - rep(c(20,34), c(110,90)) trip - c(rep(1:3, c(40, 40, 30)), rep(1:2, c(60,30))) set - c(rep(1:4, c(10, 8, 7, 15)), rep(c(10,12), c(25,15)), rep(1:3, rep(10,3)), rep(10:12, c(20, 10, 30)), rep(7:8, rep(15,2))) depth - c(rep(c(100, 150, 200, 250), c(10, 8, 7, 15)), rep(c (100,120), c(25,15)), rep(c(75, 50, 200), rep(10,3)), rep(c(200, 150, 50), c(20, 10, 30)), rep(c(100, 250), rep (15,2))) tagno - rep(round(runif(42,1,200)), c(7,3, 4,4, 2,2,3, 5,5,5, 4,6,4,3,5,3, 7,8, 4,6, 5,5, 7,3, 6,6,4,4, 4,6, 3,3,4,5,5,6,4, 5,5,5, 8,7)) prey.codes -c(187, 438, 792, 811) prey - sample(prey.codes, 200, replace=T) mass - runif(200, 0, 10) test - data.frame(nbpc, trip, set, depth, tagno, prey, mass) Because there are often multiple occurrences of the same prey in a single stomach, I need to sum them for each stomach before using reshape. Here I use summarizeBy because my understanding of the many variants of apply is not very good: test2 - summaryBy(mass~nbpc+trip+set+tagno+prey, data=test, FUN=sum, keep.names=T, id=~depth) #this messes up sorting order, I fix it k - order(test2$nbpc, test2$trip, test2$set, test2$tagno) test3 - test2[k,] result - reshape(test3, v.names=mass, idvar=c(nbpc, trip, set, tagno), timevar=prey, direction=wide) # I'm quite happy with this, although you may know of better ways of doing it. But my problem is with preys that are absent from a stomach. In later analyses, I need them to have zero abundance instead of NA. My shaky solution is: # empties - is.na(result) result[empties] - 0 # which did the job in this example, but it won't always. For instance there could have been NAs for depth, which I do not want to become zero. Is there a way to transform NAs into zeros for multiple columns of a dataframe in one step, while ignoring some columns? Or maybe there is another way to achieve this that would have put zeros where I need them (i.e. something else than reshape)? Thanking you in advance, Denis Chabot __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.