Re: [R] reshaping a dataset

2006-09-13 Thread Denis Chabot
Thank you Gabor,

I'll need to explore a bit the reshape package to see what benefits I  
get compared with the basic reshape function, but I'm glad you made  
me aware of it.

And your solution for fixing NAs just for the columns I want is just  
what I wanted.

Many thanks,

Denis
Le 06-09-13 à 00:55, Gabor Grothendieck a écrit :

 I missed your second question which was how to set the NAs to zero
 for some of the columns.  Suppose we want to replace the NAs
 in columns ic and for sake of example suppose ic specifies
 columns 1 to 8:

 library(reshape)
 testm - melt(test, id = 1:6)
 out - cast(testm, nbpc + trip + set + tagno + depth ~ prey, sum)

 # fix up NAs
 ic - 1:8
 out2 - out[,ic]
 out2[is.na(out2)] - 0
 out[,ic] - out2

 On 9/13/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 If I understand this correctly we want to sum the mass over each  
 combination
 of the first 6 variables and display the result with the 6th, prey,
 along the top and the others along the side.

 library(reshape)
 testm - melt(test, id = 1:6)
 cast(testm, nbpc + trip + set + tagno + depth ~ prey)

 Now fix up the NAs.

 On 9/12/06, Denis Chabot [EMAIL PROTECTED] wrote:
  Hi,
 
  I'm trying to move to R the last few data handling routines I was
  performing in SAS.
 
  I'm working on stomach content data. In the simplified example I
  provide below, there are variables describing the origin of each  
 prey
  item (nbpc is a ship number, each ship may have been used on
  different trips, each trip has stations, and individual fish  
 (tagno)
  can be caught at each station.
 
  For each stomach the number of lines corresponds to the number of
  prey items. Thus a variable identifies prey type, and others (here
  only one, mass) provide information on prey abundance or size or
  digestion level.
 
  Finally, there can be accompanying variables that are not used but
  that I need to keep for later analyses (e.g. depth in the example
  below).
 
  At some point I need to transform such a dataset into another  
 format
  where each stomach occupies a single line, and there are columns  
 for
  each prey item.
 
  The reshape function works really well, my program is in fact
  simpler than the SAS equivalent (not shown, don't want to bore you,
  but available on request), except that I need zeros when prey types
  are absent from a stomach instead of NAs, a problem for which I  
 only
  have a shaky solution at the moment:
 
  1) creation of a dummy dataset:
  ###
  nbpc - rep(c(20,34), c(110,90))
  trip - c(rep(1:3, c(40, 40, 30)), rep(1:2, c(60,30)))
  set - c(rep(1:4, c(10, 8, 7, 15)), rep(c(10,12), c(25,15)), rep 
 (1:3,
  rep(10,3)),
   rep(10:12, c(20, 10, 30)), rep(7:8, rep(15,2)))
  depth - c(rep(c(100, 150, 200, 250), c(10, 8, 7, 15)), rep(c
  (100,120), c(25,15)), rep(c(75, 50, 200), rep(10,3)),
   rep(c(200, 150, 50), c(20, 10, 30)), rep(c(100, 250), rep
  (15,2)))
  tagno - rep(round(runif(42,1,200)),
   c(7,3, 4,4, 2,2,3, 5,5,5,  4,6,4,3,5,3, 7,8, 4,6, 5,5,
  7,3,
 6,6,4,4, 4,6, 3,3,4,5,5,6,4, 5,5,5, 8,7))
  prey.codes -c(187, 438, 792, 811)
  prey - sample(prey.codes, 200, replace=T)
  mass - runif(200, 0, 10)
 
  test - data.frame(nbpc, trip, set, depth, tagno, prey, mass)
  
 
  Because there are often multiple occurrences of the same prey in a
  single stomach, I need to sum them for each stomach before using
  reshape. Here I use summarizeBy because my understanding of the
  many variants of apply is not very good:
 
  
  test2 - summaryBy(mass~nbpc+trip+set+tagno+prey, data=test,  
 FUN=sum,
  keep.names=T, id=~depth)
 
  #this messes up sorting order, I fix it
  k - order(test2$nbpc, test2$trip, test2$set, test2$tagno)
  test3 - test2[k,]
  result - reshape(test3, v.names=mass, idvar=c(nbpc, trip,
  set, tagno),
  timevar=prey, direction=wide)
  #
 
  I'm quite happy with this, although you may know of better ways of
  doing it.
  But my problem is with preys that are absent from a stomach. In  
 later
  analyses, I need them to have zero abundance instead of NA.
  My shaky solution is:
  #
  empties - is.na(result)
  result[empties] - 0
  #
 
  which did the job in this example, but it won't always. For  
 instance
  there could have been NAs for depth, which I do not want to  
 become
  zero.
 
  Is there a way to transform NAs into zeros for multiple columns  
 of a
  dataframe in one step, while ignoring some columns?
 
  Or maybe there is another way to achieve this that would have put
  zeros where I need them (i.e. something else than reshape)?
 
  Thanking you in advance,
 
  Denis Chabot
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/ 
 posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



[R] reshaping a dataset

2006-09-12 Thread Denis Chabot
Hi,

I'm trying to move to R the last few data handling routines I was  
performing in SAS.

I'm working on stomach content data. In the simplified example I  
provide below, there are variables describing the origin of each prey  
item (nbpc is a ship number, each ship may have been used on  
different trips, each trip has stations, and individual fish (tagno)  
can be caught at each station.

For each stomach the number of lines corresponds to the number of  
prey items. Thus a variable identifies prey type, and others (here  
only one, mass) provide information on prey abundance or size or  
digestion level.

Finally, there can be accompanying variables that are not used but  
that I need to keep for later analyses (e.g. depth in the example  
below).

At some point I need to transform such a dataset into another format  
where each stomach occupies a single line, and there are columns for  
each prey item.

The reshape function works really well, my program is in fact  
simpler than the SAS equivalent (not shown, don't want to bore you,  
but available on request), except that I need zeros when prey types  
are absent from a stomach instead of NAs, a problem for which I only  
have a shaky solution at the moment:

1) creation of a dummy dataset:
###
nbpc - rep(c(20,34), c(110,90))
trip - c(rep(1:3, c(40, 40, 30)), rep(1:2, c(60,30)))
set - c(rep(1:4, c(10, 8, 7, 15)), rep(c(10,12), c(25,15)), rep(1:3,  
rep(10,3)),
  rep(10:12, c(20, 10, 30)), rep(7:8, rep(15,2)))
depth - c(rep(c(100, 150, 200, 250), c(10, 8, 7, 15)), rep(c 
(100,120), c(25,15)), rep(c(75, 50, 200), rep(10,3)),
  rep(c(200, 150, 50), c(20, 10, 30)), rep(c(100, 250), rep 
(15,2)))
tagno - rep(round(runif(42,1,200)),
  c(7,3, 4,4, 2,2,3, 5,5,5,  4,6,4,3,5,3, 7,8, 4,6, 5,5,  
7,3,
6,6,4,4, 4,6, 3,3,4,5,5,6,4, 5,5,5, 8,7))
prey.codes -c(187, 438, 792, 811)
prey - sample(prey.codes, 200, replace=T)
mass - runif(200, 0, 10)

test - data.frame(nbpc, trip, set, depth, tagno, prey, mass)


Because there are often multiple occurrences of the same prey in a  
single stomach, I need to sum them for each stomach before using  
reshape. Here I use summarizeBy because my understanding of the  
many variants of apply is not very good:


test2 - summaryBy(mass~nbpc+trip+set+tagno+prey, data=test, FUN=sum,  
keep.names=T, id=~depth)

#this messes up sorting order, I fix it
k - order(test2$nbpc, test2$trip, test2$set, test2$tagno)
test3 - test2[k,]
result - reshape(test3, v.names=mass, idvar=c(nbpc, trip,  
set, tagno),
 timevar=prey, direction=wide)
#

I'm quite happy with this, although you may know of better ways of  
doing it.
But my problem is with preys that are absent from a stomach. In later  
analyses, I need them to have zero abundance instead of NA.
My shaky solution is:
#
empties - is.na(result)
result[empties] - 0
#

which did the job in this example, but it won't always. For instance  
there could have been NAs for depth, which I do not want to become  
zero.

Is there a way to transform NAs into zeros for multiple columns of a  
dataframe in one step, while ignoring some columns?

Or maybe there is another way to achieve this that would have put  
zeros where I need them (i.e. something else than reshape)?

Thanking you in advance,

Denis Chabot

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reshaping a dataset

2006-09-12 Thread Gabor Grothendieck
If I understand this correctly we want to sum the mass over each combination
of the first 6 variables and display the result with the 6th, prey,
along the top and the others along the side.

library(reshape)
testm - melt(test, id = 1:6)
cast(testm, nbpc + trip + set + tagno + depth ~ prey)

Now fix up the NAs.

On 9/12/06, Denis Chabot [EMAIL PROTECTED] wrote:
 Hi,

 I'm trying to move to R the last few data handling routines I was
 performing in SAS.

 I'm working on stomach content data. In the simplified example I
 provide below, there are variables describing the origin of each prey
 item (nbpc is a ship number, each ship may have been used on
 different trips, each trip has stations, and individual fish (tagno)
 can be caught at each station.

 For each stomach the number of lines corresponds to the number of
 prey items. Thus a variable identifies prey type, and others (here
 only one, mass) provide information on prey abundance or size or
 digestion level.

 Finally, there can be accompanying variables that are not used but
 that I need to keep for later analyses (e.g. depth in the example
 below).

 At some point I need to transform such a dataset into another format
 where each stomach occupies a single line, and there are columns for
 each prey item.

 The reshape function works really well, my program is in fact
 simpler than the SAS equivalent (not shown, don't want to bore you,
 but available on request), except that I need zeros when prey types
 are absent from a stomach instead of NAs, a problem for which I only
 have a shaky solution at the moment:

 1) creation of a dummy dataset:
 ###
 nbpc - rep(c(20,34), c(110,90))
 trip - c(rep(1:3, c(40, 40, 30)), rep(1:2, c(60,30)))
 set - c(rep(1:4, c(10, 8, 7, 15)), rep(c(10,12), c(25,15)), rep(1:3,
 rep(10,3)),
  rep(10:12, c(20, 10, 30)), rep(7:8, rep(15,2)))
 depth - c(rep(c(100, 150, 200, 250), c(10, 8, 7, 15)), rep(c
 (100,120), c(25,15)), rep(c(75, 50, 200), rep(10,3)),
  rep(c(200, 150, 50), c(20, 10, 30)), rep(c(100, 250), rep
 (15,2)))
 tagno - rep(round(runif(42,1,200)),
  c(7,3, 4,4, 2,2,3, 5,5,5,  4,6,4,3,5,3, 7,8, 4,6, 5,5,
 7,3,
6,6,4,4, 4,6, 3,3,4,5,5,6,4, 5,5,5, 8,7))
 prey.codes -c(187, 438, 792, 811)
 prey - sample(prey.codes, 200, replace=T)
 mass - runif(200, 0, 10)

 test - data.frame(nbpc, trip, set, depth, tagno, prey, mass)
 

 Because there are often multiple occurrences of the same prey in a
 single stomach, I need to sum them for each stomach before using
 reshape. Here I use summarizeBy because my understanding of the
 many variants of apply is not very good:

 
 test2 - summaryBy(mass~nbpc+trip+set+tagno+prey, data=test, FUN=sum,
 keep.names=T, id=~depth)

 #this messes up sorting order, I fix it
 k - order(test2$nbpc, test2$trip, test2$set, test2$tagno)
 test3 - test2[k,]
 result - reshape(test3, v.names=mass, idvar=c(nbpc, trip,
 set, tagno),
 timevar=prey, direction=wide)
 #

 I'm quite happy with this, although you may know of better ways of
 doing it.
 But my problem is with preys that are absent from a stomach. In later
 analyses, I need them to have zero abundance instead of NA.
 My shaky solution is:
 #
 empties - is.na(result)
 result[empties] - 0
 #

 which did the job in this example, but it won't always. For instance
 there could have been NAs for depth, which I do not want to become
 zero.

 Is there a way to transform NAs into zeros for multiple columns of a
 dataframe in one step, while ignoring some columns?

 Or maybe there is another way to achieve this that would have put
 zeros where I need them (i.e. something else than reshape)?

 Thanking you in advance,

 Denis Chabot

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reshaping a dataset

2006-09-12 Thread Gabor Grothendieck
I missed your second question which was how to set the NAs to zero
for some of the columns.  Suppose we want to replace the NAs
in columns ic and for sake of example suppose ic specifies
columns 1 to 8:

library(reshape)
testm - melt(test, id = 1:6)
out - cast(testm, nbpc + trip + set + tagno + depth ~ prey, sum)

# fix up NAs
ic - 1:8
out2 - out[,ic]
out2[is.na(out2)] - 0
out[,ic] - out2

On 9/13/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 If I understand this correctly we want to sum the mass over each combination
 of the first 6 variables and display the result with the 6th, prey,
 along the top and the others along the side.

 library(reshape)
 testm - melt(test, id = 1:6)
 cast(testm, nbpc + trip + set + tagno + depth ~ prey)

 Now fix up the NAs.

 On 9/12/06, Denis Chabot [EMAIL PROTECTED] wrote:
  Hi,
 
  I'm trying to move to R the last few data handling routines I was
  performing in SAS.
 
  I'm working on stomach content data. In the simplified example I
  provide below, there are variables describing the origin of each prey
  item (nbpc is a ship number, each ship may have been used on
  different trips, each trip has stations, and individual fish (tagno)
  can be caught at each station.
 
  For each stomach the number of lines corresponds to the number of
  prey items. Thus a variable identifies prey type, and others (here
  only one, mass) provide information on prey abundance or size or
  digestion level.
 
  Finally, there can be accompanying variables that are not used but
  that I need to keep for later analyses (e.g. depth in the example
  below).
 
  At some point I need to transform such a dataset into another format
  where each stomach occupies a single line, and there are columns for
  each prey item.
 
  The reshape function works really well, my program is in fact
  simpler than the SAS equivalent (not shown, don't want to bore you,
  but available on request), except that I need zeros when prey types
  are absent from a stomach instead of NAs, a problem for which I only
  have a shaky solution at the moment:
 
  1) creation of a dummy dataset:
  ###
  nbpc - rep(c(20,34), c(110,90))
  trip - c(rep(1:3, c(40, 40, 30)), rep(1:2, c(60,30)))
  set - c(rep(1:4, c(10, 8, 7, 15)), rep(c(10,12), c(25,15)), rep(1:3,
  rep(10,3)),
   rep(10:12, c(20, 10, 30)), rep(7:8, rep(15,2)))
  depth - c(rep(c(100, 150, 200, 250), c(10, 8, 7, 15)), rep(c
  (100,120), c(25,15)), rep(c(75, 50, 200), rep(10,3)),
   rep(c(200, 150, 50), c(20, 10, 30)), rep(c(100, 250), rep
  (15,2)))
  tagno - rep(round(runif(42,1,200)),
   c(7,3, 4,4, 2,2,3, 5,5,5,  4,6,4,3,5,3, 7,8, 4,6, 5,5,
  7,3,
 6,6,4,4, 4,6, 3,3,4,5,5,6,4, 5,5,5, 8,7))
  prey.codes -c(187, 438, 792, 811)
  prey - sample(prey.codes, 200, replace=T)
  mass - runif(200, 0, 10)
 
  test - data.frame(nbpc, trip, set, depth, tagno, prey, mass)
  
 
  Because there are often multiple occurrences of the same prey in a
  single stomach, I need to sum them for each stomach before using
  reshape. Here I use summarizeBy because my understanding of the
  many variants of apply is not very good:
 
  
  test2 - summaryBy(mass~nbpc+trip+set+tagno+prey, data=test, FUN=sum,
  keep.names=T, id=~depth)
 
  #this messes up sorting order, I fix it
  k - order(test2$nbpc, test2$trip, test2$set, test2$tagno)
  test3 - test2[k,]
  result - reshape(test3, v.names=mass, idvar=c(nbpc, trip,
  set, tagno),
  timevar=prey, direction=wide)
  #
 
  I'm quite happy with this, although you may know of better ways of
  doing it.
  But my problem is with preys that are absent from a stomach. In later
  analyses, I need them to have zero abundance instead of NA.
  My shaky solution is:
  #
  empties - is.na(result)
  result[empties] - 0
  #
 
  which did the job in this example, but it won't always. For instance
  there could have been NAs for depth, which I do not want to become
  zero.
 
  Is there a way to transform NAs into zeros for multiple columns of a
  dataframe in one step, while ignoring some columns?
 
  Or maybe there is another way to achieve this that would have put
  zeros where I need them (i.e. something else than reshape)?
 
  Thanking you in advance,
 
  Denis Chabot
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.