[R] help with handling replicates before reshaping data

Tom Cohen Fri, 13 Jul 2007 11:06:35 -0700

Dear list,
  I have a dataset consists of duplicated sequences within day for each patient 
(see below data) and I want to reshape the data with patient as time variable. 
However the reshape function only takes the first sequence of the replicates 
and ignores the second. How can I 1) average the duplicates and 2) give the 
duplicated sequences unique names before reshaping the data ? 
   
  > data
     patient day  seq           y
  1       10   1 acdf -0.52416066
  2       10   1 cdsv  0.62551539
  3       10   1 dlfg -1.54668047
  4       10   1 acdf  0.82404978
  5       10   1 cdsv -1.17459914
  6       10   2 acdf  0.47238216
  7       10   2 cdsv -0.92364896
  8       10   2 dlfg  1.19273992
  9       10   2 acdf  0.03759663
  10      10   2 cdsv  1.05106783
  11      12   1 acdf  0.43575105
  12      12   1 cdsv  1.01675547
  13      12   1 dlfg -1.54601413
  14      12   1 acdf  1.03384654
  15      12   1 cdsv  0.32197671
  16      12   2 acdf  0.37355285
  17      12   2 cdsv -0.39780850
  18      12   2 dlfg -0.37693499
  19      12   2 acdf -1.28989165
  20      12   2 cdsv -0.06938098
  21      23   1 acdf -0.68486972
  22      23   1 cdsv -1.08035660
  23      23   1 dlfg  0.93124685
  24      23   1 acdf -0.78737514
  25      23   1 cdsv -1.56315904
  26      23   2 acdf -2.30913270
  27      23   2 cdsv -1.64583577
  28      23   2 dlfg  1.87435485
  29      23   2 acdf -1.99671825
  30      23   2 cdsv  0.62995993
   
   
  > 
redata<-reshape(data,idvar=c("day","seq"),timevar="patient",direction="wide")
   
   
   
  The reshaped data has only three sequences for each day and didn't take into 
account the value of the second replicate. 
   
  > 
  > redata
    day  seq       y.10       y.12       y.23
  1   1 acdf -0.5241607  0.4357510 -0.6848697
  2   1 cdsv  0.6255154  1.0167555 -1.0803566
  3   1 dlfg -1.5466805 -1.5460141  0.9312469
  6   2 acdf  0.4723822  0.3735529 -2.3091327
  7   2 cdsv -0.9236490 -0.3978085 -1.6458358
  8   2 dlfg  1.1927399 -0.3769350  1.8743548
   
  Another problem I have is that I want to check for duplicates in the dataset. 
If there are duplicates then print out the sequences. I tried with the code 
below but got not so nice output. How can I make the output look nicer or is 
there better way to do this?
   
  pat<-subset(data, data$patien==10 & data$day==1)
  if(any(duplicated(pat$seq,MARGIN=1)) ==FALSE) 
  cat(No duplicates,\n, sep=)  else {cat (duplicates ,\n,sep=) & 
print(pat$seq[duplicated(pat$seq)]) }
   
  I got this output:
   
  duplicates
  [1] acdf cdsv
  Levels: acdf cdsv dlfg
  [1] NA NA
  Warning message:
  & not meaningful for factors in: Ops.factor(cat("duplicates", "\n", sep = 
""), print(pat$seq[duplicated(pat$seq)]))
   
  But would like the output to be something like:
   
  duplicates
  [1] acdf cdsv
   
  Thanks alot for any help,
  Have a nice weekend !
  Tom


              
---------------------------------

Jämför pris på flygbiljetter och hotellrum: 
http://shopping.yahoo.se/b/a/c_169901_resor_biljetter.html
        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] help with handling replicates before reshaping data

Reply via email to