Dear list,
I have a dataset consists of duplicated sequences within day for each patient
(see below data) and I want to reshape the data with patient as time variable.
However the reshape function only takes the first sequence of the replicates
and ignores the second. How can I 1) average the duplicates and 2) give the
duplicated sequences unique names before reshaping the data ?
> data
patient day seq y
1 10 1 acdf -0.52416066
2 10 1 cdsv 0.62551539
3 10 1 dlfg -1.54668047
4 10 1 acdf 0.82404978
5 10 1 cdsv -1.17459914
6 10 2 acdf 0.47238216
7 10 2 cdsv -0.92364896
8 10 2 dlfg 1.19273992
9 10 2 acdf 0.03759663
10 10 2 cdsv 1.05106783
11 12 1 acdf 0.43575105
12 12 1 cdsv 1.01675547
13 12 1 dlfg -1.54601413
14 12 1 acdf 1.03384654
15 12 1 cdsv 0.32197671
16 12 2 acdf 0.37355285
17 12 2 cdsv -0.39780850
18 12 2 dlfg -0.37693499
19 12 2 acdf -1.28989165
20 12 2 cdsv -0.06938098
21 23 1 acdf -0.68486972
22 23 1 cdsv -1.08035660
23 23 1 dlfg 0.93124685
24 23 1 acdf -0.78737514
25 23 1 cdsv -1.56315904
26 23 2 acdf -2.30913270
27 23 2 cdsv -1.64583577
28 23 2 dlfg 1.87435485
29 23 2 acdf -1.99671825
30 23 2 cdsv 0.62995993
>
redata<-reshape(data,idvar=c("day","seq"),timevar="patient",direction="wide")
The reshaped data has only three sequences for each day and didn't take into
account the value of the second replicate.
>
> redata
day seq y.10 y.12 y.23
1 1 acdf -0.5241607 0.4357510 -0.6848697
2 1 cdsv 0.6255154 1.0167555 -1.0803566
3 1 dlfg -1.5466805 -1.5460141 0.9312469
6 2 acdf 0.4723822 0.3735529 -2.3091327
7 2 cdsv -0.9236490 -0.3978085 -1.6458358
8 2 dlfg 1.1927399 -0.3769350 1.8743548
Another problem I have is that I want to check for duplicates in the dataset.
If there are duplicates then print out the sequences. I tried with the code
below but got not so nice output. How can I make the output look nicer or is
there better way to do this?
pat<-subset(data, data$patien==10 & data$day==1)
if(any(duplicated(pat$seq,MARGIN=1)) ==FALSE)
cat(No duplicates,\n, sep=) else {cat (duplicates ,\n,sep=) &
print(pat$seq[duplicated(pat$seq)]) }
I got this output:
duplicates
[1] acdf cdsv
Levels: acdf cdsv dlfg
[1] NA NA
Warning message:
& not meaningful for factors in: Ops.factor(cat("duplicates", "\n", sep =
""), print(pat$seq[duplicated(pat$seq)]))
But would like the output to be something like:
duplicates
[1] acdf cdsv
Thanks alot for any help,
Have a nice weekend !
Tom
---------------------------------
Jämför pris på flygbiljetter och hotellrum:
http://shopping.yahoo.se/b/a/c_169901_resor_biljetter.html
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.