[R] nested subset for a dataframe

2010-03-10 Thread arnaud chozo
Hi,

I've a beginner question. I'm trying to extract data in my dataframe
according to some nested rules.

I have something like the dataframe test.df:

test.df = data.frame(V1=c(rep(A,10), rep(B,10), rep(C,5)),
V2=c(rep(1,5), rep(2,5), rep(1,5), rep(2,5), rep(1,5)))

   V1 V2
1   A  1
2   A  1
3   A  1
4   A  1
5   A  1
6   A  2
7   A  2
8   A  2
9   A  2
10  A  2
11  B  1
12  B  1
13  B  1
14  B  1
15  B  1
16  B  2
17  B  2
18  B  2
19  B  2
20  B  2
21  C  1
22  C  1
23  C  1
24  C  1
25  C  1

For each value of the variable V1 (group A, B or C), I want to extract rows
for which V2 is the max for the group in V1, in order to get:

   V1 V2
1   A  2
2   A  2
3   A  2
4   A  2
5  A  2
6  B  2
7  B  2
8  B  2
9  B  2
10  B  2
11  C  1
12  C  1
13  C  1
14  C  1
15  C  1

I wrote this function:

mytest = function(df) {
  myS = unique(df$V1)
  df.tmp = subset(df, df$V1==myS[[1]])
  df.sub = subset(df.tmp, df.tmp$V2==max(df.tmp$V2))
  for (i in 2:length(myS)) {
df.tmp = subset(df, df$V1==myS[[i]])
df.sub = merge(df.sub, subset(df.tmp, df.tmp$V2==max(df.tmp$V2)),
all=TRUE)
  }
  df.sub
}

but need some more efficient and more general. Any idea?

Thanks in advance,
Arnaud

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nested subset for a dataframe

2010-03-10 Thread David Winsemius


On Mar 10, 2010, at 10:30 AM, arnaud chozo wrote:


Hi,

I've a beginner question. I'm trying to extract data in my dataframe
according to some nested rules.

I have something like the dataframe test.df:

test.df = data.frame(V1=c(rep(A,10), rep(B,10), rep(C,5)),
V2=c(rep(1,5), rep(2,5), rep(1,5), rep(2,5), rep(1,5)))

  V1 V2
1   A  1
2   A  1
3   A  1
4   A  1
5   A  1
6   A  2
7   A  2
8   A  2
9   A  2
10  A  2
11  B  1
12  B  1
13  B  1
14  B  1
15  B  1
16  B  2
17  B  2
18  B  2
19  B  2
20  B  2
21  C  1
22  C  1
23  C  1
24  C  1
25  C  1

For each value of the variable V1 (group A, B or C), I want to  
extract rows

for which V2 is the max for the group in V1, in order to get:

  V1 V2
1   A  2
2   A  2
3   A  2
4   A  2
5  A  2
6  B  2
7  B  2
8  B  2
9  B  2
10  B  2
11  C  1
12  C  1
13  C  1
14  C  1
15  C  1



 test.df[test.df$V2 == ave(test.df$V2, test.df$V1, FUN=max), ]
   V1 V2
6   A  2
7   A  2
8   A  2
9   A  2
10  A  2
16  B  2
17  B  2
18  B  2
19  B  2
20  B  2
21  C  1
22  C  1
23  C  1
24  C  1
25  C  1

You get a bit of extra information in the form of the row numbers  
which were extracted. If you want to get rid of that information, it  
would not be difficult.


--
David.

I wrote this function:

mytest = function(df) {
 myS = unique(df$V1)
 df.tmp = subset(df, df$V1==myS[[1]])
 df.sub = subset(df.tmp, df.tmp$V2==max(df.tmp$V2))
 for (i in 2:length(myS)) {
   df.tmp = subset(df, df$V1==myS[[i]])
   df.sub = merge(df.sub, subset(df.tmp, df.tmp$V2==max(df.tmp$V2)),
all=TRUE)
 }
 df.sub
}

but need some more efficient and more general. Any idea?

Thanks in advance,
Arnaud

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.