On Wed, 2004-03-03 at 21:19, Greg Blevins wrote:
> Hello R experts,
>
> The following problem outstrips my current programming knowledge.
>
> I have a dataframe with two fields that looks like the following:
>
> ID Contract
> 01 1
> 01 1
> 02 2
> 02 3
> 02 1
> 03 2
> 03 2
> 03 2
> 03 1
> 03 1
> 03 1
> etc...
>
> I would like to end up with a dataframe with one row per ID where the
> value in the contract field would be the highest value recorded for a
> single ID. As you can see above, the number of IDs varies irregularly.
> Given the above, the new file would look like the following:
>
> ID Contract
> 01 1
> 02 3
> 03 2
>
> Thanks in advance for your suggestions.
# Create the data frame
df <- data.frame(ID = I(c(rep("01", 2), rep("02", 3), rep("03", 6))),
Contract = c(1, 1, 2, 3, 1, 2, 2, 2, 1, 1, 1, ))
> df
ID Contract
1 01 1
2 01 1
3 02 2
4 02 3
5 02 1
6 03 2
7 03 2
8 03 2
9 03 1
10 03 1
11 03 1
# Now use aggregate() to condense df by ID, using the max
# value of Contract
> aggregate(df$Contract, list(ID = df$ID), max)
ID x
1 01 1
2 02 3
3 03 2
See ?aggregate for more information. By default, aggregate() names the
function derived column as 'x'. You can of course rename it as you need.
HTH,
Marc Schwartz
______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html