Hi all, I've found a lot of helpful info regarding identifying and deleting duplicates but I'd like to do something a little different - I'd like to identify the duplicate values but instead of deletion, label them with a value.
I am working with historical data regarding school courses: Student Number Course Final Mark Completed Date 1 12345678 Soc101 34 02-04-2003 2 12345678 Soc101 62 31-11-2004 3 12345678 Psy104 63 03-05-2003 4 23456789 Soc101 73 02-04-2003 5 23456789 Psy104 76 25-02-2004 In this data frame, records 1 and 2 contain data for the same student taking the same course. In record 1, the student failed (Final Mark), took the course again (Completed Date) and finally passed (Final Mark) in record 2. I'd like to be able to work with the data so that I could summarize the achievement distribution for the first attempt records and then compare it to the achievement distribution for the second attempt records. In Excel I'd use something like COUNTIF($A$2:A2,A2) in a new column and then summarize the "1" values and "2" values. Order Student Number Course Final Mark Completed Date 1 1 12345678 Soc101 34 02-04-2003 2 2 12345678 Soc101 62 31-11-2004 3 1 12345678 Psy104 63 03-05-2003 4 1 23456789 Soc101 73 02-04-2003 5 1 23456789 Psy104 76 25-02-2004 I suspect the answer is in the list discussions on "deleting duplicate records" but I'm still familiarizing myself with R and I'm not at a point to be able to see how it could be modified. Any thoughts? Cheers, Chris [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.