Hi all,

I've found a lot of helpful info regarding identifying and deleting duplicates 
but I'd like to do something a little different - I'd like to identify the 
duplicate values but instead of deletion, label them with a value.  

I am working with historical data regarding school courses:



                Student Number              Course                  Final Mark  
         Completed
Date

1              12345678                             Soc101                  34  
                         02-04-2003

2              12345678                             Soc101                  62  
                         31-11-2004

3              12345678                             Psy104                  63  
                         03-05-2003

4              23456789                             Soc101                  73  
                         02-04-2003

5              23456789                             Psy104                  76  
                         25-02-2004


In this data frame, records 1 and 2 contain data for the same student taking 
the same course.  In record 1, the student failed (Final Mark), took the course 
again (Completed Date) and finally passed (Final Mark) in record 2.

I'd like to be able to work with the data so that I could summarize the 
achievement distribution for the first attempt records and then compare it to 
the achievement distribution for the second attempt records.  In Excel I'd use 
something like COUNTIF($A$2:A2,A2) in a new column and then summarize the "1" 
values and "2" values.

              Order    Student Number              Course                  
Final Mark           Completed Date

1              1              12345678                             Soc101       
           34                           02-04-2003

2              2              12345678                             Soc101       
           62                           31-11-2004

3              1              12345678                             Psy104       
           63                           03-05-2003

4              1              23456789                             Soc101       
           73                           02-04-2003

5              1              23456789                             Psy104       
           76                           25-02-2004


I suspect the answer is in the list discussions on "deleting duplicate records" 
but I'm still familiarizing myself with R and I'm not at a point to be able to 
see how it could be modified.  Any thoughts?

Cheers,
Chris
                                          
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to