[R] finding both rows that are duplicated in a data frame

2013-09-07 Thread Robert Lynch
I have a data frame that looks like id1-c(1,1,2,2,3,3,4,5,5,6,6,7,8,9,9,10) id2-c(22,22,34,34,15,15,76,45,45,84,84,37,52,66,66,91) GENDER-sample(c(G-UNK,G-M,G-F),16, replace = TRUE) ETH -sample(c(E-AF,E-UNK,E-VT),16, replace = TRUE) example-cbind(id1,id2,GENDER,ETH) where there are two id's and

Re: [R] finding both rows that are duplicated in a data frame

2013-09-07 Thread arun
Hi, example- data.frame(id1,id2,GENDER,ETH,stringsAsFactors=FALSE) res-unique(example[!(grepl(UNK,example$GENDER)|grepl(UNK,example$ETH)),])  res #   id1 id2 GENDER  ETH #1    1  22    G-M E-VT #3    2  34    G-M E-AF #5    3  15    G-M E-AF #7    4  76    G-F E-VT #8    5  45    G-F E-VT #12  

Re: [R] finding both rows that are duplicated in a data frame

2013-09-07 Thread arun
Hi, Suppose you have situations like this: (duplicates are both UNKNOWN and want to remove those) example1-rbind(example,data.frame(id1=c(11,12,12),id2=c(93,95,95),GENDER=rep(G-UNK,3),ETH=rep(E-UNK,3))) spl- as.character(interaction(example1$id1,example1$id2))  

Re: [R] finding both rows that are duplicated in a data frame

2013-09-07 Thread jim holtman
try this. Splits the dataframe based on the two IDs and then chooses the first one in cases where condition not met. id1-c(1,1,2,2,3,3,4,5,5,6,6,7,8,9,9,10) id2-c(22,22,34,34,15,15,76,45,45,84,84,37,52,66,66,91) GENDER-sample(c(G-UNK,G-M,G-F),16, replace = TRUE) ETH