subject:"\[R\] finding both rows that are duplicated in a data frame"

[R] finding both rows that are duplicated in a data frame

2013-09-07 Thread Robert Lynch

I have a data frame that looks like id1-c(1,1,2,2,3,3,4,5,5,6,6,7,8,9,9,10) id2-c(22,22,34,34,15,15,76,45,45,84,84,37,52,66,66,91) GENDER-sample(c(G-UNK,G-M,G-F),16, replace = TRUE) ETH -sample(c(E-AF,E-UNK,E-VT),16, replace = TRUE) example-cbind(id1,id2,GENDER,ETH) where there are two id's and

Re: [R] finding both rows that are duplicated in a data frame

2013-09-07 Thread arun

Hi, example- data.frame(id1,id2,GENDER,ETH,stringsAsFactors=FALSE) res-unique(example[!(grepl(UNK,example$GENDER)|grepl(UNK,example$ETH)),]) res # id1 id2 GENDER ETH #1 1 22 G-M E-VT #3 2 34 G-M E-AF #5 3 15 G-M E-AF #7 4 76 G-F E-VT #8 5 45 G-F E-VT #12

Re: [R] finding both rows that are duplicated in a data frame

2013-09-07 Thread arun

Hi, Suppose you have situations like this: (duplicates are both UNKNOWN and want to remove those) example1-rbind(example,data.frame(id1=c(11,12,12),id2=c(93,95,95),GENDER=rep(G-UNK,3),ETH=rep(E-UNK,3))) spl- as.character(interaction(example1$id1,example1$id2))

Re: [R] finding both rows that are duplicated in a data frame

2013-09-07 Thread jim holtman

try this. Splits the dataframe based on the two IDs and then chooses the first one in cases where condition not met. id1-c(1,1,2,2,3,3,4,5,5,6,6,7,8,9,9,10) id2-c(22,22,34,34,15,15,76,45,45,84,84,37,52,66,66,91) GENDER-sample(c(G-UNK,G-M,G-F),16, replace = TRUE) ETH