[R] which duplicated rows to delete
Hi Say I've this vector with several duplicates x-c(1,2,3,4,2,6,2,8,2,3) which(duplicated(x)) [1] 5 7 9 10 11 But what I realy want is somthing like: List({2,5,7}, {3,10}, ...) Then from each sublist I can specify which of the duplicate items to drop res-NULL for(vec in myDuplicateList) res-rbind(res, subset(data[vec,], myCrit)) I'll get some of the way by sorting my original data appropriately, as it's the second and following rows that are 'marked' as duplicates, but that's not quite enough Hope for some hints Kind regards Søren __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] which duplicated rows to delete
Hi you can use apply(outer(( (1:10)[1:10%in%x]), x, ==), 1, which) to get list of duplicates. But then you will need to specify which duplicates you want to discard which can be problematic. HTH Petr On 30 Oct 2006 at 11:11, Sřren Merser wrote: From: Sřren Merser [EMAIL PROTECTED] To: R - help r-help@stat.math.ethz.ch Date sent: Mon, 30 Oct 2006 11:11:01 +0100 Subject:[R] which duplicated rows to delete Hi Say I've this vector with several duplicates x-c(1,2,3,4,2,6,2,8,2,3) which(duplicated(x)) [1] 5 7 9 10 11 But what I realy want is somthing like: List({2,5,7}, {3,10}, ...) Then from each sublist I can specify which of the duplicate items to drop res-NULL for(vec in myDuplicateList) res-rbind(res, subset(data[vec,], myCrit)) I'll get some of the way by sorting my original data appropriately, as it's the second and following rows that are 'marked' as duplicates, but that's not quite enough Hope for some hints Kind regards Sřren __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] which duplicated rows to delete
Try this. The first line breaks it up into lists and the second line drops any list that is not greater than 1 in length: out - tapply(seq(x), x, function(x)x) out[sapply(out, length) 1] On 10/30/06, Søren Merser [EMAIL PROTECTED] wrote: Hi Say I've this vector with several duplicates x-c(1,2,3,4,2,6,2,8,2,3) which(duplicated(x)) [1] 5 7 9 10 11 But what I realy want is somthing like: List({2,5,7}, {3,10}, ...) Then from each sublist I can specify which of the duplicate items to drop res-NULL for(vec in myDuplicateList) res-rbind(res, subset(data[vec,], myCrit)) I'll get some of the way by sorting my original data appropriately, as it's the second and following rows that are 'marked' as duplicates, but that's not quite enough Hope for some hints Kind regards Søren __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.