[R] which duplicated rows to delete

2006-10-30 Thread Søren Merser
Hi
Say I've this vector with several duplicates
x-c(1,2,3,4,2,6,2,8,2,3)

which(duplicated(x))
[1] 5  7  9 10 11

But what I realy want is somthing like:
List({2,5,7}, {3,10}, ...)

Then from each sublist I can specify which of the duplicate items to drop

res-NULL
for(vec in myDuplicateList) 
res-rbind(res, subset(data[vec,], myCrit))

I'll get some of the way by sorting my original data appropriately, as it's
the second and following rows that are 'marked' as duplicates, but that's
not quite enough

Hope for some hints
Kind regards Søren

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] which duplicated rows to delete

2006-10-30 Thread Petr Pikal
Hi

you can use

apply(outer(( (1:10)[1:10%in%x]), x, ==), 1, which)

to get list of duplicates. But then you will need to specify which 
duplicates you want to discard which can be problematic.

HTH
Petr

On 30 Oct 2006 at 11:11, Sřren Merser wrote:

From:   Sřren Merser [EMAIL PROTECTED]
To: R - help r-help@stat.math.ethz.ch
Date sent:  Mon, 30 Oct 2006 11:11:01 +0100
Subject:[R] which duplicated rows to delete

 Hi
 Say I've this vector with several duplicates
 x-c(1,2,3,4,2,6,2,8,2,3)
 
 which(duplicated(x))
 [1] 5  7  9 10 11
 
 But what I realy want is somthing like:
 List({2,5,7}, {3,10}, ...)
 
 Then from each sublist I can specify which of the duplicate items to
 drop
 
 res-NULL
 for(vec in myDuplicateList) 
  res-rbind(res, subset(data[vec,], myCrit))
 
 I'll get some of the way by sorting my original data appropriately, as
 it's the second and following rows that are 'marked' as duplicates,
 but that's not quite enough
 
 Hope for some hints
 Kind regards Sřren
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,
 minimal, self-contained, reproducible code.

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] which duplicated rows to delete

2006-10-30 Thread Gabor Grothendieck
Try this.  The first line breaks it up into lists and the second
line drops any list that is not greater than 1 in length:

out - tapply(seq(x), x, function(x)x)
out[sapply(out, length)  1]

On 10/30/06, Søren Merser [EMAIL PROTECTED] wrote:
 Hi
 Say I've this vector with several duplicates
 x-c(1,2,3,4,2,6,2,8,2,3)

 which(duplicated(x))
 [1] 5  7  9 10 11

 But what I realy want is somthing like:
 List({2,5,7}, {3,10}, ...)

 Then from each sublist I can specify which of the duplicate items to drop

 res-NULL
 for(vec in myDuplicateList)
res-rbind(res, subset(data[vec,], myCrit))

 I'll get some of the way by sorting my original data appropriately, as it's
 the second and following rows that are 'marked' as duplicates, but that's
 not quite enough

 Hope for some hints
 Kind regards Søren

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.