[R] Dealing with Duplicates - How to count instances?

2007-02-02 Thread Johannes Graumann
Hi there,

given a data.frame 'data' I managed to filter out entries (rows) that are
identical with respect to one column like so:

duplicity - duplicated(data[column])
data_unique - subset(data,duplicity!=TRUE)

But I'm trying to extract how many duplicates each of the remaining rows
had.

Can someone please send me down the right path for this?

Joh

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dealing with Duplicates - How to count instances?

2007-02-02 Thread jim holtman
table(data[column])

will give you the number of items in each subgroup; that would be the
count you are after.

On 2/2/07, Johannes Graumann [EMAIL PROTECTED] wrote:
 Hi there,

 given a data.frame 'data' I managed to filter out entries (rows) that are
 identical with respect to one column like so:

 duplicity - duplicated(data[column])
 data_unique - subset(data,duplicity!=TRUE)

 But I'm trying to extract how many duplicates each of the remaining rows
 had.

 Can someone please send me down the right path for this?

 Joh

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dealing with Duplicates - How to count instances?

2007-02-02 Thread Johannes Graumann
jim holtman wrote:

 table(data[column])
 
 will give you the number of items in each subgroup; that would be the
 count you are after.

Thanks for your Help! That rocks! I can do 

copynum - table(data_6plus[Accession.number])
data_6plus$Repeats - sapply(data_6plus[[Accession.number]], function(x)
 
   copynum[x][[1]])

now!

But how about this:
- do something along the lines of 

duplicity - duplicated(data_6plus[Accession.number])
data_6plus_unique - subset(data_6plus,duplicity!=TRUE)

- BUT: retain from each deleted row one field, append it to a vector and
fill that into a new field of the remaining row of the set sharing
data_6plus[Accession.number]?

How would you do something like that?

Joh

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.