[R] Dealing with Duplicates - How to count instances?
Hi there, given a data.frame 'data' I managed to filter out entries (rows) that are identical with respect to one column like so: duplicity - duplicated(data[column]) data_unique - subset(data,duplicity!=TRUE) But I'm trying to extract how many duplicates each of the remaining rows had. Can someone please send me down the right path for this? Joh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dealing with Duplicates - How to count instances?
table(data[column]) will give you the number of items in each subgroup; that would be the count you are after. On 2/2/07, Johannes Graumann [EMAIL PROTECTED] wrote: Hi there, given a data.frame 'data' I managed to filter out entries (rows) that are identical with respect to one column like so: duplicity - duplicated(data[column]) data_unique - subset(data,duplicity!=TRUE) But I'm trying to extract how many duplicates each of the remaining rows had. Can someone please send me down the right path for this? Joh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dealing with Duplicates - How to count instances?
jim holtman wrote: table(data[column]) will give you the number of items in each subgroup; that would be the count you are after. Thanks for your Help! That rocks! I can do copynum - table(data_6plus[Accession.number]) data_6plus$Repeats - sapply(data_6plus[[Accession.number]], function(x) copynum[x][[1]]) now! But how about this: - do something along the lines of duplicity - duplicated(data_6plus[Accession.number]) data_6plus_unique - subset(data_6plus,duplicity!=TRUE) - BUT: retain from each deleted row one field, append it to a vector and fill that into a new field of the remaining row of the set sharing data_6plus[Accession.number]? How would you do something like that? Joh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.