Hi, I have a dataframe:
(obs <- data.frame(a=c(1,2,2,3,3,3), b=c(1,2,3,4,4,5), c=1:2)) attach(obs) In reality its about 1 million rows. Some of the datasets have same contents in col a and! b like row 4 and 5. I want to do some calculations on col c within the duplicated rows and merge them afterwards: layer <- function(x) round((1-prod(1-x/100))*100,0) (covnew <- aggregate(c, list(a=a, b=b), layer)) This works fine, but not with 1 mill. rows because of memory space limitations. So I thought to split the dataframe into the majority of unique rows on one hand and all duplicated rows on the other: With subset(obs, a %in% a[duplicated(a)]) and !a respectively this works fine for single column comparison. This must be also possible for two column comparison, but I can`t get it. Thanks Florian -- Dr. Florian Jansen Geobotany & Nature Conservation Institute for Botany and Landscape Ecology Ernst-Moritz-Arndt-University Grimmer Str. 88 17487 Greifswald - Germany +49 (0)3834 86 4147 ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
