One way is to 'split' the indices of the rows to determine which ones to use. For example from the data give, I got the following:
> split(seq(nrow(obs)), list(obs$a, obs$b), drop=T) $`1.1` [1] 1 $`2.2` [1] 2 $`2.3` [1] 3 $`3.4` [1] 4 5 $`3.5` [1] 6 You can then use this resulting list and find all entries with more than one value and use this to do your calculations. On 10/2/06, Florian Jansen <[EMAIL PROTECTED]> wrote: > Hi, > > I have a dataframe: > > (obs <- data.frame(a=c(1,2,2,3,3,3), b=c(1,2,3,4,4,5), c=1:2)) > attach(obs) > > In reality its about 1 million rows. > > Some of the datasets have same contents in col a and! b like row 4 and 5. > I want to do some calculations on col c within the duplicated rows and > merge them afterwards: > > layer <- function(x) round((1-prod(1-x/100))*100,0) > (covnew <- aggregate(c, list(a=a, b=b), layer)) > > This works fine, but not with 1 mill. rows because of memory space > limitations. > So I thought to split the dataframe into the majority of unique rows on > one hand and all duplicated rows on the other: > > With > subset(obs, a %in% a[duplicated(a)]) > and !a respectively this works fine for single column comparison. > This must be also possible for two column comparison, but I can`t get it. > > Thanks > Florian > > -- > Dr. Florian Jansen > Geobotany & Nature Conservation > Institute for Botany and Landscape Ecology > Ernst-Moritz-Arndt-University > Grimmer Str. 88 > 17487 Greifswald - Germany > +49 (0)3834 86 4147 > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
