Hi,
I am trying to aggregate the sum of my test data.frame as follow:
testDF <- data.frame(v1 = c("a", "a", "a", "a", "a", "b", "b", "b", "b",
"b", "c", "c", "c", "c", "c", "d", "d", "d", "d", "d"),
v2 = c("a1", "a1", "a1", "a2", "a3", "b1", "b1", "b2",
"b2", "b2", "c1", "c2", "c2", "c2", "c3", "d1", "d2", "d3", "d4", "d4"),
n1 = 1:20,
n2 = 21:40 )
testDF <- orderBy( ~ v1+v2, data = testDF)
rownames(testDF) <- NULL
> testDF
v1 v2 n1 n2
1 a a1 1 21
2 a a1 2 22
3 a a1 3 23
4 a a2 4 24
5 a a3 5 25
6 b b1 6 26
7 b b1 7 27
8 b b2 8 28
9 b b2 9 29
10 b b2 10 30
11 c c1 11 31
12 c c2 12 32
13 c c2 13 33
14 c c2 14 34
15 c c3 15 35
16 d d1 16 36
17 d d2 17 37
18 d d3 18 38
19 d d4 19 39
20 d d4 20 40
>
testDF.result <- aggregate(list(testDF$n1, testDF$n2), by = list(testDF$v1,
testDF$v2), FUN = sum)
> testDF.result
Group.1 Group.2 X1.20 X21.40
1 a a1 6 66
2 a a2 4 24
3 a a3 5 25
4 b b1 13 53
5 b b2 27 87
6 c c1 11 31
7 c c2 39 99
8 c c3 15 35
9 d d1 16 36
10 d d2 17 37
11 d d3 18 38
12 d d4 39 79
>
However, when I applied it to my real data, it failed. It seems that
aggregate require more memory that I have currently (I am using WinXP,
R2.8.0, 2GB RAM).
Basically I want to perform aggregate sum on my numeric fields (in the above
case, n1 and n2) based on condition of v1 and v2.
Problem is, I have a lot more of than just two numerics and conditioning
fields.
In SQL, I would do:
select v1, v2, sum(n1), sum(n2) from myData
group by v1, v2;
Am I using a wrong function / library (or even wrong approach)? If so, can
you suggest which one?
Any pointer is really appreciated.
Thanks,
Ferry
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.