Dear all,
I have a problem with arrays.
Simplified, I have two arrays:
A =
[,,1]
1 2 3
4 5 6
7 8 9
[,,2]
10 11 12
13 14 15
16 17 18
B=
1 1 2
1 1 2
3 3 2
Basically, B declares to which cluster the values of A belong to. Now I
want an array C where for each cluster of B, the mean of A is shown (the
mean on the first 2 dimensions):
C=
[,,1]
3 3 6
3 3 6
7.5 7.5 6
[,,2]
12 12 15
12 12 15
16.5 16.5 15
The following short solution works:
A <-
aperm(array(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18),dim=c(3,3,2)),c(2,1,3))
B <- matrix(c(1,1,2,1,1,2,3,3,2),nrow=3,byrow=TRUE)
C <- array(NA,dim=c(3,3,2))
for(a in 1:3){
TS <- matrix(A[B==a],nrow=2,byrow=TRUE)
C[B==a] <- rep(rowMeans(TS),each=sum(B==a))
}
But unfortunately, my dataset is a 64x64x64x120 array, and I have
approximately 29000 clusters, and I'd have to repeat the process on 500
simulated datasets.
If I try the code above, it takes about 2 hours...
Strange enough, it is the first line in the loop that takes so long:
reconstructing a dataset with only the datapoints belonging to a certain
cluster in it...
Would anyone know a faster method?
Kind regards,
Joke Durnez
PhD student
Ghent University
Department of Data Analysis
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.