On 16-Oct-09 11:27:06, Gabor Grothendieck wrote: > Using the generalized inner product defined in this post: > > https://www.stat.math.ethz.ch/pipermail/r-help/2006-July/109311.html > > try this: > > cbind(S, d = rowSums(inner(S, obs, identical)))
The function inner() is defined at the above URL as: # generalized crossproduct inner <- function(a,b=a,f=crossprod) apply(b,2,function(x)apply(a,2,function(y)f(x,y))) In order for this approach to work with Robin's 'observation' = 'obs' and 'S' matrices as illustrated by Gabor above, the function inner() should instead be defined as # generalized crossproduct inner <- function(a,b=a,f=crossprod) apply(b,1,function(x)apply(a,1,function(y)f(x,y))) i.e. it crosses all rows (dim=1) of 'obs' with all rows (dim=1) of S; using "2" instead of "1" does it by columns. Indeed, a possibly useful more general definition might be # generalized crossproduct inner <- function(a,b=a,f=crossprod,dim1=1,dim2=1) apply(b,dim2,function(x)apply(a,dim1,function(y)f(x,y))) allowing either dimension of a to be crossed with either dimension of b. Ted. > On Fri, Oct 16, 2009 at 4:29 AM, Robin Hankin <rk...@cam.ac.uk> wrote: >> Hi >> >> I want a generalization of tabulate() which works on rows of a matrix. >> Suppose I have an integer matrix 'observation': >> >>> observation >> >> y1 y2 y3 >> 1 4 0 >> 1 4 0 >> 2 0 3 >> 4 1 0 >> 0 5 0 >> 0 1 4 >> 2 0 3 >> >> Each row corresponds to a (multivariate) observation. _Note that the >> first two rows are identical: this means that data "c(1,4,0)" was >> observed twice. >> >> Now suppose I can list the sample space: >> >>> S >> _ _ _ _ [1,] 5 0 0 >> [2,] 4 1 0 >> [3,] 3 2 0 >> [4,] 2 3 0 >> [5,] 1 4 0 >> [6,] 0 5 0 >> [7,] 4 0 1 >> [8,] 3 1 1 >> [9,] 2 2 1 >> [10,] 1 3 1 >> [11,] 0 4 1 >> [12,] 3 0 2 >> [13,] 2 1 2 >> [14,] 1 2 2 >> [15,] 0 3 2 >> [16,] 2 0 3 >> [17,] 1 1 3 >> [18,] 0 2 3 >> [19,] 1 0 4 >> [20,] 0 1 4 >> [21,] 0 0 5 >> >> (thus each row corresponds to a point in my sample space). >> >> Now what I need to do is to construct a new matrix, which uses the >> 'observation' matrix above, which is a sort of table: >> >>> desired >> >> _ _ y1 y2 y3 d >> [1,] 5 0 0 0 >> [2,] 4 1 0 1 >> [3,] 3 2 0 0 >> [4,] 2 3 0 0 >> [5,] 1 4 0 2 >> [6,] 0 5 0 1 >> [7,] 4 0 1 0 >> [8,] 3 1 1 0 >> [9,] 2 2 1 0 >> [10,] 1 3 1 0 >> [11,] 0 4 1 0 >> [12,] 3 0 2 0 >> [13,] 2 1 2 0 >> [14,] 1 2 2 0 >> [15,] 0 3 2 0 >> [16,] 2 0 3 2 >> [17,] 1 1 3 0 >> [18,] 0 2 3 0 >> [19,] 1 0 4 0 >> [20,] 0 1 4 1 >> [21,] 0 0 5 0 >> >> >> Thus the 'd' column counts the number of times that each row occurs in >> variable 'observation'. _So desired[5,4]=2 because the observation >> corresponding to desired[5,1:3] (viz c(1,4,0)) occurred twice. _And >> desired[1,4]=0 because the observation corresponding to desired[1,1:3] >> (viz c(5,0,0)) did not occur once (it was not observed). >> >> In my application I have dim(S) ~= c(5,4e6). >> >> I've tried merge(), stack(), _reshape(), but the best I can do >> is the (derisory): >> >> require(partitions) >> >> >> obs <- matrix(as.integer(c( >> _ _ _ _ _ _ _ 1, 4, 0, >> _ _ _ _ _ _ _ 1, 4, 0, >> _ _ _ _ _ _ _ 2, 0, 3, >> _ _ _ _ _ _ _ 4, 1, 0, >> _ _ _ _ _ _ _ 0, 5, 0, >> _ _ _ _ _ _ _ 0, 1, 4, >> _ _ _ _ _ _ _ 2, 0, 3 >> _ _ _ _ _ _ _ )),ncol=3,byrow=TRUE) >> >> S <- t(compositions(5,3)) >> d <- rep(0,nrow(S)) >> >> >> for(i in seq_len(nrow(obs))){ >> _for(j in seq_len(nrow(S))){ >> _ if(all(obs[i,,drop=TRUE] == S[j,,drop=TRUE])){ >> _ _ d[j] <- d[j]+1 >> _ } >> _} >> } >> >> S <- cbind(S,d) >> >> >> Anyone got anything better before I try C? >> >> >> -- >> Robin K. S. Hankin >> Uncertainty Analyst >> University of Cambridge >> 19 Silver Street >> Cambridge CB3 9EP >> 01223-764877 >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 17-Oct-09 Time: 17:00:48 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.