Dear 'Born',
There was thread on this recently, but I cannot seem to find it. The best suggestion (IMHO) was along these lines: aggregate( rep(1,40), as.data.frame(diag(4)[sample(1:4,40,repl=T),]), sum ) See also http://thread.gmane.org/gmane.comp.lang.r.general/104798/focus=104841 and if you have a really big problem and access to unix utilities you might consider something like this: dat <- read.table( pipe('sort file.dat | uniq -c' ) ) HTH, Chuck p.s. the 'netiquette' of this list is to identify yourself with an appropriate email handle or signature block. On Fri, 28 Mar 2008, [EMAIL PROTECTED] wrote: > I have a sparse contingency table (most cells are 0): > >> xtabs(~.,data[,idx:(idx+4)]) > , , x3 = 1, x4 = 1, x5 = 1 > > x2 > x1 1 2 3 > 1 0 0 31 > 2 0 0 112 > 3 0 0 94 > > , , x3 = 2, x4 = 1, x5 = 1 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 3, x4 = 1, x5 = 1 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 1, x4 = 2, x5 = 1 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 2, x4 = 2, x5 = 1 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 18 0 > 3 0 27 0 > > , , x3 = 3, x4 = 2, x5 = 1 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 1, x4 = 3, x5 = 1 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 2, x4 = 3, x5 = 1 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 3, x4 = 3, x5 = 1 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 1 0 0 > 3 2 0 0 > > , , x3 = 1, x4 = 1, x5 = 2 > > x2 > x1 1 2 3 > 1 0 0 142 > 2 0 0 340 > 3 0 0 1 > > , , x3 = 2, x4 = 1, x5 = 2 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 3, x4 = 1, x5 = 2 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 1, x4 = 2, x5 = 2 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 2, x4 = 2, x5 = 2 > > x2 > x1 1 2 3 > 1 0 4 0 > 2 0 41 0 > 3 0 0 0 > > , , x3 = 3, x4 = 2, x5 = 2 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 1, x4 = 3, x5 = 2 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 2, x4 = 3, x5 = 2 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 3, x4 = 3, x5 = 2 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 1, x4 = 1, x5 = 3 > > x2 > x1 1 2 3 > 1 0 0 173 > 2 0 0 4 > 3 0 0 0 > > , , x3 = 2, x4 = 1, x5 = 3 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 3, x4 = 1, x5 = 3 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 1, x4 = 2, x5 = 3 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 2, x4 = 2, x5 = 3 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 3, x4 = 2, x5 = 3 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 1, x4 = 3, x5 = 3 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 2, x4 = 3, x5 = 3 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > , , x3 = 3, x4 = 3, x5 = 3 > > x2 > x1 1 2 3 > 1 0 0 0 > 2 0 0 0 > 3 0 0 0 > > > > > > > > Now, I do can do the following to get the sparse representation 'y' for the > table above: > >> idx<-2 >> y<-as.data.frame.table(xtabs(~.,data[,idx:(idx+4)])) >> y<-y[y$Freq>0,] >> z<-sort(y$Freq,decreasing=T,index.return=T) >> y<-y[z$ix,] >> y > x1 x2 x3 x4 x5 Freq > 89 2 3 1 1 2 340 > 169 1 3 1 1 3 173 > 88 1 3 1 1 2 142 > 8 2 3 1 1 1 112 > 9 3 3 1 1 1 94 > 122 2 2 2 2 2 41 > 7 1 3 1 1 1 31 > 42 3 2 2 2 1 27 > 41 2 2 2 2 1 18 > 121 1 2 2 2 2 4 > 170 2 3 1 1 3 4 > 75 3 1 3 3 1 2 > 74 2 1 3 3 1 1 > 90 3 3 1 1 2 1 > > > > > I am wondering if there is an R function, or a simple R routine which would > help me make the data frame 'y' without using 'xtabs'. I need to study > contingency tables of 20 (or even more) dimensions. R is unable to store a > full 3^20 contingency table. But since the tables of interest are highly > sparse, I figure the problem at hand could be highly simplified if I have > something that would create a sparse representation. > > Any help or suggestions would be greatly appreciated. > > Thanks, > A > > [[alternative HTML version deleted]] > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

