On 16/10/2012 12:29 PM, Sam Steingold wrote:
> * R. Michael Weylandt <zvpunry.jrlyn...@tznvy.pbz> [2012-10-16 16:19:27 
+0100]:
>
> Have you looked at using table() directly? If I understand what you
> want correctly something like:
>
> table(do.call(paste, x))

I wished to avoid paste (I will have to re-split later, so it will be a
performance nightmare).

> Also, if you take a look at the development version of R, changes are
> being put in place to allow much larger data sets.
>>
>> xtabs(), although dog slow, would have footed the bill nicely:
>> --8<---------------cut here---------------start------------->8---
>>> x <- data.frame(a=1:32,b=1:32,c=1:32,d=1:32,e=1:32)
>>> system.time(subset(as.data.frame(xtabs( ~. , x )), Freq != 0 ))
>>    user  system elapsed
>>  12.788   4.288  17.224
>> --8<---------------cut here---------------end--------------->8---

you should not need "much larger data sets" for this.
x is sorted.

The problem is that xtabs() and by() and related functions are designed for the case where all combinations of all factors exist. If you have a dataset where only a few exist, you could use sparseby() from the reshape package.

Syntax would be

sparseby(data=x, INDICES=x, FUN=nrow)

if you wanted a dataframe giving counts.

I just tried it, and on your two examples it gives a warning about coercing a list to a logical vector; I guess all(list(TRUE, TRUE)) was allowed when I wrote it, but isn't any more. I'll send a patch to the maintainer.

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to