Stavros Macrakis macrakis at alum.mit.edu writes:
data.table certainly has some useful mechanisms, and I've been
experimenting with it as an implementation mechanism, though it's not a
drop-in substitute for factors. Also, though it is efficient for set
operations between small sets and
Le dimanche 06 novembre 2011 à 19:00 -0500, Stavros Macrakis a écrit :
Milan, Jeff, Patrick,
Thank you for your comments and suggestions.
Milan,
This is far from a completely theoretical problem. I am performing
text analytics on a corpus of about 2m documents. There are tens of
Matthew,
Yes, the case I am thinking of is a 1-column key; sorry for the
overgeneralization. I haven't thought much about the multi-column key case.
-s
On Mon, Nov 7, 2011 at 12:48, Matthew Dowle mdo...@mdowle.plus.com wrote:
Stavros Macrakis macrakis at alum.mit.edu writes:
Milan, Jeff, Patrick,
Thank you for your comments and suggestions.
Milan,
This is far from a completely theoretical problem. I am performing text
analytics on a corpus of about 2m documents. There are tens of thousands
of distinct words (lemmata). It seems to me that the natural
Perhaps 'data.table' would be a package
on CRAN that would be acceptable.
On 05/11/2011 16:45, Jeffrey Ryan wrote:
Or better still, extend R via the mechanisms in place. Something akin
to a fast factor package. Any change to R causes downstream issues in
(hundreds of?) millions of lines of
R factors are the natural way to represent factors -- and should be
efficient since they use small integers. But in fact, for many (but
not all) operations, R factors are considerably slower than integers,
or even character strings. This appears to be because whenever a
factor vector is