Matthew,

Yes, the case I am thinking of is a 1-column key; sorry for the
overgeneralization.  I haven't thought much about the multi-column key case.

        -s

On Mon, Nov 7, 2011 at 12:48, Matthew Dowle <mdo...@mdowle.plus.com> wrote:

> Stavros Macrakis <macrakis <at> alum.mit.edu> writes:
> >
> > data.table certainly has some useful mechanisms, and I've been
> > experimenting with it as an implementation mechanism, though it's not a
> > drop-in substitute for factors.  Also, though it is efficient for set
> > operations between small sets and large sets, it is not very efficient
> for
> > operations between two large sets
>
> As a general statement that could do with some clarification ;) data.table
> likes keys consisting of multiple ordered columns, e.g. (id,date). It is (I
> believe) efficient for joining two large 2+ column keyed data sets because
> the
> upper bound of each row's one-sided binary search is localised in that
> case (by
> group of the previous key column).
>
> As I understand it, Stavros has a different type of 'two large datasets' :
> English language website data. Each set is one large vector of uniformly
> distributed unique strings. That appears to be quite a different problem to
> multiple columns of many times duplicated data.
>
> Matthew
>
> > Thanks everyone, and if you do come across a relevant CRAN package, I'd
> be
> > very interested in hearing about it.
> >
> >           -s
> >
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to