The Cartesian product often makes an honest-to-god join not such a good idea
on large data.  The common alternative is co-group
which is basically like doing the hard work of the join, but involves
stopping just before emitting the cartesian product.  This allows
you to inject whatever cleverness you need at this point.

Common kinds of cleverness include down-sampling of problematically large
sets of candidates.

On Tue, May 31, 2011 at 11:56 AM, Michael Segel
<[email protected]>wrote:

> So the underlying problem that the OP was trying to solve was how to join
> two tables from HBase.
> Unfortunately I goofed.
> I gave a quick and dirty solution that is a bit incomplete. They row key in
> the temp table has to be unique and I forgot about the Cartesian
> product. So my solution wouldn't work in the general case.
>

Reply via email to