Has anyone tried the zig-zag merge join algorithm that Google uses to do
something similar with their AppEngine data store (BigTable)? It's described
here starting on slide 29:
http://www.scribd.com/doc/16952419/Building-scalable-complex-apps-on-App-Engine
Well, since you can start iterating from any point, you can just do a
map-reduce over the larger table. In each mapper, on the first call,
initialize a scanner into the smaller table to start with the key that you
get from the larger table. Each time you get a sequential key from the
master
You can scan through one table and see if the other one has those rowids or
not.
On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor
vishal.kapoor...@gmail.comwrote:
Friends,
how do I best achieve intersection of sets of row ids
suppose I have two tables with similar row ids
how can I get the row
Hi,
If you expect a lot of misses with that approach then enable bloom filters on
the second table for fast lookups of misses.
Lars
On Mar 11, 2011, at 9:44, Amandeep Khurana ama...@gmail.com wrote:
You can scan through one table and see if the other one has those rowids or
not.
On Thu,
Should the Bloom filter be ROW or ROWCOL?
Vishal
On Fri, Mar 11, 2011 at 11:44 AM, Lars George lars.geo...@gmail.com wrote:
Hi,
If you expect a lot of misses with that approach then enable bloom filters
on the second table for fast lookups of misses.
Lars
On Mar 11, 2011, at 9:44,
I suggest it to be ROWCOL because you have many columns to match against
in your second table (column qualifiers).
-Usman
Should the Bloom filter be ROW or ROWCOL?
Vishal
On Fri, Mar 11, 2011 at 11:44 AM, Lars George lars.geo...@gmail.com
wrote:
Hi,
If you expect a lot of misses with
You could also do this with MR easily using Pig's HBaseStorage and
either an inner join or an outer join with a filter on null, depending
on if you want matches or misses, respectively.
On Fri, Mar 11, 2011 at 4:25 PM, Usman Waheed usm...@opera.com wrote:
I suggest it to be ROWCOL because you
Friends,
how do I best achieve intersection of sets of row ids
suppose I have two tables with similar row ids
how can I get the row ids present in one and not in the other?
does things get better if I have row ids as values in some qualifier/
qualifier itself?
I hope the question is not too
You mean like write a map-reduce program that joins the key sets and outputs
what you want?
On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor
vishal.kapoor...@gmail.comwrote:
Friends,
how do I best achieve intersection of sets of row ids
suppose I have two tables with similar row ids
how can I