Re: intersection of row ids

2011-03-13 Thread Jesse Daniels
Has anyone tried the zig-zag merge join algorithm that Google uses to do something similar with their AppEngine data store (BigTable)? It's described here starting on slide 29: http://www.scribd.com/doc/16952419/Building-scalable-complex-apps-on-App-Engine

Re: intersection of row ids

2011-03-13 Thread Ted Dunning
Well, since you can start iterating from any point, you can just do a map-reduce over the larger table. In each mapper, on the first call, initialize a scanner into the smaller table to start with the key that you get from the larger table. Each time you get a sequential key from the master

Re: intersection of row ids

2011-03-11 Thread Amandeep Khurana
You can scan through one table and see if the other one has those rowids or not. On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor vishal.kapoor...@gmail.comwrote: Friends, how do I best achieve intersection of sets of row ids suppose I have two tables with similar row ids how can I get the row

Re: intersection of row ids

2011-03-11 Thread Lars George
Hi, If you expect a lot of misses with that approach then enable bloom filters on the second table for fast lookups of misses. Lars On Mar 11, 2011, at 9:44, Amandeep Khurana ama...@gmail.com wrote: You can scan through one table and see if the other one has those rowids or not. On Thu,

Re: intersection of row ids

2011-03-11 Thread Vishal Kapoor
Should the Bloom filter be ROW or ROWCOL? Vishal On Fri, Mar 11, 2011 at 11:44 AM, Lars George lars.geo...@gmail.com wrote: Hi, If you expect a lot of misses with that approach then enable bloom filters on the second table for fast lookups of misses. Lars On Mar 11, 2011, at 9:44,

Re: intersection of row ids

2011-03-11 Thread Usman Waheed
I suggest it to be ROWCOL because you have many columns to match against in your second table (column qualifiers). -Usman Should the Bloom filter be ROW or ROWCOL? Vishal On Fri, Mar 11, 2011 at 11:44 AM, Lars George lars.geo...@gmail.com wrote: Hi, If you expect a lot of misses with

Re: intersection of row ids

2011-03-11 Thread Bill Graham
You could also do this with MR easily using Pig's HBaseStorage and either an inner join or an outer join with a filter on null, depending on if you want matches or misses, respectively. On Fri, Mar 11, 2011 at 4:25 PM, Usman Waheed usm...@opera.com wrote: I suggest it to be ROWCOL because you

intersection of row ids

2011-03-10 Thread Vishal Kapoor
Friends, how do I best achieve intersection of sets of row ids suppose I have two tables with similar row ids how can I get the row ids present in one and not in the other? does things get better if I have row ids as values in some qualifier/ qualifier itself? I hope the question is not too

Re: intersection of row ids

2011-03-10 Thread Ted Dunning
You mean like write a map-reduce program that joins the key sets and outputs what you want? On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor vishal.kapoor...@gmail.comwrote: Friends, how do I best achieve intersection of sets of row ids suppose I have two tables with similar row ids how can I