You could also do this with MR easily using Pig's HBaseStorage and
either an inner join or an outer join with a filter on null, depending
on if you want matches or misses, respectively.


On Fri, Mar 11, 2011 at 4:25 PM, Usman Waheed <usm...@opera.com> wrote:
> I suggest it to be ROWCOL because you have many columns to match against in
> your second table (column qualifiers).
>
> -Usman
>
>> Should the Bloom filter be ROW or ROWCOL?
>>
>> Vishal
>>
>> On Fri, Mar 11, 2011 at 11:44 AM, Lars George <lars.geo...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> If you expect a lot of misses with that approach then enable bloom
>>> filters
>>> on the second table for fast lookups of misses.
>>>
>>> Lars
>>>
>>> On Mar 11, 2011, at 9:44, Amandeep Khurana <ama...@gmail.com> wrote:
>>>
>>> > You can scan through one table and see if the other one has those
>>> > rowids
>>> or
>>> > not.
>>> >
>>> > On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor
>>> > <vishal.kapoor...@gmail.com>wrote:
>>> >
>>> >> Friends,
>>> >> how do I best achieve intersection of sets of row ids
>>> >> suppose I have two tables with similar row ids
>>> >> how can I get the row ids present in one and not in the other?
>>> >> does things get better if I have row ids as values in some qualifier/
>>> >> qualifier itself?
>>> >> I hope the question is not too confusing...
>>> >>
>>> >> intersection of {1, 2, 3} and {2, 3, 4} is {2, 3}.
>>> >> while {1,2,3} are row ids from a table, {2,3,4} may come from other
>>> table
>>> >> as
>>> >> qualifiers in some row.
>>> >>
>>> >> thanks,
>>> >> Vishal
>>> >>
>>>
>
>
> --
> Using Opera's revolutionary email client: http://www.opera.com/mail/
>

Reply via email to