Re: question about merge-join (or AND operator betwween colums)

Jack Levin Sat, 08 Jan 2011 11:59:47 -0800

Basic problem described:

user uploads 1 image and creates some text -10 days ago, then creates 1000
text messages on between 9 days ago and today:



row key          | fm:type --> value


00days:uid     | type:text --> text_id

.

.

09days:uid | type:text --> text_id


10days:uid     | type:photo --> URL

          | type:text --> text_id


Skip all the way to 10days:uid row, without reading 00days:id - 09:uid rows.
 Ideally we do not want to read all 1000 entries that have _only_ text.  We
want to get to last entry in the most efficient way possible.


-Jack




On Sat, Jan 8, 2011 at 11:43 AM, Stack <[email protected]> wrote:
> Strike that.  This is a Scan, so can't do blooms + filter.  Sorry.
> Sounds like a coprocessor then.  You'd have your query 'lean' on the
> column that you know has the lesser items and then per item, you'd do
> a get inside the coprocessor against the column of many entries.  The
> get would go via blooms.
>
> St.Ack
>
>
> On Sat, Jan 8, 2011 at 11:39 AM, Stack <[email protected]> wrote:
>> On Sat, Jan 8, 2011 at 11:35 AM, Jack Levin <[email protected]> wrote:
>>> Yes, we thought about using filters, the issue is, if one family
>>> column has 1ml values, and second family column has 10 values at the
>>> bottom, we would end up scanning and filtering 99990 records and
>>> throwing them away, which seems inefficient.
>>
>> Blooms+filters?
>> St.Ack
>>
>

Re: question about merge-join (or AND operator betwween colums)

Reply via email to