Yes, we thought about using filters, the issue is, if one family
column has 1ml values, and second family column has 10 values at the
bottom, we would end up scanning and filtering 99990 records and
throwing them away, which seems inefficient.  The only solution is to
break the tables apart, and do a psuedo JOIN by some row key with the
application itself. There is no contrib package that allows merged
index of multiple families or columns , is there?

-Jack

On Sat, Jan 8, 2011 at 11:30 AM, Andrey Stepachev <[email protected]> wrote:
> I don't think that it is possible on scanner level with bloomfilters
> (families are in separate files, so
> they scanned independently).
> But you can use filters, to filter out unneeded data.
>
> 2011/1/8 Jack Levin <[email protected]>
>
>> Hello all, I have a scanner question, we have this table:
>>
>> hbase(main):002:0> scan 'mattest'
>> ROW                                          COLUMN+CELL
>>  1                                           column=generic:,
>> timestamp=1294454057618, value=1
>>  1                                           column=photo:,
>> timestamp=1294453830339, value=1
>>  1                                           column=type:,
>> timestamp=1294453812716, value=photo
>>  1                                           column=type:photo,
>> timestamp=1294453884174, value=photo
>>  2                                           column=generic:,
>> timestamp=1294454061156, value=1
>>  2                                           column=type:,
>> timestamp=1294453851757, value=video
>>  2                                           column=type:video,
>> timestamp=1294453877719, value=video
>>  2                                           column=video:,
>> timestamp=1294453842722, value=1
>>
>> We need to run this query:
>>
>> hbase(main):004:0> scan 'mattest', {COLUMNS => ['generic', 'photo']}
>> ROW                                          COLUMN+CELL
>>  1                                           column=generic:,
>> timestamp=1294454057618, value=1
>>  1                                           column=photo:,
>> timestamp=1294453830339, value=1
>>  2                                           column=generic:,
>> timestamp=1294454061156, value=1
>>
>> Note that  ['generic', 'photo'], utilizes 'OR' operator, and not
>> 'AND'.   Is it possible to create a scanner that will not AND and not
>> OR?, in which case something like this:
>>
>> scan 'mattest', {COLUMNS => ['generic' AND 'photo']}
>> ROW                                          COLUMN+CELL
>>  1                                           column=generic:,
>> timestamp=1294454057618, value=1
>>  1                                           column=photo:,
>> timestamp=1294453830339, value=1
>>
>> Thanks in advance.
>>
>> -Jack
>>
>

Reply via email to