More details on binary sorting you can read http://brunodumon.wordpress.com/2010/02/17/building-indexes-using-hbase-mapping-strings-numbers-and-dates-onto-bytes/
2011/1/8 Jack Levin <[email protected]> > Basic problem described: > > user uploads 1 image and creates some text -10 days ago, then creates 1000 > text messages on between 9 days ago and today: > > > row key | fm:type --> value > > > 00days:uid | type:text --> text_id > > . > > . > > 09days:uid | type:text --> text_id > > > 10days:uid | type:photo --> URL > > | type:text --> text_id > > > Skip all the way to 10days:uid row, without reading 00days:id - 09:uid > rows. > Ideally we do not want to read all 1000 entries that have _only_ text. We > want to get to last entry in the most efficient way possible. > > > -Jack > > > > > On Sat, Jan 8, 2011 at 11:43 AM, Stack <[email protected]> wrote: > > Strike that. This is a Scan, so can't do blooms + filter. Sorry. > > Sounds like a coprocessor then. You'd have your query 'lean' on the > > column that you know has the lesser items and then per item, you'd do > > a get inside the coprocessor against the column of many entries. The > > get would go via blooms. > > > > St.Ack > > > > > > On Sat, Jan 8, 2011 at 11:39 AM, Stack <[email protected]> wrote: > >> On Sat, Jan 8, 2011 at 11:35 AM, Jack Levin <[email protected]> wrote: > >>> Yes, we thought about using filters, the issue is, if one family > >>> column has 1ml values, and second family column has 10 values at the > >>> bottom, we would end up scanning and filtering 99990 records and > >>> throwing them away, which seems inefficient. > >> > >> Blooms+filters? > >> St.Ack > >> > > >
