Ok so there is no way around the FuzzyRowFilter checking every single row in the table correct? If so, what is a valid use case for that filter?
Ok so salt to a low enough prefix that makes scanning reasonable. Our client for accessing these tables is a Rails (not JRuby) application so we are stuck with either the Thrift or Rails client. Can either of these perform multiple gets/scans? On Sat, May 3, 2014 at 1:10 AM, Adrien Mogenet <[email protected]> wrote: > Using 4 random bytes you'll get 2^32 possibilities; thus your data can be > split enough among all the possible regions, but you won't be able to > easily benefit from distributed scans to gather what you want. > > Let say you want to split (time+login) with a salted key and you expect to > be able to retrieve events from 20140429 pretty fast. Then I would split > input data among 10 "spans", spread over 10 regions and 10 RS (ie: `$random > % 10'). To retrieve ordered data, I would parallelize Scans over the 10 > span groups (<00>-20140429, <01>-20140429...) and merge-sort everything > until I've got all the expected results. > > So in term of performances this looks "a little bit" faster than your 2^32 > randomization. > > > On Fri, May 2, 2014 at 10:09 PM, Software Dev > <[email protected]>wrote: > >> I'm planning to work with FuzzyRowFilter to avoid hot spotting of our >> time series data (20140501, 20140502...). We can prefix all of the >> keys with 4 random bytes and then just skip these during scanning. Is >> that correct? These *seems* like it will work but Im questioning the >> performance of this even if it does work. >> >> Also, is this available via the rest client, shell and/or thrift client? >> >> Also, is there a FuzzyColumn equivalent of this feature? >> > > > > -- > Adrien Mogenet > http://www.borntosegfault.com
