Re: Questions on FuzzyRowFilter

Software Dev Sat, 03 May 2014 08:41:25 -0700

Ok so there is no way around the FuzzyRowFilter checking every single
row in the table correct? If so, what is a valid use case for that
filter?


Ok so salt to a low enough prefix that makes scanning reasonable. Our
client for accessing these tables is a Rails (not JRuby) application
so we are stuck with either the Thrift or Rails client. Can either of
these perform multiple gets/scans?



On Sat, May 3, 2014 at 1:10 AM, Adrien Mogenet <[email protected]> wrote:
> Using 4 random bytes you'll get 2^32 possibilities; thus your data can be
> split enough among all the possible regions, but you won't be able to
> easily benefit from distributed scans to gather what you want.
>
> Let say you want to split (time+login) with a salted key and you expect to
> be able to retrieve events from 20140429 pretty fast. Then I would split
> input data among 10 "spans", spread over 10 regions and 10 RS (ie: `$random
> % 10'). To retrieve ordered data, I would parallelize Scans over the 10
> span groups (<00>-20140429, <01>-20140429...) and merge-sort everything
> until I've got all the expected results.
>
> So in term of performances this looks "a little bit" faster than your 2^32
> randomization.
>
>
> On Fri, May 2, 2014 at 10:09 PM, Software Dev 
> <[email protected]>wrote:
>
>> I'm planning to work with FuzzyRowFilter to avoid hot spotting of our
>> time series data (20140501, 20140502...).  We can prefix all of the
>> keys with 4 random bytes and then just skip these during scanning. Is
>> that correct? These *seems* like it will work but Im questioning the
>> performance of this even if it does work.
>>
>> Also, is this available via the rest client, shell and/or thrift client?
>>
>> Also, is there a FuzzyColumn equivalent of this feature?
>>
>
>
>
> --
> Adrien Mogenet
> http://www.borntosegfault.com

Re: Questions on FuzzyRowFilter

Reply via email to