Edit. I should have mentioned that my access pattern is a bit different. Ill need to scan between dates... 20140101 -> 20140501, not an individual date. My table is actually a bunch of increments so as of right now, there is only 1 row key per timeframe.
On Sat, May 3, 2014 at 8:39 AM, Software Dev <[email protected]> wrote: > Ok so there is no way around the FuzzyRowFilter checking every single > row in the table correct? If so, what is a valid use case for that > filter? > > Ok so salt to a low enough prefix that makes scanning reasonable. Our > client for accessing these tables is a Rails (not JRuby) application > so we are stuck with either the Thrift or Rails client. Can either of > these perform multiple gets/scans? > > > > On Sat, May 3, 2014 at 1:10 AM, Adrien Mogenet <[email protected]> > wrote: >> Using 4 random bytes you'll get 2^32 possibilities; thus your data can be >> split enough among all the possible regions, but you won't be able to >> easily benefit from distributed scans to gather what you want. >> >> Let say you want to split (time+login) with a salted key and you expect to >> be able to retrieve events from 20140429 pretty fast. Then I would split >> input data among 10 "spans", spread over 10 regions and 10 RS (ie: `$random >> % 10'). To retrieve ordered data, I would parallelize Scans over the 10 >> span groups (<00>-20140429, <01>-20140429...) and merge-sort everything >> until I've got all the expected results. >> >> So in term of performances this looks "a little bit" faster than your 2^32 >> randomization. >> >> >> On Fri, May 2, 2014 at 10:09 PM, Software Dev >> <[email protected]>wrote: >> >>> I'm planning to work with FuzzyRowFilter to avoid hot spotting of our >>> time series data (20140501, 20140502...). We can prefix all of the >>> keys with 4 random bytes and then just skip these during scanning. Is >>> that correct? These *seems* like it will work but Im questioning the >>> performance of this even if it does work. >>> >>> Also, is this available via the rest client, shell and/or thrift client? >>> >>> Also, is there a FuzzyColumn equivalent of this feature? >>> >> >> >> >> -- >> Adrien Mogenet >> http://www.borntosegfault.com
