Hi, You don’t want to do it. (Think about what you’re asking for …)
You would be better off w secondary indexing so that you can hit your index to get your subset of rows and then use the map/reduce to process the result set. > On Apr 23, 2015, at 2:18 PM, ayyajnam nahdravhbuhs <[email protected]> wrote: > > Hi, > > I have been toying with the idea of a predictive cache for Batch Hbase jobs. > > Traditionally speaking, hadoop is a batch processing framework. We use > hbase as a data store for a number of batch jobs that run on Hadoop. > Depending on the job that is run, and the way the data is layed out, Hbase > might perform great for some of the jobs but might result in performance > bottlenecks for others. This might specifically be seen for cases where the > same table is used as an input for different jobs with different access > patterns. > Hbase currently supports various cache implementations (Bucket, LRU, > Combined) but none of these mechanisms are job aware. A job aware cache > should be able to determine the best data to cache based on previous data > requests from previous runs of the job. The learning process can happen in > the background and will require access information from mulitple runs of > the job. The process should result in a per job output that can be used by > a new Predictive caching algorithm. When a job is then run with this > predictive cache, it can query the learning results when it has to decide > which block to evict or load. > > Just wanted to check if anyone knows of any related work in this area. > > Thoughts and suggestions welcome. > > Thanks, > Ayya The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com
