Re: Predictive Caching

Michael Segel Thu, 23 Apr 2015 14:44:52 -0700

Hi, 

You don’t want to do it. (Think about what you’re asking for …)


You would be better off w secondary indexing so that you can hit your index to 
get your subset of rows and then use the map/reduce to process the result set. 


> On Apr 23, 2015, at 2:18 PM, ayyajnam nahdravhbuhs <[email protected]> wrote:
> 
> Hi,
> 
> I have been toying with the idea of a predictive cache for Batch Hbase jobs.
> 
> Traditionally speaking, hadoop is a batch processing framework. We use
> hbase as a data store for a number of batch jobs that run on Hadoop.
> Depending on the job that is run, and the way the data is layed out, Hbase
> might perform great for some of the jobs but might result in performance
> bottlenecks for others. This might specifically be seen for cases where the
> same table is used as an input for different jobs with different access
> patterns.
> Hbase currently supports various cache implementations (Bucket, LRU,
> Combined) but none of these mechanisms are job aware. A job aware cache
> should be able to determine the best data to cache based on previous data
> requests from previous runs of the job. The learning process can happen in
> the background and will require access information from mulitple runs of
> the job. The process should result in a per job output that can be used by
> a new Predictive caching algorithm. When a job is then run with this
> predictive cache, it can query the learning results when it has to decide
> which block to evict or load.
> 
> Just wanted to check if anyone knows of any related work in this area.
> 
> Thoughts and suggestions welcome.
> 
> Thanks,
> Ayya

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Predictive Caching

Reply via email to