Predictive Caching

ayyajnam nahdravhbuhs Thu, 23 Apr 2015 12:19:16 -0700

Hi,

I have been toying with the idea of a predictive cache for Batch Hbase jobs.


Traditionally speaking, hadoop is a batch processing framework. We use
hbase as a data store for a number of batch jobs that run on Hadoop.
Depending on the job that is run, and the way the data is layed out, Hbase
might perform great for some of the jobs but might result in performance
bottlenecks for others. This might specifically be seen for cases where the
same table is used as an input for different jobs with different access
patterns.
Hbase currently supports various cache implementations (Bucket, LRU,
Combined) but none of these mechanisms are job aware. A job aware cache
should be able to determine the best data to cache based on previous data
requests from previous runs of the job. The learning process can happen in
the background and will require access information from mulitple runs of
the job. The process should result in a per job output that can be used by
a new Predictive caching algorithm. When a job is then run with this
predictive cache, it can query the learning results when it has to decide
which block to evict or load.

Just wanted to check if anyone knows of any related work in this area.

Thoughts and suggestions welcome.

Thanks,
Ayya

Predictive Caching

Reply via email to