How to scan only Memstore from end point co-processor

Gautam Borah Sun, 31 May 2015 23:17:37 -0700

Hi all,

Here is our use case,


We have a very write heavy cluster. Also we run periodic end point co
processor based jobs that operate on the data written in the last 10-15
mins, every 10 minute.

Is there a way to only query in the MemStore from the end point
co-processor? The periodic job scans for data using a time range. We would
like to implement a simple logic,

a. if query time range is within MemStore's TimeRangeTracker, then query
only memstore.
b. If end Time of the query time range is within MemStore's
TimeRangeTracker, but query start Time is outside MemStore's
TimeRangeTracker (memstore flush happened), then query both MemStore and
Files.
c. If start time and end time of the query is outside of MemStore
TimeRangeTracker we query only files.

The incoming data is time series and we do not allow old data (out of sync
from clock) to come into the system(HBase).

Cloudera has a scanner org.apache.hadoop.hbase.regionserver.InternalScan,
that has methods like checkOnlyMemStore() and checkOnlyStoreFiles(). Is
this available in Trunk?

Also, how do I access the Memstore for a Column Family in the end point
co-processor from CoprocessorEnvironment?

How to scan only Memstore from end point co-processor

Reply via email to