Hi all, Here is our use case,
We have a very write heavy cluster. Also we run periodic end point co processor based jobs that operate on the data written in the last 10-15 mins, every 10 minute. Is there a way to only query in the MemStore from the end point co-processor? The periodic job scans for data using a time range. We would like to implement a simple logic, a. if query time range is within MemStore's TimeRangeTracker, then query only memstore. b. If end Time of the query time range is within MemStore's TimeRangeTracker, but query start Time is outside MemStore's TimeRangeTracker (memstore flush happened), then query both MemStore and Files. c. If start time and end time of the query is outside of MemStore TimeRangeTracker we query only files. The incoming data is time series and we do not allow old data (out of sync from clock) to come into the system(HBase). Cloudera has a scanner org.apache.hadoop.hbase.regionserver.InternalScan, that has methods like checkOnlyMemStore() and checkOnlyStoreFiles(). Is this available in Trunk? Also, how do I access the Memstore for a Column Family in the end point co-processor from CoprocessorEnvironment?
