Re: How to scan only Memstore from end point co-processor

Vladimir Rodionov Mon, 01 Jun 2015 00:23:22 -0700

InternalScan has ctor from Scan object

See https://issues.apache.org/jira/browse/HBASE-12720


You can instantiate InternalScan from Scan, set checkOnlyMemStore, then
open RegionScanner, but the best approach is
to cache data on write and run regular RegionScanner from memstore and
block cache.

best,
-Vlad




On Sun, May 31, 2015 at 11:45 PM, Anoop John <anoop.hb...@gmail.com> wrote:

> If your scan is having a time range specified in it, HBase internally will
> check this against the time range of files etc and will avoid those which
> are clearly out of your interested time range.  You dont have to do any
> thing for this.  Make sure you set the TimeRange for ur read
>
> -Anoop-
>
> On Mon, Jun 1, 2015 at 12:09 PM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > We have a postScannerOpen hook in the CP but that may not give you a
> direct
> > access to know which one are the internal scanners on the Memstore and
> > which one are on the store files. But this is possible but we may need to
> > add some new hooks at this place where we explicitly add the internal
> > scanners required for a scan.
> >
> > But still a general question - are you sure that your data will be only
> in
> > the memstore and that the latest data would not have been flushed by that
> > time from your memstore to the Hfiles.  I see that your scenario is write
> > centric and how can you guarentee your data can be in memstore only?
> > Though your time range may say it is the latest data (may be 10 to 15
> min)
> > but you should be able to configure your memstore flushing in such a way
> > that there are no flushes happening for the latest data in that 10 to 15
> > min time.  Just saying my thoughts here.
> >
> >
> >
> >
> > On Mon, Jun 1, 2015 at 11:46 AM, Gautam Borah <gbo...@appdynamics.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > Here is our use case,
> > >
> > > We have a very write heavy cluster. Also we run periodic end point co
> > > processor based jobs that operate on the data written in the last 10-15
> > > mins, every 10 minute.
> > >
> > > Is there a way to only query in the MemStore from the end point
> > > co-processor? The periodic job scans for data using a time range. We
> > would
> > > like to implement a simple logic,
> > >
> > > a. if query time range is within MemStore's TimeRangeTracker, then
> query
> > > only memstore.
> > > b. If end Time of the query time range is within MemStore's
> > > TimeRangeTracker, but query start Time is outside MemStore's
> > > TimeRangeTracker (memstore flush happened), then query both MemStore
> and
> > > Files.
> > > c. If start time and end time of the query is outside of MemStore
> > > TimeRangeTracker we query only files.
> > >
> > > The incoming data is time series and we do not allow old data (out of
> > sync
> > > from clock) to come into the system(HBase).
> > >
> > > Cloudera has a scanner
> org.apache.hadoop.hbase.regionserver.InternalScan,
> > > that has methods like checkOnlyMemStore() and checkOnlyStoreFiles(). Is
> > > this available in Trunk?
> > >
> > > Also, how do I access the Memstore for a Column Family in the end point
> > > co-processor from CoprocessorEnvironment?
> > >
> >
>

Re: How to scan only Memstore from end point co-processor

Reply via email to