Re: Hbase Scan/Snapshot Performance...

Ted Yu Tue, 12 Aug 2014 15:26:15 -0700

Gautum:
Please take a look at this:
HBASE-8369 MapReduce over snapshot files


Cheers


On Tue, Aug 12, 2014 at 3:11 PM, Matteo Bertozzi <[email protected]>
wrote:

> There is HBASE-10935, included in  0.94.21 where you can specify to skip
> the memstore flush and the result will be the online version of an "offline
> snapshot"
>
>
> snapshot 'sourceTable', 'snapshotName', {SKIP_FLUSH => true}
>
>
>
> On Tue, Aug 12, 2014 at 10:58 PM, Gautam <[email protected]> wrote:
>
> > Hello,
> >
> >      We'v been using and loving Hbase for couple of months now. Our
> primary
> > usecase for Hbase is writing events in stream to an online time series
> > Hbase table. Every so often we run medium to large batch scan MR jobs on
> > sections (1hour, 1 day, 1 week)  of this same time series table. This
> > online table is now showing spikes whenever these large batched read jobs
> > are run. Write throughput goes down while these sequential scans are
> > running on the table.
> >
> > We'v been playing around with snapshots and are considering using
> snapshots
> > to take over the responsibility for running these scheduled hourly,
> daily,
> > weekly jobs so that the online table doesn't get affected. From
> preliminary
> > tests it looks like online snapshots take waay too long. The snapshot job
> > times out after 60secs. The time was spent flushing the memstores on all
> > region servers (as expected) which seems to take too long.  Also it seems
> > from the RS logs like this is done serially.
> >
> > Offline snapshots isn't an option since we can't disable this table which
> > serves the event writing.
> >
> > *We'r running Hbase 94.6. Tried benchmarking snapshotting on a 9TB Table
> > with 240 regions, 1 Column Family, 4 region servers. *
> >
> > All in all, I'd like to ask if things would improve if we upgraded to
> Hbase
> > 0.98.+ Are there known benchmark numbers on expected snapshot performance
> > for 94.+ vs. 98.+ ?  In an ideal scenario we'd like these MR jobs to
> > dynamically take a snapshot, run the job, delete/re-use the snapshot
> based
> > on freshness. At the least, we need the snapshot to be fresh until the
> last
> > hour.
> >
> > Also from what I understand in Hbase, scans are not consistent at the
> table
> > level but are at the row level. Are there other ways I can query the
> online
> > table without hurting the write throughput?
> >
> > Cheers,
> > -Gautam.
> >
>

Re: Hbase Scan/Snapshot Performance...

Reply via email to