There is HBASE-10935, included in 0.94.21 where you can specify to skip
the memstore flush and the result will be the online version of an "offline
snapshot"
snapshot 'sourceTable', 'snapshotName', {SKIP_FLUSH => true}
On Tue, Aug 12, 2014 at 10:58 PM, Gautam <[email protected]> wrote:
> Hello,
>
> We'v been using and loving Hbase for couple of months now. Our primary
> usecase for Hbase is writing events in stream to an online time series
> Hbase table. Every so often we run medium to large batch scan MR jobs on
> sections (1hour, 1 day, 1 week) of this same time series table. This
> online table is now showing spikes whenever these large batched read jobs
> are run. Write throughput goes down while these sequential scans are
> running on the table.
>
> We'v been playing around with snapshots and are considering using snapshots
> to take over the responsibility for running these scheduled hourly, daily,
> weekly jobs so that the online table doesn't get affected. From preliminary
> tests it looks like online snapshots take waay too long. The snapshot job
> times out after 60secs. The time was spent flushing the memstores on all
> region servers (as expected) which seems to take too long. Also it seems
> from the RS logs like this is done serially.
>
> Offline snapshots isn't an option since we can't disable this table which
> serves the event writing.
>
> *We'r running Hbase 94.6. Tried benchmarking snapshotting on a 9TB Table
> with 240 regions, 1 Column Family, 4 region servers. *
>
> All in all, I'd like to ask if things would improve if we upgraded to Hbase
> 0.98.+ Are there known benchmark numbers on expected snapshot performance
> for 94.+ vs. 98.+ ? In an ideal scenario we'd like these MR jobs to
> dynamically take a snapshot, run the job, delete/re-use the snapshot based
> on freshness. At the least, we need the snapshot to be fresh until the last
> hour.
>
> Also from what I understand in Hbase, scans are not consistent at the table
> level but are at the row level. Are there other ways I can query the online
> table without hurting the write throughput?
>
> Cheers,
> -Gautam.
>