Re: Hbase Scan/Snapshot Performance...

Ted Yu Tue, 12 Aug 2014 17:57:34 -0700

Gautum:
See also HBASE-10642 which went into 0.94.18

You can do rolling upgrade from 94.6 to 94.21


Cheers


On Tue, Aug 12, 2014 at 5:42 PM, Gautam <[email protected]> wrote:

> Thanks for the replies..
>
> Matteo,
>
>   We'r running 94.6 since February so, sadly the prod cluster doesn't have
> this SKIP_FLUSH option right now. Would be great if there are options I
> could use right now until we upgrade to 98.
>
> Ted,
>      Thanks for the jira. That is exactly what we intend to use for running
> the MR jobs over snapshots. Just wanted to know how easy/lightweight
> snapshotting can be before we set our eyes on moving the whole thing over.
>
>
> Cheers,
> -Gautam.
>
>
>
> On Tue, Aug 12, 2014 at 3:24 PM, Ted Yu <[email protected]> wrote:
>
> > Gautum:
> > Please take a look at this:
> > HBASE-8369 MapReduce over snapshot files
> >
> > Cheers
> >
> >
> > On Tue, Aug 12, 2014 at 3:11 PM, Matteo Bertozzi <
> [email protected]>
> > wrote:
> >
> > > There is HBASE-10935, included in  0.94.21 where you can specify to
> skip
> > > the memstore flush and the result will be the online version of an
> > "offline
> > > snapshot"
> > >
> > >
> > > snapshot 'sourceTable', 'snapshotName', {SKIP_FLUSH => true}
> > >
> > >
> > >
> > > On Tue, Aug 12, 2014 at 10:58 PM, Gautam <[email protected]>
> > wrote:
> > >
> > > > Hello,
> > > >
> > > >      We'v been using and loving Hbase for couple of months now. Our
> > > primary
> > > > usecase for Hbase is writing events in stream to an online time
> series
> > > > Hbase table. Every so often we run medium to large batch scan MR jobs
> > on
> > > > sections (1hour, 1 day, 1 week)  of this same time series table. This
> > > > online table is now showing spikes whenever these large batched read
> > jobs
> > > > are run. Write throughput goes down while these sequential scans are
> > > > running on the table.
> > > >
> > > > We'v been playing around with snapshots and are considering using
> > > snapshots
> > > > to take over the responsibility for running these scheduled hourly,
> > > daily,
> > > > weekly jobs so that the online table doesn't get affected. From
> > > preliminary
> > > > tests it looks like online snapshots take waay too long. The snapshot
> > job
> > > > times out after 60secs. The time was spent flushing the memstores on
> > all
> > > > region servers (as expected) which seems to take too long.  Also it
> > seems
> > > > from the RS logs like this is done serially.
> > > >
> > > > Offline snapshots isn't an option since we can't disable this table
> > which
> > > > serves the event writing.
> > > >
> > > > *We'r running Hbase 94.6. Tried benchmarking snapshotting on a 9TB
> > Table
> > > > with 240 regions, 1 Column Family, 4 region servers. *
> > > >
> > > > All in all, I'd like to ask if things would improve if we upgraded to
> > > Hbase
> > > > 0.98.+ Are there known benchmark numbers on expected snapshot
> > performance
> > > > for 94.+ vs. 98.+ ?  In an ideal scenario we'd like these MR jobs to
> > > > dynamically take a snapshot, run the job, delete/re-use the snapshot
> > > based
> > > > on freshness. At the least, we need the snapshot to be fresh until
> the
> > > last
> > > > hour.
> > > >
> > > > Also from what I understand in Hbase, scans are not consistent at the
> > > table
> > > > level but are at the row level. Are there other ways I can query the
> > > online
> > > > table without hurting the write throughput?
> > > >
> > > > Cheers,
> > > > -Gautam.
> > > >
> > >
> >
>
>
>
> --
> "If you really want something in this life, you have to work for it. Now,
> quiet! They're about to announce the lottery numbers..."
>

Re: Hbase Scan/Snapshot Performance...

Reply via email to