Gautum: See also HBASE-10642 which went into 0.94.18 You can do rolling upgrade from 94.6 to 94.21
Cheers On Tue, Aug 12, 2014 at 5:42 PM, Gautam <[email protected]> wrote: > Thanks for the replies.. > > Matteo, > > We'r running 94.6 since February so, sadly the prod cluster doesn't have > this SKIP_FLUSH option right now. Would be great if there are options I > could use right now until we upgrade to 98. > > Ted, > Thanks for the jira. That is exactly what we intend to use for running > the MR jobs over snapshots. Just wanted to know how easy/lightweight > snapshotting can be before we set our eyes on moving the whole thing over. > > > Cheers, > -Gautam. > > > > On Tue, Aug 12, 2014 at 3:24 PM, Ted Yu <[email protected]> wrote: > > > Gautum: > > Please take a look at this: > > HBASE-8369 MapReduce over snapshot files > > > > Cheers > > > > > > On Tue, Aug 12, 2014 at 3:11 PM, Matteo Bertozzi < > [email protected]> > > wrote: > > > > > There is HBASE-10935, included in 0.94.21 where you can specify to > skip > > > the memstore flush and the result will be the online version of an > > "offline > > > snapshot" > > > > > > > > > snapshot 'sourceTable', 'snapshotName', {SKIP_FLUSH => true} > > > > > > > > > > > > On Tue, Aug 12, 2014 at 10:58 PM, Gautam <[email protected]> > > wrote: > > > > > > > Hello, > > > > > > > > We'v been using and loving Hbase for couple of months now. Our > > > primary > > > > usecase for Hbase is writing events in stream to an online time > series > > > > Hbase table. Every so often we run medium to large batch scan MR jobs > > on > > > > sections (1hour, 1 day, 1 week) of this same time series table. This > > > > online table is now showing spikes whenever these large batched read > > jobs > > > > are run. Write throughput goes down while these sequential scans are > > > > running on the table. > > > > > > > > We'v been playing around with snapshots and are considering using > > > snapshots > > > > to take over the responsibility for running these scheduled hourly, > > > daily, > > > > weekly jobs so that the online table doesn't get affected. From > > > preliminary > > > > tests it looks like online snapshots take waay too long. The snapshot > > job > > > > times out after 60secs. The time was spent flushing the memstores on > > all > > > > region servers (as expected) which seems to take too long. Also it > > seems > > > > from the RS logs like this is done serially. > > > > > > > > Offline snapshots isn't an option since we can't disable this table > > which > > > > serves the event writing. > > > > > > > > *We'r running Hbase 94.6. Tried benchmarking snapshotting on a 9TB > > Table > > > > with 240 regions, 1 Column Family, 4 region servers. * > > > > > > > > All in all, I'd like to ask if things would improve if we upgraded to > > > Hbase > > > > 0.98.+ Are there known benchmark numbers on expected snapshot > > performance > > > > for 94.+ vs. 98.+ ? In an ideal scenario we'd like these MR jobs to > > > > dynamically take a snapshot, run the job, delete/re-use the snapshot > > > based > > > > on freshness. At the least, we need the snapshot to be fresh until > the > > > last > > > > hour. > > > > > > > > Also from what I understand in Hbase, scans are not consistent at the > > > table > > > > level but are at the row level. Are there other ways I can query the > > > online > > > > table without hurting the write throughput? > > > > > > > > Cheers, > > > > -Gautam. > > > > > > > > > > > > > -- > "If you really want something in this life, you have to work for it. Now, > quiet! They're about to announce the lottery numbers..." >
