We tried using multiple LOADs because we want to minimize the data loaded and take advantage of the pushdown filter support for -gte and -lte in HBaseStorage. At the same time, a salted key schema forces different key prefixes, so we ended up with 14 LOADs, one for each salted region.
Doing some research, it seems like the Mozilla folks solved the issue in Socorro by writing a custom LoadFunc: https://github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/pig/load/HBaseMultiScanLoader.java The custom LoadFunc seems cleaner, since we can manipulate the 14 HBase scanners directly, but at the cost of writing some Java glue code. Should we expect however the 14 Pig LOADs also to work? I'll check and see why the scanners are timing out. We do have automatic splitting turned on, but the region size is high enough (1 GB) that they shouldn't be splitting often. The HBase rebalancer is probably turned on - would this be enough to cause the timeouts? Norbert On Tue, Sep 13, 2011 at 10:43 AM, Dmitriy Ryaboy <[email protected]> wrote: > Why not just one load? > > Check why the scanners are timing out. Are the regions splitting under you > while you scan? Do you have the hbase rebalancer turned on? > > On Sep 12, 2011, at 7:51 AM, Norbert Burger <[email protected]> > wrote: > > > Folks -- we have a timeseries-based table we recently converted to a > salted > > key schema [1] in order to avoid region hotspotting. The rowkey format > is: > > > > salt-timestamp-sessionid-eventtype, where: > > > > salt has the form 00..13, and the timestamp is a Unix timestamp (epoch > > based). > > > > With the version 0.10.0 HBaseStorage, what's the recommended way to LOAD > a > > salted schema from Pig? Initially, I thought we'd just fire off multiple > > LOADs, one for each region (in our case, up to 14), but we're hitting > > frequently ScannerTimeoutExceptions with this approach, even on a sample > > script that does nothing but LOADs. > > > > Is there a better way? > > > > Thanks, > > Norbert > > > > [1] > > > http://ofps.oreilly.com/titles/9781449396107/advanced.html#ch09_id2336987 >
