Maybe we need something based on this? https://issues.apache.org/jira/browse/HBASE-3996
On Mon, Apr 8, 2013 at 1:41 PM, Chad Urso McDaniel <[email protected]> wrote: > This may be a core hadoop question. > > We are using Crunch with HBase. > We typically set up the input PTable like so: > --- > Scan scan = ... > HBaseSourceTarget source = new HBaseSourceTarget(tableName, scan); > PTable<ImmutableBytesWritable, Result> data = pipeline.read(source); > --- > > A use case that we want to use in order to speed up the processing with > Crunch is using multiple Scans into one PTable. > > We know which sections of the HBase table we want and they are not > contiguous. > > We have tried unioning the PTables but that turns out to be incredibly > slow. > Currently we are using a filter that results in many unnecessary reads. > > How do others solve this? > > I'm temped to write a TableSource that can do this. > > thanks > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
