Re: HBase & Crunch: multiple scans for a single PTable

Josh Wills Mon, 08 Apr 2013 13:47:52 -0700

Maybe we need something based on this?

https://issues.apache.org/jira/browse/HBASE-3996



On Mon, Apr 8, 2013 at 1:41 PM, Chad Urso McDaniel <[email protected]> wrote:

> This may be a core hadoop question.
>
> We are using Crunch with HBase.
> We typically set up the input PTable like so:
> ---
>       Scan scan = ...
>       HBaseSourceTarget source = new HBaseSourceTarget(tableName, scan);
>       PTable<ImmutableBytesWritable, Result> data = pipeline.read(source);
> ---
>
> A use case that we want to use in order to speed up the processing with
> Crunch is using multiple Scans into one PTable.
>
> We know which sections of the HBase table we want and they are not
> contiguous.
>
> We have tried unioning the PTables but that turns out to be incredibly
> slow.
> Currently we are using a filter that results in many unnecessary reads.
>
> How do others solve this?
>
> I'm temped to write a TableSource that can do this.
>
> thanks
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: HBase & Crunch: multiple scans for a single PTable

Reply via email to