Re: M/R on bulk imported tables

Todd Lipcon Mon, 24 May 2010 23:04:52 -0700

Hi Adam,

What version are you running, and are you using multiple reducers in your
HFileOutputFormat job? There was a bug in 0.20.3 which caused this case to
produce somewhat broken tables.


-Todd

On Mon, May 24, 2010 at 10:55 PM, Adam Silberstein
<[email protected]>wrote:

> Hi,
> A colleague and I are working on testing a few HBase features, notably bulk
> import (mentioned in
>
> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapr
> educe/package-summary.html) and running M/R jobs using HBase as input.
>
> We¹re taking the following steps:
> 1a. Load HBase with a M/R job using the normal API.
> OR
> 1b. Load HBase with bulk import.
>
> THEN
>
> 2a. Using the shell, do a ³count² over the table.
> OR
> 2b. Run a M/R job that scans the whole HBase table (and nothing else).
>
> Of the 4 combos, 3 are fine: 1a+2a, 1a+2b, 1b+2a.  We¹re having trouble
> with
> 1b+2b.  When we run the M/R job, it doesn¹t seem to read in any records,
> but
> there are no explicit errors in either the Hadoop or HBase logs.
>
> This seems odd.  It shouldn¹t matter how we load the table, and the shell¹s
> count operator seems to work correctly either way, counting all the
> records.
> The M/R job in 2b is the same no matter how we load the table.  Any ideas
> on
> what might be wrong with the bulk import to cause this problem?  We¹re
> thinking maybe something with the region boundaries, although they look ok
> in the GUI.
>
> Thanks for any suggestions,
> Adam
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: M/R on bulk imported tables

Reply via email to