Re: question about bulk loader

Todd Lipcon Tue, 26 Oct 2010 19:07:32 -0700

However, worth noting that your load performance will be much slower in this
case. The splitting of bulk load outputs to fit into the new regions is done
on the client in the "completebulkload" tool, so it will be very slow if for
example all of the regions have split since you ran the MR job.


In the usual case, the completebulkload tool is run soon after the job
completes so we expect little to no region churn.

-Todd

On Tue, Oct 26, 2010 at 2:49 PM, Stack <[email protected]> wrote:

> On Tue, Oct 26, 2010 at 2:34 PM, Jack Levin <[email protected]> wrote:
> > Hi, suppose we run bulk loader yesterday, and today, the regions names
> > on the same table no longer exist because of region splits, etc.?
> > What happens to the data when its 'loaded' into the hbase region
> > directories?  Will it make 'older' regions per, from 24 hours ago? Or
> > cause some sort of an issue and an exception?
> >
> >
>
> It does the right thing.  It adjusts to the new lay of the land
> splitting the bulk written files to match new region layout as needed.
>  See
> http://hbase.apache.org/docs/r0.89.20100924/xref/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.html#181
>
> St.Ack
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: question about bulk loader

Reply via email to