Re: HFiles and MapReduce

Jean-Daniel Cryans Mon, 01 Aug 2011 13:35:29 -0700

Inline.

J-D


On Mon, Aug 1, 2011 at 12:50 PM, Leif Wickland <[email protected]> wrote:
> 1. Is there any case where it's a bad idea to use HFileOutputFormat instead
> of TableOutputFormat when writing to HBase from MapReduce?

Can you think of any?

>
> 2. What are the failure modes for LoadIncrementalHFiles.doBulkLoad?  Is it
> possible some regions will be adopted and others fail?

That process isn't atomic, so to be sure you could end up with a
region failing for some reason (network issues, bug, whatever) and my
understanding of the code is that it would fail and return immediately
after any IOException.

>
> 3. I think I'd like to create HFiles as the output of my MapReduce, then use
> the HFiles as the input to a MapReduce to calculate some new aggregates, and
> then doBulkLoad on the HFiles.  Is there any easy way to use a directory of
> HFiles as the input to a MapReduce?  Is this inadvisable?  It seems like
> this would be a more sensible approach than scanning for columns with
> timestamps in an interval to find the freshly written columns.

You'd need to write an HFileInputFormat, that's pretty much it.

Re: HFiles and MapReduce

Reply via email to