If you'll forgive the slight topic shift, it seems like the pattern of
writing directly to HFiles rather than the TableOutputFormat would be
better for several cases. For instance, TableOutputFormat results in
everything being written to the WAL, and later compacted into HFiles.
When practical, why not skip that interim state and produce the HFile
directly, then do a bulk load?


Of course not all jobs that use the TableOutputFormat can easily write to
Hfiles; those files require a strict ordering of row keys being output,
and bulk loads are optimal only if the HFiles align with existing regions.
But if such requirements are met, it seems like moving away from
TableOutputFormat could help IO-bound jobs significantly.

Is my reasoning sound?

On 9/12/11 12:40 PM, "Leif Wickland" <[email protected]> wrote:

>Thanks, Bryan.  I'd love to hear any lessons you learn.  I've used that
>technique successfully at a prototype level, but haven't yet moved it to
>production.
>
>Leif
>
>On Mon, Sep 12, 2011 at 10:51 AM, Bryan Keller <[email protected]> wrote:
>
>> Ah that is a very interesting solution Leif, this seems optimal to me.
>>I am
>> going to try this and I'll report back.
>>
>> On Sep 12, 2011, at 9:09 AM, Leif Wickland wrote:
>>
>> >
>> > Bryan,
>> >
>> > Have you considered writing your MR output to HFileFormat and then
>>asking
>> > the regions to adopt the result?   That would allow you to avoid
>> committing
>> > any changes to HBase until you knew that the MR job ran successfully.
>> >
>> > Leif
>>
>>

----------------------------------------------------------------------
CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.

Reply via email to