I'm also quite interested in anyone has have feedback on Ryan's reasoning. On Mon, Sep 12, 2011 at 1:18 PM, Brush,Ryan <[email protected]> wrote:
> If you'll forgive the slight topic shift, it seems like the pattern of > writing directly to HFiles rather than the TableOutputFormat would be > better for several cases. For instance, TableOutputFormat results in > everything being written to the WAL, and later compacted into HFiles. > When practical, why not skip that interim state and produce the HFile > directly, then do a bulk load? > > > Of course not all jobs that use the TableOutputFormat can easily write to > Hfiles; those files require a strict ordering of row keys being output, > and bulk loads are optimal only if the HFiles align with existing regions. > But if such requirements are met, it seems like moving away from > TableOutputFormat could help IO-bound jobs significantly. > > Is my reasoning sound? > > On 9/12/11 12:40 PM, "Leif Wickland" <[email protected]> wrote: > > >Thanks, Bryan. I'd love to hear any lessons you learn. I've used that > >technique successfully at a prototype level, but haven't yet moved it to > >production. > > > >Leif > > > >On Mon, Sep 12, 2011 at 10:51 AM, Bryan Keller <[email protected]> wrote: > > > >> Ah that is a very interesting solution Leif, this seems optimal to me. > >>I am > >> going to try this and I'll report back. > >> > >> On Sep 12, 2011, at 9:09 AM, Leif Wickland wrote: > >> > >> > > >> > Bryan, > >> > > >> > Have you considered writing your MR output to HFileFormat and then > >>asking > >> > the regions to adopt the result? That would allow you to avoid > >> committing > >> > any changes to HBase until you knew that the MR job ran successfully. > >> > > >> > Leif > >> > >> > > ---------------------------------------------------------------------- > CONFIDENTIALITY NOTICE This message and any included attachments are from > Cerner Corporation and are intended only for the addressee. The information > contained in this message is confidential and may constitute inside or > non-public information under international, federal, or state securities > laws. Unauthorized forwarding, printing, copying, distribution, or use of > such information is strictly prohibited and may be unlawful. If you are not > the addressee, please promptly delete this message and notify the sender of > the delivery error by e-mail or you may call Cerner's corporate offices in > Kansas City, Missouri, U.S.A at (+1) (816)221-1024. >
