Inline. J-D
On Mon, Aug 1, 2011 at 12:50 PM, Leif Wickland <[email protected]> wrote: > 1. Is there any case where it's a bad idea to use HFileOutputFormat instead > of TableOutputFormat when writing to HBase from MapReduce? Can you think of any? > > 2. What are the failure modes for LoadIncrementalHFiles.doBulkLoad? Is it > possible some regions will be adopted and others fail? That process isn't atomic, so to be sure you could end up with a region failing for some reason (network issues, bug, whatever) and my understanding of the code is that it would fail and return immediately after any IOException. > > 3. I think I'd like to create HFiles as the output of my MapReduce, then use > the HFiles as the input to a MapReduce to calculate some new aggregates, and > then doBulkLoad on the HFiles. Is there any easy way to use a directory of > HFiles as the input to a MapReduce? Is this inadvisable? It seems like > this would be a more sensible approach than scanning for columns with > timestamps in an interval to find the freshly written columns. You'd need to write an HFileInputFormat, that's pretty much it.
