One big reason is that there will be updates in the memory store that aren't yet written to HFiles. You'll miss these.
On Fri, May 6, 2011 at 12:27 PM, Jason Rutherglen < [email protected]> wrote: > Is there an issue open or any particular reason that an MR job needs to > access > the HBase data directly from the region server? It seems possible to also > provide functionality such that MR can execute over the HFile(s) stored in > HDFS, thereby giving similar performance characteristics comparable to > typical > MR jobs that execute against files in HDFS. > > Jason >
