Something else is going on, since TableInputFormat and TableOutputFormat in the same map reduce are not concurrent... the maps run, then the reduces run, and there is no overlap. A feature of mapreduce.
So if you were expecting to see the rows you were 'just writing' during your map phase, you wont alas. -ryan On Thu, Sep 30, 2010 at 11:11 AM, Curt Allred <[email protected]> wrote: > I cant find any documentation which says you shouldn't write to the same > HBase table you're scanning. But it doesnt seem to work... I have a mapper > (subclass of TableMapper) which scans a table, and for each row encountered > during the scan, it updates a column of the row, writing it back to the same > table immediately (using TableOutputFormat). Occasionally the scanner ends > without completing the scan. No exception is thrown, no indication of > failure, it just says its done when it hasnt actually returned all the rows. > This happens even if the scan has no timestamp specification or filters. It > seems to only happen when I use a cache size greater than 1 > (hbase.client.scanner.caching). This behavior is also repeatable using an > HTable outside of a map-reduce job. > > The following blog entry implies that it might be risky, or worse, socially > unacceptable :) > > http://www.larsgeorge.com/2009/01/how-to-use-hbase-with-hadoop.html: > > > Again, I have cases where I do not output but save back to > > HBase. I could easily write the records back into HBase in > > the reduce step, so why pass them on first? I think this is > > in some cases just common sense or being a "good citizen". > > ... > > Writing back to the same HBase table is OK when doing it in > > the Reduce phase as all scanning has concluded in the Map > > phase beforehand, so all rows and columns are saved to an > > intermediate Hadoop SequenceFile internally and when you > > process these and write back to HBase you have no problems > > that there is still a scanner for the same job open reading > > the table. > > > > Otherwise it is OK to write to a different HBase table even > > during the Map phase. > > But I also found a jira issue which indicates it "should" work, but there was > a bug awhile back which was fixed: > > > https://issues.apache.org/jira/browse/HBASE-810: Prevent temporary > > deadlocks when, during a scan with write operations, the region splits > > Anyone else writing while scanning? Or know of documentation that addresses > this case? > > Thanks, > -Curt > >
