Something else is going on, since TableInputFormat and
TableOutputFormat in the same map reduce are not concurrent... the
maps run, then the reduces run, and there is no overlap.  A feature of
mapreduce.

So if you were expecting to see the rows you were 'just writing'
during your map phase, you wont alas.

-ryan

On Thu, Sep 30, 2010 at 11:11 AM, Curt Allred <[email protected]> wrote:
> I cant find any documentation which says you shouldn't write to the same 
> HBase table you're scanning.  But it doesnt seem to work...  I have a mapper 
> (subclass of TableMapper) which scans a table, and for each row encountered 
> during the scan, it updates a column of the row, writing it back to the same 
> table immediately (using TableOutputFormat).  Occasionally the scanner ends 
> without completing the scan.  No exception is thrown, no indication of 
> failure, it just says its done when it hasnt actually returned all the rows.  
> This happens even if the scan has no timestamp specification or filters.  It 
> seems to only happen when I use a cache size greater than 1 
> (hbase.client.scanner.caching).  This behavior is also repeatable using an 
> HTable outside of a map-reduce job.
>
> The following blog entry implies that it might be risky, or worse, socially 
> unacceptable :)
>
> http://www.larsgeorge.com/2009/01/how-to-use-hbase-with-hadoop.html:
>
>  > Again, I have cases where I do not output but save back to
>  > HBase. I could easily write the records back into HBase in
>  > the reduce step, so why pass them on first? I think this is
>  > in some cases just common sense or being a "good citizen".
>  > ...
>  > Writing back to the same HBase table is OK when doing it in
>  > the Reduce phase as all scanning has concluded in the Map
>  > phase beforehand, so all rows and columns are saved to an
>  > intermediate Hadoop SequenceFile internally and when you
>  > process these and write back to HBase you have no problems
>  > that there is still a scanner for the same job open reading
>  > the table.
>  >
>  > Otherwise it is OK to write to a different HBase table even
>  > during the Map phase.
>
> But I also found a jira issue which indicates it "should" work, but there was 
> a bug awhile back which was fixed:
>
>  > https://issues.apache.org/jira/browse/HBASE-810: Prevent temporary
>  > deadlocks when, during a scan with write operations, the region splits
>
> Anyone else writing while scanning? Or know of documentation that addresses 
> this case?
>
> Thanks,
> -Curt
>
>

Reply via email to