0.90.4 fixes two deadlocks (HBASE-4101 and HBASE-4077). Since then, there is HBASE-4367 (Which has a posted patch).
Below sounds like slowness. Can you thread dump the particular regionserver and see what its up to? Is there other loading on the system at the time? For example, loading on hdfs? Anything in the hdfs logs for the datanode running beside the slow regionserver? St.Ack On Sat, Sep 10, 2011 at 5:50 PM, Geoff Hendrey <[email protected]> wrote: > Hi all - > > > > I'm still dealing with the saga of ScannerTimeoutException, > UnknownScannerException, etc. I rewrote my code, in the hope that simply > a different approach and some different code paths might yield better > results. No change. I tried many variations (caching 1 row vs caching > many rows, changing the regionserver's lease, increasing the number of > allowed zookeeper connections, etc). I created a fresh table on the > thought that maybe there was some problem with the table...no change. > > > > I am dealing with what appears to be some sort of scanner deadlock. I > have a total order partitioned mapreduce job. In the reducer, as long as > I use just one reducer, the task finishes quickly. But as soon as more > than one reducer opens a scanner, the tasks procede in what I would call > a "jittery" lock step. They both are able to do ResultScanner.next() a > few times, but then the call to next() freezes for a long period, and > ScannerTimeoutException. I catch the exception, and get a new > ResultScanner, and the pattern repeats. Jittery lockstep consisting of > being able to get a few successful next() calls, then a lock up. > > > > I have not yet tried upgrading from 90.1 to higher. Nor have I tried > tsuna's async client. Can anyone think of anything else I can try to > resolve this? I've sunk quite a few late nights into this, and would be > very excited to find a solution. > > > > -geoff > >
