Hi all -

 

I'm still dealing with the saga of ScannerTimeoutException,
UnknownScannerException, etc. I rewrote my code, in the hope that simply
a different approach and some different code paths might yield better
results. No change. I tried many variations (caching 1 row vs caching
many rows, changing the regionserver's lease, increasing the number of
allowed zookeeper connections, etc). I created a fresh table on the
thought that maybe there was some problem with the table...no change.

 

I am dealing with what appears to be some sort of scanner deadlock. I
have a total order partitioned mapreduce job. In the reducer, as long as
I use just one reducer, the task finishes quickly. But as soon as more
than one reducer opens a scanner, the tasks procede in what I would call
a "jittery" lock step. They both are able to do ResultScanner.next() a
few times, but then the call to next() freezes for a long period, and
ScannerTimeoutException. I catch the exception, and get a new
ResultScanner, and the pattern repeats. Jittery lockstep consisting of
being able to get a few successful next() calls, then a lock up.

 

I have not yet tried upgrading from 90.1 to higher. Nor have I tried
tsuna's async client. Can anyone think of anything else I can try to
resolve this? I've sunk quite a few late nights into this, and would be
very excited to find a solution.

 

-geoff

Reply via email to