Hi all -
I'm still dealing with the saga of ScannerTimeoutException, UnknownScannerException, etc. I rewrote my code, in the hope that simply a different approach and some different code paths might yield better results. No change. I tried many variations (caching 1 row vs caching many rows, changing the regionserver's lease, increasing the number of allowed zookeeper connections, etc). I created a fresh table on the thought that maybe there was some problem with the table...no change. I am dealing with what appears to be some sort of scanner deadlock. I have a total order partitioned mapreduce job. In the reducer, as long as I use just one reducer, the task finishes quickly. But as soon as more than one reducer opens a scanner, the tasks procede in what I would call a "jittery" lock step. They both are able to do ResultScanner.next() a few times, but then the call to next() freezes for a long period, and ScannerTimeoutException. I catch the exception, and get a new ResultScanner, and the pattern repeats. Jittery lockstep consisting of being able to get a few successful next() calls, then a lock up. I have not yet tried upgrading from 90.1 to higher. Nor have I tried tsuna's async client. Can anyone think of anything else I can try to resolve this? I've sunk quite a few late nights into this, and would be very excited to find a solution. -geoff
