On Wed, Oct 19, 2011 at 12:51 PM, Eran Kutner <[email protected]> wrote:
> Hi J-D, > Thanks for the detailed explanation. > So if I understand correctly the lease we're talking about is a scanner > lease and the timeout is between two scanner calls, correct? I think that > make sense because I now realize that jobs that fail (some jobs continued > to > fail even after reducing the number of map tasks as Stack suggested) use > filters to fetch relatively few rows out of a very large table, so they > could be spending a lot of time on the region server scanning rows until it > reached my setCaching value which was 1000. Setting the caching value to 1 > seem to allow these job to complete. > I think it has to be the above, since my rows are small, with just a few > columns and processing them is very quick. > Excellent! > > However, there are still a couple ofw thing I don't understand: > 1. What is the difference between setCaching and setBatch? > * Set the maximum number of values to return for each call to next() VS * Set the number of rows for caching that will be passed to scanners. The former is useful if you have rows with millions of columns and you could setBatch to get only 1000 of them at a time. You could call that intra-row scanning. > 2. Examining the region server logs more closely than I did yesterday I see > a log of ClosedChannelExceptions in addition to the expired leases (but no > UnknownScannerException), is that expected? You can see an excerpt of the > log from one of the region servers here: http://pastebin.com/NLcZTzsY It means that when the server got to process that client request and started reading from the socket, the client was already gone. Killing a client does that (or killing a MR that scans), so does SocketTimeoutException. This should probably go in the book. We should also print something nicer :) J-D
