Hi all, I have exactly the same problem that Eran had. But there is something I don't understand: in my case, I have set the lease time to 240000 (4 minutes). But most of the map tasks that are failing run about 2 minutes. How is it possible to get a LeaseException if the task runs less than the configured time for a lease?
Regards, Lucian Iordache On Fri, Oct 21, 2011 at 12:34 AM, Eran Kutner <[email protected]> wrote: > Perfect! Thanks. > > -eran > > > > On Thu, Oct 20, 2011 at 23:27, Jean-Daniel Cryans <[email protected] > >wrote: > > > hbase.regionserver.lease.period > > > > Set it bigger than 60000. > > > > J-D > > > > On Thu, Oct 20, 2011 at 2:23 PM, Eran Kutner <[email protected]> wrote: > > > > > > Thanks J-D! > > > Since my main table is expected to continue growing I guess at some > point > > > even setting the cache size to 1 will not be enough. Is there a way to > > > configure the lease timeout? > > > > > > -eran > > > > > > > > > > > > On Thu, Oct 20, 2011 at 23:16, Jean-Daniel Cryans <[email protected] > > >wrote: > > > > > > > On Wed, Oct 19, 2011 at 12:51 PM, Eran Kutner <[email protected]> > wrote: > > > > > > > > > Hi J-D, > > > > > Thanks for the detailed explanation. > > > > > So if I understand correctly the lease we're talking about is a > > scanner > > > > > lease and the timeout is between two scanner calls, correct? I > think > > that > > > > > make sense because I now realize that jobs that fail (some jobs > > continued > > > > > to > > > > > fail even after reducing the number of map tasks as Stack > suggested) > > use > > > > > filters to fetch relatively few rows out of a very large table, so > > they > > > > > could be spending a lot of time on the region server scanning rows > > until > > > > it > > > > > reached my setCaching value which was 1000. Setting the caching > value > > to > > > > 1 > > > > > seem to allow these job to complete. > > > > > I think it has to be the above, since my rows are small, with just > a > > few > > > > > columns and processing them is very quick. > > > > > > > > > > > > > Excellent! > > > > > > > > > > > > > > > > > > However, there are still a couple ofw thing I don't understand: > > > > > 1. What is the difference between setCaching and setBatch? > > > > > > > > > > > > > * Set the maximum number of values to return for each call to next() > > > > > > > > VS > > > > > > > > * Set the number of rows for caching that will be passed to scanners. > > > > > > > > The former is useful if you have rows with millions of columns and > you > > > > could > > > > setBatch to get only 1000 of them at a time. You could call that > > intra-row > > > > scanning. > > > > > > > > > > > > > 2. Examining the region server logs more closely than I did > yesterday > > I > > > > see > > > > > a log of ClosedChannelExceptions in addition to the expired leases > > (but > > > > no > > > > > UnknownScannerException), is that expected? You can see an excerpt > of > > the > > > > > log from one of the region servers here: > > http://pastebin.com/NLcZTzsY > > > > > > > > > > > > It means that when the server got to process that client request and > > > > started > > > > reading from the socket, the client was already gone. Killing a > client > > does > > > > that (or killing a MR that scans), so does SocketTimeoutException. > This > > > > should probably go in the book. We should also print something nicer > :) > > > > > > > > J-D > > > > > > >
