Did you restart the region servers after changing the config? Are you sure it's the same exception/stack trace?
J-D On Mon, Oct 24, 2011 at 8:04 AM, Lucian Iordache <[email protected]> wrote: > Hi all, > > I have exactly the same problem that Eran had. > But there is something I don't understand: in my case, I have set the lease > time to 240000 (4 minutes). But most of the map tasks that are failing run > about 2 minutes. How is it possible to get a LeaseException if the task runs > less than the configured time for a lease? > > Regards, > Lucian Iordache > > On Fri, Oct 21, 2011 at 12:34 AM, Eran Kutner <[email protected]> wrote: > >> Perfect! Thanks. >> >> -eran >> >> >> >> On Thu, Oct 20, 2011 at 23:27, Jean-Daniel Cryans <[email protected] >> >wrote: >> >> > hbase.regionserver.lease.period >> > >> > Set it bigger than 60000. >> > >> > J-D >> > >> > On Thu, Oct 20, 2011 at 2:23 PM, Eran Kutner <[email protected]> wrote: >> > > >> > > Thanks J-D! >> > > Since my main table is expected to continue growing I guess at some >> point >> > > even setting the cache size to 1 will not be enough. Is there a way to >> > > configure the lease timeout? >> > > >> > > -eran >> > > >> > > >> > > >> > > On Thu, Oct 20, 2011 at 23:16, Jean-Daniel Cryans <[email protected] >> > >wrote: >> > > >> > > > On Wed, Oct 19, 2011 at 12:51 PM, Eran Kutner <[email protected]> >> wrote: >> > > > >> > > > > Hi J-D, >> > > > > Thanks for the detailed explanation. >> > > > > So if I understand correctly the lease we're talking about is a >> > scanner >> > > > > lease and the timeout is between two scanner calls, correct? I >> think >> > that >> > > > > make sense because I now realize that jobs that fail (some jobs >> > continued >> > > > > to >> > > > > fail even after reducing the number of map tasks as Stack >> suggested) >> > use >> > > > > filters to fetch relatively few rows out of a very large table, so >> > they >> > > > > could be spending a lot of time on the region server scanning rows >> > until >> > > > it >> > > > > reached my setCaching value which was 1000. Setting the caching >> value >> > to >> > > > 1 >> > > > > seem to allow these job to complete. >> > > > > I think it has to be the above, since my rows are small, with just >> a >> > few >> > > > > columns and processing them is very quick. >> > > > > >> > > > >> > > > Excellent! >> > > > >> > > > >> > > > > >> > > > > However, there are still a couple ofw thing I don't understand: >> > > > > 1. What is the difference between setCaching and setBatch? >> > > > > >> > > > >> > > > * Set the maximum number of values to return for each call to next() >> > > > >> > > > VS >> > > > >> > > > * Set the number of rows for caching that will be passed to scanners. >> > > > >> > > > The former is useful if you have rows with millions of columns and >> you >> > > > could >> > > > setBatch to get only 1000 of them at a time. You could call that >> > intra-row >> > > > scanning. >> > > > >> > > > >> > > > > 2. Examining the region server logs more closely than I did >> yesterday >> > I >> > > > see >> > > > > a log of ClosedChannelExceptions in addition to the expired leases >> > (but >> > > > no >> > > > > UnknownScannerException), is that expected? You can see an excerpt >> of >> > the >> > > > > log from one of the region servers here: >> > http://pastebin.com/NLcZTzsY >> > > > >> > > > >> > > > It means that when the server got to process that client request and >> > > > started >> > > > reading from the socket, the client was already gone. Killing a >> client >> > does >> > > > that (or killing a MR that scans), so does SocketTimeoutException. >> This >> > > > should probably go in the book. We should also print something nicer >> :) >> > > > >> > > > J-D >> > > > >> > >> >
