Hi,

The servers have been restarted (I have this configuration for more than a
month, so this is not the problem).
About the stack traces, they show exactly the same, a lot of
ClosedChannelConnections and LeaseExceptions.

But I found something that could be the problem: hbase.rpc.timeout . This
defaults to 60 seconds, and I did not modify it in hbase-site.xml. So it
could happen the next way:
- the mapper makes a scanner.next call to the region server
- the region servers needs more than 60 seconds to execute it (I use
multiple filters, and it could take a lot of time)
- the scan client gets the timeout and cuts the connection
- the region server tries to send the results to the client ==>
ClosedChannelConnection

I will get a deeper look into it tomorrow. If you have other suggestions,
please let me know!

Thanks,
Lucian

On Mon, Oct 24, 2011 at 8:00 PM, Jean-Daniel Cryans <[email protected]>wrote:

> Did you restart the region servers after changing the config?
>
> Are you sure it's the same exception/stack trace?
>
> J-D
>
> On Mon, Oct 24, 2011 at 8:04 AM, Lucian Iordache
> <[email protected]> wrote:
> > Hi all,
> >
> > I have exactly the same problem that Eran had.
> > But there is something I don't understand: in my case, I have set the
> lease
> > time to 240000 (4 minutes). But most of the map tasks that are failing
> run
> > about 2 minutes. How is it possible to get a LeaseException if the task
> runs
> > less than the configured time for a lease?
> >
> > Regards,
> > Lucian Iordache
> >
> > On Fri, Oct 21, 2011 at 12:34 AM, Eran Kutner <[email protected]> wrote:
> >
> >> Perfect! Thanks.
> >>
> >> -eran
> >>
> >>
> >>
> >> On Thu, Oct 20, 2011 at 23:27, Jean-Daniel Cryans <[email protected]
> >> >wrote:
> >>
> >> > hbase.regionserver.lease.period
> >> >
> >> > Set it bigger than 60000.
> >> >
> >> > J-D
> >> >
> >> > On Thu, Oct 20, 2011 at 2:23 PM, Eran Kutner <[email protected]> wrote:
> >> > >
> >> > > Thanks J-D!
> >> > > Since my main table is expected to continue growing I guess at some
> >> point
> >> > > even setting the cache size to 1 will not be enough. Is there a way
> to
> >> > > configure the lease timeout?
> >> > >
> >> > > -eran
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Oct 20, 2011 at 23:16, Jean-Daniel Cryans <
> [email protected]
> >> > >wrote:
> >> > >
> >> > > > On Wed, Oct 19, 2011 at 12:51 PM, Eran Kutner <[email protected]>
> >> wrote:
> >> > > >
> >> > > > > Hi J-D,
> >> > > > > Thanks for the detailed explanation.
> >> > > > > So if I understand correctly the lease we're talking about is a
> >> > scanner
> >> > > > > lease and the timeout is between two scanner calls, correct? I
> >> think
> >> > that
> >> > > > > make sense because I now realize that jobs that fail (some jobs
> >> > continued
> >> > > > > to
> >> > > > > fail even after reducing the number of map tasks as Stack
> >> suggested)
> >> > use
> >> > > > > filters to fetch relatively few rows out of a very large table,
> so
> >> > they
> >> > > > > could be spending a lot of time on the region server scanning
> rows
> >> > until
> >> > > > it
> >> > > > > reached my setCaching value which was 1000. Setting the caching
> >> value
> >> > to
> >> > > > 1
> >> > > > > seem to allow these job to complete.
> >> > > > > I think it has to be the above, since my rows are small, with
> just
> >> a
> >> > few
> >> > > > > columns and processing them is very quick.
> >> > > > >
> >> > > >
> >> > > > Excellent!
> >> > > >
> >> > > >
> >> > > > >
> >> > > > > However, there are still a couple ofw thing I don't understand:
> >> > > > > 1. What is the difference between setCaching and setBatch?
> >> > > > >
> >> > > >
> >> > > > * Set the maximum number of values to return for each call to
> next()
> >> > > >
> >> > > > VS
> >> > > >
> >> > > > * Set the number of rows for caching that will be passed to
> scanners.
> >> > > >
> >> > > > The former is useful if you have rows with millions of columns and
> >> you
> >> > > > could
> >> > > > setBatch to get only 1000 of them at a time. You could call that
> >> > intra-row
> >> > > > scanning.
> >> > > >
> >> > > >
> >> > > > > 2. Examining the region server logs more closely than I did
> >> yesterday
> >> > I
> >> > > > see
> >> > > > > a log of ClosedChannelExceptions in addition to the expired
> leases
> >> > (but
> >> > > > no
> >> > > > > UnknownScannerException), is that expected? You can see an
> excerpt
> >> of
> >> > the
> >> > > > > log from one of the region servers here:
> >> > http://pastebin.com/NLcZTzsY
> >> > > >
> >> > > >
> >> > > > It means that when the server got to process that client request
> and
> >> > > > started
> >> > > > reading from the socket, the client was already gone. Killing a
> >> client
> >> > does
> >> > > > that (or killing a MR that scans), so does SocketTimeoutException.
> >> This
> >> > > > should probably go in the book. We should also print something
> nicer
> >> :)
> >> > > >
> >> > > > J-D
> >> > > >
> >> >
> >>
> >
>

Reply via email to