Hello,

I would suggest logging the exception produced by the hbase.rpc.timeout on
the client side on WARN, not debug like it is right now.

Regards,
Lucian

On Wed, Oct 26, 2011 at 6:51 PM, Stack <[email protected]> wrote:

> What would you suggest we do to improve the messages we emit around
> here making it more clear whats going on?
>
> St.Ack
>
> On Tue, Oct 25, 2011 at 1:15 AM, Lucian Iordache
> <[email protected]> wrote:
> > Yes, I will try to see the SocketTimeoutException after putting log on
> > debug, because, like it says here
> > https://issues.apache.org/jira/browse/HBASE-3154 , this is logged on
> debug
> > on the client side.
> >
> > Regards,
> > Lucian
> >
> > On Mon, Oct 24, 2011 at 8:22 PM, Jean-Daniel Cryans <[email protected]
> >wrote:
> >
> >> So you should see the SocketTimeoutException in your *client* logs (in
> >> your case, mappers), not LeaseException. At this point yes you're
> >> going to timeout, but if you spend so much time cycling on the server
> >> side then you shouldn't set a high caching configuration on your
> >> scanner as IO isn't your bottle neck.
> >>
> >> J-D
> >>
> >> On Mon, Oct 24, 2011 at 10:15 AM, Lucian Iordache
> >> <[email protected]> wrote:
> >> > Hi,
> >> >
> >> > The servers have been restarted (I have this configuration for more
> than
> >> a
> >> > month, so this is not the problem).
> >> > About the stack traces, they show exactly the same, a lot of
> >> > ClosedChannelConnections and LeaseExceptions.
> >> >
> >> > But I found something that could be the problem: hbase.rpc.timeout .
> This
> >> > defaults to 60 seconds, and I did not modify it in hbase-site.xml. So
> it
> >> > could happen the next way:
> >> > - the mapper makes a scanner.next call to the region server
> >> > - the region servers needs more than 60 seconds to execute it (I use
> >> > multiple filters, and it could take a lot of time)
> >> > - the scan client gets the timeout and cuts the connection
> >> > - the region server tries to send the results to the client ==>
> >> > ClosedChannelConnection
> >> >
> >> > I will get a deeper look into it tomorrow. If you have other
> suggestions,
> >> > please let me know!
> >> >
> >> > Thanks,
> >> > Lucian
> >> >
> >> > On Mon, Oct 24, 2011 at 8:00 PM, Jean-Daniel Cryans <
> [email protected]
> >> >wrote:
> >> >
> >> >> Did you restart the region servers after changing the config?
> >> >>
> >> >> Are you sure it's the same exception/stack trace?
> >> >>
> >> >> J-D
> >> >>
> >> >> On Mon, Oct 24, 2011 at 8:04 AM, Lucian Iordache
> >> >> <[email protected]> wrote:
> >> >> > Hi all,
> >> >> >
> >> >> > I have exactly the same problem that Eran had.
> >> >> > But there is something I don't understand: in my case, I have set
> the
> >> >> lease
> >> >> > time to 240000 (4 minutes). But most of the map tasks that are
> failing
> >> >> run
> >> >> > about 2 minutes. How is it possible to get a LeaseException if the
> >> task
> >> >> runs
> >> >> > less than the configured time for a lease?
> >> >> >
> >> >> > Regards,
> >> >> > Lucian Iordache
> >> >> >
> >> >> > On Fri, Oct 21, 2011 at 12:34 AM, Eran Kutner <[email protected]>
> wrote:
> >> >> >
> >> >> >> Perfect! Thanks.
> >> >> >>
> >> >> >> -eran
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Thu, Oct 20, 2011 at 23:27, Jean-Daniel Cryans <
> >> [email protected]
> >> >> >> >wrote:
> >> >> >>
> >> >> >> > hbase.regionserver.lease.period
> >> >> >> >
> >> >> >> > Set it bigger than 60000.
> >> >> >> >
> >> >> >> > J-D
> >> >> >> >
> >> >> >> > On Thu, Oct 20, 2011 at 2:23 PM, Eran Kutner <[email protected]>
> >> wrote:
> >> >> >> > >
> >> >> >> > > Thanks J-D!
> >> >> >> > > Since my main table is expected to continue growing I guess at
> >> some
> >> >> >> point
> >> >> >> > > even setting the cache size to 1 will not be enough. Is there
> a
> >> way
> >> >> to
> >> >> >> > > configure the lease timeout?
> >> >> >> > >
> >> >> >> > > -eran
> >> >> >> > >
> >> >> >> > >
> >> >> >> > >
> >> >> >> > > On Thu, Oct 20, 2011 at 23:16, Jean-Daniel Cryans <
> >> >> [email protected]
> >> >> >> > >wrote:
> >> >> >> > >
> >> >> >> > > > On Wed, Oct 19, 2011 at 12:51 PM, Eran Kutner <
> [email protected]>
> >> >> >> wrote:
> >> >> >> > > >
> >> >> >> > > > > Hi J-D,
> >> >> >> > > > > Thanks for the detailed explanation.
> >> >> >> > > > > So if I understand correctly the lease we're talking about
> is
> >> a
> >> >> >> > scanner
> >> >> >> > > > > lease and the timeout is between two scanner calls,
> correct?
> >> I
> >> >> >> think
> >> >> >> > that
> >> >> >> > > > > make sense because I now realize that jobs that fail (some
> >> jobs
> >> >> >> > continued
> >> >> >> > > > > to
> >> >> >> > > > > fail even after reducing the number of map tasks as Stack
> >> >> >> suggested)
> >> >> >> > use
> >> >> >> > > > > filters to fetch relatively few rows out of a very large
> >> table,
> >> >> so
> >> >> >> > they
> >> >> >> > > > > could be spending a lot of time on the region server
> scanning
> >> >> rows
> >> >> >> > until
> >> >> >> > > > it
> >> >> >> > > > > reached my setCaching value which was 1000. Setting the
> >> caching
> >> >> >> value
> >> >> >> > to
> >> >> >> > > > 1
> >> >> >> > > > > seem to allow these job to complete.
> >> >> >> > > > > I think it has to be the above, since my rows are small,
> with
> >> >> just
> >> >> >> a
> >> >> >> > few
> >> >> >> > > > > columns and processing them is very quick.
> >> >> >> > > > >
> >> >> >> > > >
> >> >> >> > > > Excellent!
> >> >> >> > > >
> >> >> >> > > >
> >> >> >> > > > >
> >> >> >> > > > > However, there are still a couple ofw thing I don't
> >> understand:
> >> >> >> > > > > 1. What is the difference between setCaching and setBatch?
> >> >> >> > > > >
> >> >> >> > > >
> >> >> >> > > > * Set the maximum number of values to return for each call
> to
> >> >> next()
> >> >> >> > > >
> >> >> >> > > > VS
> >> >> >> > > >
> >> >> >> > > > * Set the number of rows for caching that will be passed to
> >> >> scanners.
> >> >> >> > > >
> >> >> >> > > > The former is useful if you have rows with millions of
> columns
> >> and
> >> >> >> you
> >> >> >> > > > could
> >> >> >> > > > setBatch to get only 1000 of them at a time. You could call
> >> that
> >> >> >> > intra-row
> >> >> >> > > > scanning.
> >> >> >> > > >
> >> >> >> > > >
> >> >> >> > > > > 2. Examining the region server logs more closely than I
> did
> >> >> >> yesterday
> >> >> >> > I
> >> >> >> > > > see
> >> >> >> > > > > a log of ClosedChannelExceptions in addition to the
> expired
> >> >> leases
> >> >> >> > (but
> >> >> >> > > > no
> >> >> >> > > > > UnknownScannerException), is that expected? You can see an
> >> >> excerpt
> >> >> >> of
> >> >> >> > the
> >> >> >> > > > > log from one of the region servers here:
> >> >> >> > http://pastebin.com/NLcZTzsY
> >> >> >> > > >
> >> >> >> > > >
> >> >> >> > > > It means that when the server got to process that client
> >> request
> >> >> and
> >> >> >> > > > started
> >> >> >> > > > reading from the socket, the client was already gone.
> Killing a
> >> >> >> client
> >> >> >> > does
> >> >> >> > > > that (or killing a MR that scans), so does
> >> SocketTimeoutException.
> >> >> >> This
> >> >> >> > > > should probably go in the book. We should also print
> something
> >> >> nicer
> >> >> >> :)
> >> >> >> > > >
> >> >> >> > > > J-D
> >> >> >> > > >
> >> >> >> >
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
> >
> >
> > --
> > Numai bine,
> > Lucian
> >
>

Reply via email to