Re: scanner deadlock?

Jean-Daniel Cryans Wed, 14 Sep 2011 10:43:34 -0700

Yeah like Stack said, the ClosedChannelException is how we figure the
client is gone. As you have a 60s timeout on the RPC call the client
_will_ go away (and possibly come right back in through another
handler) when a call takes longer than that. One of my theories was
that in your case if a region server slowed down it would start piling
up calls in the queues, some of them would timeout, so the client
would come right back with the same request, making the whole
situation worse.


When you did set the config that I told you about, you still got the
CCEs? Meaning that the calls were still slow? If so, then the issue is
elsewhere and this proves it.

Like Stack, I think we'll have to be fed more data about your system :)

J-D

On Wed, Sep 14, 2011 at 9:32 AM, Geoff Hendrey <[email protected]> wrote:
> I've already been able to replicate the problem using just two reducers,
> on a completely fresh table. So it seemed to me when I did that the
> problem was independent of the number of reducers...
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Stack
> Sent: Wednesday, September 14, 2011 8:47 AM
> To: [email protected]
> Subject: Re: scanner deadlock?
>
> On Wed, Sep 14, 2011 at 8:42 AM, Geoff Hendrey <[email protected]>
> wrote:
>> 17 MR nodes, 8 reducers per machine = 138 concurrent reducers.
>> (machines are 12-core, and I've found 8 reducers with 1GB allocated
> heap to be a happy medium that doesn't freeze out the data nodes or the
> region servers - or so I think :-).
>>
>
> Are you swapping at all?
>
> What if you restored your config. to something sane -- 100 handlers
> with queue size of 10, default timeout -- with 1/4 of the reducers?
>
> What does this MR job do?
>
> St.Ack
>

Re: scanner deadlock?

Reply via email to