This is a problem reported a while ago, I believe by Oleg.

The lock issue is inside the YARNs AMRMClientAsync.

When a TezSession is shutdown (tezClient.stop()) - it sets up handlers
within the AM for future shutdown, and returns.
After this. if the MiniCluster is shutdown, there's a possibility that the
AM is still talking to the RM to schedule resources. Once the RM goes down,
this invocation goes into a retry loop - while maintaining a lock, which is
also required to unregister from the RM (once this lock is obtained - this
would be another retry loop since the RM is no longer around).

Created TEZ-1541 to track this, and see what can be done by Tez to avoid
such situations.


On Wed, Sep 3, 2014 at 8:44 PM, Chris K Wensel <[email protected]> wrote:

>
> this is confirmed on 0.5.0 (from apache release mvn repo)
>
> just caused a hang by running a single test, the TezChild did linger, but
> exited
>
> https://www.dropbox.com/s/86ryr1ka93xaiph/dagapp.threads.txt?dl=0
>
> ckw
>
> On Sep 3, 2014, at 8:26 PM, Siddharth Seth <[email protected]> wrote:
>
> Chris,
> Are you on the latest version of Tez (ideally the 0.5 release, which just
> went out today). There was an issue with hanging DAGAppMasters, which was
> resolved recently.
> Otherwise, could you please include stack traces for the hung processes.
>
> Thanks
> - Sid
>
>
> On Wed, Sep 3, 2014 at 8:05 PM, Chris K Wensel <[email protected]> wrote:
>
>>
>> I'm finding after running MiniTezCluster I find a few DAGApp and possibly
>> a TezChild process hanging around after calling jps.
>>
>> This is problematic with our CI servers (they start to add up) let a
>> alone my dinky laptop.
>>
>> Is there a TezConfiguration setting I'm likely missing to prevent these.
>>
>> ckw
>>
>>     --
>> Chris K Wensel
>> [email protected]
>> http://concurrentinc.com
>>
>>
>
> --
> Chris K Wensel
> [email protected]
> http://concurrentinc.com
>
>

Reply via email to