[
https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645159#comment-16645159
]
Jason Lowe commented on YARN-7086:
----------------------------------
Thanks for developing a perf test case! The huge variations in runtime need to
be investigated. The second test case variations are up to 63%, including
multiple samples that are slower than existing code average. With this data, I
would argue the results are close to the noise range given the wild swings in
measurements. How could it sometimes be well over 50% faster sometimes? Is
the JVM hitting a large GC? System I/O? I see the test is spamming logs on
stdout in a tight loop while measuring timing -- that's not good. I could see
I/O effects dominating the runtimes. Try running this where the test produces
as little output as possible while running. No stdout printing in the tight
loop, use a log4j.properties that suppresses the RM logging, etc. We need to
get the runs to be a lot more consistent, otherwise we're probably not
measuring what we think we're measuring.
> Release all containers aynchronously
> ------------------------------------
>
> Key: YARN-7086
> URL: https://issues.apache.org/jira/browse/YARN-7086
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Arun Suresh
> Assignee: Manikandan R
> Priority: Major
> Attachments: YARN-7086.001.patch, YARN-7086.002.patch,
> YARN-7086.Perf-test-case.patch
>
>
> We have noticed in production two situations that can cause deadlocks and
> cause scheduling of new containers to come to a halt, especially with regard
> to applications that have a lot of live containers:
> # When these applicaitons release these containers in bulk.
> # When these applications terminate abruptly due to some failure, the
> scheduler releases all its live containers in a loop.
> To handle the issues mentioned above, we have a patch in production to make
> sure ALL container releases happen asynchronously - and it has served us well.
> Opening this JIRA to gather feedback on if this is a good idea generally (cc
> [~leftnoteasy], [~jlowe], [~curino], [~kasha], [~subru], [~roniburd])
> BTW, In YARN-6251, we already have an asyncReleaseContainer() in the
> AbstractYarnScheduler and a corresponding scheduler event, which is currently
> used specifically for the container-update code paths (where the scheduler
> realeases temp containers which it creates for the update)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]