[
https://issues.apache.org/jira/browse/YARN-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15448580#comment-15448580
]
Varun Vasudev commented on YARN-5576:
-------------------------------------
Thanks for the patch [~jianhe]!
1) Can you please look at the javac errors?
2)
{code}
+ if (!container.getContainerState().equals(
+ org.apache.hadoop.yarn.server.nodemanager.
+ containermanager.container.ContainerState.RUNNING)) {
+ throw new YarnException(
+ containerId + " is at " + container.getContainerState()
+ + " state. Not able to localize new resources.");
+ }
{code}
Can we move this logic into the ContainerImpl class? Something line {code}
if(!container.canLocalizeResources()) { {code}
It puts all the logic for this stuff in one place. It looks like similar logic
is required in the ResourceLocalization class?
{code}
+ EnumSet<ContainerState> set =
+ EnumSet.of(ContainerState.LOCALIZING, ContainerState.RUNNING);
+ if (!set.contains(c.getContainerState())) {
+ LOG.warn(c.getContainerId() + " is at " + c.getContainerState()
+ + " state, do not localize resources.");
+ return;
+ }
{code}
3)
{code}
@@ -844,22 +772,21 @@ public ContainerState transition(ContainerImpl container,
ContainerResourceLocalizedEvent rsrcEvent =
(ContainerResourceLocalizedEvent) event;
{code}
{code}
- container.localizedResources.put(location, sys);
{code}
I couldn't figure out where this logic was moved to - can you please explain?
4)
{code}
+ if (new File(linkFile).exists()) {
+ LOG.info("Symlink file already exists: " + linkFile);
+ } else {
{code}
We should throw an error here or at least flag this as a failed localization?
5)
{code}
+ container.diagnostics.append(failedEvent.getDiagnosticMessage());
{code}
We need to check the diagnostics string size - if a AM sends us too many failed
requests, the diagnostics string will just balloon in size.
6)
{code}
+ if (localizer != null && localizer.killContainerLocalizer.get()) {
+ LOG.info("New " + event.getType() + " localize request for "
+ + locId + ", remove old private localizer.");
+ cleanupPrivLocalizers(locId);
+ localizer = null;
+ }
{code}
Can you explain the logic for this? I couldn't figure out why we need this.
7)
{code}
+ System.out.println("==================================");
+ System.out.println(appDir.getAbsolutePath());
+ System.out.println(appSysDir.getAbsolutePath());
+ System.out.println(containerDir.getAbsolutePath());
+ System.out.println(containerSysDir.getAbsolutePath());
+ System.out.println(targetFile.getAbsolutePath());
+ System.out.println("==================================");
{code}
Do we need these lines in the test code? Maybe move them to LOG.debug?
8)
Can you also add a check in testLocalingResourceWhileContainerRunning to make
sure we can’t localize for non-running containers?
> Core change to localize resource while container is running
> -----------------------------------------------------------
>
> Key: YARN-5576
> URL: https://issues.apache.org/jira/browse/YARN-5576
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Jian He
> Assignee: Jian He
> Attachments: YARN-5576.1.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]