[jira] [Comment Edited] (YARN-5576) Core change to localize resource while container is running

2016-09-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457446#comment-15457446
 ] 

Jian He edited comment on YARN-5576 at 9/2/16 4:08 AM:
---

Thanks for the review, Arun !
bq. We need to override the ContainerManagerImpl::localize() method in the 
QueuingContainerManagerImpl. Re-localization should not be allowed if the 
container is currently queued (not yet running)
It is not allowed, the method in ContainerManagerImpl only allows localization 
while running
bq. I only see entries added to ResourceSet::resourcesFailedToBeLocalized set. 
Shouldnt we remove these once the AM is notified of the failure ? Also, 
Shouldn't these be notified back to the AM ? or we are just relying on the 
diagnostic string sent to the AM in the GetContainerStatus response to notify 
the AM ?
The status part is not yet implemented as mentioned in the parent jira. It'll 
will be done once the requirement is clear. Earlier I was thinking these will 
be sent as part of container status.
bq. wondering if we should have another RE_LOCALIZE_CONTAINER_RESOURCE event in 
the ResourceLocalizationService to distinguish from the localization needed for 
container initialization and correspondingly send different events to the 
Container. Or maybe for the timebeing, we should just rename 
INIT_CONTAINER_RESOURCE to LOCALIZE_CONTAINER_RESOURCE.
I don't think adding new events type for doing the same thing is necessary at 
this point. This will also add additional complexity as you need to 
conditionally sends different types of events. The goal is to reuse existing 
code. I can rename it.
bq. : spurious change in the imports of ContainerImpl and BaseAMRMProxyTest
That's done by IDE auto fixing some unused imports, I edited it manually. 



was (Author: jianhe):
bq. We need to override the ContainerManagerImpl::localize() method in the 
QueuingContainerManagerImpl. Re-localization should not be allowed if the 
container is currently queued (not yet running)
It is not allowed, the method in ContainerManagerImpl only allows localization 
while running
bq. I only see entries added to ResourceSet::resourcesFailedToBeLocalized set. 
Shouldnt we remove these once the AM is notified of the failure ? Also, 
Shouldn't these be notified back to the AM ? or we are just relying on the 
diagnostic string sent to the AM in the GetContainerStatus response to notify 
the AM ?
The status part is not yet implemented as mentioned in the parent jira. It'll 
will be done once the requirement is clear. Earlier I was thinking these will 
be sent as part of container status.
bq. wondering if we should have another RE_LOCALIZE_CONTAINER_RESOURCE event in 
the ResourceLocalizationService to distinguish from the localization needed for 
container initialization and correspondingly send different events to the 
Container. Or maybe for the timebeing, we should just rename 
INIT_CONTAINER_RESOURCE to LOCALIZE_CONTAINER_RESOURCE.
I don't think adding new events type for doing the same thing is necessary at 
this point. This will also add additional complexity as you need to 
conditionally sends different types of events. The goal is to reuse existing 
code. I can rename it.
bq. : spurious change in the imports of ContainerImpl and BaseAMRMProxyTest
That's done by IDE auto fixing some unused imports, I edited it manually. 


> Core change to localize resource while container is running
> ---
>
> Key: YARN-5576
> URL: https://issues.apache.org/jira/browse/YARN-5576
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5576.1.patch, YARN-5576.2.patch, YARN-5576.3.patch, 
> YARN-5576.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5576) Core change to localize resource while container is running

2016-09-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457446#comment-15457446
 ] 

Jian He edited comment on YARN-5576 at 9/2/16 4:06 AM:
---

bq. We need to override the ContainerManagerImpl::localize() method in the 
QueuingContainerManagerImpl. Re-localization should not be allowed if the 
container is currently queued (not yet running)
It is not allowed, the method in ContainerManagerImpl only allows localization 
while running
bq. I only see entries added to ResourceSet::resourcesFailedToBeLocalized set. 
Shouldnt we remove these once the AM is notified of the failure ? Also, 
Shouldn't these be notified back to the AM ? or we are just relying on the 
diagnostic string sent to the AM in the GetContainerStatus response to notify 
the AM ?
The status part is not yet implemented as mentioned in the parent jira. It'll 
will be done once the requirement is clear. Earlier I was thinking these will 
be sent as part of container status.
bq. wondering if we should have another RE_LOCALIZE_CONTAINER_RESOURCE event in 
the ResourceLocalizationService to distinguish from the localization needed for 
container initialization and correspondingly send different events to the 
Container. Or maybe for the timebeing, we should just rename 
INIT_CONTAINER_RESOURCE to LOCALIZE_CONTAINER_RESOURCE.
I don't think adding new events type for doing the same thing is necessary at 
this point. This will also add additional complexity as you need to 
conditionally sends different types of events. The goal is to reuse existing 
code. I can rename it.
bq. : spurious change in the imports of ContainerImpl and BaseAMRMProxyTest
That's done by IDE auto fixing some unused imports, I edited it manually. 



was (Author: jianhe):
bq. We need to override the ContainerManagerImpl::localize() method in the 
QueuingContainerManagerImpl. Re-localization should not be allowed if the 
container is currently queued (not yet running)
It is not allowed, the method in ContainerManagerImpl only allows localization 
while running
bq. We need to override the ContainerManagerImpl::localize() method in the 
QueuingContainerManagerImpl. Re-localization should not be allowed if the 
container is currently queued (not yet running)
bq. I only see entries added to ResourceSet::resourcesFailedToBeLocalized set. 
Shouldnt we remove these once the AM is notified of the failure ? Also, 
Shouldn't these be notified back to the AM ? or we are just relying on the 
diagnostic string sent to the AM in the GetContainerStatus response to notify 
the AM ?
The status part is not yet implemented as mentioned in the parent jira. It'll 
will be done once the requirement is clear. Earlier I was thinking these will 
be sent as part of container status.
bq. wondering if we should have another RE_LOCALIZE_CONTAINER_RESOURCE event in 
the ResourceLocalizationService to distinguish from the localization needed for 
container initialization and correspondingly send different events to the 
Container. Or maybe for the timebeing, we should just rename 
INIT_CONTAINER_RESOURCE to LOCALIZE_CONTAINER_RESOURCE.
I don't think adding new events type for doing the same thing is necessary at 
this point. This will also add additional complexity as you need to 
conditionally sends different types of events. The goal is to reuse existing 
code. I can rename it.
bq. : spurious change in the imports of ContainerImpl and BaseAMRMProxyTest
That's done by IDE auto fixing some unused imports, I edited it manually. 


> Core change to localize resource while container is running
> ---
>
> Key: YARN-5576
> URL: https://issues.apache.org/jira/browse/YARN-5576
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5576.1.patch, YARN-5576.2.patch, YARN-5576.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5576) Core change to localize resource while container is running

2016-08-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449258#comment-15449258
 ] 

Jian He edited comment on YARN-5576 at 8/30/16 3:05 PM:


Thanks for the review, Varun

bq. Can we move this logic into the ContainerImpl class?
bq.  It looks like similar logic is required in the ResourceLocalization class?
The logic is different for these two places. The first one should only allow 
localizing when the container is running, while the second should allow if the 
container is either localizing or running. This logic is tailored to the 
localize API only, I feel adding canLocalizeResources method to the interface 
of Container makes it a bit confusing, as container can also localize while at 
localizing state in the normal scenario too. I prefer keep it outside as it 
should be clear enough to readers

bq. I couldn't figure out where this logic was moved to - can you please 
explain?
It's moved to ResourceSet#resourceLocalized method 
bq. We need to check the diagnostics string size - if a AM sends us too many 
failed requests, the diagnostics string will just balloon in size.
Not quite sure how to check though. doesn't seem appropriate to add a new 
config for the diagnostics size. So just skip appending the diagnostics if it 
is larger than say 2000 in length?
bq. We should throw an error here or at least flag this as a failed 
localization?
I don't think we should throw error. It's mentioned in the doc that we'll skip 
updating the symlink for now. This is a valid use case if we want to replacing 
existing resource while container is running. The changing symlink part, 
instead of being done here, will be done later when container re-launches.
bq. Can you explain the logic for this? I couldn't figure out why we need this.
It's required because, in the normal scenario, when the localization completes 
before launching the container, the private localizer thread will be 
interrupted. And when we re-localize, we need to create a new Thread object as 
the old thread is stopped. Added code comments for this
bq. Can you also add a check in testLocalingResourceWhileContainerRunning to 
make sure we can’t localize for non-running containers?
will do
bq. Do we need these lines in the test code? Maybe move them to LOG.debug?
It's useful for debugging test. Anyway, I just removed it.


was (Author: jianhe):
Thanks for the review, Varun

bq. Can we move this logic into the ContainerImpl class?
bq.  It looks like similar logic is required in the ResourceLocalization class?
The logic is different for these two places. The first one should only allow 
localizing when the container is running, while the second should allow if the 
container is either localizing or running. This logic is tailored to the 
localize API only, I feel adding canLocalizeResources method to the interface 
of Container makes it a bit confusing, as container can also localize while at 
localizing state in the normal scenario too. I prefer keep it outside as it 
should be clear enough to readers

bq. I couldn't figure out where this logic was moved to - can you please 
explain?
It's moved to ResourceSet#resourceLocalized method 
bq. We need to check the diagnostics string size - if a AM sends us too many 
failed requests, the diagnostics string will just balloon in size.
Not quite sure how to check though. doesn't seem appropriate to add a new 
config for the diagnostics size. So just skip appending the diagnostics if it 
is larger than say 2000 in length?
bq. We should throw an error here or at least flag this as a failed 
localization?
I don't think we should throw error. It's mentioned in the doc that we'll skip 
updating the symlink for now. This is a valid use case if we want to replacing 
existing resource while container is running. The changing symlink part, 
instead of being done here, will be done later when container re-launches.
bq. Can you explain the logic for this? I couldn't figure out why we need this.
It's required because, in the normal scenario, when the localization completes 
before launching the container, the private localizer thread will be 
interrupted. And when we re-localize, we need to create a new Thread object as 
the old thread is stopped. Added code comments for this
bq. Do we need these lines in the test code? Maybe move them to LOG.debug?
I just removed it.
bq. Can you also add a check in testLocalingResourceWhileContainerRunning to 
make sure we can’t localize for non-running containers?
will do
bq. Do we need these lines in the test code? Maybe move them to LOG.debug?
It's useful for debugging test. Anyway, I just removed it.

> Core change to localize resource while container is running
> ---
>
> Key: YARN-5576
> URL: https://issues.apache.org/jira/browse/YARN-5576
> Project: Hadoop