[jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it

2015-08-13 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696433#comment-14696433 ] sandflee commented on YARN-3987: Thanks [~jianhe]! > am container complete msg ack to NM o

[jira] [Resolved] (YARN-4040) container complete msg should passed to AM,even if the container is released.

2015-08-13 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee resolved YARN-4040. Resolution: Not A Problem If AM release a container, the complete msg(released by AM) is stored by RMAppAtte

[jira] [Updated] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering

2015-08-13 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-4051: --- Attachment: YARN-4051.01.patch > ContainerKillEvent is lost when container is In New State and is recovering >

[jira] [Created] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering

2015-08-13 Thread sandflee (JIRA)
sandflee created YARN-4051: -- Summary: ContainerKillEvent is lost when container is In New State and is recovering Key: YARN-4051 URL: https://issues.apache.org/jira/browse/YARN-4051 Project: Hadoop YARN

[jira] [Created] (YARN-4050) NM event dispatcher may blocked by LogAggregationService if NameNode is slow

2015-08-13 Thread sandflee (JIRA)
sandflee created YARN-4050: -- Summary: NM event dispatcher may blocked by LogAggregationService if NameNode is slow Key: YARN-4050 URL: https://issues.apache.org/jira/browse/YARN-4050 Project: Hadoop YARN

[jira] [Commented] (YARN-2038) Revisit how AMs learn of containers from previous attempts

2015-08-11 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692709#comment-14692709 ] sandflee commented on YARN-2038: I thought it's the same issue to YARN-3519, but it seems n

[jira] [Created] (YARN-4040) container complete msg should passed to AM,even if the container is released.

2015-08-10 Thread sandflee (JIRA)
sandflee created YARN-4040: -- Summary: container complete msg should passed to AM,even if the container is released. Key: YARN-4040 URL: https://issues.apache.org/jira/browse/YARN-4040 Project: Hadoop YARN

[jira] [Created] (YARN-4020) Exception happens while stopContainer in AM

2015-08-05 Thread sandflee (JIRA)
sandflee created YARN-4020: -- Summary: Exception happens while stopContainer in AM Key: YARN-4020 URL: https://issues.apache.org/jira/browse/YARN-4020 Project: Hadoop YARN Issue Type: Bug

[jira] [Commented] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore

2015-08-02 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650644#comment-14650644 ] sandflee commented on YARN-4005: seems there's no need to add to recentlyStoppedContainers,

[jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it

2015-07-28 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645341#comment-14645341 ] sandflee commented on YARN-3987: AM crashes before it register to RM > am container compl

[jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it

2015-07-28 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645326#comment-14645326 ] sandflee commented on YARN-3987: Yes the old AM container in NM aren't cleaned up. in our c

[jira] [Updated] (YARN-3987) am container complete msg ack to NM once RM receive it

2015-07-28 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-3987: --- Attachment: YARN-3987.002.patch > am container complete msg ack to NM once RM receive it >

[jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it

2015-07-28 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645136#comment-14645136 ] sandflee commented on YARN-3987: yes, we set getKeepContainersAcrossApplicationAttempts tru

[jira] [Updated] (YARN-3987) am container complete msg ack to NM once RM receive it

2015-07-28 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-3987: --- Attachment: YARN-3987.001.patch > am container complete msg ack to NM once RM receive it >

[jira] [Created] (YARN-3987) am container complete msg ack to NM once RM receive it

2015-07-28 Thread sandflee (JIRA)
sandflee created YARN-3987: -- Summary: am container complete msg ack to NM once RM receive it Key: YARN-3987 URL: https://issues.apache.org/jira/browse/YARN-3987 Project: Hadoop YARN Issue Type: Bug

[jira] [Commented] (YARN-3327) if NMClientAsync stopContainer failed because of IOException, there's no chance to stopContainer again

2015-07-14 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626528#comment-14626528 ] sandflee commented on YARN-3327: There is no logs any more, it's a long time and I just fix

[jira] [Updated] (YARN-3518) default rm/am expire interval should not less than default resourcemanager connect wait time

2015-05-28 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-3518: --- Attachment: YARN-3518.004.patch remove checkstyle warning > default rm/am expire interval should not less than

[jira] [Commented] (YARN-3668) Long run service shouldn't be killed even if Yarn crashed

2015-05-27 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560550#comment-14560550 ] sandflee commented on YARN-3668: when the AM restarts its JARs are re-downloaded from HDFS.

[jira] [Commented] (YARN-3668) Long run service shouldn't be killed even if Yarn crashed

2015-05-27 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560549#comment-14560549 ] sandflee commented on YARN-3668: when the AM restarts its JARs are re-downloaded from HDFS.

[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

2015-05-26 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560111#comment-14560111 ] sandflee commented on YARN-3644: Thanks [~vinodkv], what my concerns is long running contai

[jira] [Updated] (YARN-3518) default rm/am expire interval should not less than default resourcemanager connect wait time

2015-05-26 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-3518: --- Attachment: YARN-3518.003.patch > default rm/am expire interval should not less than default resourcemanager >

[jira] [Commented] (YARN-3668) Long run service shouldn't be killed even if Yarn crashed

2015-05-18 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549506#comment-14549506 ] sandflee commented on YARN-3668: yes, I agree it's purely a problem of AM,but it seems a bo

[jira] [Updated] (YARN-3518) default rm/am expire interval should not less than default resourcemanager connect wait time

2015-05-18 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-3518: --- Attachment: YARN-3518.002.patch replace RESOURCEMANAGER_CONNECT_MAX_WAIT_MS with RESOURCETRACKER_RESOURCEMANAG

[jira] [Commented] (YARN-3668) Long run service shouldn't be killed even if Yarn crashed

2015-05-18 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548123#comment-14548123 ] sandflee commented on YARN-3668: thanks [~stevel], we're using our own AM not slider, and s

[jira] [Commented] (YARN-3668) Long run service shouldn't be killed even if Yarn crashed

2015-05-17 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547496#comment-14547496 ] sandflee commented on YARN-3668: I don't want the service to terminated if AM goes down, ya

[jira] [Commented] (YARN-3668) Long run service shouldn't be killed even if Yarn crashed

2015-05-17 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547434#comment-14547434 ] sandflee commented on YARN-3668: seems not enough,if AM crashed on launch because of AM's b

[jira] [Commented] (YARN-3668) Long run service shouldn't be killed even if Yarn crashed

2015-05-17 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547168#comment-14547168 ] sandflee commented on YARN-3668: If am crashed and reaches am max fail times, applications

[jira] [Commented] (YARN-3668) Long run service shouldn't be killed even if Yarn crashed

2015-05-17 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547165#comment-14547165 ] sandflee commented on YARN-3668: If all RM crashed, all running containers will be killed,

[jira] [Created] (YARN-3668) Long run service shouldn't be killed even if Yarn crashed

2015-05-17 Thread sandflee (JIRA)
sandflee created YARN-3668: -- Summary: Long run service shouldn't be killed even if Yarn crashed Key: YARN-3668 URL: https://issues.apache.org/jira/browse/YARN-3668 Project: Hadoop YARN Issue Type: W

[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

2015-05-17 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547159#comment-14547159 ] sandflee commented on YARN-3644: In our cluster we also have to face this problem, I'd like

[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

2015-05-17 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547155#comment-14547155 ] sandflee commented on YARN-3644: [~raju.bairishetti] thanks for your reply, If RM HA is no

[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

2015-05-16 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546971#comment-14546971 ] sandflee commented on YARN-3644: If RM is down, NM's connection will be reset by RM machine

[jira] [Commented] (YARN-3480) Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable

2015-05-07 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532755#comment-14532755 ] sandflee commented on YARN-3480: one benefit in [~hex108]'s work is we wouldn't worry about

[jira] [Commented] (YARN-3518) default rm/am expire interval should not less than default resourcemanager connect wait time

2015-05-04 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527706#comment-14527706 ] sandflee commented on YARN-3518: agree, we should set nm, am, client separately > default

[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-02 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525264#comment-14525264 ] sandflee commented on YARN-3554: Hi [~Naganarasimha] 3 mins seems dangerous, If rm fails o

[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-02 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525143#comment-14525143 ] sandflee commented on YARN-3554: set this to a bigger value maybe based on network partitio

[jira] [Resolved] (YARN-3546) AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're some misuse of it

2015-04-30 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee resolved YARN-3546. Resolution: Not A Problem > AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're > so

[jira] [Commented] (YARN-3546) AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're some misuse of it

2015-04-30 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522456#comment-14522456 ] sandflee commented on YARN-3546: ok, close it now, thanks [~jianhe] > AbstractYarnSchedule

[jira] [Commented] (YARN-3546) AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're some misuse of it

2015-04-29 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520838#comment-14520838 ] sandflee commented on YARN-3546: The implement of AbstractYarnScheduler.getApplicationAttem

[jira] [Commented] (YARN-3546) AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're some misuse of it

2015-04-29 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520819#comment-14520819 ] sandflee commented on YARN-3546: sorry for my explanation. Let's consider below situation,

[jira] [Commented] (YARN-3546) AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're some misuse of it

2015-04-29 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520706#comment-14520706 ] sandflee commented on YARN-3546: [~jianhe], thanks for your explanation, I stil have one d

[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled

2015-04-25 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512769#comment-14512769 ] sandflee commented on YARN-3533: getApplicationAttempt seems confusing, I just opened htt

[jira] [Created] (YARN-3546) AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're some misuse of it

2015-04-25 Thread sandflee (JIRA)
sandflee created YARN-3546: -- Summary: AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're some misuse of it Key: YARN-3546 URL: https://issues.apache.org/jira/browse/YARN-3546 Project

[jira] [Updated] (YARN-3518) default rm/am expire interval should not less than default resourcemanager connect wait time

2015-04-25 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-3518: --- Attachment: YARN-3518.001.patch I don't know why DEFAULT_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS is 15min, just s

[jira] [Updated] (YARN-3518) default rm/am expire interval should not less than default resourcemanager connect wait time

2015-04-25 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-3518: --- Summary: default rm/am expire interval should not less than default resourcemanager connect wait time (was: de

[jira] [Commented] (YARN-3387) Previous AM's container complete message couldn't pass to current am if am restarted and rm changed

2015-04-24 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512065#comment-14512065 ] sandflee commented on YARN-3387: Thanks He Jian and Anubhav > Previous AM's container comp

[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled

2015-04-22 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508476#comment-14508476 ] sandflee commented on YARN-3533: thanks for you patch, 1, waitForSchedulerAppAttemptAdded

[jira] [Commented] (YARN-2038) Revisit how AMs learn of containers from previous attempts

2015-04-22 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507266#comment-14507266 ] sandflee commented on YARN-2038: If nm register to rm in a short time, we can add a interfa

[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

2015-04-22 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507255#comment-14507255 ] sandflee commented on YARN-3387: It seems a bug in LaunchAM in MockRM.java, in LaunchAM: 1,

[jira] [Commented] (YARN-3519) registerApplicationMaster couldn't get all running containers if rm is rebuilding container info while am is relaunched

2015-04-21 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506269#comment-14506269 ] sandflee commented on YARN-3519: not easy to fix, I'll think more > registerApplicationMas

[jira] [Resolved] (YARN-3519) registerApplicationMaster couldn't get all running containers if rm is rebuilding container info while am is relaunched

2015-04-21 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee resolved YARN-3519. Resolution: Duplicate > registerApplicationMaster couldn't get all running containers if rm is > rebuilding

[jira] [Commented] (YARN-3519) registerApplicationMaster couldn't get all running containers if rm is rebuilding container info while am is relaunched

2015-04-21 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505894#comment-14505894 ] sandflee commented on YARN-3519: yes, the same issue > registerApplicationMaster couldn't

[jira] [Created] (YARN-3519) registerApplicationMaster couldn't get all running containers if rm is rebuilding container info while am is relaunched

2015-04-21 Thread sandflee (JIRA)
sandflee created YARN-3519: -- Summary: registerApplicationMaster couldn't get all running containers if rm is rebuilding container info while am is relaunched Key: YARN-3519 URL: https://issues.apache.org/jira/browse/YAR

[jira] [Created] (YARN-3518) default rm/am expire interval should less than default resourcemanager connect wait time

2015-04-21 Thread sandflee (JIRA)
sandflee created YARN-3518: -- Summary: default rm/am expire interval should less than default resourcemanager connect wait time Key: YARN-3518 URL: https://issues.apache.org/jira/browse/YARN-3518 Project: Had

[jira] [Updated] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

2015-04-20 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-3387: --- Attachment: YARN-3387.002.patch ut added > container complete message couldn't pass to am if am restarted and

[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

2015-04-12 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491469#comment-14491469 ] sandflee commented on YARN-3387: Jian He, thanks for the reiew. Yes, they're same right now

[jira] [Updated] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

2015-03-25 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-3387: --- Attachment: YARN-3387.001.patch share justFinishedContainers with Current appAttempt while recovering app atte

[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

2015-03-23 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377019#comment-14377019 ] sandflee commented on YARN-3387: yes > container complete message couldn't pass to am if a

[jira] [Created] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

2015-03-23 Thread sandflee (JIRA)
sandflee created YARN-3387: -- Summary: container complete message couldn't pass to am if am restarted and rm changed Key: YARN-3387 URL: https://issues.apache.org/jira/browse/YARN-3387 Project: Hadoop YARN

[jira] [Commented] (YARN-3328) There's no way to rebuild containers Managed by NMClientAsync If AM restart

2015-03-10 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356384#comment-14356384 ] sandflee commented on YARN-3328: Is there any necessary to keep containers info in NMClient

[jira] [Updated] (YARN-3328) There's no way to rebuild containers Managed by NMClientAsync If AM restart

2015-03-10 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-3328: --- Description: If work preserving is enabled and AM restart, AM could't stop containers launched by pre-am, bec

[jira] [Commented] (YARN-3329) There's no way to rebuild containers Managed by NMClientAsync If AM restart

2015-03-10 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355064#comment-14355064 ] sandflee commented on YARN-3329: the same to YARN-3328, close it > There's no way to rebui

[jira] [Resolved] (YARN-3329) There's no way to rebuild containers Managed by NMClientAsync If AM restart

2015-03-10 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee resolved YARN-3329. Resolution: Done Release Note: the same to YARN-3328, sorry for creating twice > There's no way to rebu

[jira] [Created] (YARN-3328) There's no way to rebuild containers Managed by NMClientAsync If AM restart

2015-03-10 Thread sandflee (JIRA)
sandflee created YARN-3328: -- Summary: There's no way to rebuild containers Managed by NMClientAsync If AM restart Key: YARN-3328 URL: https://issues.apache.org/jira/browse/YARN-3328 Project: Hadoop YARN

[jira] [Created] (YARN-3329) There's no way to rebuild containers Managed by NMClientAsync If AM restart

2015-03-10 Thread sandflee (JIRA)
sandflee created YARN-3329: -- Summary: There's no way to rebuild containers Managed by NMClientAsync If AM restart Key: YARN-3329 URL: https://issues.apache.org/jira/browse/YARN-3329 Project: Hadoop YARN

[jira] [Created] (YARN-3327) if NMClientAsync stopContainer failed because of IOException, there's no change to stopContainer again

2015-03-10 Thread sandflee (JIRA)
sandflee created YARN-3327: -- Summary: if NMClientAsync stopContainer failed because of IOException, there's no change to stopContainer again Key: YARN-3327 URL: https://issues.apache.org/jira/browse/YARN-3327

[jira] [Updated] (YARN-3327) if NMClientAsync stopContainer failed because of IOException, there's no chance to stopContainer again

2015-03-10 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-3327: --- Summary: if NMClientAsync stopContainer failed because of IOException, there's no chance to stopContainer agai

[jira] [Commented] (YARN-3161) Containers' information are lost in some cases when RM restart

2015-02-09 Thread sandflee (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313383#comment-14313383 ] sandflee commented on YARN-3161: if the NM machine crashes while RM restart, it seems we'll

<    1   2   3   4   5