[
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696433#comment-14696433
]
sandflee commented on YARN-3987:
Thanks [~jianhe]!
> am container complete msg ack to NM o
[
https://issues.apache.org/jira/browse/YARN-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee resolved YARN-4040.
Resolution: Not A Problem
If AM release a container, the complete msg(released by AM) is stored by
RMAppAtte
[
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-4051:
---
Attachment: YARN-4051.01.patch
> ContainerKillEvent is lost when container is In New State and is recovering
>
sandflee created YARN-4051:
--
Summary: ContainerKillEvent is lost when container is In New
State and is recovering
Key: YARN-4051
URL: https://issues.apache.org/jira/browse/YARN-4051
Project: Hadoop YARN
sandflee created YARN-4050:
--
Summary: NM event dispatcher may blocked by LogAggregationService
if NameNode is slow
Key: YARN-4050
URL: https://issues.apache.org/jira/browse/YARN-4050
Project: Hadoop YARN
[
https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692709#comment-14692709
]
sandflee commented on YARN-2038:
I thought it's the same issue to YARN-3519, but it seems n
sandflee created YARN-4040:
--
Summary: container complete msg should passed to AM,even if the
container is released.
Key: YARN-4040
URL: https://issues.apache.org/jira/browse/YARN-4040
Project: Hadoop YARN
sandflee created YARN-4020:
--
Summary: Exception happens while stopContainer in AM
Key: YARN-4020
URL: https://issues.apache.org/jira/browse/YARN-4020
Project: Hadoop YARN
Issue Type: Bug
[
https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650644#comment-14650644
]
sandflee commented on YARN-4005:
seems there's no need to add to recentlyStoppedContainers,
[
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645341#comment-14645341
]
sandflee commented on YARN-3987:
AM crashes before it register to RM
> am container compl
[
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645326#comment-14645326
]
sandflee commented on YARN-3987:
Yes the old AM container in NM aren't cleaned up. in our c
[
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3987:
---
Attachment: YARN-3987.002.patch
> am container complete msg ack to NM once RM receive it
>
[
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645136#comment-14645136
]
sandflee commented on YARN-3987:
yes, we set getKeepContainersAcrossApplicationAttempts tru
[
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3987:
---
Attachment: YARN-3987.001.patch
> am container complete msg ack to NM once RM receive it
>
sandflee created YARN-3987:
--
Summary: am container complete msg ack to NM once RM receive it
Key: YARN-3987
URL: https://issues.apache.org/jira/browse/YARN-3987
Project: Hadoop YARN
Issue Type: Bug
[
https://issues.apache.org/jira/browse/YARN-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626528#comment-14626528
]
sandflee commented on YARN-3327:
There is no logs any more, it's a long time and I just fix
[
https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3518:
---
Attachment: YARN-3518.004.patch
remove checkstyle warning
> default rm/am expire interval should not less than
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560550#comment-14560550
]
sandflee commented on YARN-3668:
when the AM restarts its JARs are re-downloaded from HDFS.
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560549#comment-14560549
]
sandflee commented on YARN-3668:
when the AM restarts its JARs are re-downloaded from HDFS.
[
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560111#comment-14560111
]
sandflee commented on YARN-3644:
Thanks [~vinodkv], what my concerns is long running contai
[
https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3518:
---
Attachment: YARN-3518.003.patch
> default rm/am expire interval should not less than default resourcemanager
>
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549506#comment-14549506
]
sandflee commented on YARN-3668:
yes, I agree it's purely a problem of AM,but it seems a bo
[
https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3518:
---
Attachment: YARN-3518.002.patch
replace RESOURCEMANAGER_CONNECT_MAX_WAIT_MS with
RESOURCETRACKER_RESOURCEMANAG
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548123#comment-14548123
]
sandflee commented on YARN-3668:
thanks [~stevel], we're using our own AM not slider, and s
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547496#comment-14547496
]
sandflee commented on YARN-3668:
I don't want the service to terminated if AM goes down, ya
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547434#comment-14547434
]
sandflee commented on YARN-3668:
seems not enough,if AM crashed on launch because of AM's b
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547168#comment-14547168
]
sandflee commented on YARN-3668:
If am crashed and reaches am max fail times, applications
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547165#comment-14547165
]
sandflee commented on YARN-3668:
If all RM crashed, all running containers will be killed,
sandflee created YARN-3668:
--
Summary: Long run service shouldn't be killed even if Yarn crashed
Key: YARN-3668
URL: https://issues.apache.org/jira/browse/YARN-3668
Project: Hadoop YARN
Issue Type: W
[
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547159#comment-14547159
]
sandflee commented on YARN-3644:
In our cluster we also have to face this problem, I'd like
[
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547155#comment-14547155
]
sandflee commented on YARN-3644:
[~raju.bairishetti] thanks for your reply, If RM HA is no
[
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546971#comment-14546971
]
sandflee commented on YARN-3644:
If RM is down, NM's connection will be reset by RM machine
[
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532755#comment-14532755
]
sandflee commented on YARN-3480:
one benefit in [~hex108]'s work is we wouldn't worry about
[
https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527706#comment-14527706
]
sandflee commented on YARN-3518:
agree, we should set nm, am, client separately
> default
[
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525264#comment-14525264
]
sandflee commented on YARN-3554:
Hi [~Naganarasimha] 3 mins seems dangerous, If rm fails o
[
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525143#comment-14525143
]
sandflee commented on YARN-3554:
set this to a bigger value maybe based on network partitio
[
https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee resolved YARN-3546.
Resolution: Not A Problem
> AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're
> so
[
https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522456#comment-14522456
]
sandflee commented on YARN-3546:
ok, close it now, thanks [~jianhe]
> AbstractYarnSchedule
[
https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520838#comment-14520838
]
sandflee commented on YARN-3546:
The implement of AbstractYarnScheduler.getApplicationAttem
[
https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520819#comment-14520819
]
sandflee commented on YARN-3546:
sorry for my explanation. Let's consider below situation,
[
https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520706#comment-14520706
]
sandflee commented on YARN-3546:
[~jianhe], thanks for your explanation, I stil have one d
[
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512769#comment-14512769
]
sandflee commented on YARN-3533:
getApplicationAttempt seems confusing, I just opened
htt
sandflee created YARN-3546:
--
Summary: AbstractYarnScheduler.getApplicationAttempt seems
misleading, and there're some misuse of it
Key: YARN-3546
URL: https://issues.apache.org/jira/browse/YARN-3546
Project
[
https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3518:
---
Attachment: YARN-3518.001.patch
I don't know why DEFAULT_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS is 15min, just
s
[
https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3518:
---
Summary: default rm/am expire interval should not less than default
resourcemanager connect wait time (was: de
[
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512065#comment-14512065
]
sandflee commented on YARN-3387:
Thanks He Jian and Anubhav
> Previous AM's container comp
[
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508476#comment-14508476
]
sandflee commented on YARN-3533:
thanks for you patch,
1, waitForSchedulerAppAttemptAdded
[
https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507266#comment-14507266
]
sandflee commented on YARN-2038:
If nm register to rm in a short time, we can add a interfa
[
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507255#comment-14507255
]
sandflee commented on YARN-3387:
It seems a bug in LaunchAM in MockRM.java, in LaunchAM:
1,
[
https://issues.apache.org/jira/browse/YARN-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506269#comment-14506269
]
sandflee commented on YARN-3519:
not easy to fix, I'll think more
> registerApplicationMas
[
https://issues.apache.org/jira/browse/YARN-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee resolved YARN-3519.
Resolution: Duplicate
> registerApplicationMaster couldn't get all running containers if rm is
> rebuilding
[
https://issues.apache.org/jira/browse/YARN-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505894#comment-14505894
]
sandflee commented on YARN-3519:
yes, the same issue
> registerApplicationMaster couldn't
sandflee created YARN-3519:
--
Summary: registerApplicationMaster couldn't get all running
containers if rm is rebuilding container info while am is relaunched
Key: YARN-3519
URL: https://issues.apache.org/jira/browse/YAR
sandflee created YARN-3518:
--
Summary: default rm/am expire interval should less than default
resourcemanager connect wait time
Key: YARN-3518
URL: https://issues.apache.org/jira/browse/YARN-3518
Project: Had
[
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3387:
---
Attachment: YARN-3387.002.patch
ut added
> container complete message couldn't pass to am if am restarted and
[
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491469#comment-14491469
]
sandflee commented on YARN-3387:
Jian He, thanks for the reiew.
Yes, they're same right now
[
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3387:
---
Attachment: YARN-3387.001.patch
share justFinishedContainers with Current appAttempt while recovering app
atte
[
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377019#comment-14377019
]
sandflee commented on YARN-3387:
yes
> container complete message couldn't pass to am if a
sandflee created YARN-3387:
--
Summary: container complete message couldn't pass to am if am
restarted and rm changed
Key: YARN-3387
URL: https://issues.apache.org/jira/browse/YARN-3387
Project: Hadoop YARN
[
https://issues.apache.org/jira/browse/YARN-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356384#comment-14356384
]
sandflee commented on YARN-3328:
Is there any necessary to keep containers info in NMClient
[
https://issues.apache.org/jira/browse/YARN-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3328:
---
Description:
If work preserving is enabled and AM restart, AM could't stop containers
launched by pre-am, bec
[
https://issues.apache.org/jira/browse/YARN-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355064#comment-14355064
]
sandflee commented on YARN-3329:
the same to YARN-3328, close it
> There's no way to rebui
[
https://issues.apache.org/jira/browse/YARN-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee resolved YARN-3329.
Resolution: Done
Release Note: the same to YARN-3328, sorry for creating twice
> There's no way to rebu
sandflee created YARN-3328:
--
Summary: There's no way to rebuild containers Managed by
NMClientAsync If AM restart
Key: YARN-3328
URL: https://issues.apache.org/jira/browse/YARN-3328
Project: Hadoop YARN
sandflee created YARN-3329:
--
Summary: There's no way to rebuild containers Managed by
NMClientAsync If AM restart
Key: YARN-3329
URL: https://issues.apache.org/jira/browse/YARN-3329
Project: Hadoop YARN
sandflee created YARN-3327:
--
Summary: if NMClientAsync stopContainer failed because of
IOException, there's no change to stopContainer again
Key: YARN-3327
URL: https://issues.apache.org/jira/browse/YARN-3327
[
https://issues.apache.org/jira/browse/YARN-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3327:
---
Summary: if NMClientAsync stopContainer failed because of IOException,
there's no chance to stopContainer agai
[
https://issues.apache.org/jira/browse/YARN-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313383#comment-14313383
]
sandflee commented on YARN-3161:
if the NM machine crashes while RM restart, it seems we'll
401 - 468 of 468 matches
Mail list logo