[
https://issues.apache.org/jira/browse/YARN-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313383#comment-14313383
]
sandflee commented on YARN-3161:
if the NM machine crashes while RM restart, it seems we'll
[
https://issues.apache.org/jira/browse/YARN-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3327:
---
Summary: if NMClientAsync stopContainer failed because of IOException,
there's no chance to stopContainer
sandflee created YARN-3327:
--
Summary: if NMClientAsync stopContainer failed because of
IOException, there's no change to stopContainer again
Key: YARN-3327
URL: https://issues.apache.org/jira/browse/YARN-3327
[
https://issues.apache.org/jira/browse/YARN-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355064#comment-14355064
]
sandflee commented on YARN-3329:
the same to YARN-3328, close it
There's no way to
[
https://issues.apache.org/jira/browse/YARN-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3328:
---
Description:
If work preserving is enabled and AM restart, AM could't stop containers
launched by pre-am,
[
https://issues.apache.org/jira/browse/YARN-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356384#comment-14356384
]
sandflee commented on YARN-3328:
Is there any necessary to keep containers info in
sandflee created YARN-3328:
--
Summary: There's no way to rebuild containers Managed by
NMClientAsync If AM restart
Key: YARN-3328
URL: https://issues.apache.org/jira/browse/YARN-3328
Project: Hadoop YARN
sandflee created YARN-3329:
--
Summary: There's no way to rebuild containers Managed by
NMClientAsync If AM restart
Key: YARN-3329
URL: https://issues.apache.org/jira/browse/YARN-3329
Project: Hadoop YARN
[
https://issues.apache.org/jira/browse/YARN-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee resolved YARN-3329.
Resolution: Done
Release Note: the same to YARN-3328, sorry for creating twice
There's no way to
sandflee created YARN-3387:
--
Summary: container complete message couldn't pass to am if am
restarted and rm changed
Key: YARN-3387
URL: https://issues.apache.org/jira/browse/YARN-3387
Project: Hadoop YARN
[
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377019#comment-14377019
]
sandflee commented on YARN-3387:
yes
container complete message couldn't pass to am if am
[
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3387:
---
Attachment: YARN-3387.002.patch
ut added
container complete message couldn't pass to am if am restarted and
sandflee created YARN-3519:
--
Summary: registerApplicationMaster couldn't get all running
containers if rm is rebuilding container info while am is relaunched
Key: YARN-3519
URL:
sandflee created YARN-3518:
--
Summary: default rm/am expire interval should less than default
resourcemanager connect wait time
Key: YARN-3518
URL: https://issues.apache.org/jira/browse/YARN-3518
Project:
sandflee created YARN-3546:
--
Summary: AbstractYarnScheduler.getApplicationAttempt seems
misleading, and there're some misuse of it
Key: YARN-3546
URL: https://issues.apache.org/jira/browse/YARN-3546
[
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512769#comment-14512769
]
sandflee commented on YARN-3533:
getApplicationAttempt seems confusing, I just opened
[
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507255#comment-14507255
]
sandflee commented on YARN-3387:
It seems a bug in LaunchAM in MockRM.java, in LaunchAM:
1,
[
https://issues.apache.org/jira/browse/YARN-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505894#comment-14505894
]
sandflee commented on YARN-3519:
yes, the same issue
registerApplicationMaster couldn't
[
https://issues.apache.org/jira/browse/YARN-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee resolved YARN-3519.
Resolution: Duplicate
registerApplicationMaster couldn't get all running containers if rm is
rebuilding
[
https://issues.apache.org/jira/browse/YARN-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506269#comment-14506269
]
sandflee commented on YARN-3519:
not easy to fix, I'll think more
[
https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507266#comment-14507266
]
sandflee commented on YARN-2038:
If nm register to rm in a short time, we can add a
[
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512065#comment-14512065
]
sandflee commented on YARN-3387:
Thanks He Jian and Anubhav
Previous AM's container
[
https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3518:
---
Attachment: YARN-3518.001.patch
I don't know why DEFAULT_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS is 15min, just
[
https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3518:
---
Summary: default rm/am expire interval should not less than default
resourcemanager connect wait time (was:
[
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508476#comment-14508476
]
sandflee commented on YARN-3533:
thanks for you patch,
1, waitForSchedulerAppAttemptAdded
[
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525143#comment-14525143
]
sandflee commented on YARN-3554:
set this to a bigger value maybe based on network
[
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525264#comment-14525264
]
sandflee commented on YARN-3554:
Hi [~Naganarasimha] 3 mins seems dangerous, If rm fails
[
https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522456#comment-14522456
]
sandflee commented on YARN-3546:
ok, close it now, thanks [~jianhe]
[
https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee resolved YARN-3546.
Resolution: Not A Problem
AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're
[
https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520706#comment-14520706
]
sandflee commented on YARN-3546:
[~jianhe], thanks for your explanation, I stil have one
[
https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520819#comment-14520819
]
sandflee commented on YARN-3546:
sorry for my explanation. Let's consider below situation,
[
https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520838#comment-14520838
]
sandflee commented on YARN-3546:
The implement of
[
https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527706#comment-14527706
]
sandflee commented on YARN-3518:
agree, we should set nm, am, client separately
default
[
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532755#comment-14532755
]
sandflee commented on YARN-3480:
one benefit in [~hex108]'s work is we wouldn't worry about
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548123#comment-14548123
]
sandflee commented on YARN-3668:
thanks [~stevel], we're using our own AM not slider, and
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547434#comment-14547434
]
sandflee commented on YARN-3668:
seems not enough,if AM crashed on launch because of AM's
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547496#comment-14547496
]
sandflee commented on YARN-3668:
I don't want the service to terminated if AM goes down,
[
https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3518:
---
Attachment: YARN-3518.002.patch
replace RESOURCEMANAGER_CONNECT_MAX_WAIT_MS with
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549506#comment-14549506
]
sandflee commented on YARN-3668:
yes, I agree it's purely a problem of AM,but it seems a
[
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491469#comment-14491469
]
sandflee commented on YARN-3387:
Jian He, thanks for the reiew.
Yes, they're same right
[
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546971#comment-14546971
]
sandflee commented on YARN-3644:
If RM is down, NM's connection will be reset by RM
[
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547155#comment-14547155
]
sandflee commented on YARN-3644:
[~raju.bairishetti] thanks for your reply, If RM HA is
[
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547159#comment-14547159
]
sandflee commented on YARN-3644:
In our cluster we also have to face this problem, I'd like
sandflee created YARN-3668:
--
Summary: Long run service shouldn't be killed even if Yarn crashed
Key: YARN-3668
URL: https://issues.apache.org/jira/browse/YARN-3668
Project: Hadoop YARN
Issue Type:
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547165#comment-14547165
]
sandflee commented on YARN-3668:
If all RM crashed, all running containers will be killed,
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547168#comment-14547168
]
sandflee commented on YARN-3668:
If am crashed and reaches am max fail times, applications
[
https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3518:
---
Attachment: YARN-3518.004.patch
remove checkstyle warning
default rm/am expire interval should not less than
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560549#comment-14560549
]
sandflee commented on YARN-3668:
when the AM restarts its JARs are re-downloaded from HDFS.
[
https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560550#comment-14560550
]
sandflee commented on YARN-3668:
when the AM restarts its JARs are re-downloaded from HDFS.
[
https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3518:
---
Attachment: YARN-3518.003.patch
default rm/am expire interval should not less than default resourcemanager
[
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560111#comment-14560111
]
sandflee commented on YARN-3644:
Thanks [~vinodkv], what my concerns is long running
[
https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692709#comment-14692709
]
sandflee commented on YARN-2038:
I thought it's the same issue to YARN-3519, but it seems
[
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-4051:
---
Attachment: YARN-4051.01.patch
ContainerKillEvent is lost when container is In New State and is recovering
sandflee created YARN-4050:
--
Summary: NM event dispatcher may blocked by LogAggregationService
if NameNode is slow
Key: YARN-4050
URL: https://issues.apache.org/jira/browse/YARN-4050
Project: Hadoop YARN
sandflee created YARN-4051:
--
Summary: ContainerKillEvent is lost when container is In New
State and is recovering
Key: YARN-4051
URL: https://issues.apache.org/jira/browse/YARN-4051
Project: Hadoop YARN
[
https://issues.apache.org/jira/browse/YARN-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee resolved YARN-4040.
Resolution: Not A Problem
If AM release a container, the complete msg(released by AM) is stored by
[
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-4051:
---
Attachment: YARN-4051.03.patch
pending kill event while container is recovered. and just act like
[
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699745#comment-14699745
]
sandflee commented on YARN-4051:
if recovered as REQUESTED, try to cleanup container
[
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645136#comment-14645136
]
sandflee commented on YARN-3987:
yes, we set getKeepContainersAcrossApplicationAttempts
[
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645341#comment-14645341
]
sandflee commented on YARN-3987:
AM crashes before it register to RM
am container
[
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3987:
---
Attachment: YARN-3987.002.patch
am container complete msg ack to NM once RM receive it
[
https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650644#comment-14650644
]
sandflee commented on YARN-4005:
seems there's no need to add to recentlyStoppedContainers,
sandflee created YARN-3987:
--
Summary: am container complete msg ack to NM once RM receive it
Key: YARN-3987
URL: https://issues.apache.org/jira/browse/YARN-3987
Project: Hadoop YARN
Issue Type: Bug
[
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-3987:
---
Attachment: YARN-3987.001.patch
am container complete msg ack to NM once RM receive it
[
https://issues.apache.org/jira/browse/YARN-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee reassigned YARN-4050:
--
Assignee: sandflee
NM event dispatcher may blocked by LogAggregationService if NameNode is slow
[
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696433#comment-14696433
]
sandflee commented on YARN-3987:
Thanks [~jianhe]!
am container complete msg ack to NM
[
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-4051:
---
Attachment: YARN-4051.02.patch
fix check style errors
ContainerKillEvent is lost when container is In New
sandflee created YARN-4040:
--
Summary: container complete msg should passed to AM,even if the
container is released.
Key: YARN-4040
URL: https://issues.apache.org/jira/browse/YARN-4040
Project: Hadoop YARN
sandflee created YARN-4020:
--
Summary: Exception happens while stopContainer in AM
Key: YARN-4020
URL: https://issues.apache.org/jira/browse/YARN-4020
Project: Hadoop YARN
Issue Type: Bug
[
https://issues.apache.org/jira/browse/YARN-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626528#comment-14626528
]
sandflee commented on YARN-3327:
There is no logs any more, it's a long time and I just fix
[
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-4051:
---
Attachment: YARN-4051.04.patch
NM register to RM after all containers are recovered by default, and user could
[
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001725#comment-15001725
]
sandflee commented on YARN-4051:
thanks [~jlowe]
Should the value be infinite by default? The concern is
[
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-4051:
---
Attachment: YARN-4051.05.patch
set default timeout to 2min, since default nm expire timeout is 10min
>
[
https://issues.apache.org/jira/browse/YARN-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001734#comment-15001734
]
sandflee commented on YARN-4050:
There may be 2 problems:
1, nm dispatcher maybe blocked by logaggregation
[
https://issues.apache.org/jira/browse/YARN-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-4050:
---
Assignee: (was: sandflee)
> NM event dispatcher may blocked by LogAggregationService if NameNode is slow
>
[
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996345#comment-14996345
]
sandflee commented on YARN-4051:
Is it possible for the finish application or complete container requests
[
https://issues.apache.org/jira/browse/YARN-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984590#comment-14984590
]
sandflee commented on YARN-4020:
seems new masterkey are synced to NM but not to AM.I'll try to fix it.
>
[
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984580#comment-14984580
]
sandflee commented on YARN-4051:
Thanks Jason, sorry for just noticed your reply.
It's more reasonable
sandflee created YARN-4277:
--
Summary: containers would be leaked if nm crashed and rm failover
Key: YARN-4277
URL: https://issues.apache.org/jira/browse/YARN-4277
Project: Hadoop YARN
Issue Type:
[
https://issues.apache.org/jira/browse/YARN-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964388#comment-14964388
]
sandflee commented on YARN-4277:
thanks [~jlowe] , you have explained very clearly.
> containers would be
[
https://issues.apache.org/jira/browse/YARN-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964387#comment-14964387
]
sandflee commented on YARN-4277:
yes, this's a problem in our cluster, our NM hangs for a long time because
[
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710716#comment-14710716
]
sandflee commented on YARN-4051:
could anyone help to review it?
ContainerKillEvent is
sandflee created YARN-4426:
--
Summary: unhealthy disk makes NM LOST
Key: YARN-4426
URL: https://issues.apache.org/jira/browse/YARN-4426
Project: Hadoop YARN
Issue Type: Bug
Reporter:
[
https://issues.apache.org/jira/browse/YARN-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee resolved YARN-4426.
Resolution: Duplicate
> unhealthy disk makes NM LOST
>
>
> Key:
[
https://issues.apache.org/jira/browse/YARN-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046203#comment-15046203
]
sandflee commented on YARN-4426:
Thanks [~suda], they are caused by hanged mkdir
> unhealthy disk makes NM
[
https://issues.apache.org/jira/browse/YARN-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046252#comment-15046252
]
sandflee commented on YARN-4301:
it maybe change the behaviour of NM_MIN_HEALTHY_DISKS_FRACTION, could we
[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061577#comment-15061577
]
sandflee commented on YARN-1197:
seems complicated for AM to do this, especially we added disk,network to
[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061347#comment-15061347
]
sandflee commented on YARN-1197:
user application(long running) are running on our yarn platform, they
[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061348#comment-15061348
]
sandflee commented on YARN-1197:
user application(long running) are running on our yarn platform, they
[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061349#comment-15061349
]
sandflee commented on YARN-1197:
user application(long running) are running on our yarn platform, they
[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061346#comment-15061346
]
sandflee commented on YARN-1197:
user application(long running) are running on our yarn platform, they
[
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057229#comment-15057229
]
sandflee commented on YARN-4138:
got it, thanks for your explain!
> Roll back container resource
[
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055330#comment-15055330
]
sandflee commented on YARN-4138:
Hi, [~mding], consider such situation:
1) AM request increase request to
[
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063576#comment-15063576
]
sandflee commented on YARN-4138:
{quote}
We should not update lastConfirmedResource in this scenario. This
[
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052468#comment-15052468
]
sandflee commented on YARN-4138:
if AM increase container size successful in NM, but resource increase
[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059332#comment-15059332
]
sandflee commented on YARN-1197:
seems not support increase memory and decrease cpu cores meanwhile?
>
[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059362#comment-15059362
]
sandflee commented on YARN-1197:
got it, Thanks,[~leftnoteasy]!
> Support changing resources of an
[
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059777#comment-15059777
]
sandflee commented on YARN-4138:
1, use Resources.fitsin(targetResource, lastConfirmedResource)?
[
https://issues.apache.org/jira/browse/YARN-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-4520:
---
Description:
once we restart nodemanager we see many logs like :
2015-12-28 11:59:18,725 WARN
[
https://issues.apache.org/jira/browse/YARN-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sandflee updated YARN-4528:
---
Attachment: YARN-4528.01.patch
1, pending container decrease msg util next heartbeat.
2, nodemanager#allocate
1 - 100 of 463 matches
Mail list logo