[jira] [Commented] (YARN-3468) NM should not blindly rename usercache/filecache/nmPrivate on restart

2015-05-18 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548239#comment-14548239
 ] 

Siqi Li commented on YARN-3468:
---

Agreed. setting yarn.nodemanager.delete.debug-delay-sec to 10 minute is a bad 
idea in a production cluster.
I will mark this jira as won't fix

 NM should not blindly rename usercache/filecache/nmPrivate on restart
 -

 Key: YARN-3468
 URL: https://issues.apache.org/jira/browse/YARN-3468
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-3468.v1.patch, YARN-3468.v2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3468) NM should not blindly rename usercache/filecache/nmPrivate on restart

2015-04-27 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514880#comment-14514880
 ] 

Sangjin Lee commented on YARN-3468:
---

It seems the test failures are related to the patch. [~l201514], could you 
please look into them?

 NM should not blindly rename usercache/filecache/nmPrivate on restart
 -

 Key: YARN-3468
 URL: https://issues.apache.org/jira/browse/YARN-3468
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-3468.v1.patch, YARN-3468.v2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3468) NM should not blindly rename usercache/filecache/nmPrivate on restart

2015-04-27 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514929#comment-14514929
 ] 

Vinod Kumar Vavilapalli commented on YARN-3468:
---

What do you mean by NM flapping?

You shouldn't really be setting yarn.nodemanager.delete.debug-delay-sec to 10 
minute in a production cluster.

Getting this patch in is okay though. Needs a testcase?

 NM should not blindly rename usercache/filecache/nmPrivate on restart
 -

 Key: YARN-3468
 URL: https://issues.apache.org/jira/browse/YARN-3468
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-3468.v1.patch, YARN-3468.v2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3468) NM should not blindly rename usercache/filecache/nmPrivate on restart

2015-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505561#comment-14505561
 ] 

Hadoop QA commented on YARN-3468:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12726942/YARN-3468.v2.patch
  against trunk revision 424a00d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7430//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7430//console

This message is automatically generated.

 NM should not blindly rename usercache/filecache/nmPrivate on restart
 -

 Key: YARN-3468
 URL: https://issues.apache.org/jira/browse/YARN-3468
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-3468.v1.patch, YARN-3468.v2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3468) NM should not blindly rename usercache/filecache/nmPrivate on restart

2015-04-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504093#comment-14504093
 ] 

Sangjin Lee commented on YARN-3468:
---

Sorry this fell off my radar.

{code}
1094  RemoteIteratorFileStatus subDirStatus = 
lfs.listStatus(subDirPath);
1095  if (subDirStatus != null  !subDirStatus.hasNext()) {
1096// Skip the renaming when the given localSubDir is empty
1097return;
1098  }
1099  lfs.rename(subDirPath, new Path(
{code}

Wouldn't it be bit safer to do

{code}
1094  RemoteIteratorFileStatus subDirStatus = 
lfs.listStatus(subDirPath);
1095  if (subDirStatus != null  subDirStatus.hasNext()) {
1096lfs.rename(subDirPath, new Path(
{code}
? Also, I normally try to avoid returning in the middle of a method as it can 
be error-prone (not necessarily here, but as a general rule).


 NM should not blindly rename usercache/filecache/nmPrivate on restart
 -

 Key: YARN-3468
 URL: https://issues.apache.org/jira/browse/YARN-3468
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-3468.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3468) NM should not blindly rename usercache/filecache/nmPrivate on restart

2015-04-20 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504151#comment-14504151
 ] 

Siqi Li commented on YARN-3468:
---

Got it. I will modify the patch as you suggested

 NM should not blindly rename usercache/filecache/nmPrivate on restart
 -

 Key: YARN-3468
 URL: https://issues.apache.org/jira/browse/YARN-3468
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-3468.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3468) NM should not blindly rename usercache/filecache/nmPrivate on restart

2015-04-09 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487979#comment-14487979
 ] 

Siqi Li commented on YARN-3468:
---

Can anyone have some comments on this jira?

 NM should not blindly rename usercache/filecache/nmPrivate on restart
 -

 Key: YARN-3468
 URL: https://issues.apache.org/jira/browse/YARN-3468
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-3468.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3468) NM should not blindly rename usercache/filecache/nmPrivate on restart

2015-04-08 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486285#comment-14486285
 ] 

Siqi Li commented on YARN-3468:
---

The renaming is causing a enormous number of _DEL_timestamp to be created, 
when a NM is flapping and yarn.nodemanager.delete.debug-delay-sec is set to 
several minutes.

This leads to NM full GC problem on starting up.

Here is how things go south, on NM start up it will do 3 renames, and 3 mkdir

usercache -- usercache_DEL_timestamp
filecache   -- filecache_DEL_timestamp
nmPrivate -- nmPrivate_DEL_timestamp

mkdir usercache
mkdir filecache
mkdir nmPrivate

then it will scan all DEL dirs and add them to the deletion service. However, 
with yarn.nodemanager.delete.debug-delay-sec set to 10 minute for example, 
those DEL dirs will not be deleted immediately, but after 10 minutes.

When a NM flaps, it restarts every ~4 sec. There is not enough time(i.e. 10 
minutes) for it to do the deletion. That's why more and more DEL dirs are 
created, but none were removed.

As a result, /data/disk*/yarn/local/ will have hundreds of thousands of DEL 
dirs, and scanning them and adding them to deletion service will blow out the 
memory.

My suggestion is to prevent NMs from renaming those directories if they are 
empty.

 NM should not blindly rename usercache/filecache/nmPrivate on restart
 -

 Key: YARN-3468
 URL: https://issues.apache.org/jira/browse/YARN-3468
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3468) NM should not blindly rename usercache/filecache/nmPrivate on restart

2015-04-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486380#comment-14486380
 ] 

Hadoop QA commented on YARN-3468:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12724051/YARN-3468.v1.patch
  against trunk revision 5a540c3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7265//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7265//console

This message is automatically generated.

 NM should not blindly rename usercache/filecache/nmPrivate on restart
 -

 Key: YARN-3468
 URL: https://issues.apache.org/jira/browse/YARN-3468
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-3468.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)