date:20150610


[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580378#comment-14580378
 ] 

Hadoop QA commented on YARN-3044:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 29s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 56s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 42s | The applied patch generated  1 
new checkstyle issues (total was 236, now 236). |
| {color:green}+1{color} | whitespace |   0m  6s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 42s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   5m 46s | The patch appears to introduce 8 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   6m 55s | Tests failed in 
hadoop-yarn-applications-distributedshell. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |  61m 37s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| {color:green}+1{color} | yarn tests |   1m 15s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | | 118m  6s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-applications-distributedshell |
| FindBugs | module:hadoop-yarn-server-resourcemanager |
| Failed unit tests | 
hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels |
|   | hadoop.yarn.applications.distributedshell.TestDistributedShell |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738768/YARN-3044-YARN-2928.011.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 0a3c147 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8233/artifact/patchprocess/YARN-2928FindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8233/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8233/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-applications-distributedshell.html
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8233/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8233/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8233/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8233/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8233/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8233/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8233/console |


This message was automatically generated.

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components:

[jira] [Assigned] (YARN-3790) TestWorkPreservingRMRestart#testSchedulerRecovery fails in trunk for FS scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu reassigned YARN-3790:
---

Assignee: zhihai xu

 TestWorkPreservingRMRestart#testSchedulerRecovery fails in trunk for FS 
 scheduler
 -

 Key: YARN-3790
 URL: https://issues.apache.org/jira/browse/YARN-3790
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Rohith
Assignee: zhihai xu

 Failure trace is as follows
 {noformat}
 Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
   Time elapsed: 6.502 sec   FAILURE!
 java.lang.AssertionError: expected:6144 but was:8192
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS


 [ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3044:

Attachment: YARN-3044-YARN-2928.011.patch

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
 YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, 
 YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, 
 YARN-3044-YARN-2928.011.patch, YARN-3044.20150325-1.patch, 
 YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3790) TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in trunk for FS scheduler

2015-06-10 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580255#comment-14580255
 ] 

Rohith commented on YARN-3790:
--

Thanks for looking into this issue,
bq. If UpdateThread call update after recoverContainersOnNode, the test will 
succeed
In the test, I see below code which verify for contaner to recover right?
{code}
// Wait for RM to settle down on recovering containers;
waitForNumContainersToRecover(2, rm2, am1.getApplicationAttemptId());
SetContainerId launchedContainers =
((RMNodeImpl) rm2.getRMContext().getRMNodes().get(nm1.getNodeId()))
  .getLaunchedContainers();
assertTrue(launchedContainers.contains(amContainer.getContainerId()));
assertTrue(launchedContainers.contains(runningContainer.getContainerId()));
{code}

Am I missing anything?

 TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in 
 trunk for FS scheduler
 

 Key: YARN-3790
 URL: https://issues.apache.org/jira/browse/YARN-3790
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Rohith
Assignee: zhihai xu

 Failure trace is as follows
 {noformat}
 Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
   Time elapsed: 6.502 sec   FAILURE!
 java.lang.AssertionError: expected:6144 but was:8192
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3790) TestWorkPreservingRMRestart#testSchedulerRecovery fails in trunk for FS scheduler


[ 
https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580209#comment-14580209
 ] 

zhihai xu commented on YARN-3790:
-

Hi [~rohithsharma], thanks for reporting this issue. I think this test fails 
intermittently.
The following is stack trace for the test failure:
{code}
java.lang.AssertionError: expected:6144 but was:8192
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:852)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:341)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:240)
{code}
The failure is {{rootMetrics}}'s available resource is not correct for 
FairScheduler.
I know what cause this test failure.
For FairScheduler, {{updateRootQueueMetrics}} is used to update 
{{rootMetrics}}'s available resource.
But {{updateRootQueueMetrics}} is not called in/after 
{{recoverContainersOnNode}}, in this case, we can only depend UpdateThread to 
update {{rootMetrics}}'s available resource. Currently UpdateThread will be 
triggered in {{addNode}}. The timing in UpdateThread will decide whether this 
test will succeed or not. If UpdateThread call {{update}} after 
{{recoverContainersOnNode}}, the test will succeed. If UpdateThread call 
{{update}} before {{recoverContainersOnNode}}, the test will fail.

 TestWorkPreservingRMRestart#testSchedulerRecovery fails in trunk for FS 
 scheduler
 -

 Key: YARN-3790
 URL: https://issues.apache.org/jira/browse/YARN-3790
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Rohith
Assignee: zhihai xu

 Failure trace is as follows
 {noformat}
 Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
   Time elapsed: 6.502 sec   FAILURE!
 java.lang.AssertionError: expected:6144 but was:8192
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2194) Cgroups cease to work in RHEL7

2015-06-10 Thread Wei Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2194:
--
Attachment: YARN-2194-4.patch

Uploaded a patch by replacing comma with '%'.

 Cgroups cease to work in RHEL7
 --

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Critical
 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
 YARN-2194-4.patch


 In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the 
 controller name leads to container launch failure. 
 RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
 systemd has certain shortcomings as identified in this JIRA (see comments). 
 This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7


[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580275#comment-14580275
 ] 

Hadoop QA commented on YARN-2194:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 56s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 37s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 36s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 13s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m  6s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  43m 44s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738765/YARN-2194-4.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6785661 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8232/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8232/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8232/console |


This message was automatically generated.

 Cgroups cease to work in RHEL7
 --

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Critical
 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
 YARN-2194-4.patch


 In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the 
 controller name leads to container launch failure. 
 RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
 systemd has certain shortcomings as identified in this JIRA (see comments). 
 This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3791) FSDownload

2015-06-10 Thread HuanWang (JIRA)

HuanWang created YARN-3791:
--

 Summary: FSDownload
 Key: YARN-3791
 URL: https://issues.apache.org/jira/browse/YARN-3791
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
 Environment: Linux 2.6.32-279.el6.x86_64 
Reporter: HuanWang


Inadvertently,we set two source ftp path:

 { { ftp://10.27.178.207:21/home/cbt/1213/jxf.sql, 143322551, FILE, null 
},pending,[(container_20150608111420_41540_1213_1503_)],4237640867118938,DOWNLOADING}

ftp://10.27.89.13:21/home/cbt/common/2/sql.jar, 1433225415000, FILE, null 
},pending,[(container_20150608111420_41540_1213_1503_)],4237640866988089,DOWNLOADING}

the first one is a wrong path,only one source was set this;but Follow the log,i 
saw Starting from the first path source download,All next jobs sources were 
downloaded from  ftp://10.27.178.207 by default.


the log is :

code
2015-06-09 11:14:34,653 INFO  [AsyncDispatcher event handler] 
localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:addResource(544)) - Downloading public rsrc:{ 
ftp://10.27.89.13:21/home/cbt/common/2/sql.jar, 1433225415000, FILE, null }
2015-06-09 11:14:34,653 INFO  [AsyncDispatcher event handler] 
localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:addResource(544)) - Downloading public rsrc:{ 
ftp://10.27.178.207:21/home/cbt/1213/jxf.sql, 143322551, FILE, null }
2015-06-09 11:14:37,883 INFO  [Public Localizer] 
localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:run(672)) - Failed to download rsrc { { 
ftp://10.27.178.207:21/home/cbt/1213/jxf.sql, 143322551, FILE, null 
},pending,[(container_20150608111420_41540_1213_1503_)],4237640867118938,DOWNLOADING}
java.io.IOException: Login failed on server - 10.27.178.207, port - 21
at 
org.apache.hadoop.fs.ftp.FTPFileSystem.connect(FTPFileSystem.java:133)
at 
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:390)
at 
com.suning.cybertron.superion.util.FSDownload.copy(FSDownload.java:172)
at 
com.suning.cybertron.superion.util.FSDownload.call(FSDownload.java:279)
at 
com.suning.cybertron.superion.util.FSDownload.call(FSDownload.java:52)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-06-09 11:14:37,885 INFO  [Public Localizer] localizer.LocalizedResource 
(LocalizedResource.java:handle(203)) - Resource 
ftp://10.27.178.207:21/home/cbt/1213/jxf.sql transitioned from DOWNLOADING to 
FAILED
2015-06-09 11:14:37,886 INFO  [Public Localizer] 
localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:run(672)) - Failed to download rsrc { { 
ftp://10.27.89.13:21/home/cbt/common/2/sql.jar, 1433225415000, FILE, null 
},pending,[(container_20150608111420_41540_1213_1503_)],4237640866988089,DOWNLOADING}
java.io.IOException: Login failed on server - 10.27.178.207, port - 21
at 
org.apache.hadoop.fs.ftp.FTPFileSystem.connect(FTPFileSystem.java:133)
at 
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:390)
at 
com.suning.cybertron.superion.util.FSDownload.copy(FSDownload.java:172)
at 
com.suning.cybertron.superion.util.FSDownload.call(FSDownload.java:279)
at 
com.suning.cybertron.superion.util.FSDownload.call(FSDownload.java:52)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-06-09 11:14:37,886 INFO  [AsyncDispatcher event handler] 
container.Container (ContainerImpl.java:handle(853)) - Container 
container_20150608111420_41540_1213_1503_ transitioned from LOCALIZING to 
LOCALIZATION_FAILED
2015-06-09 11:14:37,887 INFO  [Public Localizer] localizer.LocalizedResource 
(LocalizedResource.java:handle(203)) - Resource 
ftp://10.27.89.13:21/home/cbt/common/2/sql.jar transitioned from DOWNLOADING to 
FAILED
2015-06-09 11:14:37,887 INFO  [AsyncDispatcher event handler] 
localizer.LocalResourcesTrackerImpl 
(LocalResourcesTrackerImpl.java:handle(133)) - Container 
container_20150608111420_41540_1213_1503_ sent RELEASE event on a resource 
request {

[jira] [Updated] (YARN-3791) FSDownload

2015-06-10 Thread HuanWang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HuanWang updated YARN-3791:
---
Description: 
Inadvertently,we set two source ftp path:
{code}
 { { ftp://10.27.178.207:21/home/cbt/1213/jxf.sql, 143322551, FILE, null 
},pending,[(container_20150608111420_41540_1213_1503_)],4237640867118938,DOWNLOADING}

ftp://10.27.89.13:21/home/cbt/common/2/sql.jar, 1433225415000, FILE, null 
},pending,[(container_20150608111420_41540_1213_1503_)],4237640866988089,DOWNLOADING}

the first one is a wrong path,only one source was set this;but Follow the log,i 
saw Starting from the first path source download,All next jobs sources were 
downloaded from  ftp://10.27.178.207 by default.


the log is :

code
2015-06-09 11:14:34,653 INFO  [AsyncDispatcher event handler] 
localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:addResource(544)) - Downloading public rsrc:{ 
ftp://10.27.89.13:21/home/cbt/common/2/sql.jar, 1433225415000, FILE, null }
2015-06-09 11:14:34,653 INFO  [AsyncDispatcher event handler] 
localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:addResource(544)) - Downloading public rsrc:{ 
ftp://10.27.178.207:21/home/cbt/1213/jxf.sql, 143322551, FILE, null }
2015-06-09 11:14:37,883 INFO  [Public Localizer] 
localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:run(672)) - Failed to download rsrc { { 
ftp://10.27.178.207:21/home/cbt/1213/jxf.sql, 143322551, FILE, null 
},pending,[(container_20150608111420_41540_1213_1503_)],4237640867118938,DOWNLOADING}
java.io.IOException: Login failed on server - 10.27.178.207, port - 21
at 
org.apache.hadoop.fs.ftp.FTPFileSystem.connect(FTPFileSystem.java:133)
at 
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:390)
at 
com.suning.cybertron.superion.util.FSDownload.copy(FSDownload.java:172)
at 
com.suning.cybertron.superion.util.FSDownload.call(FSDownload.java:279)
at 
com.suning.cybertron.superion.util.FSDownload.call(FSDownload.java:52)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-06-09 11:14:37,885 INFO  [Public Localizer] localizer.LocalizedResource 
(LocalizedResource.java:handle(203)) - Resource 
ftp://10.27.178.207:21/home/cbt/1213/jxf.sql transitioned from DOWNLOADING to 
FAILED
2015-06-09 11:14:37,886 INFO  [Public Localizer] 
localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:run(672)) - Failed to download rsrc { { 
ftp://10.27.89.13:21/home/cbt/common/2/sql.jar, 1433225415000, FILE, null 
},pending,[(container_20150608111420_41540_1213_1503_)],4237640866988089,DOWNLOADING}
java.io.IOException: Login failed on server - 10.27.178.207, port - 21
at 
org.apache.hadoop.fs.ftp.FTPFileSystem.connect(FTPFileSystem.java:133)
at 
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:390)
at 
com.suning.cybertron.superion.util.FSDownload.copy(FSDownload.java:172)
at 
com.suning.cybertron.superion.util.FSDownload.call(FSDownload.java:279)
at 
com.suning.cybertron.superion.util.FSDownload.call(FSDownload.java:52)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-06-09 11:14:37,886 INFO  [AsyncDispatcher event handler] 
container.Container (ContainerImpl.java:handle(853)) - Container 
container_20150608111420_41540_1213_1503_ transitioned from LOCALIZING to 
LOCALIZATION_FAILED
2015-06-09 11:14:37,887 INFO  [Public Localizer] localizer.LocalizedResource 
(LocalizedResource.java:handle(203)) - Resource 
ftp://10.27.89.13:21/home/cbt/common/2/sql.jar transitioned from DOWNLOADING to 
FAILED
2015-06-09 11:14:37,887 INFO  [AsyncDispatcher event handler] 
localizer.LocalResourcesTrackerImpl 
(LocalResourcesTrackerImpl.java:handle(133)) - Container 
container_20150608111420_41540_1213_1503_ sent RELEASE event on a resource 
request { ftp://10.27.89.13:21/home/cbt/common/2/sql.jar, 1433225415000, FILE, 
null } not present
{code}

I debug the code of yarn.I found the piont is 
org.apache.hadoop.fs.FileSystem#cache 

the code source is here:

{code}
private FileSystem

[jira] [Updated] (YARN-3791) FSDownload

2015-06-10 Thread HuanWang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HuanWang updated YARN-3791:
---
Description: 
Inadvertently,we set two source ftp path:
{code}
 { { ftp://10.27.178.207:21/home/cbt/1213/jxf.sql, 143322551, FILE, null 
},pending,[(container_20150608111420_41540_1213_1503_)],4237640867118938,DOWNLOADING}

ftp://10.27.89.13:21/home/cbt/common/2/sql.jar, 1433225415000, FILE, null 
},pending,[(container_20150608111420_41540_1213_1503_)],4237640866988089,DOWNLOADING}

the first one is a wrong path,only one source was set this;but Follow the log,i 
saw Starting from the first path source download,All next jobs sources were 
downloaded from  ftp://10.27.178.207 by default.
{code}

the log is :

{code}
2015-06-09 11:14:34,653 INFO  [AsyncDispatcher event handler] 
localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:addResource(544)) - Downloading public rsrc:{ 
ftp://10.27.89.13:21/home/cbt/common/2/sql.jar, 1433225415000, FILE, null }
2015-06-09 11:14:34,653 INFO  [AsyncDispatcher event handler] 
localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:addResource(544)) - Downloading public rsrc:{ 
ftp://10.27.178.207:21/home/cbt/1213/jxf.sql, 143322551, FILE, null }
2015-06-09 11:14:37,883 INFO  [Public Localizer] 
localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:run(672)) - Failed to download rsrc { { 
ftp://10.27.178.207:21/home/cbt/1213/jxf.sql, 143322551, FILE, null 
},pending,[(container_20150608111420_41540_1213_1503_)],4237640867118938,DOWNLOADING}
java.io.IOException: Login failed on server - 10.27.178.207, port - 21
at 
org.apache.hadoop.fs.ftp.FTPFileSystem.connect(FTPFileSystem.java:133)
at 
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:390)
at 
com.suning.cybertron.superion.util.FSDownload.copy(FSDownload.java:172)
at 
com.suning.cybertron.superion.util.FSDownload.call(FSDownload.java:279)
at 
com.suning.cybertron.superion.util.FSDownload.call(FSDownload.java:52)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-06-09 11:14:37,885 INFO  [Public Localizer] localizer.LocalizedResource 
(LocalizedResource.java:handle(203)) - Resource 
ftp://10.27.178.207:21/home/cbt/1213/jxf.sql transitioned from DOWNLOADING to 
FAILED
2015-06-09 11:14:37,886 INFO  [Public Localizer] 
localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:run(672)) - Failed to download rsrc { { 
ftp://10.27.89.13:21/home/cbt/common/2/sql.jar, 1433225415000, FILE, null 
},pending,[(container_20150608111420_41540_1213_1503_)],4237640866988089,DOWNLOADING}
java.io.IOException: Login failed on server - 10.27.178.207, port - 21
at 
org.apache.hadoop.fs.ftp.FTPFileSystem.connect(FTPFileSystem.java:133)
at 
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:390)
at 
com.suning.cybertron.superion.util.FSDownload.copy(FSDownload.java:172)
at 
com.suning.cybertron.superion.util.FSDownload.call(FSDownload.java:279)
at 
com.suning.cybertron.superion.util.FSDownload.call(FSDownload.java:52)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-06-09 11:14:37,886 INFO  [AsyncDispatcher event handler] 
container.Container (ContainerImpl.java:handle(853)) - Container 
container_20150608111420_41540_1213_1503_ transitioned from LOCALIZING to 
LOCALIZATION_FAILED
2015-06-09 11:14:37,887 INFO  [Public Localizer] localizer.LocalizedResource 
(LocalizedResource.java:handle(203)) - Resource 
ftp://10.27.89.13:21/home/cbt/common/2/sql.jar transitioned from DOWNLOADING to 
FAILED
2015-06-09 11:14:37,887 INFO  [AsyncDispatcher event handler] 
localizer.LocalResourcesTrackerImpl 
(LocalResourcesTrackerImpl.java:handle(133)) - Container 
container_20150608111420_41540_1213_1503_ sent RELEASE event on a resource 
request { ftp://10.27.89.13:21/home/cbt/common/2/sql.jar, 1433225415000, FILE, 
null } not present
{code}

I debug the code of yarn.I found the piont is 
org.apache.hadoop.fs.FileSystem#cache 

the code source is here:

{code}
private

[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580508#comment-14580508
 ] 

Naganarasimha G R commented on YARN-3044:
-

[~zjshen],
Seems like many of the test case failures in TestDistributedShell, 
TestDistributedShellWithNodeLabels etc.. are not related to this jira, opening 
new jira to handle it, based on past exp better to handle in new jira so that 
duplicate effort will be avoided.

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
 YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, 
 YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, 
 YARN-3044-YARN-2928.011.patch, YARN-3044.20150325-1.patch, 
 YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3792) Test case failures in TestDistributedShell after changes for subjira's of YARN-2928

Naganarasimha G R created YARN-3792:
---

 Summary: Test case failures in TestDistributedShell after changes 
for subjira's of YARN-2928
 Key: YARN-3792
 URL: https://issues.apache.org/jira/browse/YARN-3792
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R


encountered [testcase 
failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] 
which was happening even without the patch modifications in YARN-3044

TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow
TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow
TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3790) TestWorkPreservingRMRestart#testSchedulerRecovery fails in trunk for FS scheduler

2015-06-10 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580228#comment-14580228
 ] 

Rohith commented on YARN-3790:
--

bq. I think this test fails intermittently.
Yes, it is failing intermittenlty. May be issue summary can be updated.

 TestWorkPreservingRMRestart#testSchedulerRecovery fails in trunk for FS 
 scheduler
 -

 Key: YARN-3790
 URL: https://issues.apache.org/jira/browse/YARN-3790
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Rohith
Assignee: zhihai xu

 Failure trace is as follows
 {noformat}
 Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
   Time elapsed: 6.502 sec   FAILURE!
 java.lang.AssertionError: expected:6144 but was:8192
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3790) TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in trunk for FS scheduler

2015-06-10 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3790:
-
Summary: TestWorkPreservingRMRestart#testSchedulerRecovery fails 
intermittently in trunk for FS scheduler  (was: 
TestWorkPreservingRMRestart#testSchedulerRecovery fails in trunk for FS 
scheduler)

 TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in 
 trunk for FS scheduler
 

 Key: YARN-3790
 URL: https://issues.apache.org/jira/browse/YARN-3790
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Rohith
Assignee: zhihai xu

 Failure trace is as follows
 {noformat}
 Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
   Time elapsed: 6.502 sec   FAILURE!
 java.lang.AssertionError: expected:6144 but was:8192
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580253#comment-14580253
 ] 

Naganarasimha G R commented on YARN-3044:
-

Hi [~zjshen],
Have taken care of the issue which you have mentioned and also added 
some test cases in TestDistributedShell to cover it (along with some code 
refactoring). Please review
bq. I'm not sure because as far as I can tell, NM's impl is different from 
RM's, but it's up to you to figure out the proper solution
Yep will start doing that now, but getting experts advise to make my job easy ;)

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
 YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, 
 YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, 
 YARN-3044-YARN-2928.011.patch, YARN-3044.20150325-1.patch, 
 YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3785) Support for Resource as an argument during submitApp call in MockRM test class

2015-06-10 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580101#comment-14580101
 ] 

Sunil G commented on YARN-3785:
---

Looks like YARN-3790 is filed to track the test failure separately. This 
failure is independent of this patch. [~xgong] could you please take a look.

 Support for Resource as an argument during submitApp call in MockRM test class
 --

 Key: YARN-3785
 URL: https://issues.apache.org/jira/browse/YARN-3785
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
Priority: Minor
 Attachments: 0001-YARN-3785.patch, 0002-YARN-3785.patch


 Currently MockRM#submitApp supports only memory. Adding test cases to support 
 vcores so that DominentResourceCalculator can be tested with this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers


[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580961#comment-14580961
 ] 

Hadoop QA commented on YARN-3051:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 26s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |  10m 12s | The applied patch generated  11  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 22s | The applied patch generated  
25 new checkstyle issues (total was 243, now 267). |
| {color:green}+1{color} | shellcheck |   0m  6s | There were no new shellcheck 
(v0.3.3) issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 40s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m  2s | The patch appears to introduce 5 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   1m 27s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  48m  2s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-timelineservice |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738884/YARN-3051-YARN-2928.04.patch
 |
| Optional Tests | shellcheck javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 0a3c147 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/8234/artifact/patchprocess/diffJavadocWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8234/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8234/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-timelineservice.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8234/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8234/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8234/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8234/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8234/console |


This message was automatically generated.

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051-YARN-2928.003.patch, 
 YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, 
 YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3790) TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in trunk for FS scheduler


[ 
https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581026#comment-14581026
 ] 

zhihai xu commented on YARN-3790:
-

I uploaded a patch YARN-3790.000.patch which will move 
{{updateRootQueueMetrics}} after {{recoverContainersOnNode}} in {{addNode}}.

 TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in 
 trunk for FS scheduler
 

 Key: YARN-3790
 URL: https://issues.apache.org/jira/browse/YARN-3790
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Rohith
Assignee: zhihai xu
 Attachments: YARN-3790.000.patch


 Failure trace is as follows
 {noformat}
 Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
   Time elapsed: 6.502 sec   FAILURE!
 java.lang.AssertionError: expected:6144 but was:8192
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3793) Several NPEs when deleting local files on NM recovery

2015-06-10 Thread Karthik Kambatla (JIRA)

Karthik Kambatla created YARN-3793:
--

 Summary: Several NPEs when deleting local files on NM recovery
 Key: YARN-3793
 URL: https://issues.apache.org/jira/browse/YARN-3793
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical


When NM work-preserving restart is enabled, we see several NPEs on recovery. 
These seem to correspond to sub-directories that need to be deleted. I wonder 
if null pointers here mean incorrect tracking of these resources and a 
potential leak. This JIRA is to investigate and fix anything required.

Logs show:
{noformat}
2015-05-18 07:06:10,225 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting 
absolute path : null
2015-05-18 07:06:10,224 ERROR 
org.apache.hadoop.yarn.server.nodemanager.DeletionService: Exception during 
execution of task in DeletionService
java.lang.NullPointerException
at 
org.apache.hadoop.fs.FileContext.fixRelativePart(FileContext.java:274)
at org.apache.hadoop.fs.FileContext.delete(FileContext.java:755)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.deleteAsUser(DefaultContainerExecutor.java:458)
at 
org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:293)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3793) Several NPEs when deleting local files on NM recovery

2015-06-10 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3793:
---
Priority: Major  (was: Critical)

 Several NPEs when deleting local files on NM recovery
 -

 Key: YARN-3793
 URL: https://issues.apache.org/jira/browse/YARN-3793
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 When NM work-preserving restart is enabled, we see several NPEs on recovery. 
 These seem to correspond to sub-directories that need to be deleted. I wonder 
 if null pointers here mean incorrect tracking of these resources and a 
 potential leak. This JIRA is to investigate and fix anything required.
 Logs show:
 {noformat}
 2015-05-18 07:06:10,225 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting 
 absolute path : null
 2015-05-18 07:06:10,224 ERROR 
 org.apache.hadoop.yarn.server.nodemanager.DeletionService: Exception during 
 execution of task in DeletionService
 java.lang.NullPointerException
 at 
 org.apache.hadoop.fs.FileContext.fixRelativePart(FileContext.java:274)
 at org.apache.hadoop.fs.FileContext.delete(FileContext.java:755)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.deleteAsUser(DefaultContainerExecutor.java:458)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:293)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3790) TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in trunk for FS scheduler


[ 
https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580965#comment-14580965
 ] 

zhihai xu commented on YARN-3790:
-

[~rohithsharma], thanks for updating the title.
The containers are recovered. {{rootMetrics}}'s used resource is also updated, 
But {{rootMetrics}}'s available resource is not updated.
The following logs in the failed test proved it:
{code}
2015-06-09 22:55:42,964 INFO  [ResourceManager Event Processor] 
fair.FairScheduler (FairScheduler.java:addNode(855)) - Added node 
127.0.0.1:1234 cluster capacity: memory:8192, vCores:8
2015-06-09 22:55:42,964 DEBUG [AsyncDispatcher event handler] rmapp.RMAppImpl 
(RMAppImpl.java:handle(756)) - Processing event for 
application_1433915736884_0001 of type NODE_UPDATE
2015-06-09 22:55:42,964 DEBUG [AsyncDispatcher event handler] rmapp.RMAppImpl 
(RMAppImpl.java:processNodeUpdate(820)) - Received node update 
event:NODE_USABLE for node:127.0.0.1:1234 with state:RUNNING
2015-06-09 22:55:42,964 DEBUG [FairSchedulerUpdateThread] fair.FSLeafQueue 
(FSLeafQueue.java:updateDemand(287)) - The updated demand for root.default is 
memory:0, vCores:0; the max is memory:2147483647, vCores:2147483647
2015-06-09 22:55:42,964 DEBUG [FairSchedulerUpdateThread] fair.FSLeafQueue 
(FSLeafQueue.java:updateDemand(289)) - The updated fairshare for root.default 
is memory:0, vCores:0
2015-06-09 22:55:42,964 DEBUG [FairSchedulerUpdateThread] fair.FSParentQueue 
(FSParentQueue.java:updateDemand(163)) - Counting resource from root.default 
memory:0, vCores:0; Total resource consumption for root now memory:0, 
vCores:0
2015-06-09 22:55:42,964 DEBUG [FairSchedulerUpdateThread] fair.FSLeafQueue 
(FSLeafQueue.java:updateDemandForApp(298)) - Counting resource from 
application_1433915736884_0001 memory:0, vCores:0; Total resource consumption 
for root.zxu now memory:0, vCores:0
2015-06-09 22:55:42,964 DEBUG [FairSchedulerUpdateThread] fair.FSLeafQueue 
(FSLeafQueue.java:updateDemand(287)) - The updated demand for root.zxu is 
memory:0, vCores:0; the max is memory:2147483647, vCores:2147483647
2015-06-09 22:55:42,964 DEBUG [FairSchedulerUpdateThread] fair.FSLeafQueue 
(FSLeafQueue.java:updateDemand(289)) - The updated fairshare for root.zxu is 
memory:0, vCores:0
2015-06-09 22:55:42,964 DEBUG [FairSchedulerUpdateThread] fair.FSParentQueue 
(FSParentQueue.java:updateDemand(163)) - Counting resource from root.zxu 
memory:0, vCores:0; Total resource consumption for root now memory:0, 
vCores:0
2015-06-09 22:55:42,964 DEBUG [FairSchedulerUpdateThread] fair.FSParentQueue 
(FSParentQueue.java:updateDemand(177)) - The updated demand for root is 
memory:0, vCores:0; the max is memory:2147483647, vCores:2147483647
2015-06-09 22:55:42,964 DEBUG [FairSchedulerUpdateThread] fair.FSQueue 
(FSQueue.java:setFairShare(196)) - The updated fairShare for root is 
memory:8192, vCores:8
2015-06-09 22:55:42,965 INFO  [ResourceManager Event Processor] 
scheduler.AbstractYarnScheduler 
(AbstractYarnScheduler.java:recoverContainersOnNode(349)) - Recovering 
container container_id { app_attempt_id { application_id { id: 1 
cluster_timestamp: 1433915736884 } attemptId: 1 } id: 1 } container_state: 
C_RUNNING resource { memory: 1024 virtual_cores: 1 } priority { priority: 0 } 
diagnostics: recover container container_exit_status: 0 creation_time: 0 
nodeLabelExpression: 
2015-06-09 22:55:42,965 DEBUG [ResourceManager Event Processor] 
rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(382)) - Processing 
container_1433915736884_0001_01_01 of type RECOVER
2015-06-09 22:55:42,965 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(167)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppRunningOnNodeEvent.EventType:
 APP_RUNNING_ON_NODE
2015-06-09 22:55:42,965 INFO  [ResourceManager Event Processor] 
rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(394)) - 
container_1433915736884_0001_01_01 Container Transitioned from NEW to 
RUNNING
2015-06-09 22:55:42,965 DEBUG [AsyncDispatcher event handler] rmapp.RMAppImpl 
(RMAppImpl.java:handle(756)) - Processing event for 
application_1433915736884_0001 of type APP_RUNNING_ON_NODE
2015-06-09 22:55:42,965 INFO  [ResourceManager Event Processor] 
scheduler.SchedulerNode (SchedulerNode.java:allocateContainer(154)) - Assigned 
container container_1433915736884_0001_01_01 of capacity memory:1024, 
vCores:1 on host 127.0.0.1:1234, which has 1 containers, memory:1024, 
vCores:1 used and memory:7168, vCores:7 available after allocation
2015-06-09 22:55:42,966 INFO  [ResourceManager Event Processor] 
scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:recoverContainer(651)) - SchedulerAttempt 
appattempt_1433915736884_0001_01 is recovering container 
container_1433915736884_0001_01_01
2015-06-09 22:55:42,966 INFO  [ResourceManager Event

[jira] [Updated] (YARN-3790) TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in trunk for FS scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3790:

Attachment: YARN-3790.000.patch

 TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in 
 trunk for FS scheduler
 

 Key: YARN-3790
 URL: https://issues.apache.org/jira/browse/YARN-3790
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Rohith
Assignee: zhihai xu
 Attachments: YARN-3790.000.patch


 Failure trace is as follows
 {noformat}
 Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
   Time elapsed: 6.502 sec   FAILURE!
 java.lang.AssertionError: expected:6144 but was:8192
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

[
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580858#comment-14580858
]

Varun Saxena commented on YARN-3051:

[~zjshen], thanks for your inputs. I will brief you about the APIs' I have
decided as of now.

# APIs' for querying individual entity/flow/flow run/user and APIs' for
querying a set of entities/flow runs/flows/users. APIs' such a set of
flows/users will contain aggregated data. The reason for separate endpoints for
entities, flows, users,etc. is because of the different tables in HBase/Phoenix
schema.
# Most the APIs' will be variations of either getting a single entity or a set
of entities. So I will primarily talk about entity and a set of entities in
subsequent points.
# For getting a set of entities, there will be 3 kinds of filters - filtering
on the basis of info, filtering on configs and filtering on metrics. Filtering
on the basis of info and field will be based on equality, for instance, fetch
entities which have a config name matching a specific config value. Metrics
filtering though will be on the basis of relational operator. For instance,
user can query entities which have a specific metric = a certain value.
# In addition to that certain predicates such as limit, windowStart, windowEnd,
etc. which used to exist in ATSv1 exist even now.Some predicates such as
fromId, fromTs may not make sense in ATSv2 but I have included them for now
with the intention of discussion.
# Additional predicates such as metricswindowStart and end has been specified
to fetch metrics data for a specific time span. The reason I included this is
because this can aid in plotting graphs on UI for a specific metric of some
entity.
# Only entity id, type, created and modified time will be returned if fields
are not specified in REST URL. This will be the default view of an entity.
# Moreover you can also specify which configurations and metrics to return.
# Every query param will be received as a String, even timestamp. Now from
backing storage implementation viewpoint, would it make more sense to let these
query params be passed as strings or do datatype conversion ?

Few concerns from Li Lu regarding parameter list becoming too long are quite
valid as most of them will be nulls. We can also club multiple related
parameters in a different classes to reduce them. Or as he said have different
methods for frequently occurring use cases. Thoughts ?

Comments are welcome so that this JIRA can speed up, probably after Hadoop
Summit :)

[Storage abstraction] Create backing storage read interface for ATS readers
---

Key: YARN-3051
URL: https://issues.apache.org/jira/browse/YARN-3051
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Varun Saxena
Attachments: YARN-3051-YARN-2928.003.patch,
YARN-3051-YARN-2928.03.patch, YARN-3051.wip.02.YARN-2928.patch,
YARN-3051.wip.patch, YARN-3051_temp.patch

Per design in YARN-2928, create backing storage read interface that can be
implemented by multiple backing storage implementations.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3789) Refactor logs for LeafQueue#activateApplications() to remove duplicate logging

2015-06-10 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580821#comment-14580821
 ] 

Bibin A Chundatt commented on YARN-3789:


With this patch there is no increase in number  of lines. Checkstyle issue 
seems unrelated

 Refactor logs for LeafQueue#activateApplications() to remove duplicate logging
 --

 Key: YARN-3789
 URL: https://issues.apache.org/jira/browse/YARN-3789
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: 0001-YARN-3789.patch, 0002-YARN-3789.patch, 
 0003-YARN-3789.patch


 Duplicate logging from resource manager
 during am limit check for each application
 {code}
 015-06-09 17:32:40,019 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 not starting application as amIfStarted exceeds amLimit
 2015-06-09 17:32:40,019 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 not starting application as amIfStarted exceeds amLimit
 2015-06-09 17:32:40,019 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 not starting application as amIfStarted exceeds amLimit
 2015-06-09 17:32:40,019 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 not starting application as amIfStarted exceeds amLimit
 2015-06-09 17:32:40,019 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 not starting application as amIfStarted exceeds amLimit
 2015-06-09 17:32:40,019 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 not starting application as amIfStarted exceeds amLimit
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2578) NM does not failover timely if RM node network connection fails

2015-06-10 Thread Masatake Iwasaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-2578:
---
Attachment: YARN-2578.002.patch

I attached 002 which makes rpcTimeout configurable by ipc.client.rpc.timeout. 
The default value is 0 in order to keep current behaviour. We can test timeout 
by changing the value explicitly and change the default value later after some 
tests. I also left Client#getTimeout as is to keep compatibility.

 NM does not failover timely if RM node network connection fails
 ---

 Key: YARN-2578
 URL: https://issues.apache.org/jira/browse/YARN-2578
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg
 Attachments: YARN-2578.002.patch, YARN-2578.patch


 The NM does not fail over correctly when the network cable of the RM is 
 unplugged or the failure is simulated by a service network stop or a 
 firewall that drops all traffic on the node. The RM fails over to the standby 
 node when the failure is detected as expected. The NM should than re-register 
 with the new active RM. This re-register takes a long time (15 minutes or 
 more). Until then the cluster has no nodes for processing and applications 
 are stuck.
 Reproduction test case which can be used in any environment:
 - create a cluster with 3 nodes
 node 1: ZK, NN, JN, ZKFC, DN, RM, NM
 node 2: ZK, NN, JN, ZKFC, DN, RM, NM
 node 3: ZK, JN, DN, NM
 - start all services make sure they are in good health
 - kill the network connection of the RM that is active using one of the 
 network kills from above
 - observe the NN and RM failover
 - the DN's fail over to the new active NN
 - the NM does not recover for a long time
 - the logs show a long delay and traces show no change at all
 The stack traces of the NM all show the same set of threads. The main thread 
 which should be used in the re-register is the Node Status Updater This 
 thread is stuck in:
 {code}
 Node Status Updater prio=10 tid=0x7f5a6cc99800 nid=0x18d0 in 
 Object.wait() [0x7f5a51fc1000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.hadoop.ipc.Client.call(Client.java:1395)
   - locked 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at org.apache.hadoop.ipc.Client.call(Client.java:1362)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy26.nodeHeartbeat(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
 {code}
 The client connection which goes through the proxy can be traced back to the 
 ResourceTrackerPBClientImpl. The generated proxy does not time out and we 
 should be using a version which takes the RPC timeout (from the 
 configuration) as a parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers


 [ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3051:
---
Attachment: YARN-3051-YARN-2928.04.patch

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051-YARN-2928.003.patch, 
 YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, 
 YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers


[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580910#comment-14580910
 ] 

Varun Saxena commented on YARN-3051:


As of now, there are very similar APIs' for 
getEntity/getFlowEntity/getUserEntity etc. Will it be fine to combine these 
APIs' and pass something like a query type(ENTITY/USER/FLOW,etc.) in the API 
which storage implementation can then use to decide which type of query it is ?

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051-YARN-2928.003.patch, 
 YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, 
 YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3790) TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in trunk for FS scheduler


[ 
https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581151#comment-14581151
 ] 

Hadoop QA commented on YARN-3790:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  3s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 45s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  50m 55s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  88m 51s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738905/YARN-3790.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c7729ef |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8235/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8235/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8235/console |


This message was automatically generated.

 TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in 
 trunk for FS scheduler
 

 Key: YARN-3790
 URL: https://issues.apache.org/jira/browse/YARN-3790
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Rohith
Assignee: zhihai xu
 Attachments: YARN-3790.000.patch


 Failure trace is as follows
 {noformat}
 Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
   Time elapsed: 6.502 sec   FAILURE!
 java.lang.AssertionError: expected:6144 but was:8192
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster

2015-06-10 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581430#comment-14581430
 ] 

Xuan Gong commented on YARN-3779:
-

[~varun_saxena] Thanks for the logs. Could you apply the patch and print the 
ugi ?

 Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings 
 in secure cluster
 --

 Key: YARN-3779
 URL: https://issues.apache.org/jira/browse/YARN-3779
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: mrV2, secure mode
Reporter: Zhang Wei
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3779.01.patch, YARN-3779.02.patch


 {{GSSException}} is thrown everytime log aggregation deletion is attempted 
 after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure 
 cluster.
 The problem can be reproduced by following steps:
 1. startup historyserver in secure cluster.
 2. Log deletion happens as per expectation. 
 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh 
 the configuration value.
 4. All the subsequent attempts of log deletion fail with {{GSSException}}
 Following exception can be found in historyserver's log if log deletion is 
 enabled. 
 {noformat}
 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this 
 deletion attempt is being aborted | AggregatedLogDeletionService.java:127
 java.io.IOException: Failed on local exception: java.io.IOException: 
 javax.security.sasl.SaslException: GSS initiate failed [Caused by 
 GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt)]; Host Details : local host is: vm-31/9.91.12.31; 
 destination host is: vm-33:25000; 
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
 at org.apache.hadoop.ipc.Client.call(Client.java:1414)
 at org.apache.hadoop.ipc.Client.call(Client.java:1363)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at com.sun.proxy.$Proxy9.getListing(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519)
 at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy10.getListing(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749)
 at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at java.util.TimerThread.run(Timer.java:505)
 Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS 
 initiate failed [Caused by GSSException: No valid credentials provided 
 (Mechanism level: Failed to find any Kerberos tgt)]
 at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641)
 at 
 org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640)
 at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724)
 at 
 org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
 at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
 at org.apache.hadoop.ipc.Client.call(Client.java:1381)
 ... 21 more
 Caused by: javax.security.sasl.SaslException: GSS

[jira] [Created] (YARN-3794) TestRMEmbeddedElector fails because of ambiguous LOG reference

2015-06-10 Thread Chengbing Liu (JIRA)

Chengbing Liu created YARN-3794:
---

 Summary: TestRMEmbeddedElector fails because of ambiguous LOG 
reference
 Key: YARN-3794
 URL: https://issues.apache.org/jira/browse/YARN-3794
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu


After YARN-2921, {{MockRM}} has also a {{LOG}} field. Therefore {{LOG}} in the 
following code snippet is ambiguous.
{code}
protected AdminService createAdminService() {
  return new AdminService(MockRMWithElector.this, getRMContext()) {
@Override
protected EmbeddedElectorService createEmbeddedElectorService() {
  return new EmbeddedElectorService(getRMContext()) {
@Override
public void becomeActive() throws
ServiceFailedException {
  try {
callbackCalled.set(true);
LOG.info(Callback called. Sleeping now);
Thread.sleep(delayMs);
LOG.info(Sleep done);
  } catch (InterruptedException e) {
e.printStackTrace();
  }
  super.becomeActive();
}
  };
}
  };
}
{code}
Eclipse gives the following error:
{quote}
The field LOG is defined in an inherited type and an enclosing scope
{quote}

IMO, we should fix this as {{TestRMEmbeddedElector.LOG}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3794) TestRMEmbeddedElector fails because of ambiguous LOG reference

2015-06-10 Thread Chengbing Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-3794:

Attachment: YARN-3794.01.patch

 TestRMEmbeddedElector fails because of ambiguous LOG reference
 --

 Key: YARN-3794
 URL: https://issues.apache.org/jira/browse/YARN-3794
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-3794.01.patch


 After YARN-2921, {{MockRM}} has also a {{LOG}} field. Therefore {{LOG}} in 
 the following code snippet is ambiguous.
 {code}
 protected AdminService createAdminService() {
   return new AdminService(MockRMWithElector.this, getRMContext()) {
 @Override
 protected EmbeddedElectorService createEmbeddedElectorService() {
   return new EmbeddedElectorService(getRMContext()) {
 @Override
 public void becomeActive() throws
 ServiceFailedException {
   try {
 callbackCalled.set(true);
 LOG.info(Callback called. Sleeping now);
 Thread.sleep(delayMs);
 LOG.info(Sleep done);
   } catch (InterruptedException e) {
 e.printStackTrace();
   }
   super.becomeActive();
 }
   };
 }
   };
 }
 {code}
 Eclipse gives the following error:
 {quote}
 The field LOG is defined in an inherited type and an enclosing scope
 {quote}
 IMO, we should fix this as {{TestRMEmbeddedElector.LOG}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3785) Support for Resource as an argument during submitApp call in MockRM test class

2015-06-10 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581452#comment-14581452
 ] 

Xuan Gong commented on YARN-3785:
-

Committed into trunk/branch-2. Thanks, [~sunilg]

 Support for Resource as an argument during submitApp call in MockRM test class
 --

 Key: YARN-3785
 URL: https://issues.apache.org/jira/browse/YARN-3785
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
Priority: Minor
 Fix For: 2.8.0

 Attachments: 0001-YARN-3785.patch, 0002-YARN-3785.patch


 Currently MockRM#submitApp supports only memory. Adding test cases to support 
 vcores so that DominentResourceCalculator can be tested with this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2015-06-10 Thread Hong Zhiguo (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581349#comment-14581349
 ] 

Hong Zhiguo commented on YARN-2768:
---

[~kasha], the excution time displayed in the profiling output is cumulative.
Actually, I repeated such profiling a lot of times and got the same ratio.
The profiling is done with a cluster of NM/AM simulators and I don't have such 
resource now.

I wrote a testcase which creates 8000 nodes, 4500 apps within 1200 queues, and 
then performs 1 rounds of FairScheduler.update(), and print the average 
execution time of one call to update. With this patch, the average execution 
time decreased from about 35ms to 20ms.

I think the effect comes from GC and memory allocation since in each round of 
FairScheduler.update(), Resource.multiply is called as many times as the number 
of pending ResourceRequests, which is more than 3 million in our production 
cluster.

 optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
 of computing time of update thread
 

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2768.patch, profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3791) FSDownload