[jira] [Commented] (YARN-3787) loading applications by filtering appstartedTime period for ATS Web UI

2015-06-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577524#comment-14577524
 ] 

Xuan Gong commented on YARN-3787:
-

In this ticket, we will introduce two more filtering parameters: startedTimeEnd 
and startedTimeEnd to let the users load the applications based on the app 
start time.

 loading applications by filtering appstartedTime period for ATS Web UI
 --

 Key: YARN-3787
 URL: https://issues.apache.org/jira/browse/YARN-3787
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong

 After YARN-3700, we have defined a parameter called apps.num for loading a 
 given number of applications in ats web page.
 We could also define several additional parameters for the similar purpose. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2015-06-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577565#comment-14577565
 ] 

Karthik Kambatla commented on YARN-2716:


The test failures seem unrelated. TestWorkPreservingRMRestart fails on trunk as 
well, and the ATS tests are likely completely unrelated.

 Refactor ZKRMStateStore retry code with Apache Curator
 --

 Key: YARN-2716
 URL: https://issues.apache.org/jira/browse/YARN-2716
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Karthik Kambatla
 Attachments: yarn-2716-1.patch, yarn-2716-2.patch, yarn-2716-3.patch, 
 yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch


 Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
 simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-08 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577614#comment-14577614
 ] 

Tsuyoshi Ozawa commented on YARN-3017:
--

Sorry for the delay and thanks for your pings. Let me check.

 ContainerID in ResourceManager Log Has Slightly Different Format From 
 AppAttemptID
 --

 Key: YARN-3017
 URL: https://issues.apache.org/jira/browse/YARN-3017
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: MUFEED USMAN
Priority: Minor
  Labels: PatchAvailable
 Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch


 Not sure if this should be filed as a bug or not.
 In the ResourceManager log in the events surrounding the creation of a new
 application attempt,
 ...
 ...
 2014-11-14 17:45:37,258 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
 masterappattempt_1412150883650_0001_02
 ...
 ...
 The application attempt has the ID format _1412150883650_0001_02.
 Whereas the associated ContainerID goes by _1412150883650_0001_02_.
 ...
 ...
 2014-11-14 17:45:37,260 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
 up
 container Container: [ContainerId: container_1412150883650_0001_02_01,
 NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, 
 vCores:1,
 disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service:
 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
 ...
 ...
 Curious to know if this is kept like that for a reason. If not while using
 filtering tools to, say, grep events surrounding a specific attempt by the
 numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster

2015-06-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577570#comment-14577570
 ] 

Varun Saxena commented on YARN-3779:


[~zjshen], thanks for looking at this.
Its the same user which is used for both starting the history server and for 
executing the refresh command.
Timer will create a new thread on refresh and from then on, problem occurs.

There is no problem if I use a ScheduledThreadPoolExecutor(with 1 thread) 
instead as that doesn't spawn a new thread.
So it seems the new thread doesn't take the correct UGI.

Are you able to simulate the issue ?
I hope there is no issue in the way Kerberos has been set up in my cluster.

 Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings 
 in secure cluster
 --

 Key: YARN-3779
 URL: https://issues.apache.org/jira/browse/YARN-3779
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: mrV2, secure mode
Reporter: Zhang Wei
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3779.01.patch, YARN-3779.02.patch


 {{GSSException}} is thrown everytime log aggregation deletion is attempted 
 after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure 
 cluster.
 The problem can be reproduced by following steps:
 1. startup historyserver in secure cluster.
 2. Log deletion happens as per expectation. 
 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh 
 the configuration value.
 4. All the subsequent attempts of log deletion fail with {{GSSException}}
 Following exception can be found in historyserver's log if log deletion is 
 enabled. 
 {noformat}
 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this 
 deletion attempt is being aborted | AggregatedLogDeletionService.java:127
 java.io.IOException: Failed on local exception: java.io.IOException: 
 javax.security.sasl.SaslException: GSS initiate failed [Caused by 
 GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt)]; Host Details : local host is: vm-31/9.91.12.31; 
 destination host is: vm-33:25000; 
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
 at org.apache.hadoop.ipc.Client.call(Client.java:1414)
 at org.apache.hadoop.ipc.Client.call(Client.java:1363)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at com.sun.proxy.$Proxy9.getListing(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519)
 at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy10.getListing(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749)
 at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at java.util.TimerThread.run(Timer.java:505)
 Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS 
 initiate failed [Caused by GSSException: No valid credentials provided 
 (Mechanism level: Failed to find any Kerberos tgt)]
 at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641)
 at 
 

[jira] [Commented] (YARN-3787) loading applications by filtering appstartedTime period for ATS Web UI

2015-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577625#comment-14577625
 ] 

Hadoop QA commented on YARN-3787:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m  4s | Pre-patch trunk has 3 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 45s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   3m  7s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| | |  48m 39s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738394/YARN-3787.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0e80d51 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8220/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8220/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8220/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8220/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8220/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8220/console |


This message was automatically generated.

 loading applications by filtering appstartedTime period for ATS Web UI
 --

 Key: YARN-3787
 URL: https://issues.apache.org/jira/browse/YARN-3787
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3787.1.patch


 After YARN-3700, we have defined a parameter called apps.num for loading a 
 given number of applications in ats web page.
 We could also define several additional parameters for the similar purpose. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3787) loading applications by filtering appstartedTime period for ATS Web UI

2015-06-08 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3787:

Attachment: YARN-3787.1.patch

 loading applications by filtering appstartedTime period for ATS Web UI
 --

 Key: YARN-3787
 URL: https://issues.apache.org/jira/browse/YARN-3787
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3787.1.patch


 After YARN-3700, we have defined a parameter called apps.num for loading a 
 given number of applications in ats web page.
 We could also define several additional parameters for the similar purpose. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3787) loading applications by filtering appstartedTime period for ATS Web UI

2015-06-08 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-3787:
---

 Summary: loading applications by filtering appstartedTime period 
for ATS Web UI
 Key: YARN-3787
 URL: https://issues.apache.org/jira/browse/YARN-3787
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong


After YARN-3700, we have defined a parameter called apps.num for loading a 
given number of applications in ats web page.
We could also define several additional parameters for the similar purpose. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3706) Generalize native HBase writer for additional tables

2015-06-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577534#comment-14577534
 ] 

Sangjin Lee commented on YARN-3706:
---

Thanks for the update [~jrottinghuis]! I haven't gone over the latest patch, 
but will do so some time soon. Just to reply to your questions earlier...

{quote}
Reason I had the relatively simple BaseTable.setTableName(...) is that it 
allows me to not have to leak the name of the configuration value for the 
table to be a public attribute. Do you think it is better to just have a public 
member on EntityTable and set the value directly, or to keep that private?
{quote}

Understood. That said, though, configuration names and the default values seem 
like pretty public attributes, given we allow people to set and override them. 
Is it critical we hide these strings from code? I suspect not...

{quote}
Wrt. EntityTable.java
I92 should be static. Not sure what you mean by this.
{quote}

In v.5 I reviewed, DEFAULT_ENTITY_NAME_BYTES was not static. It looks like it 
is no longer there in the latest version?

bq. Would you prefer if getInstance simple news up a table instance each time, 
or are you generally against the pattern of being able to call getInstance()?

It's really about the singleton pattern. To be sure, this is a fairly minor 
design bias of mine. I believe it is normally rather uncommon that something 
truly must be a singleton or the code becomes incorrect. Also, the singleton 
becomes bit of an anti-pattern with regards to dependency injection and can 
cause some issues with writing unit tests.

{quote}
I agree with the USERNAME_SPLITS being public. I've left a comment to remove 
this completely and have this read from the configuration. I think it would be 
better to provide a default property in a config file for this. This was in 
place in the code currently checked in and I did not tackle that in this patch. 
Would it be OK if I file a separate jira for this?
{quote}

It was really a small nit about the hadoop coding convention. All static final 
variables (constants), whether they are public or not, must be in upper case.


 Generalize native HBase writer for additional tables
 

 Key: YARN-3706
 URL: https://issues.apache.org/jira/browse/YARN-3706
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Minor
 Attachments: YARN-3706-YARN-2928.001.patch, 
 YARN-3726-YARN-2928.002.patch, YARN-3726-YARN-2928.003.patch, 
 YARN-3726-YARN-2928.004.patch, YARN-3726-YARN-2928.005.patch, 
 YARN-3726-YARN-2928.006.patch, YARN-3726-YARN-2928.007.patch, 
 YARN-3726-YARN-2928.008.patch, YARN-3726-YARN-2928.009.patch


 When reviewing YARN-3411 we noticed that we could change the class hierarchy 
 a little in order to accommodate additional tables easily.
 In order to get ready for benchmark testing we left the original layout in 
 place, as performance would not be impacted by the code hierarchy.
 Here is a separate jira to address the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3787) loading applications by filtering appstartedTime period for ATS Web UI

2015-06-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577624#comment-14577624
 ] 

Zhijie Shen commented on YARN-3787:
---

The patch looks good to me overall. Some comments: 

1. startedTimeBegin - app.started-time.begin and startedTimeEnd - 
app.started-time.end?
{code}
36String APP_START_TIME_BEGIN = startedTimeBegin;
37String APP_START_TIME_END = startedTimeEnd;
{code}

2. This semantics enforced for appblocks is conflicting with webservices.
{code}
103 if (appStartedTimeBegain  appStartedTimeEnd) {
104   throw new BadRequestException(
105 startedTimeEnd must be greater than startTimeBegin);
106 }
{code}

 loading applications by filtering appstartedTime period for ATS Web UI
 --

 Key: YARN-3787
 URL: https://issues.apache.org/jira/browse/YARN-3787
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3787.1.patch


 After YARN-3700, we have defined a parameter called apps.num for loading a 
 given number of applications in ats web page.
 We could also define several additional parameters for the similar purpose. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3785) Support for vcores during submitApp in MockRM test class

2015-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577652#comment-14577652
 ] 

Hadoop QA commented on YARN-3785:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   6m 29s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 34s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 19s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 46s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  50m 22s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  68m 59s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738393/0001-YARN-3785.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 0e80d51 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8221/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8221/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8221/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8221/console |


This message was automatically generated.

 Support for vcores during submitApp in MockRM test class
 

 Key: YARN-3785
 URL: https://issues.apache.org/jira/browse/YARN-3785
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
Priority: Minor
 Attachments: 0001-YARN-3785.patch


 Currently MockRM#submitApp supports only memory. Adding test cases to support 
 vcores so that DominentResourceCalculator can be tested with this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster

2015-06-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577527#comment-14577527
 ] 

Zhijie Shen commented on YARN-3779:
---

So the problem is after refreshing, the deletion task is scheduled and executed 
by the ugi of who executes the refreshing command, right?

 Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings 
 in secure cluster
 --

 Key: YARN-3779
 URL: https://issues.apache.org/jira/browse/YARN-3779
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: mrV2, secure mode
Reporter: Zhang Wei
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3779.01.patch, YARN-3779.02.patch


 {{GSSException}} is thrown everytime log aggregation deletion is attempted 
 after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure 
 cluster.
 The problem can be reproduced by following steps:
 1. startup historyserver in secure cluster.
 2. Log deletion happens as per expectation. 
 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh 
 the configuration value.
 4. All the subsequent attempts of log deletion fail with {{GSSException}}
 Following exception can be found in historyserver's log if log deletion is 
 enabled. 
 {noformat}
 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this 
 deletion attempt is being aborted | AggregatedLogDeletionService.java:127
 java.io.IOException: Failed on local exception: java.io.IOException: 
 javax.security.sasl.SaslException: GSS initiate failed [Caused by 
 GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt)]; Host Details : local host is: vm-31/9.91.12.31; 
 destination host is: vm-33:25000; 
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
 at org.apache.hadoop.ipc.Client.call(Client.java:1414)
 at org.apache.hadoop.ipc.Client.call(Client.java:1363)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at com.sun.proxy.$Proxy9.getListing(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519)
 at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy10.getListing(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749)
 at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at java.util.TimerThread.run(Timer.java:505)
 Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS 
 initiate failed [Caused by GSSException: No valid credentials provided 
 (Mechanism level: Failed to find any Kerberos tgt)]
 at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641)
 at 
 org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640)
 at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724)
 at 
 org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
 at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
 at org.apache.hadoop.ipc.Client.call(Client.java:1381)
 ... 21 more

[jira] [Commented] (YARN-3785) Support for vcores during submitApp in MockRM test class

2015-06-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577611#comment-14577611
 ] 

Xuan Gong commented on YARN-3785:
-

Thanks for working on this. [~sunilg].
Looks like that we have added a new subApps() method which accept Resource as 
parameter in MockRM, and re-write all other subApps() method by using this new 
subApps() method. 
But I did not see vcores. I understand that with this patch in the future, if 
we want to support vcores, we have already support Resource as parameter. 
Could you re-name the ticket title a little bit ?

 Support for vcores during submitApp in MockRM test class
 

 Key: YARN-3785
 URL: https://issues.apache.org/jira/browse/YARN-3785
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
Priority: Minor
 Attachments: 0001-YARN-3785.patch


 Currently MockRM#submitApp supports only memory. Adding test cases to support 
 vcores so that DominentResourceCalculator can be tested with this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster

2015-06-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577709#comment-14577709
 ] 

Zhijie Shen commented on YARN-3779:
---

No, I didn't simulate the problem. Just have a quick glance at the code. Log 
retention refresh will reschedule the deletion task, but this is done in the 
rpc call by the request user. So I'm not wondering if this changes the ug of 
the following deletion task. Can you try to print the ugi? Then, we can see 
what is changed.

 Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings 
 in secure cluster
 --

 Key: YARN-3779
 URL: https://issues.apache.org/jira/browse/YARN-3779
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: mrV2, secure mode
Reporter: Zhang Wei
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3779.01.patch, YARN-3779.02.patch


 {{GSSException}} is thrown everytime log aggregation deletion is attempted 
 after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure 
 cluster.
 The problem can be reproduced by following steps:
 1. startup historyserver in secure cluster.
 2. Log deletion happens as per expectation. 
 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh 
 the configuration value.
 4. All the subsequent attempts of log deletion fail with {{GSSException}}
 Following exception can be found in historyserver's log if log deletion is 
 enabled. 
 {noformat}
 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this 
 deletion attempt is being aborted | AggregatedLogDeletionService.java:127
 java.io.IOException: Failed on local exception: java.io.IOException: 
 javax.security.sasl.SaslException: GSS initiate failed [Caused by 
 GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt)]; Host Details : local host is: vm-31/9.91.12.31; 
 destination host is: vm-33:25000; 
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
 at org.apache.hadoop.ipc.Client.call(Client.java:1414)
 at org.apache.hadoop.ipc.Client.call(Client.java:1363)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at com.sun.proxy.$Proxy9.getListing(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519)
 at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy10.getListing(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749)
 at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at java.util.TimerThread.run(Timer.java:505)
 Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS 
 initiate failed [Caused by GSSException: No valid credentials provided 
 (Mechanism level: Failed to find any Kerberos tgt)]
 at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641)
 at 
 org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640)
 at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724)
 at 
 

[jira] [Updated] (YARN-3785) Support for vcores during submitApp in MockRM test class

2015-06-08 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3785:
--
Attachment: 0001-YARN-3785.patch

Updating a patch for same

 Support for vcores during submitApp in MockRM test class
 

 Key: YARN-3785
 URL: https://issues.apache.org/jira/browse/YARN-3785
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
Priority: Minor
 Attachments: 0001-YARN-3785.patch


 Currently MockRM#submitApp supports only memory. Adding test cases to support 
 vcores so that DominentResourceCalculator can be tested with this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3787) loading applications by filtering appstartedTime period for ATS Web UI

2015-06-08 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3787:

Attachment: YARN-3787.2.patch

new patch addressed all the comments

 loading applications by filtering appstartedTime period for ATS Web UI
 --

 Key: YARN-3787
 URL: https://issues.apache.org/jira/browse/YARN-3787
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3787.1.patch, YARN-3787.2.patch


 After YARN-3700, we have defined a parameter called apps.num for loading a 
 given number of applications in ats web page.
 We could also define several additional parameters for the similar purpose. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation

2015-06-08 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577891#comment-14577891
 ] 

Li Lu commented on YARN-2928:
-

Oh one more thing [~jamestaylor], are there any plans to make the PDataTypes 
APIs to be public and/or stable, or, at least make it limited public to YARN? I 
believe that will be very helpful for us. Thanks! 

 YARN Timeline Service: Next generation
 --

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
 v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, 
 TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2015-06-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1457#comment-1457
 ] 

Karthik Kambatla commented on YARN-2716:


[~jianhe] - I believe the patch is good to go. Do you have any further 
comments? 

 Refactor ZKRMStateStore retry code with Apache Curator
 --

 Key: YARN-2716
 URL: https://issues.apache.org/jira/browse/YARN-2716
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Karthik Kambatla
 Attachments: yarn-2716-1.patch, yarn-2716-2.patch, yarn-2716-3.patch, 
 yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch


 Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
 simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-08 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577861#comment-14577861
 ] 

Tsuyoshi Ozawa commented on YARN-3017:
--

[~zxu] [~rohithsharma] ApplicationMaster can use ContainerId.fromString() via 
ConverterUtils.toContainerId() and ContainerId.toString() e.g. MRAppMaster 
parses a string, containerIdStr, passed by an environment variable to know its 
container id. Fortunately, I think this change is compatible one since both old 
format and new format can be parsed with ContainerId.fromString().

[~mohdshahidkhan], could you update following tests to check that parsing with 
old format works fine?

TestConverterUtils.testContainerId():
{code}
// Check to parse old format correctly.
assertEquals(
ConverterUtils.toContainerId(container_0__00_00), id);
{code}

TestConverterUtils.testContainerIdWithEpoch():
{code}
// Check to parse old format correctly.
String id4 = container_0__00_25645811;
ContainerId gen4 = ConverterUtils.toContainerId(cid);
assertEquals(gen4.toString(), id4.toString());
{code}

 ContainerID in ResourceManager Log Has Slightly Different Format From 
 AppAttemptID
 --

 Key: YARN-3017
 URL: https://issues.apache.org/jira/browse/YARN-3017
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: MUFEED USMAN
Priority: Minor
  Labels: PatchAvailable
 Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch


 Not sure if this should be filed as a bug or not.
 In the ResourceManager log in the events surrounding the creation of a new
 application attempt,
 ...
 ...
 2014-11-14 17:45:37,258 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
 masterappattempt_1412150883650_0001_02
 ...
 ...
 The application attempt has the ID format _1412150883650_0001_02.
 Whereas the associated ContainerID goes by _1412150883650_0001_02_.
 ...
 ...
 2014-11-14 17:45:37,260 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
 up
 container Container: [ContainerId: container_1412150883650_0001_02_01,
 NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, 
 vCores:1,
 disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service:
 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
 ...
 ...
 Curious to know if this is kept like that for a reason. If not while using
 filtering tools to, say, grep events surrounding a specific attempt by the
 numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3778) Fix Yarn resourcemanger CLI usage

2015-06-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577992#comment-14577992
 ] 

Xuan Gong commented on YARN-3778:
-

+1 LGTM.

 Fix Yarn resourcemanger CLI usage
 -

 Key: YARN-3778
 URL: https://issues.apache.org/jira/browse/YARN-3778
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3778.patch


 The usage message from code does not match with the one documented. 
 1. java ResourceManager  should be yarn resourcemanager 
 {code}
  private static void printUsage(PrintStream out) {
 out.println(Usage: java ResourceManager [-format-state-store]);
 out.println(
 + [-remove-application-from-state-store appId] + \n);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3778) Fix Yarn resourcemanger CLI usage

2015-06-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577997#comment-14577997
 ] 

Xuan Gong commented on YARN-3778:
-

Committed into trunk/branch-2. Thanks, Brahma Reddy Battula.

 Fix Yarn resourcemanger CLI usage
 -

 Key: YARN-3778
 URL: https://issues.apache.org/jira/browse/YARN-3778
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Fix For: 2.8.0

 Attachments: YARN-3778.patch


 The usage message from code does not match with the one documented. 
 1. java ResourceManager  should be yarn resourcemanager 
 {code}
  private static void printUsage(PrintStream out) {
 out.println(Usage: java ResourceManager [-format-state-store]);
 out.println(
 + [-remove-application-from-state-store appId] + \n);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2015-06-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577899#comment-14577899
 ] 

Jian He commented on YARN-2716:
---

looks good , committing

 Refactor ZKRMStateStore retry code with Apache Curator
 --

 Key: YARN-2716
 URL: https://issues.apache.org/jira/browse/YARN-2716
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Karthik Kambatla
 Attachments: yarn-2716-1.patch, yarn-2716-2.patch, yarn-2716-3.patch, 
 yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch


 Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
 simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-08 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577864#comment-14577864
 ] 

Tsuyoshi Ozawa commented on YARN-3017:
--

Oops, TestConverterUtils.testContainerIdWithEpoch() should be fixed as follows:
{code}
ContainerId id = TestContainerId.newContainerId(0, 0, 0, 25645811);
...
// Check to parse old format correctly.
String cid4 = container_0__00_25645811;
ContainerId gen4 = ConverterUtils.toContainerId(cid4);
assertEquals(id.toString(), gen4.toString());
{code}

 ContainerID in ResourceManager Log Has Slightly Different Format From 
 AppAttemptID
 --

 Key: YARN-3017
 URL: https://issues.apache.org/jira/browse/YARN-3017
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: MUFEED USMAN
Priority: Minor
  Labels: PatchAvailable
 Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch


 Not sure if this should be filed as a bug or not.
 In the ResourceManager log in the events surrounding the creation of a new
 application attempt,
 ...
 ...
 2014-11-14 17:45:37,258 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
 masterappattempt_1412150883650_0001_02
 ...
 ...
 The application attempt has the ID format _1412150883650_0001_02.
 Whereas the associated ContainerID goes by _1412150883650_0001_02_.
 ...
 ...
 2014-11-14 17:45:37,260 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
 up
 container Container: [ContainerId: container_1412150883650_0001_02_01,
 NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, 
 vCores:1,
 disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service:
 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
 ...
 ...
 Curious to know if this is kept like that for a reason. If not while using
 filtering tools to, say, grep events surrounding a specific attempt by the
 numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3787) loading applications by filtering appstartedTime period for ATS Web UI

2015-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577991#comment-14577991
 ] 

Hadoop QA commented on YARN-3787:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 12s | Pre-patch trunk has 3 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 51s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 46s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 37s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   3m 25s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 31s | Tests passed in 
hadoop-yarn-server-common. |
| | |  49m 37s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738432/YARN-3787.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0e80d51 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8222/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8222/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8222/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8222/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8222/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8222/console |


This message was automatically generated.

 loading applications by filtering appstartedTime period for ATS Web UI
 --

 Key: YARN-3787
 URL: https://issues.apache.org/jira/browse/YARN-3787
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3787.1.patch, YARN-3787.2.patch


 After YARN-3700, we have defined a parameter called apps.num for loading a 
 given number of applications in ats web page.
 We could also define several additional parameters for the similar purpose. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation

2015-06-08 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577885#comment-14577885
 ] 

Li Lu commented on YARN-2928:
-

Hi [~jamestaylor], thank you very much for your great help! Some clarifications 
on my questions...

bq. For your configuration/metric key-value pair, how are they named? Do you 
know the possible set of key values in advance? Or are they known more-or-less 
on-the-fly? 

For our use case they're completely on-the-fly. For each timeline entity, we 
plan to store each of its configuration/metric in one dynamic column. It is 
possible that different entities may have completely different configs/metrics. 
For example, a mapreduce job may have a completely different set of configs to 
a tez job. Therefore, we need to generate all columns for configs/metrics 
dynamically. I'm wondering that, when adding the dynamic columns into a view, 
do I still need to explicitly claim those dynamic columns (I assume yes but 
would like to double check)? 

bq. Are you thinking to have a secondary table that's a rollup aggregation of 
more raw data? Is that required, or is it more of a convenience for the user? 
If the raw data is Phoenix-queryable, then I think you have a lot of options. 
Can you point me to some more info on your design?

Yes, we are considering to have multiple levels of aggregation tables, each 
with a different granularity. For example, now we're planning to do the first 
level (application level) aggregation from an HBase table to a Phoenix table. 
Then, we can aggregate flow level information based on our application level 
aggregation (since each application belongs to and only belongs to one flow). 
In this way, we can temporarily get rid of the write throughput limitation of 
Phoenix, but still support SQL queries on aggregated data. If the Phoenix 
PDataTypes are stable, then is it possible for us to do the following two 
things? 
# Use HBase API and PDataTypes to read a Phoenix table, and read dynamic 
columns iteratively. 
# Use HBase API and PDataTypes to write a Phoenix table, and write dynamic 
columns iteratively. 

 YARN Timeline Service: Next generation
 --

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
 v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, 
 TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2015-06-08 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577998#comment-14577998
 ] 

Sidharta Seethana commented on YARN-2194:
-

[~ywskycn] , thanks! Looking forward to your patch.

 Cgroups cease to work in RHEL7
 --

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Critical
 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch


 In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the 
 controller name leads to container launch failure. 
 RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
 systemd has certain shortcomings as identified in this JIRA (see comments). 
 This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2015-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577947#comment-14577947
 ] 

Hudson commented on YARN-2716:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7990 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7990/])
YARN-2716. Refactor ZKRMStateStore retry code with Apache Curator. Contributed 
by Karthik Kambatla (jianhe: rev 960b8f19ca98dbcfdd30f2f1f275b8718d2e872f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStorePerf.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java


 Refactor ZKRMStateStore retry code with Apache Curator
 --

 Key: YARN-2716
 URL: https://issues.apache.org/jira/browse/YARN-2716
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Karthik Kambatla
 Fix For: 2.8.0

 Attachments: yarn-2716-1.patch, yarn-2716-2.patch, yarn-2716-3.patch, 
 yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch


 Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
 simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3778) Fix Yarn resourcemanger CLI usage

2015-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578006#comment-14578006
 ] 

Hudson commented on YARN-3778:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7991 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7991/])
YARN-3778. Fix Yarn resourcemanger CLI usage. Contributed by Brahma Reddy 
Battula (xgong: rev 2b2465dfac1f147b6bb20d878b69a8cc3e85c8ad)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java


 Fix Yarn resourcemanger CLI usage
 -

 Key: YARN-3778
 URL: https://issues.apache.org/jira/browse/YARN-3778
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Fix For: 2.8.0

 Attachments: YARN-3778.patch


 The usage message from code does not match with the one documented. 
 1. java ResourceManager  should be yarn resourcemanager 
 {code}
  private static void printUsage(PrintStream out) {
 out.println(Usage: java ResourceManager [-format-state-store]);
 out.println(
 + [-remove-application-from-state-store appId] + \n);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2015-06-08 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577490#comment-14577490
 ] 

Sidharta Seethana commented on YARN-2194:
-

Hi [~ywskycn], 

Would you be able to submit a patch with the requested changes? Maybe we should 
consider pulling this into 2.7.1 ? 

/cc [~vinodkv]



 Cgroups cease to work in RHEL7
 --

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Critical
 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch


 In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the 
 controller name leads to container launch failure. 
 RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
 systemd has certain shortcomings as identified in this JIRA (see comments). 
 This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2015-06-08 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577496#comment-14577496
 ] 

Wei Yan commented on YARN-2194:
---

[~sidharta-s], yes, working on it.

 Cgroups cease to work in RHEL7
 --

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Critical
 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch


 In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the 
 controller name leads to container launch failure. 
 RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
 systemd has certain shortcomings as identified in this JIRA (see comments). 
 This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3780) Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition

2015-06-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577497#comment-14577497
 ] 

zhihai xu commented on YARN-3780:
-

thanks [~rohithsharma] for the review and thanks [~devaraj.k] for the review 
and committing the patch.

 Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition
 -

 Key: YARN-3780
 URL: https://issues.apache.org/jira/browse/YARN-3780
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3780.000.patch


 Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition 
 to avoid unnecessary NodeResourceUpdateSchedulerEvent.
 The current code use {{!=}} to compare Resource totalCapability, which will 
 compare reference not the real value in Resource. So we should use equals to 
 compare Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3786) Document yarn class path options

2015-06-08 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577461#comment-14577461
 ] 

Brahma Reddy Battula commented on YARN-3786:


[~cnauroth] Attached the patch..Kindly Review..

 Document yarn class path options
 

 Key: YARN-3786
 URL: https://issues.apache.org/jira/browse/YARN-3786
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3786.patch


 --global, --jar options are not documented.
 {code}
 $ yarn classpath --help
 classpath [--glob|--jar path|-h|--help] :
   Prints the classpath needed to get the Hadoop jar and the required
   libraries.
   Options:
   --glob   expand wildcards
   --jar path write classpath as manifest in jar named path
   -h, --help   print help
 {code}
 current document:
 {code}
 User Commands
 Commands useful for users of a hadoop cluster.
 classpath
 Usage: yarn classpath
 Prints the class path needed to get the Hadoop jar and the required libraries
 {code}
 http://hadoop.apache.org/docs/r2.7.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#classpath



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3786) Document yarn class path options

2015-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577470#comment-14577470
 ] 

Hadoop QA commented on YARN-3786:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   3m  5s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   2m 55s | Site still builds. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| | |   6m 24s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738386/YARN-3786.patch |
| Optional Tests | site |
| git revision | trunk / 0e80d51 |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8219/console |


This message was automatically generated.

 Document yarn class path options
 

 Key: YARN-3786
 URL: https://issues.apache.org/jira/browse/YARN-3786
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3786.patch


 --global, --jar options are not documented.
 {code}
 $ yarn classpath --help
 classpath [--glob|--jar path|-h|--help] :
   Prints the classpath needed to get the Hadoop jar and the required
   libraries.
   Options:
   --glob   expand wildcards
   --jar path write classpath as manifest in jar named path
   -h, --help   print help
 {code}
 current document:
 {code}
 User Commands
 Commands useful for users of a hadoop cluster.
 classpath
 Usage: yarn classpath
 Prints the class path needed to get the Hadoop jar and the required libraries
 {code}
 http://hadoop.apache.org/docs/r2.7.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#classpath



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2038) Revisit how AMs learn of containers from previous attempts

2015-06-08 Thread Henry Saputra (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577498#comment-14577498
 ] 

Henry Saputra commented on YARN-2038:
-

What are the desired changes needed for the issue to be resolved?

 Revisit how AMs learn of containers from previous attempts
 --

 Key: YARN-2038
 URL: https://issues.apache.org/jira/browse/YARN-2038
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla

 Based on YARN-556, we need to update the way AMs learn about containers 
 allocation previous attempts. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3786) Document yarn class path options

2015-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578043#comment-14578043
 ] 

Hudson commented on YARN-3786:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7992 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7992/])
YARN-3786. Document yarn class path options. Contributed by Brahma Reddy 
Battula. (cnauroth: rev a531b058aef48c9bf2e5366ed110e1f817316c1a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md
* hadoop-yarn-project/CHANGES.txt


 Document yarn class path options
 

 Key: YARN-3786
 URL: https://issues.apache.org/jira/browse/YARN-3786
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Fix For: 2.8.0

 Attachments: YARN-3786.patch


 --global, --jar options are not documented.
 {code}
 $ yarn classpath --help
 classpath [--glob|--jar path|-h|--help] :
   Prints the classpath needed to get the Hadoop jar and the required
   libraries.
   Options:
   --glob   expand wildcards
   --jar path write classpath as manifest in jar named path
   -h, --help   print help
 {code}
 current document:
 {code}
 User Commands
 Commands useful for users of a hadoop cluster.
 classpath
 Usage: yarn classpath
 Prints the class path needed to get the Hadoop jar and the required libraries
 {code}
 http://hadoop.apache.org/docs/r2.7.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#classpath



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

2015-06-08 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1279:

Fix Version/s: 2.8.0

 Expose a client API to allow clients to figure if log aggregation is complete
 -

 Key: YARN-1279
 URL: https://issues.apache.org/jira/browse/YARN-1279
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1279.1.patch, YARN-1279.11.patch, 
 YARN-1279.2.patch, YARN-1279.2.patch, YARN-1279.3.patch, YARN-1279.3.patch, 
 YARN-1279.4.patch, YARN-1279.4.patch, YARN-1279.5.patch, YARN-1279.6.patch, 
 YARN-1279.7.patch, YARN-1279.8.patch, YARN-1279.8.patch, YARN-1279.9.patch


 Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

2015-06-08 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong resolved YARN-1279.
-
Resolution: Fixed

 Expose a client API to allow clients to figure if log aggregation is complete
 -

 Key: YARN-1279
 URL: https://issues.apache.org/jira/browse/YARN-1279
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Xuan Gong
 Attachments: YARN-1279.1.patch, YARN-1279.11.patch, 
 YARN-1279.2.patch, YARN-1279.2.patch, YARN-1279.3.patch, YARN-1279.3.patch, 
 YARN-1279.4.patch, YARN-1279.4.patch, YARN-1279.5.patch, YARN-1279.6.patch, 
 YARN-1279.7.patch, YARN-1279.8.patch, YARN-1279.8.patch, YARN-1279.9.patch


 Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

2015-06-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578063#comment-14578063
 ] 

Xuan Gong commented on YARN-1279:
-

Close this ticket since YARN-1376 and YARN-1402 have been fixed.

 Expose a client API to allow clients to figure if log aggregation is complete
 -

 Key: YARN-1279
 URL: https://issues.apache.org/jira/browse/YARN-1279
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1279.1.patch, YARN-1279.11.patch, 
 YARN-1279.2.patch, YARN-1279.2.patch, YARN-1279.3.patch, YARN-1279.3.patch, 
 YARN-1279.4.patch, YARN-1279.4.patch, YARN-1279.5.patch, YARN-1279.6.patch, 
 YARN-1279.7.patch, YARN-1279.8.patch, YARN-1279.8.patch, YARN-1279.9.patch


 Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3786) Document yarn class path options

2015-06-08 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-3786:

Component/s: documentation
 Issue Type: Improvement  (was: Bug)

 Document yarn class path options
 

 Key: YARN-3786
 URL: https://issues.apache.org/jira/browse/YARN-3786
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3786.patch


 --global, --jar options are not documented.
 {code}
 $ yarn classpath --help
 classpath [--glob|--jar path|-h|--help] :
   Prints the classpath needed to get the Hadoop jar and the required
   libraries.
   Options:
   --glob   expand wildcards
   --jar path write classpath as manifest in jar named path
   -h, --help   print help
 {code}
 current document:
 {code}
 User Commands
 Commands useful for users of a hadoop cluster.
 classpath
 Usage: yarn classpath
 Prints the class path needed to get the Hadoop jar and the required libraries
 {code}
 http://hadoop.apache.org/docs/r2.7.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#classpath



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-76) killApplication doesn't fully kill application master on Mac OS

2015-06-08 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong resolved YARN-76.
---
Resolution: Duplicate

 killApplication doesn't fully kill application master on Mac OS
 ---

 Key: YARN-76
 URL: https://issues.apache.org/jira/browse/YARN-76
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: Failed on MacOS. OK on Linux
Reporter: Bo Wang
Assignee: Xuan Gong

 When client sends a ClientRMProtocol#killApplication to RM, the corresponding 
 AM is supposed to be killed. However, on Mac OS, the AM is still alive (w/o 
 any interruption).
 I figured out part of the reason after some debugging. NM starts a AM with 
 command like /bin/bash -c /path/to/java SampleAM. This command is executed 
 in a process (say with PID 0001), which starts another Java process (say with 
 PID 0002). When NM kills the AM, it send SIGTERM and then SIGKILL to the bash 
 process (PID 0001). In Linux, the death of the bash process (PID 0001) will 
 trigger the kill of the Java process (PID 0002). However, in Mac OS, only the 
 bash process is killed. The Java process is in the wild since then.
 Note: on Mac OS, DefaultContainerExecutor is used rather than 
 LinuxContainerExecutor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1263) Clean up unused imports in TestFairScheduler after YARN-899

2015-06-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578082#comment-14578082
 ] 

Xuan Gong commented on YARN-1263:
-

No, this is invalid now. Close this

 Clean up unused imports in TestFairScheduler after YARN-899
 ---

 Key: YARN-1263
 URL: https://issues.apache.org/jira/browse/YARN-1263
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Xuan Gong

 YARN-899 added a bunch of unused imports to TestFairScheduler.  It might be 
 useful to check to see whether it added these in other files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1263) Clean up unused imports in TestFairScheduler after YARN-899

2015-06-08 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong resolved YARN-1263.
-
Resolution: Invalid

 Clean up unused imports in TestFairScheduler after YARN-899
 ---

 Key: YARN-1263
 URL: https://issues.apache.org/jira/browse/YARN-1263
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Xuan Gong

 YARN-899 added a bunch of unused imports to TestFairScheduler.  It might be 
 useful to check to see whether it added these in other files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1377) Log aggregation via node manager should expose expose a way to cancel log aggregation at application or container level

2015-06-08 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong resolved YARN-1377.
-
Resolution: Duplicate

 Log aggregation via node manager should expose expose a way to cancel log 
 aggregation at application or container level
 ---

 Key: YARN-1377
 URL: https://issues.apache.org/jira/browse/YARN-1377
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Xuan Gong

 Today when application finishes it starts aggregating all the logs but that 
 may slow down the whole process significantly...
 there can be situations where certain containers overwrote the logs .. say in 
 multiple GBsin these scenarios we need a way to cancel log aggregation 
 for certain containers. These can be at per application level or at per 
 container level.
 thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1377) Log aggregation via node manager should expose expose a way to cancel log aggregation at application or container level

2015-06-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578086#comment-14578086
 ] 

Xuan Gong commented on YARN-1377:
-

I think that YARN-221 can be used to fix this. Close this as duplicate.

 Log aggregation via node manager should expose expose a way to cancel log 
 aggregation at application or container level
 ---

 Key: YARN-1377
 URL: https://issues.apache.org/jira/browse/YARN-1377
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Xuan Gong

 Today when application finishes it starts aggregating all the logs but that 
 may slow down the whole process significantly...
 there can be situations where certain containers overwrote the logs .. say in 
 multiple GBsin these scenarios we need a way to cancel log aggregation 
 for certain containers. These can be at per application level or at per 
 container level.
 thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-76) killApplication doesn't fully kill application master on Mac OS

2015-06-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578065#comment-14578065
 ] 

Xuan Gong commented on YARN-76:
---

I think this is duplication with YARN-3561. We already have discussion there. 
Close this as duplication

 killApplication doesn't fully kill application master on Mac OS
 ---

 Key: YARN-76
 URL: https://issues.apache.org/jira/browse/YARN-76
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: Failed on MacOS. OK on Linux
Reporter: Bo Wang
Assignee: Xuan Gong

 When client sends a ClientRMProtocol#killApplication to RM, the corresponding 
 AM is supposed to be killed. However, on Mac OS, the AM is still alive (w/o 
 any interruption).
 I figured out part of the reason after some debugging. NM starts a AM with 
 command like /bin/bash -c /path/to/java SampleAM. This command is executed 
 in a process (say with PID 0001), which starts another Java process (say with 
 PID 0002). When NM kills the AM, it send SIGTERM and then SIGKILL to the bash 
 process (PID 0001). In Linux, the death of the bash process (PID 0001) will 
 trigger the kill of the Java process (PID 0002). However, in Mac OS, only the 
 bash process is killed. The Java process is in the wild since then.
 Note: on Mac OS, DefaultContainerExecutor is used rather than 
 LinuxContainerExecutor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3786) Document yarn class path options

2015-06-08 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578245#comment-14578245
 ] 

Brahma Reddy Battula commented on YARN-3786:


[~cnauroth] thanks for review and commit..

 Document yarn class path options
 

 Key: YARN-3786
 URL: https://issues.apache.org/jira/browse/YARN-3786
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Fix For: 2.8.0

 Attachments: YARN-3786.patch


 --global, --jar options are not documented.
 {code}
 $ yarn classpath --help
 classpath [--glob|--jar path|-h|--help] :
   Prints the classpath needed to get the Hadoop jar and the required
   libraries.
   Options:
   --glob   expand wildcards
   --jar path write classpath as manifest in jar named path
   -h, --help   print help
 {code}
 current document:
 {code}
 User Commands
 Commands useful for users of a hadoop cluster.
 classpath
 Usage: yarn classpath
 Prints the class path needed to get the Hadoop jar and the required libraries
 {code}
 http://hadoop.apache.org/docs/r2.7.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#classpath



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3778) Fix Yarn resourcemanger CLI usage

2015-06-08 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578243#comment-14578243
 ] 

Brahma Reddy Battula commented on YARN-3778:


[~xgong] thanks for review and commit..

 Fix Yarn resourcemanger CLI usage
 -

 Key: YARN-3778
 URL: https://issues.apache.org/jira/browse/YARN-3778
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Fix For: 2.8.0

 Attachments: YARN-3778.patch


 The usage message from code does not match with the one documented. 
 1. java ResourceManager  should be yarn resourcemanager 
 {code}
  private static void printUsage(PrintStream out) {
 out.println(Usage: java ResourceManager [-format-state-store]);
 out.println(
 + [-remove-application-from-state-store appId] + \n);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3706) Generalize native HBase writer for additional tables

2015-06-08 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated YARN-3706:
---
Attachment: YARN-3706-YARN-2928.010.patch

Thanks for your feedback [~sjlee0]
Latest patch (YARN-3706-YARN-2928.010.patch) addresses all comments.

Note that I had to upgrade Eclipse to Luna to fix the formatter adding empty 
spaces after empty lines on comments blocks and the remove trailing spaces 
refused to remove them. See https://bugs.eclipse.org/bugs/show_bug.cgi?id=49619

 Generalize native HBase writer for additional tables
 

 Key: YARN-3706
 URL: https://issues.apache.org/jira/browse/YARN-3706
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Minor
 Attachments: YARN-3706-YARN-2928.001.patch, 
 YARN-3706-YARN-2928.010.patch, YARN-3726-YARN-2928.002.patch, 
 YARN-3726-YARN-2928.003.patch, YARN-3726-YARN-2928.004.patch, 
 YARN-3726-YARN-2928.005.patch, YARN-3726-YARN-2928.006.patch, 
 YARN-3726-YARN-2928.007.patch, YARN-3726-YARN-2928.008.patch, 
 YARN-3726-YARN-2928.009.patch


 When reviewing YARN-3411 we noticed that we could change the class hierarchy 
 a little in order to accommodate additional tables easily.
 In order to get ready for benchmark testing we left the original layout in 
 place, as performance would not be impacted by the code hierarchy.
 Here is a separate jira to address the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-08 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578296#comment-14578296
 ] 

Rohith commented on YARN-3017:
--

Thanks [~ozawa] for confirmation:-)

 ContainerID in ResourceManager Log Has Slightly Different Format From 
 AppAttemptID
 --

 Key: YARN-3017
 URL: https://issues.apache.org/jira/browse/YARN-3017
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: MUFEED USMAN
Priority: Minor
  Labels: PatchAvailable
 Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch


 Not sure if this should be filed as a bug or not.
 In the ResourceManager log in the events surrounding the creation of a new
 application attempt,
 ...
 ...
 2014-11-14 17:45:37,258 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
 masterappattempt_1412150883650_0001_02
 ...
 ...
 The application attempt has the ID format _1412150883650_0001_02.
 Whereas the associated ContainerID goes by _1412150883650_0001_02_.
 ...
 ...
 2014-11-14 17:45:37,260 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
 up
 container Container: [ContainerId: container_1412150883650_0001_02_01,
 NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, 
 vCores:1,
 disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service:
 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
 ...
 ...
 Curious to know if this is kept like that for a reason. If not while using
 filtering tools to, say, grep events surrounding a specific attempt by the
 numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3780) Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition

2015-06-08 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3780:

Hadoop Flags: Reviewed

+1, Looks good to me, will commit it shortly.

 Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition
 -

 Key: YARN-3780
 URL: https://issues.apache.org/jira/browse/YARN-3780
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Attachments: YARN-3780.000.patch


 Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition 
 to avoid unnecessary NodeResourceUpdateSchedulerEvent.
 The current code use {{!=}} to compare Resource totalCapability, which will 
 compare reference not the real value in Resource. So we should use equals to 
 compare Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-08 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576652#comment-14576652
 ] 

Rohith commented on YARN-3017:
--

I see.. Thanks for the detailed explanation..

 ContainerID in ResourceManager Log Has Slightly Different Format From 
 AppAttemptID
 --

 Key: YARN-3017
 URL: https://issues.apache.org/jira/browse/YARN-3017
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: MUFEED USMAN
Priority: Minor
  Labels: PatchAvailable
 Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch


 Not sure if this should be filed as a bug or not.
 In the ResourceManager log in the events surrounding the creation of a new
 application attempt,
 ...
 ...
 2014-11-14 17:45:37,258 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
 masterappattempt_1412150883650_0001_02
 ...
 ...
 The application attempt has the ID format _1412150883650_0001_02.
 Whereas the associated ContainerID goes by _1412150883650_0001_02_.
 ...
 ...
 2014-11-14 17:45:37,260 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
 up
 container Container: [ContainerId: container_1412150883650_0001_02_01,
 NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, 
 vCores:1,
 disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service:
 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
 ...
 ...
 Curious to know if this is kept like that for a reason. If not while using
 filtering tools to, say, grep events surrounding a specific attempt by the
 numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-06-08 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576702#comment-14576702
 ] 

Sidharta Seethana commented on YARN-3366:
-

* classful, hierarchical traffic shaping with tc (and cgroups) works only for 
outbound network traffic - at this point this is no good way to do this on a 
per-container basis for incoming traffic. When packets have arrived and are 
‘known’ (somehow) to be destined for a specific container, the network cost has 
already been incurred . We could penalize the (remote) application that is 
sending packets, at best . 
* (Outbound) Network bandwidth utilization is collected on a per-container 
basis ( in terms of ‘total bytes sent’ - which will have to be periodically 
queried ) but it has not been hooked up to any metrics collection mechanism. 
See the ‘getBytesSentPerContainer()’ function here : 
https://github.com/apache/hadoop/blob/branch-2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TrafficControlBandwidthHandlerImpl.java#L213
  . Again, this works only for outbound bandwidth and is restricted to 
containers only ( shuffle, HDFS, YARN NM traffic etc isn’t account for by this 
hook ) . Technically speaking, stat parsing based on tc output does gather 
outbound bandwidth utilization for all of the traffic shaping classes 
associated with a network interface and not just the containers, but through 
the resource handler we only expose utilization data for containers.

While we haven't done specific microbenchmarks, I did remember running into 
some data in presentation by a redhat engineer - let me see if I can find it 
again. 



 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Fix For: 2.8.0

 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
 YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
 YARN-3366.006.patch, YARN-3366.007.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1042) add ability to specify affinity/anti-affinity in container requests

2015-06-08 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576703#comment-14576703
 ] 

Weiwei Yang commented on YARN-1042:
---

Hi Steve

Thanks for the comments. I correct the words. And regarding to 
{quote}
That way the AM can choose to wait 1 minute or more for an anti-affine 
placement before giving up and accepting a node already in use.
{quote}
this is exactly the reason I proposed the PREFERRED rules,  you can set a max 
time await before compromising the rule. For example, you use ANTI_AFFINITY 
rule and set 1 minute to the max wait time, then RM will wait for at least 1 
minute before assigning a container to a node which already has a container 
running on it. Or ... forget about REQUIRED or PREFERRED, we can directly 
define these preference in ContainerAllocateRule class, with an attribute like 
*maxTimeAwaitBeforeCompromise*, default it is 0, which means never compromise 
the rule (REQUIRED). 



 add ability to specify affinity/anti-affinity in container requests
 ---

 Key: YARN-1042
 URL: https://issues.apache.org/jira/browse/YARN-1042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Arun C Murthy
 Attachments: YARN-1042-demo.patch


 container requests to the AM should be able to request anti-affinity to 
 ensure that things like Region Servers don't come up on the same failure 
 zones. 
 Similarly, you may be able to want to specify affinity to same host or rack 
 without specifying which specific host/rack. Example: bringing up a small 
 giraph cluster in a large YARN cluster would benefit from having the 
 processes in the same rack purely for bandwidth reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3381) A typographical error in InvalidStateTransitonException

2015-06-08 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3381:
---
Attachment: YARN-3381-004.patch

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3780) Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition

2015-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576659#comment-14576659
 ] 

Hudson commented on YARN-3780:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7986 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7986/])
YARN-3780. Should use equals when compare Resource in (devaraj: rev 
c7ee6c151c5771043a6de3b8a951cea13f59dd7b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* hadoop-yarn-project/CHANGES.txt


 Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition
 -

 Key: YARN-3780
 URL: https://issues.apache.org/jira/browse/YARN-3780
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3780.000.patch


 Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition 
 to avoid unnecessary NodeResourceUpdateSchedulerEvent.
 The current code use {{!=}} to compare Resource totalCapability, which will 
 compare reference not the real value in Resource. So we should use equals to 
 compare Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException

2015-06-08 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576670#comment-14576670
 ] 

Brahma Reddy Battula commented on YARN-3381:


Thanks a lot [~vinodkv],[~sidharta-s],[~rchiang] and [~ozawa] for taking a look 
into this issue..Attached patch based on [~vinodkv] comment.. Kindly review..

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-08 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576671#comment-14576671
 ] 

Rohith commented on YARN-3017:
--

+1 lgtm (non-binding)

 ContainerID in ResourceManager Log Has Slightly Different Format From 
 AppAttemptID
 --

 Key: YARN-3017
 URL: https://issues.apache.org/jira/browse/YARN-3017
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: MUFEED USMAN
Priority: Minor
  Labels: PatchAvailable
 Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch


 Not sure if this should be filed as a bug or not.
 In the ResourceManager log in the events surrounding the creation of a new
 application attempt,
 ...
 ...
 2014-11-14 17:45:37,258 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
 masterappattempt_1412150883650_0001_02
 ...
 ...
 The application attempt has the ID format _1412150883650_0001_02.
 Whereas the associated ContainerID goes by _1412150883650_0001_02_.
 ...
 ...
 2014-11-14 17:45:37,260 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
 up
 container Container: [ContainerId: container_1412150883650_0001_02_01,
 NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, 
 vCores:1,
 disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service:
 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
 ...
 ...
 Curious to know if this is kept like that for a reason. If not while using
 filtering tools to, say, grep events surrounding a specific attempt by the
 numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException

2015-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576765#comment-14576765
 ] 

Hadoop QA commented on YARN-3381:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m  3s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:red}-1{color} | javac |   7m 35s | The applied patch generated  11  
additional warning messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m  2s | The applied patch generated  1 
new checkstyle issues (total was 466, now 466). |
| {color:red}-1{color} | checkstyle |   2m 43s | The applied patch generated  1 
new checkstyle issues (total was 48, now 49). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   6m 11s | The patch appears to introduce 2 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   9m 27s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | yarn tests |   6m 59s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   6m 12s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  50m 23s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 124m 28s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-common |
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738304/YARN-3381-004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c7ee6c1 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8214/artifact/patchprocess/trunkFindbugsWarningshadoop-mapreduce-client-app.html
 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/8214/artifact/patchprocess/diffJavacWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8214/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-app.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/8214/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8214/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8214/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8214/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8214/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8214/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8214/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8214/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8214/console |


This message was automatically generated.

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
   

[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-06-08 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576774#comment-14576774
 ] 

Rohith commented on YARN-3535:
--

Recently in test we faced same issue,  [~peng.zhang] would you mind updating 
the patch?

  ResourceRequest should be restored back to scheduler when RMContainer is 
 killed at ALLOCATED
 -

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical
  Labels: BB2015-05-TBR
 Attachments: YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
 yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3381) A typographical error in InvalidStateTransitonException

2015-06-08 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3381:
---
Attachment: YARN-3381-004.patch

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure

2015-06-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576761#comment-14576761
 ] 

zhihai xu commented on YARN-3591:
-

Hi [~lavkesh], thanks for the update.
IMHO, although storing local error directories in NM state store will be 
implemented in a separate follow-up JIRA, it will be good to make this patch to 
accommodate with it. Upon NM start, we can consider the previous error Dirs is 
the error Dirs stored in NM state store.
{{DirectoryCollection#checkDirs}} is already called at 
{{LocalDirsHandlerService#serviceInit}} before 
{{registerLocalDirsChangeListener}} is called at 
{{ResourceLocalizationService#serviceStart}}. {{onDirsChanged}} will be called 
in {{registerLocalDirsChangeListener}} for the first time. You can see we 
already have previous error Dirs when {{onDirsChanged}} is called for the first 
time, we just need current error Dirs to calculate newErrorDirs and 
newRepairedDirs, which is implemented at my proposal #4.
So instead of adding three APIs({{getDiskNewErrorDirs}}, 
{{getDiskNewRepairedDirs}} and {{getErrorDirs}}) in DirectoryCollection, we can 
just add one API {{getErrorDirs}}. It will make the interface simpler and make 
the code cleaner.
And also even you have three APIs, when {{onDirsChanged}} is called for the 
first time, you still need to recalculate newErrorDirs and newRepairedDirs 
based on the errors Dirs stored in NM state store.

bq. upon start we can do a cleanUpLocalDir on the errordirs.
we needn't do it because we can handle it in {{onDirsChanged}}

As [~sunilg] suggested, changing checkLocalizedResources implementation to call 
removeResource on those localized resources whose parent is present in 
newErrorDirs will be better. Because it will give better performance.

By the way, {{checkAndInitializeLocalDirs}} should be called after 
{{cleanUpLocalDir}}, because once the directory is cleaned up, it need be 
reinitialized.

 Resource Localisation on a bad disk causes subsequent containers failure 
 -

 Key: YARN-3591
 URL: https://issues.apache.org/jira/browse/YARN-3591
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, 
 YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch


 It happens when a resource is localised on the disk, after localising that 
 disk has gone bad. NM keeps paths for localised resources in memory.  At the 
 time of resource request isResourcePresent(rsrc) will be called which calls 
 file.exists() on the localised path.
 In some cases when disk has gone bad, inodes are stilled cached and 
 file.exists() returns true. But at the time of reading, file will not open.
 Note: file.exists() actually calls stat64 natively which returns true because 
 it was able to find inode information from the OS.
 A proposal is to call file.list() on the parent path of the resource, which 
 will call open() natively. If the disk is good it should return an array of 
 paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-06-08 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3535:
-
Priority: Critical  (was: Major)

  ResourceRequest should be restored back to scheduler when RMContainer is 
 killed at ALLOCATED
 -

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical
  Labels: BB2015-05-TBR
 Attachments: YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
 yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3747) TestLocalDirsHandlerService should delete the created test directory logDir2

2015-06-08 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3747:

Summary: TestLocalDirsHandlerService should delete the created test 
directory logDir2  (was: TestLocalDirsHandlerService.java: test directory 
logDir2 not deleted)

 TestLocalDirsHandlerService should delete the created test directory logDir2
 

 Key: YARN-3747
 URL: https://issues.apache.org/jira/browse/YARN-3747
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test, yarn
Affects Versions: 2.7.0
Reporter: David Moore
Assignee: David Moore
Priority: Minor
  Labels: patch, test, yarn
 Attachments: YARN-3747.patch


 During a code review of 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java
  I noted that logDir2 is never deleted while logDir1 is deleted twice. This 
 is not in keeping with the rest of the function and appears to be a bug. 
 I will be submitting a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3747) TestLocalDirsHandlerService should delete the created test directory logDir2

2015-06-08 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3747:

 Component/s: (was: yarn)
Target Version/s: 2.8.0  (was: 2.7.0)
  Labels:   (was: patch test yarn)
Hadoop Flags: Reviewed

+1, looks good to me, will commit it shortly.

 TestLocalDirsHandlerService should delete the created test directory logDir2
 

 Key: YARN-3747
 URL: https://issues.apache.org/jira/browse/YARN-3747
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: David Moore
Assignee: David Moore
Priority: Minor
 Attachments: YARN-3747.patch


 During a code review of 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java
  I noted that logDir2 is never deleted while logDir1 is deleted twice. This 
 is not in keeping with the rest of the function and appears to be a bug. 
 I will be submitting a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-06-08 Thread Peng Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576970#comment-14576970
 ] 

Peng Zhang commented on YARN-3535:
--

Sorry for late reply.

Thanks for your comments.

bq. 1. I think the method recoverResourceRequestForContainer should be 
synchronized, any thought?
I notice it's not with synchronized originally. I checked this method and found 
only applications need to be protected( get by calling 
getCurrentAttemptForContainer() ). applications is instantiated using 
ConcurrentHashMap in derived scheduler, so I think it's no need to add 
synchronized.

Other three comments are all related with test. 
# Changing TestAMRestart.java is because that case 
testAMRestartWithExistingContainers will trigger this logic. After this patch, 
one more container may be scheduled, and 
attempt.getJustFinishedContainers().size() may be bigger than expectedNum and 
loop never ends. So I simply change the situation.
# I agreed that this issue exist in all scheduler, and should be tested 
generally. But I didn't find good way to reproduce it. I'll take a try with 
ParameterizedSchedulerTestBase.
# I change RMContextImpl.java to get schedulerDispatcher and start it in test 
TestFairScheduler. Otherwise event handler cannot be triggered. I'll check if 
this can also be solved based on ParameterizedSchedulerTestBase.


  ResourceRequest should be restored back to scheduler when RMContainer is 
 killed at ALLOCATED
 -

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical
  Labels: BB2015-05-TBR
 Attachments: YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
 yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3747) TestLocalDirsHandlerService should delete the created test directory logDir2

2015-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576980#comment-14576980
 ] 

Hudson commented on YARN-3747:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7987 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7987/])
YARN-3747. TestLocalDirsHandlerService should delete the created test (devaraj: 
rev 126321eded7dc38c1eef2cfde9365404c924a5cb)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java


 TestLocalDirsHandlerService should delete the created test directory logDir2
 

 Key: YARN-3747
 URL: https://issues.apache.org/jira/browse/YARN-3747
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: David Moore
Assignee: David Moore
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3747.patch


 During a code review of 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java
  I noted that logDir2 is never deleted while logDir1 is deleted twice. This 
 is not in keeping with the rest of the function and appears to be a bug. 
 I will be submitting a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3381) A typographical error in InvalidStateTransitonException

2015-06-08 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3381:
---
Attachment: (was: YARN-3381-004.patch)

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3747) TestLocalDirsHandlerService.java: test directory logDir2 not deleted

2015-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576782#comment-14576782
 ] 

Hadoop QA commented on YARN-3747:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   6m 21s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 40s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 19s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 36s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 11s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m  4s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  24m 19s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12736555/YARN-3747.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / c7ee6c1 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8215/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8215/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8215/console |


This message was automatically generated.

 TestLocalDirsHandlerService.java: test directory logDir2 not deleted
 

 Key: YARN-3747
 URL: https://issues.apache.org/jira/browse/YARN-3747
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test, yarn
Affects Versions: 2.7.0
Reporter: David Moore
Priority: Minor
  Labels: patch, test, yarn
 Attachments: YARN-3747.patch


 During a code review of 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java
  I noted that logDir2 is never deleted while logDir1 is deleted twice. This 
 is not in keeping with the rest of the function and appears to be a bug. 
 I will be submitting a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1042) add ability to specify affinity/anti-affinity in container requests

2015-06-08 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-1042:
--
Attachment: YARN-1042-001.patch

I have worked out a patch, it is not all complete, but the part I've done works 
the way I expected. I tested on my 5 nodes cluster against AFFINITY and 
ANTI_AFFINITY in NODE scope. Please kindly help to review the patch, appreciate 
for comments and suggestions.

Work left to be done are
1. Complete container allocate handlers for RACK scope
2. Write test cases to test rules in RACK scope
3. Add support to argument maxTimeAwaitBeforeCompromise 
4. Complete code changes on other schedulers

 add ability to specify affinity/anti-affinity in container requests
 ---

 Key: YARN-1042
 URL: https://issues.apache.org/jira/browse/YARN-1042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Arun C Murthy
 Attachments: YARN-1042-001.patch, YARN-1042-demo.patch


 container requests to the AM should be able to request anti-affinity to 
 ensure that things like Region Servers don't come up on the same failure 
 zones. 
 Similarly, you may be able to want to specify affinity to same host or rack 
 without specifying which specific host/rack. Example: bringing up a small 
 giraph cluster in a large YARN cluster would benefit from having the 
 processes in the same rack purely for bandwidth reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3747) TestLocalDirsHandlerService.java: test directory logDir2 not deleted

2015-06-08 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3747:

Assignee: David Moore

 TestLocalDirsHandlerService.java: test directory logDir2 not deleted
 

 Key: YARN-3747
 URL: https://issues.apache.org/jira/browse/YARN-3747
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test, yarn
Affects Versions: 2.7.0
Reporter: David Moore
Assignee: David Moore
Priority: Minor
  Labels: patch, test, yarn
 Attachments: YARN-3747.patch


 During a code review of 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java
  I noted that logDir2 is never deleted while logDir1 is deleted twice. This 
 is not in keeping with the rest of the function and appears to be a bug. 
 I will be submitting a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3508) Preemption processing occuring on the main RM dispatcher

2015-06-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577021#comment-14577021
 ] 

Varun Saxena commented on YARN-3508:


Just to clarify, in the implementation I have spawned a new preemption 
dispatcher thread instead of posting preemption events to scheduler dispatcher.
This is because IMHO container preemption events should have priority over 
scheduler events. This approach though would make this one extra thread 
contending for scheduler lock.

Another approach though would be to post events preemption events to scheduler 
dispatcher. And have a {{LinkedBlockingDeque}} for storing events instead. This 
way preemption events can be posted to front of queue. However, linked blocking 
deque uses a single lock for put and take operations whereas linked blocking 
queue uses 2 different locks for these 2 operations making the latter better 
from a performance viewpoint.

[~jlowe], [~jianhe], [~leftnoteasy], thoughts on the approaches mentioned above 
?

 Preemption processing occuring on the main RM dispatcher
 

 Key: YARN-3508
 URL: https://issues.apache.org/jira/browse/YARN-3508
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Attachments: YARN-3508.002.patch, YARN-3508.01.patch


 We recently saw the RM for a large cluster lag far behind on the 
 AsyncDispacher event queue.  The AsyncDispatcher thread was consistently 
 blocked on the highly-contended CapacityScheduler lock trying to dispatch 
 preemption-related events for RMContainerPreemptEventDispatcher.  Preemption 
 processing should occur on the scheduler event dispatcher thread or a 
 separate thread to avoid delaying the processing of other events in the 
 primary dispatcher queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3508) Preemption processing occuring on the main RM dispatcher

2015-06-08 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577026#comment-14577026
 ] 

Rohith commented on YARN-3508:
--

The problem I see in the clubbing with scheduler events is if there is many 
scheduler events already in the event queue then it delays pre-emption events 
to trigger. As [~varun_saxena] said, container preemption events should be 
considered as higher priority than scheduler events. Having separate event 
disaptcher for preemptiong events would allow preemption events to participate 
in obtaining the lock in--earlier--stages rather then waiting for scheuduler 
events queue to complete.  I think current patch approach make sense to me i.e 
having individual dispatcher thread for preemption events. 

 Preemption processing occuring on the main RM dispatcher
 

 Key: YARN-3508
 URL: https://issues.apache.org/jira/browse/YARN-3508
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Attachments: YARN-3508.002.patch, YARN-3508.01.patch


 We recently saw the RM for a large cluster lag far behind on the 
 AsyncDispacher event queue.  The AsyncDispatcher thread was consistently 
 blocked on the highly-contended CapacityScheduler lock trying to dispatch 
 preemption-related events for RMContainerPreemptEventDispatcher.  Preemption 
 processing should occur on the scheduler event dispatcher thread or a 
 separate thread to avoid delaying the processing of other events in the 
 primary dispatcher queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3770) SerializedException should also handle java.lang.Error

2015-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577245#comment-14577245
 ] 

Hadoop QA commented on YARN-3770:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 58s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 52s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 52s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 39s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| | |  41m 26s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738350/YARN-3770.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 18f6809 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8218/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8218/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8218/console |


This message was automatically generated.

 SerializedException should also handle java.lang.Error 
 ---

 Key: YARN-3770
 URL: https://issues.apache.org/jira/browse/YARN-3770
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3770.1.patch, YARN-3770.patch


 IN SerializedExceptionPBImpl deserialize() method
 {code}
 Class classType = null;
 if (YarnException.class.isAssignableFrom(realClass)) {
   classType = YarnException.class;
 } else if (IOException.class.isAssignableFrom(realClass)) {
   classType = IOException.class;
 } else if (RuntimeException.class.isAssignableFrom(realClass)) {
   classType = RuntimeException.class;
 } else {
   classType = Exception.class;
 }
 return instantiateException(realClass.asSubclass(classType), getMessage(),
   cause == null ? null : cause.deSerialize());
   }
 {code}
 if realClass is a subclass of java.lang.Error deSerialize() throws 
 ClassCastException.
 in the last else statement classType should be equal to Trowable.class 
 instead of Exception.class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3775) Job does not exit after all node become unhealthy

2015-06-08 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith resolved YARN-3775.
--
Resolution: Not A Problem

Closing as Not A Problem. Please Reopen if you disagree..

 Job does not exit after all node become unhealthy
 -

 Key: YARN-3775
 URL: https://issues.apache.org/jira/browse/YARN-3775
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.1
 Environment: Environment:
 Version : 2.7.0
 OS: RHEL7 
 NameNodes:  xiachsh11 xiachsh12 (HA enabled)
 DataNodes:  5 xiachsh13-17
 ResourceManage:  xiachsh11
 NodeManage: 5 xiachsh13-17 
 all nodes are openstack provisioned:  
 MEM: 1.5G 
 Disk: 16G 
Reporter: Chengshun Xia
 Attachments: logs.tar.gz


 Running Terasort with data size 10G, all the containers exit since the disk 
 space threshold 0.90 reached,at this point,the job does not exit with error 
 15/06/05 13:13:28 INFO mapreduce.Job:  map 9% reduce 0%
 15/06/05 13:13:52 INFO mapreduce.Job:  map 10% reduce 0%
 15/06/05 13:14:30 INFO mapreduce.Job:  map 11% reduce 0%
 15/06/05 13:15:11 INFO mapreduce.Job:  map 12% reduce 0%
 15/06/05 13:15:43 INFO mapreduce.Job:  map 13% reduce 0%
 15/06/05 13:16:38 INFO mapreduce.Job:  map 14% reduce 0%
 15/06/05 13:16:41 INFO mapreduce.Job:  map 15% reduce 0%
 15/06/05 13:16:53 INFO mapreduce.Job:  map 16% reduce 0%
 15/06/05 13:17:24 INFO mapreduce.Job:  map 17% reduce 0%
 15/06/05 13:17:53 INFO mapreduce.Job:  map 18% reduce 0%
 15/06/05 13:18:36 INFO mapreduce.Job:  map 19% reduce 0%
 15/06/05 13:19:03 INFO mapreduce.Job:  map 20% reduce 0%
 15/06/05 13:19:09 INFO mapreduce.Job:  map 15% reduce 0%
 15/06/05 13:19:32 INFO mapreduce.Job:  map 16% reduce 0%
 15/06/05 13:20:00 INFO mapreduce.Job:  map 17% reduce 0%
 15/06/05 13:20:36 INFO mapreduce.Job:  map 18% reduce 0%
 15/06/05 13:20:57 INFO mapreduce.Job:  map 19% reduce 0%
 15/06/05 13:21:22 INFO mapreduce.Job:  map 18% reduce 0%
 15/06/05 13:21:24 INFO mapreduce.Job:  map 14% reduce 0%
 15/06/05 13:21:25 INFO mapreduce.Job:  map 9% reduce 0%
 15/06/05 13:21:28 INFO mapreduce.Job:  map 10% reduce 0%
 15/06/05 13:22:22 INFO mapreduce.Job:  map 11% reduce 0%
 15/06/05 13:23:06 INFO mapreduce.Job:  map 12% reduce 0%
 15/06/05 13:23:41 INFO mapreduce.Job:  map 9% reduce 0%
 15/06/05 13:23:42 INFO mapreduce.Job:  map 5% reduce 0%
 15/06/05 13:24:38 INFO mapreduce.Job:  map 6% reduce 0%
 15/06/05 13:25:16 INFO mapreduce.Job:  map 7% reduce 0%
 15/06/05 13:25:53 INFO mapreduce.Job:  map 8% reduce 0%
 15/06/05 13:26:35 INFO mapreduce.Job:  map 9% reduce 0%
 the last response time is  15/06/05 13:26:35
 and current time :
 [root@xiachsh11 logs]# date
 Fri Jun  5 19:19:59 EDT 2015
 [root@xiachsh11 logs]#
 [root@xiachsh11 logs]# yarn node -list
 15/06/05 19:20:18 INFO client.RMProxy: Connecting to ResourceManager at 
 xiachsh11.eng.platformlab.ibm.com/9.21.62.234:8032
 Total Nodes:0
  Node-Id Node-State Node-Http-Address   
 Number-of-Running-Containers
 [root@xiachsh11 logs]#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3779) Log Aggregation Deletion doesnt work after refreshing Log Retention Settings in secure cluster

2015-06-08 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3779:
---
Description: 
{{GSSException}} is thrown everytime log aggregation deletion is attempted 
after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure 
cluster.

The problem can be reproduced by following steps:
1. startup historyserver in secure cluster.
2. Log deletion happens as per expectation. 
3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh 
the configuration value.
4. All the subsequent attempts of log deletion fail with {{GSSException}}

Following exception can be found in historyserver's log if log deletion is 
enabled. 
{noformat}
2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this 
deletion attempt is being aborted | AggregatedLogDeletionService.java:127
java.io.IOException: Failed on local exception: java.io.IOException: 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]; Host Details : local host is: vm-31/9.91.12.31; destination host is: 
vm-33:25000; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1414)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy9.getListing(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519)
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.getListing(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749)
at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate 
failed [Caused by GSSException: No valid credentials provided (Mechanism level: 
Failed to find any Kerberos tgt)]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641)
at 
org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
at org.apache.hadoop.ipc.Client.call(Client.java:1381)
... 21 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
at 
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:411)
at 
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:550)
at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:367)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:716)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:712)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 

[jira] [Commented] (YARN-3774) ZKRMStateStore should use Curator 3.0 and avail CuratorOp

2015-06-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577100#comment-14577100
 ] 

Steve Loughran commented on YARN-3774:
--

..and then there's the transitive guava dependency

 ZKRMStateStore should use Curator 3.0 and avail CuratorOp
 -

 Key: YARN-3774
 URL: https://issues.apache.org/jira/browse/YARN-3774
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker

 YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are 
 somewhat involved, and could be improved using CuratorOp introduced in 
 Curator 3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version 
 and make this change. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3770) SerializedException should also handle java.lang.Error

2015-06-08 Thread Lavkesh Lahngir (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lavkesh Lahngir updated YARN-3770:
--
Attachment: YARN-3770.1.patch

 SerializedException should also handle java.lang.Error 
 ---

 Key: YARN-3770
 URL: https://issues.apache.org/jira/browse/YARN-3770
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3770.1.patch, YARN-3770.patch


 IN SerializedExceptionPBImpl deserialize() method
 {code}
 Class classType = null;
 if (YarnException.class.isAssignableFrom(realClass)) {
   classType = YarnException.class;
 } else if (IOException.class.isAssignableFrom(realClass)) {
   classType = IOException.class;
 } else if (RuntimeException.class.isAssignableFrom(realClass)) {
   classType = RuntimeException.class;
 } else {
   classType = Exception.class;
 }
 return instantiateException(realClass.asSubclass(classType), getMessage(),
   cause == null ? null : cause.deSerialize());
   }
 {code}
 if realClass is a subclass of java.lang.Error deSerialize() throws 
 ClassCastException.
 in the last else statement classType should be equal to Trowable.class 
 instead of Exception.class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3783) Modify onChangeDir() to remove cached resources from the memory.

2015-06-08 Thread Lavkesh Lahngir (JIRA)
Lavkesh Lahngir created YARN-3783:
-

 Summary: Modify onChangeDir() to remove cached resources from the 
memory. 
 Key: YARN-3783
 URL: https://issues.apache.org/jira/browse/YARN-3783
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577254#comment-14577254
 ] 

Hudson commented on YARN-3655:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2150 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2150/])
YARN-3655. FairScheduler: potential livelock due to maxAMShare limitation and 
container reservation. (Zhihai Xu via kasha) (kasha: rev 
bd69ea408f8fdd8293836ce1089fe9b01616f2f7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java


 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation 
 -

 Key: YARN-3655
 URL: https://issues.apache.org/jira/browse/YARN-3655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
 YARN-3655.002.patch, YARN-3655.003.patch, YARN-3655.004.patch


 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation.
 If a node is reserved by an application, all the other applications don't 
 have any chance to assign a new container on this node, unless the 
 application which reserves the node assigns a new container on this node or 
 releases the reserved container on this node.
 The problem is if an application tries to call assignReservedContainer and 
 fail to get a new container due to maxAMShare limitation, it will block all 
 other applications to use the nodes it reserves. If all other running 
 applications can't release their AM containers due to being blocked by these 
 reserved containers. A livelock situation can happen.
 The following is the code at FSAppAttempt#assignContainer which can cause 
 this potential livelock.
 {code}
 // Check the AM resource usage for the leaf queue
 if (!isAmRunning()  !getUnmanagedAM()) {
   ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests();
   if (ask.isEmpty() || !getQueue().canRunAppAM(
   ask.get(0).getCapability())) {
 if (LOG.isDebugEnabled()) {
   LOG.debug(Skipping allocation because maxAMShare limit would  +
   be exceeded);
 }
 return Resources.none();
   }
 }
 {code}
 To fix this issue, we can unreserve the node if we can't allocate the AM 
 container on the node due to Max AM share limitation and the node is reserved 
 by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3747) TestLocalDirsHandlerService should delete the created test directory logDir2

2015-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577251#comment-14577251
 ] 

Hudson commented on YARN-3747:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2150 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2150/])
YARN-3747. TestLocalDirsHandlerService should delete the created test (devaraj: 
rev 126321eded7dc38c1eef2cfde9365404c924a5cb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java
* hadoop-yarn-project/CHANGES.txt


 TestLocalDirsHandlerService should delete the created test directory logDir2
 

 Key: YARN-3747
 URL: https://issues.apache.org/jira/browse/YARN-3747
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: David Moore
Assignee: David Moore
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3747.patch


 During a code review of 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java
  I noted that logDir2 is never deleted while logDir1 is deleted twice. This 
 is not in keeping with the rest of the function and appears to be a bug. 
 I will be submitting a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3780) Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition

2015-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577253#comment-14577253
 ] 

Hudson commented on YARN-3780:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2150 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2150/])
YARN-3780. Should use equals when compare Resource in (devaraj: rev 
c7ee6c151c5771043a6de3b8a951cea13f59dd7b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* hadoop-yarn-project/CHANGES.txt


 Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition
 -

 Key: YARN-3780
 URL: https://issues.apache.org/jira/browse/YARN-3780
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3780.000.patch


 Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition 
 to avoid unnecessary NodeResourceUpdateSchedulerEvent.
 The current code use {{!=}} to compare Resource totalCapability, which will 
 compare reference not the real value in Resource. So we should use equals to 
 compare Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException

2015-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577194#comment-14577194
 ] 

Hadoop QA commented on YARN-3381:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 42s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:red}-1{color} | javac |   7m 43s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |   9m 50s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 49s | The applied patch generated  1 
new checkstyle issues (total was 48, now 49). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   6m 29s | The patch appears to introduce 3 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   9m 28s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | yarn tests |   7m  2s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   6m  9s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  50m 36s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 121m 52s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-mapreduce-client-app |
| FindBugs | module:hadoop-yarn-common |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738342/YARN-3381-004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 126321e |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/8216/artifact/patchprocess/diffJavacWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8216/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8216/artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8216/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8216/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8216/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8216/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8216/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8216/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8216/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8216/console |


This message was automatically generated.

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3780) Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition

2015-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577053#comment-14577053
 ] 

Hudson commented on YARN-3780:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #222 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/222/])
YARN-3780. Should use equals when compare Resource in (devaraj: rev 
c7ee6c151c5771043a6de3b8a951cea13f59dd7b)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java


 Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition
 -

 Key: YARN-3780
 URL: https://issues.apache.org/jira/browse/YARN-3780
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3780.000.patch


 Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition 
 to avoid unnecessary NodeResourceUpdateSchedulerEvent.
 The current code use {{!=}} to compare Resource totalCapability, which will 
 compare reference not the real value in Resource. So we should use equals to 
 compare Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577054#comment-14577054
 ] 

Hudson commented on YARN-3655:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #222 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/222/])
YARN-3655. FairScheduler: potential livelock due to maxAMShare limitation and 
container reservation. (Zhihai Xu via kasha) (kasha: rev 
bd69ea408f8fdd8293836ce1089fe9b01616f2f7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation 
 -

 Key: YARN-3655
 URL: https://issues.apache.org/jira/browse/YARN-3655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
 YARN-3655.002.patch, YARN-3655.003.patch, YARN-3655.004.patch


 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation.
 If a node is reserved by an application, all the other applications don't 
 have any chance to assign a new container on this node, unless the 
 application which reserves the node assigns a new container on this node or 
 releases the reserved container on this node.
 The problem is if an application tries to call assignReservedContainer and 
 fail to get a new container due to maxAMShare limitation, it will block all 
 other applications to use the nodes it reserves. If all other running 
 applications can't release their AM containers due to being blocked by these 
 reserved containers. A livelock situation can happen.
 The following is the code at FSAppAttempt#assignContainer which can cause 
 this potential livelock.
 {code}
 // Check the AM resource usage for the leaf queue
 if (!isAmRunning()  !getUnmanagedAM()) {
   ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests();
   if (ask.isEmpty() || !getQueue().canRunAppAM(
   ask.get(0).getCapability())) {
 if (LOG.isDebugEnabled()) {
   LOG.debug(Skipping allocation because maxAMShare limit would  +
   be exceeded);
 }
 return Resources.none();
   }
 }
 {code}
 To fix this issue, we can unreserve the node if we can't allocate the AM 
 container on the node due to Max AM share limitation and the node is reserved 
 by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3747) TestLocalDirsHandlerService should delete the created test directory logDir2

2015-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577051#comment-14577051
 ] 

Hudson commented on YARN-3747:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #222 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/222/])
YARN-3747. TestLocalDirsHandlerService should delete the created test (devaraj: 
rev 126321eded7dc38c1eef2cfde9365404c924a5cb)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java


 TestLocalDirsHandlerService should delete the created test directory logDir2
 

 Key: YARN-3747
 URL: https://issues.apache.org/jira/browse/YARN-3747
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: David Moore
Assignee: David Moore
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3747.patch


 During a code review of 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java
  I noted that logDir2 is never deleted while logDir1 is deleted twice. This 
 is not in keeping with the rest of the function and appears to be a bug. 
 I will be submitting a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1042) add ability to specify affinity/anti-affinity in container requests

2015-06-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577079#comment-14577079
 ] 

Steve Loughran commented on YARN-1042:
--

thanks for this. I'm afraid I don't know enough about the YARN scheduling to 
review it properly —hopefully the experts will.

meanwhile, 
# does this let us specify different rules for different container requests? 
The uses we see in slider do differentiate different requests (e.g. we care 
more about affinity of hbase masters than we do about workers). 
# how would we set up the enum for if we ever want to add different placement 
policies in future? I think the strategy would be to have a no-preferences 
policy, which would be the default, and different from affinity (==I really 
want them on the same node) and anti-affinity. Then we could have a {{switch}} 
statement to choose placement, rather than just an {{isAffinity()}} predicate.
# hadoop's [code 
rules|https://github.com/steveloughran/formality/blob/master/styleguide/styleguide.md#code-style]
 are two spaces, no tabs. 

 add ability to specify affinity/anti-affinity in container requests
 ---

 Key: YARN-1042
 URL: https://issues.apache.org/jira/browse/YARN-1042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Arun C Murthy
 Attachments: YARN-1042-001.patch, YARN-1042-demo.patch


 container requests to the AM should be able to request anti-affinity to 
 ensure that things like Region Servers don't come up on the same failure 
 zones. 
 Similarly, you may be able to want to specify affinity to same host or rack 
 without specifying which specific host/rack. Example: bringing up a small 
 giraph cluster in a large YARN cluster would benefit from having the 
 processes in the same rack purely for bandwidth reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-3755) Log the command of launching containers

2015-06-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened YARN-3755:
--

 Log the command of launching containers
 ---

 Key: YARN-3755
 URL: https://issues.apache.org/jira/browse/YARN-3755
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: YARN-3755-1.patch, YARN-3755-2.patch


 In the resource manager log, yarn would log the command for launching AM, 
 this is very useful. But there's no such log in the NN log for launching 
 containers. It would be difficult to diagnose when containers fails to launch 
 due to some issue in the commands. Although user can look at the commands in 
 the container launch script file, this is an internal things of yarn, usually 
 user don't know that. In user's perspective, they only know what commands 
 they specify when building yarn application. 
 {code}
 2015-06-01 16:06:42,245 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command 
 to launch container container_1433145984561_0001_01_01 : 
 $JAVA_HOME/bin/java -server -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN  -Xmx1024m  
 -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
 -Dlog4j.configuration=tez-container-log4j.properties 
 -Dyarn.app.container.log.dir=LOG_DIR -Dtez.root.logger=info,CLA 
 -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster 
 1LOG_DIR/stdout 2LOG_DIR/stderr
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3755) Log the command of launching containers

2015-06-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577097#comment-14577097
 ] 

Steve Loughran commented on YARN-3755:
--

I think this should be re-opened with YARN-3759 tagged as a duplicate.
# one of the problems of pushing it to the framework is what if you can't 
control that framework? Because I spent a lot of time last week having to add 
the logging inside my own fork of spark do to an env  $LOG_DIR/env
# your framework code doesn't know the CLI and env that is finally generated

I propose, then, one of two options
# that YARN has the option to log the env and CLI to the log directory
# that we have a specific YARN log for app  container launch, which can be set 
to INFO to get this log information
# that we start a section in the docs troubleshooting YARN which explains how 
to use this stuff.

This patch would be basis the no. 2 item on that list; we just create a 
specific log for those messages.



 Log the command of launching containers
 ---

 Key: YARN-3755
 URL: https://issues.apache.org/jira/browse/YARN-3755
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: YARN-3755-1.patch, YARN-3755-2.patch


 In the resource manager log, yarn would log the command for launching AM, 
 this is very useful. But there's no such log in the NN log for launching 
 containers. It would be difficult to diagnose when containers fails to launch 
 due to some issue in the commands. Although user can look at the commands in 
 the container launch script file, this is an internal things of yarn, usually 
 user don't know that. In user's perspective, they only know what commands 
 they specify when building yarn application. 
 {code}
 2015-06-01 16:06:42,245 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command 
 to launch container container_1433145984561_0001_01_01 : 
 $JAVA_HOME/bin/java -server -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN  -Xmx1024m  
 -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
 -Dlog4j.configuration=tez-container-log4j.properties 
 -Dyarn.app.container.log.dir=LOG_DIR -Dtez.root.logger=info,CLA 
 -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster 
 1LOG_DIR/stdout 2LOG_DIR/stderr
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3780) Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition

2015-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577083#comment-14577083
 ] 

Hudson commented on YARN-3780:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #952 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/952/])
YARN-3780. Should use equals when compare Resource in (devaraj: rev 
c7ee6c151c5771043a6de3b8a951cea13f59dd7b)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java


 Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition
 -

 Key: YARN-3780
 URL: https://issues.apache.org/jira/browse/YARN-3780
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3780.000.patch


 Should use equals when compare Resource in RMNodeImpl#ReconnectNodeTransition 
 to avoid unnecessary NodeResourceUpdateSchedulerEvent.
 The current code use {{!=}} to compare Resource totalCapability, which will 
 compare reference not the real value in Resource. So we should use equals to 
 compare Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3747) TestLocalDirsHandlerService should delete the created test directory logDir2

2015-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577081#comment-14577081
 ] 

Hudson commented on YARN-3747:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #952 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/952/])
YARN-3747. TestLocalDirsHandlerService should delete the created test (devaraj: 
rev 126321eded7dc38c1eef2cfde9365404c924a5cb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java
* hadoop-yarn-project/CHANGES.txt


 TestLocalDirsHandlerService should delete the created test directory logDir2
 

 Key: YARN-3747
 URL: https://issues.apache.org/jira/browse/YARN-3747
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: David Moore
Assignee: David Moore
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3747.patch


 During a code review of 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java
  I noted that logDir2 is never deleted while logDir1 is deleted twice. This 
 is not in keeping with the rest of the function and appears to be a bug. 
 I will be submitting a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577084#comment-14577084
 ] 

Hudson commented on YARN-3655:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #952 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/952/])
YARN-3655. FairScheduler: potential livelock due to maxAMShare limitation and 
container reservation. (Zhihai Xu via kasha) (kasha: rev 
bd69ea408f8fdd8293836ce1089fe9b01616f2f7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java


 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation 
 -

 Key: YARN-3655
 URL: https://issues.apache.org/jira/browse/YARN-3655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
 YARN-3655.002.patch, YARN-3655.003.patch, YARN-3655.004.patch


 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation.
 If a node is reserved by an application, all the other applications don't 
 have any chance to assign a new container on this node, unless the 
 application which reserves the node assigns a new container on this node or 
 releases the reserved container on this node.
 The problem is if an application tries to call assignReservedContainer and 
 fail to get a new container due to maxAMShare limitation, it will block all 
 other applications to use the nodes it reserves. If all other running 
 applications can't release their AM containers due to being blocked by these 
 reserved containers. A livelock situation can happen.
 The following is the code at FSAppAttempt#assignContainer which can cause 
 this potential livelock.
 {code}
 // Check the AM resource usage for the leaf queue
 if (!isAmRunning()  !getUnmanagedAM()) {
   ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests();
   if (ask.isEmpty() || !getQueue().canRunAppAM(
   ask.get(0).getCapability())) {
 if (LOG.isDebugEnabled()) {
   LOG.debug(Skipping allocation because maxAMShare limit would  +
   be exceeded);
 }
 return Resources.none();
   }
 }
 {code}
 To fix this issue, we can unreserve the node if we can't allocate the AM 
 container on the node due to Max AM share limitation and the node is reserved 
 by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3745:
-
Target Version/s: 2.8.0  (was: 2.7.0)

 SerializedException should also try to instantiate internal exception with 
 the default constructor
 --

 Key: YARN-3745
 URL: https://issues.apache.org/jira/browse/YARN-3745
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3745.1.patch, YARN-3745.2.patch, YARN-3745.3.patch, 
 YARN-3745.patch


 While deserialising a SerializedException it tries to create internal 
 exception in instantiateException() with cn = 
 cls.getConstructor(String.class).
 if cls does not has a constructor with String parameter it throws 
 Nosuchmethodexception
 for example ClosedChannelException class.  
 We should also try to instantiate exception with default constructor so that 
 inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3782) Add api for storing/getting Error dirs from NMStateStore

2015-06-08 Thread Lavkesh Lahngir (JIRA)
Lavkesh Lahngir created YARN-3782:
-

 Summary: Add api for storing/getting Error dirs from NMStateStore
 Key: YARN-3782
 URL: https://issues.apache.org/jira/browse/YARN-3782
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577120#comment-14577120
 ] 

Hadoop QA commented on YARN-3745:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 55s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 32s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 55s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| | |  39m 53s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738345/YARN-3745.3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 18f6809 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8217/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8217/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8217/console |


This message was automatically generated.

 SerializedException should also try to instantiate internal exception with 
 the default constructor
 --

 Key: YARN-3745
 URL: https://issues.apache.org/jira/browse/YARN-3745
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3745.1.patch, YARN-3745.2.patch, YARN-3745.3.patch, 
 YARN-3745.patch


 While deserialising a SerializedException it tries to create internal 
 exception in instantiateException() with cn = 
 cls.getConstructor(String.class).
 if cls does not has a constructor with String parameter it throws 
 Nosuchmethodexception
 for example ClosedChannelException class.  
 We should also try to instantiate exception with default constructor so that 
 inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3781) Add api for getErrorDirs()

2015-06-08 Thread Lavkesh Lahngir (JIRA)
Lavkesh Lahngir created YARN-3781:
-

 Summary: Add api for getErrorDirs()
 Key: YARN-3781
 URL: https://issues.apache.org/jira/browse/YARN-3781
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir


Add api for getting Error dirs which will be used to calculate recently 
repaired and recently gone bad dirs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure

2015-06-08 Thread Lavkesh Lahngir (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577124#comment-14577124
 ] 

Lavkesh Lahngir commented on YARN-3591:
---

[~zxu]: Thanks for the review and comments. 
I have added subtasks for more clarity. Please feel free to suggest changes. 

 Resource Localisation on a bad disk causes subsequent containers failure 
 -

 Key: YARN-3591
 URL: https://issues.apache.org/jira/browse/YARN-3591
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, 
 YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch


 It happens when a resource is localised on the disk, after localising that 
 disk has gone bad. NM keeps paths for localised resources in memory.  At the 
 time of resource request isResourcePresent(rsrc) will be called which calls 
 file.exists() on the localised path.
 In some cases when disk has gone bad, inodes are stilled cached and 
 file.exists() returns true. But at the time of reading, file will not open.
 Note: file.exists() actually calls stat64 natively which returns true because 
 it was able to find inode information from the OS.
 A proposal is to call file.list() on the parent path of the resource, which 
 will call open() natively. If the disk is good it should return an array of 
 paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3775) Job does not exit after all node become unhealthy

2015-06-08 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577065#comment-14577065
 ] 

Rohith commented on YARN-3775:
--

[~xiachengs...@yeah.net] Thanks for reporting the issue. IIUC, This is expected 
behavior
If the Application attempt is killed because of the following reason, then 
current attempt failure is not considered as attempt failures count. 
# Preempted
# Aborted
# Disk_failed(i.e NM unhealthy)
# killed by ResourceManager.

In your case, applicaitons attempt got killed because of disk_failed, which RM 
never consider this as attempt failure. So RM wait for this applications to 
launch and run in further NM register to it.

 Job does not exit after all node become unhealthy
 -

 Key: YARN-3775
 URL: https://issues.apache.org/jira/browse/YARN-3775
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.1
 Environment: Environment:
 Version : 2.7.0
 OS: RHEL7 
 NameNodes:  xiachsh11 xiachsh12 (HA enabled)
 DataNodes:  5 xiachsh13-17
 ResourceManage:  xiachsh11
 NodeManage: 5 xiachsh13-17 
 all nodes are openstack provisioned:  
 MEM: 1.5G 
 Disk: 16G 
Reporter: Chengshun Xia
 Attachments: logs.tar.gz


 Running Terasort with data size 10G, all the containers exit since the disk 
 space threshold 0.90 reached,at this point,the job does not exit with error 
 15/06/05 13:13:28 INFO mapreduce.Job:  map 9% reduce 0%
 15/06/05 13:13:52 INFO mapreduce.Job:  map 10% reduce 0%
 15/06/05 13:14:30 INFO mapreduce.Job:  map 11% reduce 0%
 15/06/05 13:15:11 INFO mapreduce.Job:  map 12% reduce 0%
 15/06/05 13:15:43 INFO mapreduce.Job:  map 13% reduce 0%
 15/06/05 13:16:38 INFO mapreduce.Job:  map 14% reduce 0%
 15/06/05 13:16:41 INFO mapreduce.Job:  map 15% reduce 0%
 15/06/05 13:16:53 INFO mapreduce.Job:  map 16% reduce 0%
 15/06/05 13:17:24 INFO mapreduce.Job:  map 17% reduce 0%
 15/06/05 13:17:53 INFO mapreduce.Job:  map 18% reduce 0%
 15/06/05 13:18:36 INFO mapreduce.Job:  map 19% reduce 0%
 15/06/05 13:19:03 INFO mapreduce.Job:  map 20% reduce 0%
 15/06/05 13:19:09 INFO mapreduce.Job:  map 15% reduce 0%
 15/06/05 13:19:32 INFO mapreduce.Job:  map 16% reduce 0%
 15/06/05 13:20:00 INFO mapreduce.Job:  map 17% reduce 0%
 15/06/05 13:20:36 INFO mapreduce.Job:  map 18% reduce 0%
 15/06/05 13:20:57 INFO mapreduce.Job:  map 19% reduce 0%
 15/06/05 13:21:22 INFO mapreduce.Job:  map 18% reduce 0%
 15/06/05 13:21:24 INFO mapreduce.Job:  map 14% reduce 0%
 15/06/05 13:21:25 INFO mapreduce.Job:  map 9% reduce 0%
 15/06/05 13:21:28 INFO mapreduce.Job:  map 10% reduce 0%
 15/06/05 13:22:22 INFO mapreduce.Job:  map 11% reduce 0%
 15/06/05 13:23:06 INFO mapreduce.Job:  map 12% reduce 0%
 15/06/05 13:23:41 INFO mapreduce.Job:  map 9% reduce 0%
 15/06/05 13:23:42 INFO mapreduce.Job:  map 5% reduce 0%
 15/06/05 13:24:38 INFO mapreduce.Job:  map 6% reduce 0%
 15/06/05 13:25:16 INFO mapreduce.Job:  map 7% reduce 0%
 15/06/05 13:25:53 INFO mapreduce.Job:  map 8% reduce 0%
 15/06/05 13:26:35 INFO mapreduce.Job:  map 9% reduce 0%
 the last response time is  15/06/05 13:26:35
 and current time :
 [root@xiachsh11 logs]# date
 Fri Jun  5 19:19:59 EDT 2015
 [root@xiachsh11 logs]#
 [root@xiachsh11 logs]# yarn node -list
 15/06/05 19:20:18 INFO client.RMProxy: Connecting to ResourceManager at 
 xiachsh11.eng.platformlab.ibm.com/9.21.62.234:8032
 Total Nodes:0
  Node-Id Node-State Node-Http-Address   
 Number-of-Running-Containers
 [root@xiachsh11 logs]#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-08 Thread Lavkesh Lahngir (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lavkesh Lahngir updated YARN-3745:
--
Attachment: YARN-3745.3.patch

 SerializedException should also try to instantiate internal exception with 
 the default constructor
 --

 Key: YARN-3745
 URL: https://issues.apache.org/jira/browse/YARN-3745
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3745.1.patch, YARN-3745.2.patch, YARN-3745.3.patch, 
 YARN-3745.patch


 While deserialising a SerializedException it tries to create internal 
 exception in instantiateException() with cn = 
 cls.getConstructor(String.class).
 if cls does not has a constructor with String parameter it throws 
 Nosuchmethodexception
 for example ClosedChannelException class.  
 We should also try to instantiate exception with default constructor so that 
 inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3759) Include command line, localization info and env vars on AM launch failure

2015-06-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-3759.
--
Resolution: Duplicate

 Include command line, localization info and env vars on AM launch failure
 -

 Key: YARN-3759
 URL: https://issues.apache.org/jira/browse/YARN-3759
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Steve Loughran
Priority: Minor

 While trying to diagnose AM launch failures, its important to be able to get 
 at the final, expanded {{CLASSPATH}} and other env variables. We don't get 
 that today: you can log the unexpanded values on the client, and tweak NM 
 ContainerExecutor log levels to DEBUG  get some of this —‚ut you don't get 
 it in the task logs, and tuning NM log level isn't viable on a large, busy 
 cluster.
 Launch failures should include some env specifics:
 # list of env vars (ideally, full getenv values), with some stripping of 
 sensitive options (i'm thinking AWS env vars here)
 # command line
 # path localisations
 These can go in the task logs, we don't need to include them in the 
 application report.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3755) Log the command of launching containers

2015-06-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-3755:
-
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-522

 Log the command of launching containers
 ---

 Key: YARN-3755
 URL: https://issues.apache.org/jira/browse/YARN-3755
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: YARN-3755-1.patch, YARN-3755-2.patch


 In the resource manager log, yarn would log the command for launching AM, 
 this is very useful. But there's no such log in the NN log for launching 
 containers. It would be difficult to diagnose when containers fails to launch 
 due to some issue in the commands. Although user can look at the commands in 
 the container launch script file, this is an internal things of yarn, usually 
 user don't know that. In user's perspective, they only know what commands 
 they specify when building yarn application. 
 {code}
 2015-06-01 16:06:42,245 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command 
 to launch container container_1433145984561_0001_01_01 : 
 $JAVA_HOME/bin/java -server -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN  -Xmx1024m  
 -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
 -Dlog4j.configuration=tez-container-log4j.properties 
 -Dyarn.app.container.log.dir=LOG_DIR -Dtez.root.logger=info,CLA 
 -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster 
 1LOG_DIR/stdout 2LOG_DIR/stderr
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3747) TestLocalDirsHandlerService should delete the created test directory logDir2

2015-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577270#comment-14577270
 ] 

Hudson commented on YARN-3747:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #211 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/211/])
YARN-3747. TestLocalDirsHandlerService should delete the created test (devaraj: 
rev 126321eded7dc38c1eef2cfde9365404c924a5cb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java
* hadoop-yarn-project/CHANGES.txt


 TestLocalDirsHandlerService should delete the created test directory logDir2
 

 Key: YARN-3747
 URL: https://issues.apache.org/jira/browse/YARN-3747
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: David Moore
Assignee: David Moore
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3747.patch


 During a code review of 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java
  I noted that logDir2 is never deleted while logDir1 is deleted twice. This 
 is not in keeping with the rest of the function and appears to be a bug. 
 I will be submitting a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >