[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever

2015-05-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544988#comment-14544988
 ] 

Rohith commented on YARN-3646:
--

Setting RetryPolicies.RETRY_FOREVER for exceptionToPolicyMap as default policy 
is not sufficient, but also {{RetryPolicies.RetryForever.shouldRetry()}} should 
check for Connect exceptions and handle it. Otherwise shouldRetry always return 
RetryAction.RETRY action.

 Applications are getting stuck some times in case of retry policy forever
 -

 Key: YARN-3646
 URL: https://issues.apache.org/jira/browse/YARN-3646
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Raju Bairishetti

 We have set  *yarn.resourcemanager.connect.wait-ms* to -1 to use  FOREVER 
 retry policy.
 Yarn client is infinitely retrying in case of exceptions from the RM as it is 
 using retrying policy as FOREVER. The problem is it is retrying for all kinds 
 of exceptions (like ApplicationNotFoundException), even though it is not a 
 connection failure. Due to this my application is not progressing further.
 *Yarn client should not retry infinitely in case of non connection failures.*
 We have written a simple yarn-client which is trying to get an application 
 report for an invalid  or older appId. ResourceManager is throwing an 
 ApplicationNotFoundException as this is an invalid or older appId.  But 
 because of retry policy FOREVER, client is keep on retrying for getting the 
 application report and ResourceManager is throwing 
 ApplicationNotFoundException continuously.
 {code}
 private void testYarnClientRetryPolicy() throws  Exception{
 YarnConfiguration conf = new YarnConfiguration();
 conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, 
 -1);
 YarnClient yarnClient = YarnClient.createYarnClient();
 yarnClient.init(conf);
 yarnClient.start();
 ApplicationId appId = ApplicationId.newInstance(1430126768987L, 
 10645);
 ApplicationReport report = yarnClient.getApplicationReport(appId);
 }
 {code}
 *RM logs:*
 {noformat}
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875162 Retry#0
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1430126768987_10645' doesn't exist in RM.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875163 Retry#0
 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3653) Expose the scheduler's KPI in web UI

2015-05-15 Thread Xianyin Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated YARN-3653:
--
Component/s: webapp

 Expose the scheduler's KPI in web UI
 

 Key: YARN-3653
 URL: https://issues.apache.org/jira/browse/YARN-3653
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: webapp
Reporter: Xianyin Xin

 As discussed in YARN-3630, exposing the scheduler's KPI in web UI is very 
 useful for administrator to track the scheduler's performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.

2015-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545101#comment-14545101
 ] 

Hadoop QA commented on YARN-3583:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m  8s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 55s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m  1s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   5m 36s | The patch appears to introduce 1 
new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | mapreduce tests | 108m 59s | Tests passed in 
hadoop-mapreduce-client-jobclient. |
| {color:green}+1{color} | yarn tests |   0m 27s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   7m  0s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  50m  7s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 212m 38s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
|  |  Inconsistent synchronization of 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS;
 locked 66% of time  Unsynchronized access at FileSystemRMStateStore.java:66% 
of time  Unsynchronized access at FileSystemRMStateStore.java:[line 156] |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733060/0002-YARN-3583.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / cbc01ed |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/7949/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7949/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7949/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7949/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7949/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7949/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7949/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7949/console |


This message was automatically generated.

 Support of NodeLabel object instead of plain String in YarnClient side.
 ---

 Key: YARN-3583
 URL: https://issues.apache.org/jira/browse/YARN-3583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.6.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch


 Similar to YARN-3521, use NodeLabel objects in YarnClient side apis.
 getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of 
 using plain label name.
 This will help to bring other label details such as Exclusivity to client 
 side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3651) Tracking url in ApplicaitonCLI wrong for running application

2015-05-15 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-3651:
--

 Summary: Tracking url in ApplicaitonCLI wrong for running 
application
 Key: YARN-3651
 URL: https://issues.apache.org/jira/browse/YARN-3651
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt


Application URL in Application CLI wrong

Steps to reproduce
==
1. Start HA setup
2.Submit application to cluster
3.Execute command ./yarn application -list
4.Observer tracking URL shown

{code}
15/05/15 13:34:38 INFO client.AHSProxy: Connecting to Application History 
server at /IP:45034
Total number of applications (application-types: [] and states: [SUBMITTED, 
ACCEPTED, RUNNING]):1
Application-Id --- Tracking-URL
application_1431672734347_0003   *http://host-10-19-92-117:13013*

{code}

*Expected*

http://IP:64323/proxy/application_1431672734347_0003 /





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications

2015-05-15 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545188#comment-14545188
 ] 

Xianyin Xin commented on YARN-3630:
---

Thanks, [~leftnoteasy] and [~kasha].
[~leftnoteasy], I think {{ScheulerMetrics}} is a good idea, but I think we 
should also consider some things:
# {{ScheulerMetrics}} tracks various indexes of a scheduler, but we also have 
{{QueueMetrics}}. Not exactly, the {{root.metrics}} gives us most of the 
information of the scheduler, then how to deal with the relation between the 
two;
# #events waiting for being handled is an important index for evaluating the 
scheduler's load, but it is not owned by scheduler, it is maintained by 
{{ResourceManager#SchedulerEventDispatcher}}, then who will maintain the 
{{ScheulerMetrics}}, the {{SchedulerEventDispatcher}} or the scheduler itself? 
From the literal meaning, {{ScheulerMetrics}} should be maintained by scheduler.

Anyway, considering the WebUI improvement you mentioned, a {{ScheulerMetrics}} 
is need. Created another jira YARN-3652 to discuss this.

Thanks [~kasha] for valuable suggestions on the policy of determining the 
heartbeat interval, in the following days I'll work for a draft.

 YARN should suggest a heartbeat interval for applications
 -

 Key: YARN-3630
 URL: https://issues.apache.org/jira/browse/YARN-3630
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 2.7.0
Reporter: Zoltán Zvara
Assignee: Xianyin Xin
Priority: Minor

 It seems currently applications - for example Spark - are not adaptive to RM 
 regarding heartbeat intervals. RM should be able to suggest a desired 
 heartbeat interval to applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml

2015-05-15 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545097#comment-14545097
 ] 

Akira AJISAKA commented on YARN-3069:
-

Thanks [~rchiang] for the update. Looked the patch from 
{{yarn.nodemanager.aux-services.mapreduce_shuffle.class}} to 
{{yarn.client.app-submission.poll-interval}}.

bq. {code}
!-- Minicluster Configuration --
{code}
I'm thinking it would be better for users to document that the configuration is 
only used for testing.

bq. yarn.minicluster.yarn.nodemanager.resource.memory-mb
Default value is 4096.

bq. yarn.node-labels.fs-store.retry-policy-spec
Retry policy used for FileSystem node label store. The policy is specified by N 
pairs of sleep-time in milliseconds and number-of-retries s1,n1,s2,n2, 
Default value is 2000, 500. (I'm thinking the default number of retries is too 
high.)

bq. {code}
description
  URI for NodeLabelManager
/description
{code}
Would you document that default is in local: 
{{/tmp/hadoop-yarn-$\{user\}/node-labels/}} in the description? It is described 
in {{FileSystemNodeLabelsStore#getDefaultFSNodeLabelsRootDir}}.

bq. yarn.node-labels.configuration-type
Set configuration type for node labels. Administrators can specify 
centralized or distributed.

bq. yarn.client.app-submission.poll-interval
Can we move this parameter to DeprecatedProperties.md?


 Document missing properties in yarn-default.xml
 ---

 Key: YARN-3069
 URL: https://issues.apache.org/jira/browse/YARN-3069
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: BB2015-05-TBR, supportability
 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
 YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, 
 YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch


 The following properties are currently not defined in yarn-default.xml.  
 These properties should either be
   A) documented in yarn-default.xml OR
   B)  listed as an exception (with comments, e.g. for internal use) in the 
 TestYarnConfigurationFields unit test
 Any comments for any of the properties below are welcome.
   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
   security.applicationhistory.protocol.acl
   yarn.app.container.log.backups
   yarn.app.container.log.dir
   yarn.app.container.log.filesize
   yarn.client.app-submission.poll-interval
   yarn.client.application-client-protocol.poll-timeout-ms
   yarn.is.minicluster
   yarn.log.server.url
   yarn.minicluster.control-resource-monitoring
   yarn.minicluster.fixed.ports
   yarn.minicluster.use-rpc
   yarn.node-labels.fs-store.retry-policy-spec
   yarn.node-labels.fs-store.root-dir
   yarn.node-labels.manager-class
   yarn.nodemanager.container-executor.os.sched.priority.adjustment
   yarn.nodemanager.container-monitor.process-tree.class
   yarn.nodemanager.disk-health-checker.enable
   yarn.nodemanager.docker-container-executor.image-name
   yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
   yarn.nodemanager.linux-container-executor.group
   yarn.nodemanager.log.deletion-threads-count
   yarn.nodemanager.user-home-dir
   yarn.nodemanager.webapp.https.address
   yarn.nodemanager.webapp.spnego-keytab-file
   yarn.nodemanager.webapp.spnego-principal
   yarn.nodemanager.windows-secure-container-executor.group
   yarn.resourcemanager.configuration.file-system-based-store
   yarn.resourcemanager.delegation-token-renewer.thread-count
   yarn.resourcemanager.delegation.key.update-interval
   yarn.resourcemanager.delegation.token.max-lifetime
   yarn.resourcemanager.delegation.token.renew-interval
   yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size
   yarn.resourcemanager.metrics.runtime.buckets
   yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.reservation-system.class
   yarn.resourcemanager.reservation-system.enable
   yarn.resourcemanager.reservation-system.plan.follower
   yarn.resourcemanager.reservation-system.planfollower.time-step
   yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
   yarn.resourcemanager.webapp.spnego-keytab-file
   yarn.resourcemanager.webapp.spnego-principal
   yarn.scheduler.include-port-in-node-name
   yarn.timeline-service.delegation.key.update-interval
   yarn.timeline-service.delegation.token.max-lifetime
   yarn.timeline-service.delegation.token.renew-interval
   yarn.timeline-service.generic-application-history.enabled
   
 yarn.timeline-service.generic-application-history.fs-history-store.compression-type
   yarn.timeline-service.generic-application-history.fs-history-store.uri
   

[jira] [Updated] (YARN-3651) Tracking url in ApplicationCLI wrong for running application

2015-05-15 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3651:
---
Summary: Tracking url in ApplicationCLI wrong for running application  
(was: Tracking url in ApplicaitonCLI wrong for running application)

 Tracking url in ApplicationCLI wrong for running application
 

 Key: YARN-3651
 URL: https://issues.apache.org/jira/browse/YARN-3651
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt

 Application URL in Application CLI wrong
 Steps to reproduce
 ==
 1. Start HA setup
 2.Submit application to cluster
 3.Execute command ./yarn application -list
 4.Observer tracking URL shown
 {code}
 15/05/15 13:34:38 INFO client.AHSProxy: Connecting to Application History 
 server at /IP:45034
 Total number of applications (application-types: [] and states: [SUBMITTED, 
 ACCEPTED, RUNNING]):1
 Application-Id --- Tracking-URL
 application_1431672734347_0003   *http://host-10-19-92-117:13013*
 {code}
 *Expected*
 http://IP:64323/proxy/application_1431672734347_0003 /



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3547) FairScheduler: Apps that have no resource demand should not participate scheduling

2015-05-15 Thread Xianyin Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated YARN-3547:
--
Attachment: YARN-3547.005.patch

A patch using {{getDemand() - getResourceUsage()}}.

 FairScheduler: Apps that have no resource demand should not participate 
 scheduling
 --

 Key: YARN-3547
 URL: https://issues.apache.org/jira/browse/YARN-3547
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Xianyin Xin
Assignee: Xianyin Xin
 Attachments: YARN-3547.001.patch, YARN-3547.002.patch, 
 YARN-3547.003.patch, YARN-3547.004.patch, YARN-3547.005.patch


 At present, all of the 'running' apps participate the scheduling process, 
 however, most of them may have no resource demand on a production cluster, as 
 the app's status is running other than waiting for resource at the most of 
 the app's lifetime. It's not a wise way we sort all the 'running' apps and 
 try to fulfill them, especially on a large-scale cluster which has heavy 
 scheduling load. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications

2015-05-15 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545218#comment-14545218
 ] 

Sunil G commented on YARN-3630:
---

bq.Are we considering automatically slowing down the NM heartbeats as well? 

 YARN should suggest a heartbeat interval for applications
 -

 Key: YARN-3630
 URL: https://issues.apache.org/jira/browse/YARN-3630
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 2.7.0
Reporter: Zoltán Zvara
Assignee: Xianyin Xin
Priority: Minor

 It seems currently applications - for example Spark - are not adaptive to RM 
 regarding heartbeat intervals. RM should be able to suggest a desired 
 heartbeat interval to applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.

2015-05-15 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545131#comment-14545131
 ] 

Sunil G commented on YARN-3583:
---

bq.Inconsistent synchronization of 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS;
 locked 66% of time
This doesn't look related to this patch.

 Support of NodeLabel object instead of plain String in YarnClient side.
 ---

 Key: YARN-3583
 URL: https://issues.apache.org/jira/browse/YARN-3583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.6.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch


 Similar to YARN-3521, use NodeLabel objects in YarnClient side apis.
 getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of 
 using plain label name.
 This will help to bring other label details such as Exclusivity to client 
 side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3547) FairScheduler: Apps that have no resource demand should not participate scheduling

2015-05-15 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544995#comment-14544995
 ] 

Xianyin Xin commented on YARN-3547:
---

[~kasha], can you please have a look?

 FairScheduler: Apps that have no resource demand should not participate 
 scheduling
 --

 Key: YARN-3547
 URL: https://issues.apache.org/jira/browse/YARN-3547
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Xianyin Xin
Assignee: Xianyin Xin
 Attachments: YARN-3547.001.patch, YARN-3547.002.patch, 
 YARN-3547.003.patch, YARN-3547.004.patch, YARN-3547.005.patch


 At present, all of the 'running' apps participate the scheduling process, 
 however, most of them may have no resource demand on a production cluster, as 
 the app's status is running other than waiting for resource at the most of 
 the app's lifetime. It's not a wise way we sort all the 'running' apps and 
 try to fulfill them, especially on a large-scale cluster which has heavy 
 scheduling load. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance

2015-05-15 Thread Xianyin Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated YARN-3652:
--
Summary: A SchedulerMetrics may be need for evaluating the scheduler's 
performance  (was: A {{SchedulerMetrics}} may be need for evaluating the 
scheduler's performance)

 A SchedulerMetrics may be need for evaluating the scheduler's performance
 -

 Key: YARN-3652
 URL: https://issues.apache.org/jira/browse/YARN-3652
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Xianyin Xin

 As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating 
 the scheduler's performance. The performance indexes includes #events waiting 
 for being handled by scheduler, the throughput, the scheduling delay and/or 
 other indicators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3652) A {{SchedulerMetrics}} may be need for evaluating the scheduler's performance

2015-05-15 Thread Xianyin Xin (JIRA)
Xianyin Xin created YARN-3652:
-

 Summary: A {{SchedulerMetrics}} may be need for evaluating the 
scheduler's performance
 Key: YARN-3652
 URL: https://issues.apache.org/jira/browse/YARN-3652
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Xianyin Xin


As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating 
the scheduler's performance. The performance indexes includes #events waiting 
for being handled by scheduler, the throughput, the scheduling delay and/or 
other indicators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications

2015-05-15 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545219#comment-14545219
 ] 

Sunil G commented on YARN-3630:
---

bq.Are we considering automatically slowing down the NM heartbeats as well? 

 YARN should suggest a heartbeat interval for applications
 -

 Key: YARN-3630
 URL: https://issues.apache.org/jira/browse/YARN-3630
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 2.7.0
Reporter: Zoltán Zvara
Assignee: Xianyin Xin
Priority: Minor

 It seems currently applications - for example Spark - are not adaptive to RM 
 regarding heartbeat intervals. RM should be able to suggest a desired 
 heartbeat interval to applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3653) Expose the scheduler's KPI in web UI

2015-05-15 Thread Xianyin Xin (JIRA)
Xianyin Xin created YARN-3653:
-

 Summary: Expose the scheduler's KPI in web UI
 Key: YARN-3653
 URL: https://issues.apache.org/jira/browse/YARN-3653
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xianyin Xin


As discussed in YARN-3630, exposing the scheduler's KPI in web UI is very 
useful for administrator to track the scheduler's performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3651) Tracking url in ApplicationCLI wrong for running application

2015-05-15 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3651:
---
Description: 
Application URL in Application CLI wrong

Steps to reproduce
==
1. Start HA setup insecure mode
2.Configure HTTPS_ONLY
3.Submit application to cluster
4.Execute command ./yarn application -list
5.Observer tracking URL shown

{code}
15/05/15 13:34:38 INFO client.AHSProxy: Connecting to Application History 
server at /IP:45034
Total number of applications (application-types: [] and states: [SUBMITTED, 
ACCEPTED, RUNNING]):1
Application-Id --- Tracking-URL
application_1431672734347_0003   *http://host-10-19-92-117:13013*
{code}



*Expected*

https://IP:64323/proxy/application_1431672734347_0003 /



  was:
Application URL in Application CLI wrong

Steps to reproduce
==
1. Start HA setup
2.Submit application to cluster
3.Execute command ./yarn application -list
4.Observer tracking URL shown

{code}
15/05/15 13:34:38 INFO client.AHSProxy: Connecting to Application History 
server at /IP:45034
Total number of applications (application-types: [] and states: [SUBMITTED, 
ACCEPTED, RUNNING]):1
Application-Id --- Tracking-URL
application_1431672734347_0003   *http://host-10-19-92-117:13013*

{code}

*Expected*

http://IP:64323/proxy/application_1431672734347_0003 /



   Priority: Minor  (was: Major)

 Tracking url in ApplicationCLI wrong for running application
 

 Key: YARN-3651
 URL: https://issues.apache.org/jira/browse/YARN-3651
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Priority: Minor

 Application URL in Application CLI wrong
 Steps to reproduce
 ==
 1. Start HA setup insecure mode
 2.Configure HTTPS_ONLY
 3.Submit application to cluster
 4.Execute command ./yarn application -list
 5.Observer tracking URL shown
 {code}
 15/05/15 13:34:38 INFO client.AHSProxy: Connecting to Application History 
 server at /IP:45034
 Total number of applications (application-types: [] and states: [SUBMITTED, 
 ACCEPTED, RUNNING]):1
 Application-Id --- Tracking-URL
 application_1431672734347_0003   *http://host-10-19-92-117:13013*
 {code}
 *Expected*
 https://IP:64323/proxy/application_1431672734347_0003 /



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications

2015-05-15 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545225#comment-14545225
 ] 

Sunil G commented on YARN-3630:
---

bq.Are we considering automatically slowing down the NM heartbeats as well? 
+1. It will be wise to slowdown the heartbeats from NM which shares a less 
load. However there should be a limit or range to which it can be slowed down 
even in lighter load. Else i feel more starvation can happen for applications. 

 YARN should suggest a heartbeat interval for applications
 -

 Key: YARN-3630
 URL: https://issues.apache.org/jira/browse/YARN-3630
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 2.7.0
Reporter: Zoltán Zvara
Assignee: Xianyin Xin
Priority: Minor

 It seems currently applications - for example Spark - are not adaptive to RM 
 regarding heartbeat intervals. RM should be able to suggest a desired 
 heartbeat interval to applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever

2015-05-15 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545264#comment-14545264
 ] 

Devaraj K commented on YARN-3646:
-

You can probably avoid this situation by setting a bigger value for 
yarn.resourcemanager.connect.max-wait.ms(like below) if you want to wait for 
long time to establish a connection to RM with retries.

{code:xml}
conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, 
Integer.MAX_VALUE);
{code}

Anyway it seems this issue needs to be fixed.

 Applications are getting stuck some times in case of retry policy forever
 -

 Key: YARN-3646
 URL: https://issues.apache.org/jira/browse/YARN-3646
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Raju Bairishetti

 We have set  *yarn.resourcemanager.connect.wait-ms* to -1 to use  FOREVER 
 retry policy.
 Yarn client is infinitely retrying in case of exceptions from the RM as it is 
 using retrying policy as FOREVER. The problem is it is retrying for all kinds 
 of exceptions (like ApplicationNotFoundException), even though it is not a 
 connection failure. Due to this my application is not progressing further.
 *Yarn client should not retry infinitely in case of non connection failures.*
 We have written a simple yarn-client which is trying to get an application 
 report for an invalid  or older appId. ResourceManager is throwing an 
 ApplicationNotFoundException as this is an invalid or older appId.  But 
 because of retry policy FOREVER, client is keep on retrying for getting the 
 application report and ResourceManager is throwing 
 ApplicationNotFoundException continuously.
 {code}
 private void testYarnClientRetryPolicy() throws  Exception{
 YarnConfiguration conf = new YarnConfiguration();
 conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, 
 -1);
 YarnClient yarnClient = YarnClient.createYarnClient();
 yarnClient.init(conf);
 yarnClient.start();
 ApplicationId appId = ApplicationId.newInstance(1430126768987L, 
 10645);
 ApplicationReport report = yarnClient.getApplicationReport(appId);
 }
 {code}
 *RM logs:*
 {noformat}
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875162 Retry#0
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1430126768987_10645' doesn't exist in RM.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875163 Retry#0
 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3651) Tracking url in ApplicationCLI wrong for running application

2015-05-15 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545353#comment-14545353
 ] 

Bibin A Chundatt commented on YARN-3651:


Also when configured as HTTPS_ONLY why http port is opened .
HI All any comments  on the same??




 Tracking url in ApplicationCLI wrong for running application
 

 Key: YARN-3651
 URL: https://issues.apache.org/jira/browse/YARN-3651
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Priority: Minor

 Application URL in Application CLI wrong
 Steps to reproduce
 ==
 1. Start HA setup insecure mode
 2.Configure HTTPS_ONLY
 3.Submit application to cluster
 4.Execute command ./yarn application -list
 5.Observer tracking URL shown
 {code}
 15/05/15 13:34:38 INFO client.AHSProxy: Connecting to Application History 
 server at /IP:45034
 Total number of applications (application-types: [] and states: [SUBMITTED, 
 ACCEPTED, RUNNING]):1
 Application-Id --- Tracking-URL
 application_1431672734347_0003   *http://host-10-19-92-117:13013*
 {code}
 *Expected*
 https://IP:64323/proxy/application_1431672734347_0003 /



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance

2015-05-15 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545260#comment-14545260
 ] 

Sunil G commented on YARN-3652:
---

Hi [~xinxianyin]
This will be a very helpful feature and thanks for working on same.

Few points:
1. *Throughput*: Are you mentioning about #events processed over a period of 
time? If so, how can we set the timeline by which throughput is calculated 
(configurable?)?
A clear indicator from this will be like we can predict possible end timeline 
for the pending events in dispatcher queue. Adding throughput with #no of 
pending events may give much more better indication about RM overload. 

2. However there are many events coming to scheduler, if possible a filter for 
the events based on events type may be helpful to give an accuracy for 
throughout and scheduling delay.

 A SchedulerMetrics may be need for evaluating the scheduler's performance
 -

 Key: YARN-3652
 URL: https://issues.apache.org/jira/browse/YARN-3652
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Xianyin Xin

 As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating 
 the scheduler's performance. The performance indexes includes #events waiting 
 for being handled by scheduler, the throughput, the scheduling delay and/or 
 other indicators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1519) check if sysconf is implemented before using it

2015-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545392#comment-14545392
 ] 

Hudson commented on YARN-1519:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #928 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/928/])
YARN-1519. Check in container-executor if sysconf is implemented before using 
it (Radim Kolar and Eric Payne via raviprak) (raviprak: rev 
53fe4eff09fdaeed75a8cad3a26156bf963a8d37)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


 check if sysconf is implemented before using it
 ---

 Key: YARN-1519
 URL: https://issues.apache.org/jira/browse/YARN-1519
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.3.0
Reporter: Radim Kolar
Assignee: Radim Kolar
  Labels: BB2015-05-TBR
 Fix For: 2.8.0

 Attachments: YARN-1519.002.patch, YARN-1519.003.patch, 
 nodemgr-sysconf.txt


 If sysconf value _SC_GETPW_R_SIZE_MAX is not implemented, it leads to 
 segfault because invalid pointer gets passed to libc function.
 fix: enforce minimum value 1024, same method is used in hadoop-common native 
 code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545410#comment-14545410
 ] 

Hudson commented on YARN-3505:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #197 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/197/])
YARN-3505. Node's Log Aggregation Report with SUCCEED should not cached in 
RMApps. Contributed by Xuan Gong. (junping_du: rev 
15ccd967ee3e7046a50522089f67ba01f36ec76a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppLogAggregationStatusBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/LogAggregationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/LogAggregationReportPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
 YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch, 
 YARN-3505.5.patch, YARN-3505.6.patch, YARN-3505.addendum.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1519) check if sysconf is implemented before using it

2015-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545415#comment-14545415
 ] 

Hudson commented on YARN-1519:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #197 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/197/])
YARN-1519. Check in container-executor if sysconf is implemented before using 
it (Radim Kolar and Eric Payne via raviprak) (raviprak: rev 
53fe4eff09fdaeed75a8cad3a26156bf963a8d37)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


 check if sysconf is implemented before using it
 ---

 Key: YARN-1519
 URL: https://issues.apache.org/jira/browse/YARN-1519
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.3.0
Reporter: Radim Kolar
Assignee: Radim Kolar
  Labels: BB2015-05-TBR
 Fix For: 2.8.0

 Attachments: YARN-1519.002.patch, YARN-1519.003.patch, 
 nodemgr-sysconf.txt


 If sysconf value _SC_GETPW_R_SIZE_MAX is not implemented, it leads to 
 segfault because invalid pointer gets passed to libc function.
 fix: enforce minimum value 1024, same method is used in hadoop-common native 
 code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure

2015-05-15 Thread Lavkesh Lahngir (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545250#comment-14545250
 ] 

Lavkesh Lahngir commented on YARN-3591:
---

What about zombie files lying in the various paths..In the case of disk 
becoming good, they will be there forever. Do we not care? 
Also I was thinking to remove resources which have public and user level 
visibility, because app level resources will be deleted automatically. Thoughts?

 Resource Localisation on a bad disk causes subsequent containers failure 
 -

 Key: YARN-3591
 URL: https://issues.apache.org/jira/browse/YARN-3591
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
 Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, 
 YARN-3591.2.patch


 It happens when a resource is localised on the disk, after localising that 
 disk has gone bad. NM keeps paths for localised resources in memory.  At the 
 time of resource request isResourcePresent(rsrc) will be called which calls 
 file.exists() on the localised path.
 In some cases when disk has gone bad, inodes are stilled cached and 
 file.exists() returns true. But at the time of reading, file will not open.
 Note: file.exists() actually calls stat64 natively which returns true because 
 it was able to find inode information from the OS.
 A proposal is to call file.list() on the parent path of the resource, which 
 will call open() natively. If the disk is good it should return an array of 
 paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545387#comment-14545387
 ] 

Hudson commented on YARN-3505:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #928 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/928/])
YARN-3505. Node's Log Aggregation Report with SUCCEED should not cached in 
RMApps. Contributed by Xuan Gong. (junping_du: rev 
15ccd967ee3e7046a50522089f67ba01f36ec76a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/LogAggregationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppLogAggregationStatusBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/LogAggregationReportPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto


 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
 YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch, 
 YARN-3505.5.patch, YARN-3505.6.patch, YARN-3505.addendum.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3605) _ as method name may not be supported much longer

2015-05-15 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated YARN-3605:
--
Labels: newbie  (was: )

 _ as method name may not be supported much longer
 -

 Key: YARN-3605
 URL: https://issues.apache.org/jira/browse/YARN-3605
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Robert Joseph Evans
  Labels: newbie

 I was trying to run the precommit test on my mac under JDK8, and I got the 
 following error related to javadocs.
  
  (use of '_' as an identifier might not be supported in releases after Java 
 SE 8)
 It looks like we need to at least change the method name to not be '_' any 
 more, or possibly replace the HTML generation with something more standard. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545471#comment-14545471
 ] 

Hudson commented on YARN-3505:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7841 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7841/])
YARN-3505 addendum: fix an issue in previous patch. (junping_du: rev 
03a293aed6de101b0cae1a294f506903addcaa75)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
 YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch, 
 YARN-3505.5.patch, YARN-3505.6.patch, YARN-3505.addendum.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2015-05-15 Thread john lilley (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545486#comment-14545486
 ] 

john lilley commented on YARN-624:
--

I would like to +1 this feature, and illustrate our use cases.  Currently there 
are two:
-- Finding strongly-connected subgraphs.  This is a central step in 
data-quality/matching applications, because after record-matching is performed 
in a distributed fashion, the match pairs (edges) must be turned into match 
groups (subgraphs).  It is very inefficient to process this using a traditional 
independent-task YARN model.
-- Machine-learning model training.  There are many models that lend themselves 
to distributed processing, and even those that don't can benefit from parallel 
genetic algorithm that competes multiple models and topologies in parallel.

In both these cases we are considering a custom AM that runs like:
-- Asks for M containers
-- Accepts as few as N containers, but only after not getting M for some period 
of time (heuristics TBD).
-- Possibly, after getting non-zero but  N containers for some time, release 
them all, sleep a while, and try again (deadlock avoidance).

This algorithm would be much better run by the RM, because it can:
-- Immediately fail the AM if N containers are impossible.
-- Avoid idle incomplete sets of containers while waiting for a sufficient gang.
-- Avoid deadlock.

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever

2015-05-15 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545548#comment-14545548
 ] 

Srikanth Sundarrajan commented on YARN-3646:


{quote}
You can probably avoid this situation by setting a bigger value
{quote}

Would this not cause the client to wait for too long (well after the rm has 
come back online)

 Applications are getting stuck some times in case of retry policy forever
 -

 Key: YARN-3646
 URL: https://issues.apache.org/jira/browse/YARN-3646
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Raju Bairishetti

 We have set  *yarn.resourcemanager.connect.wait-ms* to -1 to use  FOREVER 
 retry policy.
 Yarn client is infinitely retrying in case of exceptions from the RM as it is 
 using retrying policy as FOREVER. The problem is it is retrying for all kinds 
 of exceptions (like ApplicationNotFoundException), even though it is not a 
 connection failure. Due to this my application is not progressing further.
 *Yarn client should not retry infinitely in case of non connection failures.*
 We have written a simple yarn-client which is trying to get an application 
 report for an invalid  or older appId. ResourceManager is throwing an 
 ApplicationNotFoundException as this is an invalid or older appId.  But 
 because of retry policy FOREVER, client is keep on retrying for getting the 
 application report and ResourceManager is throwing 
 ApplicationNotFoundException continuously.
 {code}
 private void testYarnClientRetryPolicy() throws  Exception{
 YarnConfiguration conf = new YarnConfiguration();
 conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, 
 -1);
 YarnClient yarnClient = YarnClient.createYarnClient();
 yarnClient.init(conf);
 yarnClient.start();
 ApplicationId appId = ApplicationId.newInstance(1430126768987L, 
 10645);
 ApplicationReport report = yarnClient.getApplicationReport(appId);
 }
 {code}
 *RM logs:*
 {noformat}
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875162 Retry#0
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1430126768987_10645' doesn't exist in RM.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875163 Retry#0
 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545710#comment-14545710
 ] 

Hudson commented on YARN-3505:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #196 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/196/])
YARN-3505. Node's Log Aggregation Report with SUCCEED should not cached in 
RMApps. Contributed by Xuan Gong. (junping_du: rev 
15ccd967ee3e7046a50522089f67ba01f36ec76a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppLogAggregationStatusBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/LogAggregationReportPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/LogAggregationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
YARN-3505 addendum: fix an issue in previous patch. (junping_du: rev 
03a293aed6de101b0cae1a294f506903addcaa75)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
 YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch, 
 YARN-3505.5.patch, YARN-3505.6.patch, YARN-3505.addendum.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1519) check if sysconf is implemented before using it

2015-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545715#comment-14545715
 ] 

Hudson commented on YARN-1519:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #196 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/196/])
YARN-1519. Check in container-executor if sysconf is implemented before using 
it (Radim Kolar and Eric Payne via raviprak) (raviprak: rev 
53fe4eff09fdaeed75a8cad3a26156bf963a8d37)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


 check if sysconf is implemented before using it
 ---

 Key: YARN-1519
 URL: https://issues.apache.org/jira/browse/YARN-1519
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.3.0
Reporter: Radim Kolar
Assignee: Radim Kolar
  Labels: BB2015-05-TBR
 Fix For: 2.8.0

 Attachments: YARN-1519.002.patch, YARN-1519.003.patch, 
 nodemgr-sysconf.txt


 If sysconf value _SC_GETPW_R_SIZE_MAX is not implemented, it leads to 
 segfault because invalid pointer gets passed to libc function.
 fix: enforce minimum value 1024, same method is used in hadoop-common native 
 code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever

2015-05-15 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545702#comment-14545702
 ] 

Devaraj K commented on YARN-3646:
-

bq. Would this not cause the client to wait for too long (well after the rm has 
come back online)
yarn.resourcemanager.connect.max-wait.ms is the max time to wait to establish 
a connection to RM, If the RM comes online before this time it will connect 
immediately. IPC client would be internally retrying to connect RM for every 
yarn.resourcemanager.connect.retry-interval.ms (default value 30 * 1000) and 
exception will be thrown if it can't connect for 
yarn.resourcemanager.connect.max-wait.ms.

 Applications are getting stuck some times in case of retry policy forever
 -

 Key: YARN-3646
 URL: https://issues.apache.org/jira/browse/YARN-3646
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Raju Bairishetti

 We have set  *yarn.resourcemanager.connect.wait-ms* to -1 to use  FOREVER 
 retry policy.
 Yarn client is infinitely retrying in case of exceptions from the RM as it is 
 using retrying policy as FOREVER. The problem is it is retrying for all kinds 
 of exceptions (like ApplicationNotFoundException), even though it is not a 
 connection failure. Due to this my application is not progressing further.
 *Yarn client should not retry infinitely in case of non connection failures.*
 We have written a simple yarn-client which is trying to get an application 
 report for an invalid  or older appId. ResourceManager is throwing an 
 ApplicationNotFoundException as this is an invalid or older appId.  But 
 because of retry policy FOREVER, client is keep on retrying for getting the 
 application report and ResourceManager is throwing 
 ApplicationNotFoundException continuously.
 {code}
 private void testYarnClientRetryPolicy() throws  Exception{
 YarnConfiguration conf = new YarnConfiguration();
 conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, 
 -1);
 YarnClient yarnClient = YarnClient.createYarnClient();
 yarnClient.init(conf);
 yarnClient.start();
 ApplicationId appId = ApplicationId.newInstance(1430126768987L, 
 10645);
 ApplicationReport report = yarnClient.getApplicationReport(appId);
 }
 {code}
 *RM logs:*
 {noformat}
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875162 Retry#0
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1430126768987_10645' doesn't exist in RM.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875163 Retry#0
 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1519) check if sysconf is implemented before using it

2015-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545732#comment-14545732
 ] 

Hudson commented on YARN-1519:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2144 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2144/])
YARN-1519. Check in container-executor if sysconf is implemented before using 
it (Radim Kolar and Eric Payne via raviprak) (raviprak: rev 
53fe4eff09fdaeed75a8cad3a26156bf963a8d37)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* hadoop-yarn-project/CHANGES.txt


 check if sysconf is implemented before using it
 ---

 Key: YARN-1519
 URL: https://issues.apache.org/jira/browse/YARN-1519
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.3.0
Reporter: Radim Kolar
Assignee: Radim Kolar
  Labels: BB2015-05-TBR
 Fix For: 2.8.0

 Attachments: YARN-1519.002.patch, YARN-1519.003.patch, 
 nodemgr-sysconf.txt


 If sysconf value _SC_GETPW_R_SIZE_MAX is not implemented, it leads to 
 segfault because invalid pointer gets passed to libc function.
 fix: enforce minimum value 1024, same method is used in hadoop-common native 
 code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545727#comment-14545727
 ] 

Hudson commented on YARN-3505:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2144 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2144/])
YARN-3505. Node's Log Aggregation Report with SUCCEED should not cached in 
RMApps. Contributed by Xuan Gong. (junping_du: rev 
15ccd967ee3e7046a50522089f67ba01f36ec76a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/LogAggregationReportPBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppLogAggregationStatusBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/LogAggregationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
YARN-3505 addendum: fix an issue in previous patch. (junping_du: rev 
03a293aed6de101b0cae1a294f506903addcaa75)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
 YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch, 
 YARN-3505.5.patch, YARN-3505.6.patch, YARN-3505.addendum.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2421) RM still allocates containers to an app in the FINISHING state

2015-05-15 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2421:
-
Summary: RM still allocates containers to an app in the FINISHING state  
(was: CapacityScheduler still allocates containers to an app in the FINISHING 
state)

 RM still allocates containers to an app in the FINISHING state
 --

 Key: YARN-2421
 URL: https://issues.apache.org/jira/browse/YARN-2421
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Thomas Graves
Assignee: Chang Li
 Attachments: YARN-2421.4.patch, YARN-2421.5.patch, YARN-2421.6.patch, 
 YARN-2421.7.patch, YARN-2421.8.patch, YARN-2421.9.patch, yarn2421.patch, 
 yarn2421.patch, yarn2421.patch


 I saw an instance of a bad application master where it unregistered with the 
 RM but then continued to call into allocate.  The RMAppAttempt went to the 
 FINISHING state, but the capacity scheduler kept allocating it containers.   
 We should probably have the capacity scheduler check that the application 
 isn't in one of the terminal states before giving it containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3609) Move load labels from storage from serviceInit to serviceStart to make it works with RM HA case.

2015-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546262#comment-14546262
 ] 

Hadoop QA commented on YARN-3609:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 58s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 45s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 38s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 3  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   2m 44s | The patch appears to introduce 1 
new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  50m 25s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  91m 45s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
|  |  Inconsistent synchronization of 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS;
 locked 66% of time  Unsynchronized access at FileSystemRMStateStore.java:66% 
of time  Unsynchronized access at FileSystemRMStateStore.java:[line 156] |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733214/YARN-3609.3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 03a293a |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7951/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/7951/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7951/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7951/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7951/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7951/console |


This message was automatically generated.

 Move load labels from storage from serviceInit to serviceStart to make it 
 works with RM HA case.
 

 Key: YARN-3609
 URL: https://issues.apache.org/jira/browse/YARN-3609
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3609.1.preliminary.patch, YARN-3609.2.patch, 
 YARN-3609.3.patch


 Now RMNodeLabelsManager loads label when serviceInit, but 
 RMActiveService.start() is called when RM HA transition happens.
 We haven't done this before because queue's initialization happens in 
 serviceInit as well, we need make sure labels added to system before init 
 queue, after YARN-2918, we should be able to do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-15 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546374#comment-14546374
 ] 

Craig Welch commented on YARN-3626:
---

Right, going back to [~cnauroth], [~vinodkv], we chatted and you asserted that 
the original approach can't work, but it seemed to, it's not entirely clear to 
me why it shouldn't...

 On Windows localized resources are not moved to the front of the classpath 
 when they should be
 --

 Key: YARN-3626
 URL: https://issues.apache.org/jira/browse/YARN-3626
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.7.1

 Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch, 
 YARN-3626.9.patch


 In response to the mapreduce.job.user.classpath.first setting the classpath 
 is ordered differently so that localized resources will appear before system 
 classpath resources when tasks execute.  On Windows this does not work 
 because the localized resources are not linked into their final location when 
 the classpath jar is created.  To compensate for that localized jar resources 
 are added directly to the classpath generated for the jar rather than being 
 discovered from the localized directories.  Unfortunately, they are always 
 appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3659) Federation Router (hiding multiple RMs for ApplicationClientProtocol)

2015-05-15 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-3659:
-
Description: 
This JIRA tracks the design/implementation of the layer for routing 
ApplicaitonClientProtocol requests to the appropriate
RM(s) in a federated YARN cluster.


  was:
This JIRA tracks the design/implementation of the layer for routing 
ApplicationClientProtocol requests to the appropriate
RM(s) in a federated YARN cluster.



 Federation Router (hiding multiple RMs for ApplicationClientProtocol)
 -

 Key: YARN-3659
 URL: https://issues.apache.org/jira/browse/YARN-3659
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Giovanni Matteo Fumarola

 This JIRA tracks the design/implementation of the layer for routing 
 ApplicaitonClientProtocol requests to the appropriate
 RM(s) in a federated YARN cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3661) Federation UI

2015-05-15 Thread Giovanni Matteo Fumarola (JIRA)
Giovanni Matteo Fumarola created YARN-3661:
--

 Summary: Federation UI 
 Key: YARN-3661
 URL: https://issues.apache.org/jira/browse/YARN-3661
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Giovanni Matteo Fumarola


The UIs provided by each RM, provide a correct local view of what is running 
in a sub-cluster. In the context of federation we need new 
UIs that can track load, jobs, users across sub-clusters.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3561) Non-AM Containers continue to run even after AM is stopped

2015-05-15 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546454#comment-14546454
 ] 

Gour Saha commented on YARN-3561:
-

[~jianhe] it is consistently reproducible on debian 7. Can you provide a quick 
instruction on how to enable debug level in NM logs?

[~chackra] If possible, can you set debug level on for NM logs, re-run the test 
and provide the logs again?

 Non-AM Containers continue to run even after AM is stopped
 --

 Key: YARN-3561
 URL: https://issues.apache.org/jira/browse/YARN-3561
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, yarn
Affects Versions: 2.6.0
 Environment: debian 7
Reporter: Gour Saha
Priority: Critical
 Attachments: app0001.zip


 Non-AM containers continue to run even after application is stopped. This 
 occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a 
 Hadoop 2.6 deployment. 
 Following are the NM logs from 2 different nodes:
 *host-07* - where Slider AM was running
 *host-03* - where Storm NIMBUS container was running.
 *Note:* The logs are partial, starting with the time when the relevant Slider 
 AM and NIMBUS containers were allocated, till the time when the Slider AM was 
 stopped. Also, the large number of Memory usage log lines were removed 
 keeping only a few starts and ends of every segment.
 *NM log from host-07 where Slider AM container was running:*
 {noformat}
 2015-04-29 00:39:24,614 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for 
 container_1428575950531_0020_02_01
 2015-04-29 00:41:10,310 INFO  ipc.Server (Server.java:saslProcess(1306)) - 
 Auth successful for appattempt_1428575950531_0021_01 (auth:SIMPLE)
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(803)) - Start request for 
 container_1428575950531_0021_01_01 by user yarn
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new 
 application reference for app application_1428575950531_0021
 2015-04-29 00:41:10,323 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from NEW to INITING
 2015-04-29 00:41:10,325 INFO  nodemanager.NMAuditLogger 
 (NMAuditLogger.java:logSuccess(89)) - USER=yarn   IP=10.84.105.162
 OPERATION=Start Container Request   TARGET=ContainerManageImpl  
 RESULT=SUCCESS  APPID=application_1428575950531_0021
 CONTAINERID=container_1428575950531_0021_01_01
 2015-04-29 00:41:10,328 WARN  logaggregation.LogAggregationService 
 (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root 
 Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: 
 [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple 
 users.
 2015-04-29 00:41:10,328 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:init(182)) - rollingMonitorInterval is set as 
 -1. The log rolling mornitoring interval is disabled. The logs will be 
 aggregated after this application is finished.
 2015-04-29 00:41:10,351 INFO  application.Application 
 (ApplicationImpl.java:transition(304)) - Adding 
 container_1428575950531_0021_01_01 to application 
 application_1428575950531_0021
 2015-04-29 00:41:10,352 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from INITING to RUNNING
 2015-04-29 00:41:10,356 INFO  container.Container 
 (ContainerImpl.java:handle(999)) - Container 
 container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING
 2015-04-29 00:41:10,357 INFO  containermanager.AuxServices 
 (AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId 
 application_1428575950531_0021
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/jettison-1.1.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/api-util-1.0.0-M20.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  

[jira] [Assigned] (YARN-3659) Federation Router (hiding multiple RMs for ApplicationClientProtocol)

2015-05-15 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola reassigned YARN-3659:
--

Assignee: Giovanni Matteo Fumarola

 Federation Router (hiding multiple RMs for ApplicationClientProtocol)
 -

 Key: YARN-3659
 URL: https://issues.apache.org/jira/browse/YARN-3659
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Giovanni Matteo Fumarola
Assignee: Giovanni Matteo Fumarola

 This JIRA tracks the design/implementation of the layer for routing 
 ApplicaitonClientProtocol requests to the appropriate
 RM(s) in a federated YARN cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3654) ContainerLogsPage web UI should not have meta-refresh

2015-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546217#comment-14546217
 ] 

Hadoop QA commented on YARN-3654:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 14s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 56s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 40s | The applied patch generated  2 
new checkstyle issues (total was 12, now 13). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m  7s | The patch appears to introduce 1 
new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   6m  4s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  43m 25s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-nodemanager |
|  |  Class org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebAppFilter 
defines non-transient non-serializable instance field nmConf  In 
NMWebAppFilter.java:instance field nmConf  In NMWebAppFilter.java |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733248/YARN-3654.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 03a293a |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7953/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/7953/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7953/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7953/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7953/console |


This message was automatically generated.

 ContainerLogsPage web UI should not have meta-refresh
 -

 Key: YARN-3654
 URL: https://issues.apache.org/jira/browse/YARN-3654
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.1
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3654.1.patch


 Currently, When we try to find the container logs for the finished 
 application, it will re-direct to the url which we re-configured for 
 yarn.log.server.url in yarn-site.xml. But in ContainerLogsPage, we are using 
 meta-refresh:
 {code}
 set(TITLE, join(Redirecting to log server for , $(CONTAINER_ID)));
 html.meta_http(refresh, 1; url= + redirectUrl);
 {code}
 which is not good for some browsers which need to enable the meta-refresh in 
 their security setting, especially for IE which meta-refresh is considered a 
 security hole.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2421) CapacityScheduler still allocates containers to an app in the FINISHING state

2015-05-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546254#comment-14546254
 ] 

Jason Lowe commented on YARN-2421:
--

+1 latest patch lgtm.  Committing this.

 CapacityScheduler still allocates containers to an app in the FINISHING state
 -

 Key: YARN-2421
 URL: https://issues.apache.org/jira/browse/YARN-2421
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Thomas Graves
Assignee: Chang Li
 Attachments: YARN-2421.4.patch, YARN-2421.5.patch, YARN-2421.6.patch, 
 YARN-2421.7.patch, YARN-2421.8.patch, YARN-2421.9.patch, yarn2421.patch, 
 yarn2421.patch, yarn2421.patch


 I saw an instance of a bad application master where it unregistered with the 
 RM but then continued to call into allocate.  The RMAppAttempt went to the 
 FINISHING state, but the capacity scheduler kept allocating it containers.   
 We should probably have the capacity scheduler check that the application 
 isn't in one of the terminal states before giving it containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes

2015-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546257#comment-14546257
 ] 

Hadoop QA commented on YARN-3632:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | javac |   7m 32s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |   9m 32s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 45s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  2s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 18s | The patch appears to introduce 1 
new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |  50m  1s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  86m 18s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
|  |  Inconsistent synchronization of 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS;
 locked 66% of time  Unsynchronized access at FileSystemRMStateStore.java:66% 
of time  Unsynchronized access at FileSystemRMStateStore.java:[line 156] |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733235/YARN-3632.4.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 03a293a |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/7952/artifact/patchprocess/diffJavacWarnings.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7952/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/7952/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7952/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7952/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7952/console |


This message was automatically generated.

 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2421) RM still allocates containers to an app in the FINISHING state

2015-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546268#comment-14546268
 ] 

Hudson commented on YARN-2421:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7842 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7842/])
YARN-2421. RM still allocates containers to an app in the FINISHING state. 
Contributed by Chang Li (jlowe: rev f7e051c4310024d4040ad466c34432c72e88b0fc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java


 RM still allocates containers to an app in the FINISHING state
 --

 Key: YARN-2421
 URL: https://issues.apache.org/jira/browse/YARN-2421
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Thomas Graves
Assignee: Chang Li
 Fix For: 2.8.0

 Attachments: YARN-2421.4.patch, YARN-2421.5.patch, YARN-2421.6.patch, 
 YARN-2421.7.patch, YARN-2421.8.patch, YARN-2421.9.patch, yarn2421.patch, 
 yarn2421.patch, yarn2421.patch


 I saw an instance of a bad application master where it unregistered with the 
 RM but then continued to call into allocate.  The RMAppAttempt went to the 
 FINISHING state, but the capacity scheduler kept allocating it containers.   
 We should probably have the capacity scheduler check that the application 
 isn't in one of the terminal states before giving it containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes

2015-05-15 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3632:
--
Attachment: YARN-3632.5.patch

 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes

2015-05-15 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546271#comment-14546271
 ] 

Craig Welch commented on YARN-3632:
---

One line change to address missing whitespace issue.  Again, the javac and 
findbugs don't appear to have anything to do with the patch.

 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running

2015-05-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546296#comment-14546296
 ] 

Jian He commented on YARN-2268:
---

I think the lock file solution only suits zk-store, not for other state-store 
implementations. 
The current approach of polling web service should be more general. 

 Disallow formatting the RMStateStore when there is an RM running
 

 Key: YARN-2268
 URL: https://issues.apache.org/jira/browse/YARN-2268
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Rohith
 Attachments: 0001-YARN-2268.patch


 YARN-2131 adds a way to format the RMStateStore. However, it can be a problem 
 if we format the store while an RM is actively using it. It would be nice to 
 fail the format if there is an RM running and using this store. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546331#comment-14546331
 ] 

Hadoop QA commented on YARN-3626:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 24s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  7s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 16s | The applied patch generated  2 
new checkstyle issues (total was 59, now 58). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 33s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | mapreduce tests |   0m 45s | Tests passed in 
hadoop-mapreduce-client-common. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   6m  1s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  48m 49s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733257/YARN-3626.9.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f7e051c |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7954/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt
 |
| hadoop-mapreduce-client-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7954/artifact/patchprocess/testrun_hadoop-mapreduce-client-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7954/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7954/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7954/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7954/console |


This message was automatically generated.

 On Windows localized resources are not moved to the front of the classpath 
 when they should be
 --

 Key: YARN-3626
 URL: https://issues.apache.org/jira/browse/YARN-3626
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.7.1

 Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch, 
 YARN-3626.9.patch


 In response to the mapreduce.job.user.classpath.first setting the classpath 
 is ordered differently so that localized resources will appear before system 
 classpath resources when tasks execute.  On Windows this does not work 
 because the localized resources are not linked into their final location when 
 the classpath jar is created.  To compensate for that localized jar resources 
 are added directly to the classpath generated for the jar rather than being 
 discovered from the localized directories.  Unfortunately, they are always 
 appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations

2015-05-15 Thread Ishai Menache (JIRA)
Ishai Menache created YARN-3656:
---

 Summary: LowCost: A Cost-Based Placement Agent for YARN 
Reservations
 Key: YARN-3656
 URL: https://issues.apache.org/jira/browse/YARN-3656
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Ishai Menache


YARN-1051 enables SLA support by allowing users to reserve cluster capacity 
ahead of time. YARN-1710 introduced a greedy agent for placing user 
reservations. The greedy agent makes fast placement decisions but at the cost 
of ignoring the cluster committed resources, which might result in blocking the 
cluster resources for certain periods of time, and in turn rejecting some 
arriving jobs.

We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” the 
demand of the job throughout the allowed time-window according to a global, 
load-based cost function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations

2015-05-15 Thread Ishai Menache (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishai Menache updated YARN-3656:

Attachment: LowCostRayonExternal.pdf

This tech-report summarizes the details of LowCost, as well as our experimental 
results which show benefits in using LowCost on a variety of performance metrics

 LowCost: A Cost-Based Placement Agent for YARN Reservations
 ---

 Key: YARN-3656
 URL: https://issues.apache.org/jira/browse/YARN-3656
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Ishai Menache
 Attachments: LowCostRayonExternal.pdf


 YARN-1051 enables SLA support by allowing users to reserve cluster capacity 
 ahead of time. YARN-1710 introduced a greedy agent for placing user 
 reservations. The greedy agent makes fast placement decisions but at the cost 
 of ignoring the cluster committed resources, which might result in blocking 
 the cluster resources for certain periods of time, and in turn rejecting some 
 arriving jobs.
 We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” 
 the demand of the job throughout the allowed time-window according to a 
 global, load-based cost function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-15 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546365#comment-14546365
 ] 

Chris Nauroth commented on YARN-3626:
-

I don't fully understand the objection to the former patch that had been 
committed.

bq. The new configuration added is supposed to be per app, but it is now a 
server side configuration.

There was a new YARN configuration property for triggering this behavior, but 
the MR application would toggle on that YARN property only if the MR job 
submission had {{MAPREDUCE_JOB_USER_CLASSPATH_FIRST}} on.  From {{MRApps}}:

{code}
boolean userClassesTakesPrecedence = 
  conf.getBoolean(MRJobConfig.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, false);

if (userClassesTakesPrecedence) {
  conf.set(YarnConfiguration.YARN_APPLICATION_CLASSPATH_PREPEND_DISTCACHE,
true);
}
{code}

I thought this implemented per app behavior, because it could vary between MR 
app submission instances.  It would not be a requirement to put 
{{YARN_APPLICATION_CLASSPATH_PREPEND_DISTCACHE}} into the server configs and 
have the client and server share configs.

Is there a detail I'm missing?

 On Windows localized resources are not moved to the front of the classpath 
 when they should be
 --

 Key: YARN-3626
 URL: https://issues.apache.org/jira/browse/YARN-3626
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.7.1

 Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch, 
 YARN-3626.9.patch


 In response to the mapreduce.job.user.classpath.first setting the classpath 
 is ordered differently so that localized resources will appear before system 
 classpath resources when tasks execute.  On Windows this does not work 
 because the localized resources are not linked into their final location when 
 the classpath jar is created.  To compensate for that localized jar resources 
 are added directly to the classpath generated for the jar rather than being 
 discovered from the localized directories.  Unfortunately, they are always 
 appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2015-05-15 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546377#comment-14546377
 ] 

Bikas Saha commented on YARN-1902:
--

The AMRMClient was not written to automatically remove requests because it does 
not know which requests will be matched to allocated containers. The explicit 
contract is for users of AMRMClient to remove requests that have been matched 
to containers.

If we change that behavior to automatically remove requests then it may lead to 
issues where 2 entities are removing requests. 1) user 2) AMRMClient. So that 
change should only be made in a different version of AMRMClient or else 
existing users will break.

In the worst case, if the AMRMClient (automatically) removes the wrong request 
then the application will hang because the RM will not provide it the container 
that is needed. Not automatically removing the request has the downside of 
getting additional containers that need to be released by the application. We 
chose excess containers over hanging for the original implementation. 

Excess containers should happen rarely because the user controls when 
AMRMClient heartbeats to the RM and can do that after having removed all 
matched requests, so that the remote request table reflects the current state 
of outstanding requests. There may still be a race condition on the RM side 
that gives more containers. Excess containers can happen more often with 
AMRMClientAsync, because it heartbeats at a regular schedule and can send more 
requests than really outstanding if the heartbeat goes out before the user has 
removed the matched requests.


 Allocation of too many containers when a second request is done with the same 
 resource capability
 -

 Key: YARN-1902
 URL: https://issues.apache.org/jira/browse/YARN-1902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0, 2.3.0, 2.4.0
Reporter: Sietse T. Au
Assignee: Sietse T. Au
  Labels: client
 Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch


 Regarding AMRMClientImpl
 Scenario 1:
 Given a ContainerRequest x with Resource y, when addContainerRequest is 
 called z times with x, allocate is called and at least one of the z allocated 
 containers is started, then if another addContainerRequest call is done and 
 subsequently an allocate call to the RM, (z+1) containers will be allocated, 
 where 1 container is expected.
 Scenario 2:
 No containers are started between the allocate calls. 
 Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
 are requested in both scenarios, but that only in the second scenario, the 
 correct behavior is observed.
 Looking at the implementation I have found that this (z+1) request is caused 
 by the structure of the remoteRequestsTable. The consequence of MapResource, 
 ResourceRequestInfo is that ResourceRequestInfo does not hold any 
 information about whether a request has been sent to the RM yet or not.
 There are workarounds for this, such as releasing the excess containers 
 received.
 The solution implemented is to initialize a new ResourceRequest in 
 ResourceRequestInfo when a request has been successfully sent to the RM.
 The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2015-05-15 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546421#comment-14546421
 ] 

Bikas Saha commented on YARN-1902:
--

Yes. And then the RM may give a container on H1 which is not useful for the 
app. If we again auto-decrement and release the container then we end up with 2 
outstanding requests and the job will hang because it needs 3 containers.

 Allocation of too many containers when a second request is done with the same 
 resource capability
 -

 Key: YARN-1902
 URL: https://issues.apache.org/jira/browse/YARN-1902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0, 2.3.0, 2.4.0
Reporter: Sietse T. Au
Assignee: Sietse T. Au
  Labels: client
 Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch


 Regarding AMRMClientImpl
 Scenario 1:
 Given a ContainerRequest x with Resource y, when addContainerRequest is 
 called z times with x, allocate is called and at least one of the z allocated 
 containers is started, then if another addContainerRequest call is done and 
 subsequently an allocate call to the RM, (z+1) containers will be allocated, 
 where 1 container is expected.
 Scenario 2:
 No containers are started between the allocate calls. 
 Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
 are requested in both scenarios, but that only in the second scenario, the 
 correct behavior is observed.
 Looking at the implementation I have found that this (z+1) request is caused 
 by the structure of the remoteRequestsTable. The consequence of MapResource, 
 ResourceRequestInfo is that ResourceRequestInfo does not hold any 
 information about whether a request has been sent to the RM yet or not.
 There are workarounds for this, such as releasing the excess containers 
 received.
 The solution implemented is to initialize a new ResourceRequest in 
 ResourceRequestInfo when a request has been successfully sent to the RM.
 The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-05-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546420#comment-14546420
 ] 

zhihai xu commented on YARN-3655:
-

I uploaded a patch YARN-3655.000.patch for review.

 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation 
 -

 Key: YARN-3655
 URL: https://issues.apache.org/jira/browse/YARN-3655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3655.000.patch


 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation.
 If a node is reserved by an application, all the other applications don't 
 have any chance to assign a new container on this node, unless the 
 application which reserves the node assigns a new container on this node or 
 releases the reserved container on this node.
 The problem is if an application tries to call assignReservedContainer and 
 fail to get a new container due to maxAMShare limitation, it will block all 
 other applications to use the nodes it reserves. If all other running 
 applications can't release their AM containers due to being blocked by these 
 reserved containers. A livelock situation can happen.
 The following is the code at FSAppAttempt#assignContainer which can cause 
 this potential livelock.
 {code}
 // Check the AM resource usage for the leaf queue
 if (!isAmRunning()  !getUnmanagedAM()) {
   ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests();
   if (ask.isEmpty() || !getQueue().canRunAppAM(
   ask.get(0).getCapability())) {
 if (LOG.isDebugEnabled()) {
   LOG.debug(Skipping allocation because maxAMShare limit would  +
   be exceeded);
 }
 return Resources.none();
   }
 }
 {code}
 To fix this issue, we can unreserve the node if we can't allocate the AM 
 container on the node due to Max AM share limitation and the node is reserved 
 by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3662) Federation StateStore APIs

2015-05-15 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-3662:


 Summary: Federation StateStore APIs
 Key: YARN-3662
 URL: https://issues.apache.org/jira/browse/YARN-3662
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Subru Krishnan


The Federation State defines the additional state that needs to be maintained 
to loosely couple multiple individual sub-clusters into a single large 
federated cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3663) Federation State and Policy Store (DBMS implementation)

2015-05-15 Thread Giovanni Matteo Fumarola (JIRA)
Giovanni Matteo Fumarola created YARN-3663:
--

 Summary: Federation State and Policy Store (DBMS implementation)
 Key: YARN-3663
 URL: https://issues.apache.org/jira/browse/YARN-3663
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Giovanni Matteo Fumarola


This JIRA tracks a SQL-based implementation of the Federation State and Policy 
Store, which implements YARN-3662 APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure

2015-05-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546461#comment-14546461
 ] 

zhihai xu commented on YARN-3591:
-

[~vinodkv], yes, keeping the ownership of turning disks good or bad in one 
single place is a very good suggestion. So it is reasonable to keep all the 
disk checking at DirectoryCollection.
Normally CacheCleanup thread will periodically send CACHE_CLEANUP event to 
cleanup these localized files in LocalResourcesTrackerImpl.
If we only remove these localized resources on the bad disk which can't be 
recovered, it will be ok. Here bad disk is different from full disk. I 
suppose all the files on the bad disk will be lost/deleted when it becomes 
good. Keeping app level resources sounds reasonable to me.

 Resource Localisation on a bad disk causes subsequent containers failure 
 -

 Key: YARN-3591
 URL: https://issues.apache.org/jira/browse/YARN-3591
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
 Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, 
 YARN-3591.2.patch


 It happens when a resource is localised on the disk, after localising that 
 disk has gone bad. NM keeps paths for localised resources in memory.  At the 
 time of resource request isResourcePresent(rsrc) will be called which calls 
 file.exists() on the localised path.
 In some cases when disk has gone bad, inodes are stilled cached and 
 file.exists() returns true. But at the time of reading, file will not open.
 Note: file.exists() actually calls stat64 natively which returns true because 
 it was able to find inode information from the OS.
 A proposal is to call file.list() on the parent path of the resource, which 
 will call open() natively. If the disk is good it should return an array of 
 paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3666) Federation Intercepting and propagating AM-RM communications

2015-05-15 Thread Kishore Chaliparambil (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishore Chaliparambil updated YARN-3666:

External issue ID:   (was: YARN-2884)

 Federation Intercepting and propagating AM-RM communications
 

 Key: YARN-3666
 URL: https://issues.apache.org/jira/browse/YARN-3666
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Kishore Chaliparambil

 In order, to support transparent spanning of jobs across sub-clusters, all 
 AM-RM communications are proxied (via YARN-2884).
 This JIRA tracks federation-specific mechanisms that decide how to 
 split/broadcast requests to the RMs and merge answers to 
 the AM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure

2015-05-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546213#comment-14546213
 ] 

Vinod Kumar Vavilapalli commented on YARN-3591:
---

Essentially keeping the ownership of turning disks good or bad in one single 
place.

 Resource Localisation on a bad disk causes subsequent containers failure 
 -

 Key: YARN-3591
 URL: https://issues.apache.org/jira/browse/YARN-3591
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
 Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, 
 YARN-3591.2.patch


 It happens when a resource is localised on the disk, after localising that 
 disk has gone bad. NM keeps paths for localised resources in memory.  At the 
 time of resource request isResourcePresent(rsrc) will be called which calls 
 file.exists() on the localised path.
 In some cases when disk has gone bad, inodes are stilled cached and 
 file.exists() returns true. But at the time of reading, file will not open.
 Note: file.exists() actually calls stat64 natively which returns true because 
 it was able to find inode information from the OS.
 A proposal is to call file.list() on the parent path of the resource, which 
 will call open() natively. If the disk is good it should return an array of 
 paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-15 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546288#comment-14546288
 ] 

Craig Welch commented on YARN-3626:
---

[~cnauroth], [~vvasudev] - This patch goes back to the original approach I 
passed by you offline - the fix itself is the same, but it uses the classpath 
instead of configuration to determine when the behavior should change.  Your 
thoughts?

 On Windows localized resources are not moved to the front of the classpath 
 when they should be
 --

 Key: YARN-3626
 URL: https://issues.apache.org/jira/browse/YARN-3626
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.7.1

 Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch, 
 YARN-3626.9.patch


 In response to the mapreduce.job.user.classpath.first setting the classpath 
 is ordered differently so that localized resources will appear before system 
 classpath resources when tasks execute.  On Windows this does not work 
 because the localized resources are not linked into their final location when 
 the classpath jar is created.  To compensate for that localized jar resources 
 are added directly to the classpath generated for the jar rather than being 
 discovered from the localized directories.  Unfortunately, they are always 
 appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546359#comment-14546359
 ] 

Vinod Kumar Vavilapalli commented on YARN-3626:
---

bq. We may have to depend on some sort of a named environment variable or 
something, assuming adding a new field in CLC is not desirable.
Can't we do the above? We definitely cannot insert mapreduce incantations like 
job.jar into YARN.

 On Windows localized resources are not moved to the front of the classpath 
 when they should be
 --

 Key: YARN-3626
 URL: https://issues.apache.org/jira/browse/YARN-3626
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.7.1

 Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch, 
 YARN-3626.9.patch


 In response to the mapreduce.job.user.classpath.first setting the classpath 
 is ordered differently so that localized resources will appear before system 
 classpath resources when tasks execute.  On Windows this does not work 
 because the localized resources are not linked into their final location when 
 the classpath jar is created.  To compensate for that localized jar resources 
 are added directly to the classpath generated for the jar rather than being 
 discovered from the localized directories.  Unfortunately, they are always 
 appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-15 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546370#comment-14546370
 ] 

Craig Welch commented on YARN-3626:
---

bq. Can't we do the above? We definitely cannot insert mapreduce incantations 
like job.jar into YARN.

That's why I took the config based approach, which apparently is invalid... but 
it also worked, which is quite confusing.  I'm going to go back and validate 
our reasoning for believing it shoudn't.

bq. Can't we do the above? We definitely cannot insert mapreduce incantations 
like job.jar into YARN.

I suppose we can if it would work.  It needs to be something which can be 
propagated from Oozie, which adds additional complexity.  Ideally, we need 
something the MrApps can set based on the presence of the mapred configuration 
so that it propagates through.  Do we have an example of this being done 
elsewhere?

 On Windows localized resources are not moved to the front of the classpath 
 when they should be
 --

 Key: YARN-3626
 URL: https://issues.apache.org/jira/browse/YARN-3626
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.7.1

 Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch, 
 YARN-3626.9.patch


 In response to the mapreduce.job.user.classpath.first setting the classpath 
 is ordered differently so that localized resources will appear before system 
 classpath resources when tasks execute.  On Windows this does not work 
 because the localized resources are not linked into their final location when 
 the classpath jar is created.  To compensate for that localized jar resources 
 are added directly to the classpath generated for the jar rather than being 
 discovered from the localized directories.  Unfortunately, they are always 
 appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546393#comment-14546393
 ] 

Vinod Kumar Vavilapalli commented on YARN-3626:
---

bq. I thought this implemented per app behavior, because it could vary 
between MR app submission instances. It would not be a requirement to put 
YARN_APPLICATION_CLASSPATH_PREPEND_DISTCACHE into the server configs and have 
the client and server share configs.
YARN doesn't have a notion of app-configs, it doesn't know app's config files 
etc. So, the app cannot set a config property that it expects the server to 
respect.

No idea, how your original patch apparently worked. May be we are missing 
something.

[~cwelch], what I was proposing was something in the lines of (a) user sets MR 
user-classpath-before config (b) MR converts that into a special env for YARN 
and (c) YARN looks at the ENV to figure out how to the order the classpath.

Overall, it is terrible that we are talking classpaths in YARN, but that's for 
another JIRA.

 On Windows localized resources are not moved to the front of the classpath 
 when they should be
 --

 Key: YARN-3626
 URL: https://issues.apache.org/jira/browse/YARN-3626
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.7.1

 Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch, 
 YARN-3626.9.patch


 In response to the mapreduce.job.user.classpath.first setting the classpath 
 is ordered differently so that localized resources will appear before system 
 classpath resources when tasks execute.  On Windows this does not work 
 because the localized resources are not linked into their final location when 
 the classpath jar is created.  To compensate for that localized jar resources 
 are added directly to the classpath generated for the jar rather than being 
 discovered from the localized directories.  Unfortunately, they are always 
 appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-15 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546400#comment-14546400
 ] 

Chris Nauroth commented on YARN-3626:
-

I see now.  Thanks for the clarification.  In that case, I agree with the new 
proposal.

 On Windows localized resources are not moved to the front of the classpath 
 when they should be
 --

 Key: YARN-3626
 URL: https://issues.apache.org/jira/browse/YARN-3626
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.7.1

 Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch, 
 YARN-3626.9.patch


 In response to the mapreduce.job.user.classpath.first setting the classpath 
 is ordered differently so that localized resources will appear before system 
 classpath resources when tasks execute.  On Windows this does not work 
 because the localized resources are not linked into their final location when 
 the classpath jar is created.  To compensate for that localized jar resources 
 are added directly to the classpath generated for the jar rather than being 
 discovered from the localized directories.  Unfortunately, they are always 
 appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-05-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3655:

Attachment: YARN-3655.000.patch

 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation 
 -

 Key: YARN-3655
 URL: https://issues.apache.org/jira/browse/YARN-3655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3655.000.patch


 FairScheduler: potential deadlock due to maxAMShare limitation and container 
 reservation.
 If a node is reserved by an application, all the other applications don't 
 have any chance to assign a new container on this node, unless the 
 application which reserves the node assigns a new container on this node or 
 releases the reserved container on this node.
 The problem is if an application tries to call assignReservedContainer and 
 fail to get a new container due to maxAMShare limitation, it will block all 
 other applications to use the nodes it reserves. If all other running 
 applications can't release their AM containers due to being blocked by these 
 reserved containers. A dead lock situation can happen.
 The following is the code at FSAppAttempt#assignContainer which can cause 
 this potential dead lock.
 {code}
 // Check the AM resource usage for the leaf queue
 if (!isAmRunning()  !getUnmanagedAM()) {
   ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests();
   if (ask.isEmpty() || !getQueue().canRunAppAM(
   ask.get(0).getCapability())) {
 if (LOG.isDebugEnabled()) {
   LOG.debug(Skipping allocation because maxAMShare limit would  +
   be exceeded);
 }
 return Resources.none();
   }
 }
 {code}
 To fix this issue, we can unreserve the node if we can't allocate the AM 
 container on the node due to Max AM share limitation and the node is reserved 
 by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-05-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3655:

Description: 
FairScheduler: potential livelock due to maxAMShare limitation and container 
reservation.
If a node is reserved by an application, all the other applications don't have 
any chance to assign a new container on this node, unless the application which 
reserves the node assigns a new container on this node or releases the reserved 
container on this node.
The problem is if an application tries to call assignReservedContainer and fail 
to get a new container due to maxAMShare limitation, it will block all other 
applications to use the nodes it reserves. If all other running applications 
can't release their AM containers due to being blocked by these reserved 
containers. A livelock situation can happen.
The following is the code at FSAppAttempt#assignContainer which can cause this 
potential livelock.
{code}
// Check the AM resource usage for the leaf queue
if (!isAmRunning()  !getUnmanagedAM()) {
  ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests();
  if (ask.isEmpty() || !getQueue().canRunAppAM(
  ask.get(0).getCapability())) {
if (LOG.isDebugEnabled()) {
  LOG.debug(Skipping allocation because maxAMShare limit would  +
  be exceeded);
}
return Resources.none();
  }
}
{code}
To fix this issue, we can unreserve the node if we can't allocate the AM 
container on the node due to Max AM share limitation and the node is reserved 
by the application.

  was:
FairScheduler: potential deadlock due to maxAMShare limitation and container 
reservation.
If a node is reserved by an application, all the other applications don't have 
any chance to assign a new container on this node, unless the application which 
reserves the node assigns a new container on this node or releases the reserved 
container on this node.
The problem is if an application tries to call assignReservedContainer and fail 
to get a new container due to maxAMShare limitation, it will block all other 
applications to use the nodes it reserves. If all other running applications 
can't release their AM containers due to being blocked by these reserved 
containers. A dead lock situation can happen.
The following is the code at FSAppAttempt#assignContainer which can cause this 
potential dead lock.
{code}
// Check the AM resource usage for the leaf queue
if (!isAmRunning()  !getUnmanagedAM()) {
  ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests();
  if (ask.isEmpty() || !getQueue().canRunAppAM(
  ask.get(0).getCapability())) {
if (LOG.isDebugEnabled()) {
  LOG.debug(Skipping allocation because maxAMShare limit would  +
  be exceeded);
}
return Resources.none();
  }
}
{code}
To fix this issue, we can unreserve the node if we can't allocate the AM 
container on the node due to Max AM share limitation and the node is reserved 
by the application.


 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation 
 -

 Key: YARN-3655
 URL: https://issues.apache.org/jira/browse/YARN-3655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3655.000.patch


 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation.
 If a node is reserved by an application, all the other applications don't 
 have any chance to assign a new container on this node, unless the 
 application which reserves the node assigns a new container on this node or 
 releases the reserved container on this node.
 The problem is if an application tries to call assignReservedContainer and 
 fail to get a new container due to maxAMShare limitation, it will block all 
 other applications to use the nodes it reserves. If all other running 
 applications can't release their AM containers due to being blocked by these 
 reserved containers. A livelock situation can happen.
 The following is the code at FSAppAttempt#assignContainer which can cause 
 this potential livelock.
 {code}
 // Check the AM resource usage for the leaf queue
 if (!isAmRunning()  !getUnmanagedAM()) {
   ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests();
   if (ask.isEmpty() || !getQueue().canRunAppAM(
   ask.get(0).getCapability())) {
 if (LOG.isDebugEnabled()) {
   LOG.debug(Skipping allocation because maxAMShare limit would  +
   be exceeded);
 }
 return Resources.none();
   }
 }
 {code}
 To fix this issue, 

[jira] [Updated] (YARN-3659) Federation Router (hiding multiple RMs for ApplicationClientProtocol)

2015-05-15 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-3659:
---
Description: 
This JIRA tracks the design/implementation of the layer for routing 
ApplicationClientProtocol requests to the appropriate
RM(s) in a federated YARN cluster.


  was:
This JIRA tracks the design/implementation of the layer for routing 
ApplicaitonSubmissionProtocol requests to the appropriate
RM(s) in a federated YARN cluster.



 Federation Router (hiding multiple RMs for ApplicationClientProtocol)
 -

 Key: YARN-3659
 URL: https://issues.apache.org/jira/browse/YARN-3659
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Giovanni Matteo Fumarola

 This JIRA tracks the design/implementation of the layer for routing 
 ApplicationClientProtocol requests to the appropriate
 RM(s) in a federated YARN cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3660) Federation Global Policy Generator (load balancing)

2015-05-15 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-3660:
--

 Summary: Federation Global Policy Generator (load balancing)
 Key: YARN-3660
 URL: https://issues.apache.org/jira/browse/YARN-3660
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Subru Krishnan


In a federated environment, local impairments of one sub-cluster might unfairly 
affect users/queues that are mapped to that sub-cluster. A centralized 
component (GPG) runs out-of-band and edits the policies governing how 
users/queues are allocated to sub-clusters. This allows us to enforce global 
invariants (by dynamically updating locally-enforced invariants).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3659) Federation Router (hiding multiple RMs for ApplicationSubmissionProtocol)

2015-05-15 Thread Giovanni Matteo Fumarola (JIRA)
Giovanni Matteo Fumarola created YARN-3659:
--

 Summary: Federation Router (hiding multiple RMs for 
ApplicationSubmissionProtocol)
 Key: YARN-3659
 URL: https://issues.apache.org/jira/browse/YARN-3659
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Giovanni Matteo Fumarola


This JIRA tracks the design/implementation of the layer for routing 
ApplicaitonSubmissionProtocol requests to the appropriate
RM(s) in a federated YARN cluster.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3659) Federation Router (hiding multiple RMs for ApplicationClientProtocol)

2015-05-15 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-3659:
-
Summary: Federation Router (hiding multiple RMs for 
ApplicationClientProtocol)  (was: Federation Router (hiding multiple RMs for 
ApplicationSubmissionProtocol))

 Federation Router (hiding multiple RMs for ApplicationClientProtocol)
 -

 Key: YARN-3659
 URL: https://issues.apache.org/jira/browse/YARN-3659
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Giovanni Matteo Fumarola

 This JIRA tracks the design/implementation of the layer for routing 
 ApplicaitonSubmissionProtocol requests to the appropriate
 RM(s) in a federated YARN cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3664) Federation PolicyStore APIs

2015-05-15 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan reassigned YARN-3664:


Assignee: Subru Krishnan

 Federation PolicyStore APIs
 ---

 Key: YARN-3664
 URL: https://issues.apache.org/jira/browse/YARN-3664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan

 The federation Policy Store contains information about the capacity 
 allocations made by users, their mapping to sub-clusters and the policies 
 that each of the components (Router, AMRMPRoxy, RMs) should enforce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3665) Federation subcluster membership mechanisms

2015-05-15 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-3665:


 Summary: Federation subcluster membership mechanisms
 Key: YARN-3665
 URL: https://issues.apache.org/jira/browse/YARN-3665
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan


The member YARN RMs continuously heartbeat to the state store to keep alive and 
publish their current capability/load information. This JIRA tracks this 
mechanisms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3663) Federation State and Policy Store (DBMS implementation)

2015-05-15 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola reassigned YARN-3663:
--

Assignee: Giovanni Matteo Fumarola

 Federation State and Policy Store (DBMS implementation)
 ---

 Key: YARN-3663
 URL: https://issues.apache.org/jira/browse/YARN-3663
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Giovanni Matteo Fumarola
Assignee: Giovanni Matteo Fumarola

 This JIRA tracks a SQL-based implementation of the Federation State and 
 Policy Store, which implements YARN-3662 APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3661) Federation UI

2015-05-15 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola reassigned YARN-3661:
--

Assignee: Giovanni Matteo Fumarola

 Federation UI 
 --

 Key: YARN-3661
 URL: https://issues.apache.org/jira/browse/YARN-3661
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Giovanni Matteo Fumarola
Assignee: Giovanni Matteo Fumarola

 The UIs provided by each RM, provide a correct local view of what is 
 running in a sub-cluster. In the context of federation we need new 
 UIs that can track load, jobs, users across sub-clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3666) Federation Intercepting and propagating AM-RM communications

2015-05-15 Thread Kishore Chaliparambil (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishore Chaliparambil reassigned YARN-3666:
---

Assignee: Kishore Chaliparambil

 Federation Intercepting and propagating AM-RM communications
 

 Key: YARN-3666
 URL: https://issues.apache.org/jira/browse/YARN-3666
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Kishore Chaliparambil
Assignee: Kishore Chaliparambil

 In order, to support transparent spanning of jobs across sub-clusters, all 
 AM-RM communications are proxied (via YARN-2884).
 This JIRA tracks federation-specific mechanisms that decide how to 
 split/broadcast requests to the RMs and merge answers to 
 the AM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3561) Non-AM Containers continue to run even after AM is stopped

2015-05-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546278#comment-14546278
 ] 

Jian He commented on YARN-3561:
---

[~gsaha], from the description, this is running against 2.6 ?  this could be 
related to YARN-2825, but that's fixed in 2.6 
From the logs, I can only see the container is still sort of waiting for the 
process to finish.  Is this easy to reproduce? It'll be great if we have NM 
logs with debug level on.

 Non-AM Containers continue to run even after AM is stopped
 --

 Key: YARN-3561
 URL: https://issues.apache.org/jira/browse/YARN-3561
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, yarn
Affects Versions: 2.6.0
 Environment: debian 7
Reporter: Gour Saha
Priority: Critical
 Attachments: app0001.zip


 Non-AM containers continue to run even after application is stopped. This 
 occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a 
 Hadoop 2.6 deployment. 
 Following are the NM logs from 2 different nodes:
 *host-07* - where Slider AM was running
 *host-03* - where Storm NIMBUS container was running.
 *Note:* The logs are partial, starting with the time when the relevant Slider 
 AM and NIMBUS containers were allocated, till the time when the Slider AM was 
 stopped. Also, the large number of Memory usage log lines were removed 
 keeping only a few starts and ends of every segment.
 *NM log from host-07 where Slider AM container was running:*
 {noformat}
 2015-04-29 00:39:24,614 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for 
 container_1428575950531_0020_02_01
 2015-04-29 00:41:10,310 INFO  ipc.Server (Server.java:saslProcess(1306)) - 
 Auth successful for appattempt_1428575950531_0021_01 (auth:SIMPLE)
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(803)) - Start request for 
 container_1428575950531_0021_01_01 by user yarn
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new 
 application reference for app application_1428575950531_0021
 2015-04-29 00:41:10,323 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from NEW to INITING
 2015-04-29 00:41:10,325 INFO  nodemanager.NMAuditLogger 
 (NMAuditLogger.java:logSuccess(89)) - USER=yarn   IP=10.84.105.162
 OPERATION=Start Container Request   TARGET=ContainerManageImpl  
 RESULT=SUCCESS  APPID=application_1428575950531_0021
 CONTAINERID=container_1428575950531_0021_01_01
 2015-04-29 00:41:10,328 WARN  logaggregation.LogAggregationService 
 (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root 
 Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: 
 [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple 
 users.
 2015-04-29 00:41:10,328 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:init(182)) - rollingMonitorInterval is set as 
 -1. The log rolling mornitoring interval is disabled. The logs will be 
 aggregated after this application is finished.
 2015-04-29 00:41:10,351 INFO  application.Application 
 (ApplicationImpl.java:transition(304)) - Adding 
 container_1428575950531_0021_01_01 to application 
 application_1428575950531_0021
 2015-04-29 00:41:10,352 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from INITING to RUNNING
 2015-04-29 00:41:10,356 INFO  container.Container 
 (ContainerImpl.java:handle(999)) - Container 
 container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING
 2015-04-29 00:41:10,357 INFO  containermanager.AuxServices 
 (AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId 
 application_1428575950531_0021
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/jettison-1.1.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/api-util-1.0.0-M20.jar
  transitioned from INIT to 

[jira] [Updated] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-05-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3655:
---
Summary: FairScheduler: potential livelock due to maxAMShare limitation and 
container reservation   (was: FairScheduler: potential deadlock due to 
maxAMShare limitation and container reservation )

 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation 
 -

 Key: YARN-3655
 URL: https://issues.apache.org/jira/browse/YARN-3655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu

 FairScheduler: potential deadlock due to maxAMShare limitation and container 
 reservation.
 If a node is reserved by an application, all the other applications don't 
 have any chance to assign a new container on this node, unless the 
 application which reserves the node assigns a new container on this node or 
 releases the reserved container on this node.
 The problem is if an application tries to call assignReservedContainer and 
 fail to get a new container due to maxAMShare limitation, it will block all 
 other applications to use the nodes it reserves. If all other running 
 applications can't release their AM containers due to being blocked by these 
 reserved containers. A dead lock situation can happen.
 The following is the code at FSAppAttempt#assignContainer which can cause 
 this potential dead lock.
 {code}
 // Check the AM resource usage for the leaf queue
 if (!isAmRunning()  !getUnmanagedAM()) {
   ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests();
   if (ask.isEmpty() || !getQueue().canRunAppAM(
   ask.get(0).getCapability())) {
 if (LOG.isDebugEnabled()) {
   LOG.debug(Skipping allocation because maxAMShare limit would  +
   be exceeded);
 }
 return Resources.none();
   }
 }
 {code}
 To fix this issue, we can unreserve the node if we can't allocate the AM 
 container on the node due to Max AM share limitation and the node is reserved 
 by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2015-05-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546391#comment-14546391
 ] 

Vinod Kumar Vavilapalli commented on YARN-1902:
---

This was discussed multiple times before.

Two kinds of races can happen. A resource-table deduction happens when
 # allocated containers are already sitting in the RM (tracked at YARN-110)
 # allocated containers are already sitting in the client library

Seems like this JIRA is talking about both (1) and (2).

The dist-shell example above sounds like it could be because of (1).

Re (2), as Bikas says, the notion of forcing apps to deduct requests after a 
successful allocation (using AMRMClient.removeContainerRequest()) was 
introduced because the library clearly doesn't have an idea of which 
ResourceRequest to deduct from. [~leftnoteasy] mentioned offline that we could 
at-least deduct the count against the over-all number (ANY request) for a given 
priority. /cc [~bikassaha]

 Allocation of too many containers when a second request is done with the same 
 resource capability
 -

 Key: YARN-1902
 URL: https://issues.apache.org/jira/browse/YARN-1902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0, 2.3.0, 2.4.0
Reporter: Sietse T. Au
Assignee: Sietse T. Au
  Labels: client
 Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch


 Regarding AMRMClientImpl
 Scenario 1:
 Given a ContainerRequest x with Resource y, when addContainerRequest is 
 called z times with x, allocate is called and at least one of the z allocated 
 containers is started, then if another addContainerRequest call is done and 
 subsequently an allocate call to the RM, (z+1) containers will be allocated, 
 where 1 container is expected.
 Scenario 2:
 No containers are started between the allocate calls. 
 Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
 are requested in both scenarios, but that only in the second scenario, the 
 correct behavior is observed.
 Looking at the implementation I have found that this (z+1) request is caused 
 by the structure of the remoteRequestsTable. The consequence of MapResource, 
 ResourceRequestInfo is that ResourceRequestInfo does not hold any 
 information about whether a request has been sent to the RM yet or not.
 There are workarounds for this, such as releasing the excess containers 
 received.
 The solution implemented is to initialize a new ResourceRequest in 
 ResourceRequestInfo when a request has been successfully sent to the RM.
 The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes

2015-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546396#comment-14546396
 ] 

Hadoop QA commented on YARN-3632:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 25s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | javac |   7m 28s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |   9m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 45s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 17s | The patch appears to introduce 1 
new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |  50m  6s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  86m 10s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
|  |  Inconsistent synchronization of 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS;
 locked 66% of time  Unsynchronized access at FileSystemRMStateStore.java:66% 
of time  Unsynchronized access at FileSystemRMStateStore.java:[line 156] |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733262/YARN-3632.5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f7e051c |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/7955/artifact/patchprocess/diffJavacWarnings.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/7955/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7955/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7955/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7955/console |


This message was automatically generated.

 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3664) Federation PolicyStore APIs

2015-05-15 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-3664:


 Summary: Federation PolicyStore APIs
 Key: YARN-3664
 URL: https://issues.apache.org/jira/browse/YARN-3664
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan


The federation Policy Store contains information about the capacity allocations 
made by users, their mapping to sub-clusters and the policies that each of the 
components (Router, AMRMPRoxy, RMs) should enforce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3666) Federation Intercepting and propagating AM-RM communications

2015-05-15 Thread Kishore Chaliparambil (JIRA)
Kishore Chaliparambil created YARN-3666:
---

 Summary: Federation Intercepting and propagating AM-RM 
communications
 Key: YARN-3666
 URL: https://issues.apache.org/jira/browse/YARN-3666
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Kishore Chaliparambil


In order, to support transparent spanning of jobs across sub-clusters, all 
AM-RM communications are proxied (via YARN-2884).

This JIRA tracks federation-specific mechanisms that decide how to 
split/broadcast requests to the RMs and merge answers to 
the AM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2015-05-15 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546209#comment-14546209
 ] 

MENG DING commented on YARN-1902:
-

I was almost going to log the same issue when I saw this thread (and also 
YARN-3020) :-).

After reading all the discussions, and after reading the related code, I still 
believe this is a bug.

I understand what [~bikassaha] has said that the AM-RM protocol is NOT a delta 
protocol, and that currently user (i.e., ApplicationMaster) is responsible for 
calling removeContainerRequest() after receiving an allocation, but consider 
the follow simple modification to the packaged *distributedshell* application:

{code:title=ApplicationMaster.java|borderStyle=solid}
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
@@ -805,6 +805,8 @@ public void onContainersAllocated(ListContainer 
allocatedContainers) {
 // as all containers may not be allocated at one go.
 launchThreads.add(launchThread);
 launchThread.start();
+ContainerRequest containerAsk = setupContainerAskForRM();
+amRMClient.removeContainerRequest(containerAsk);
   }
 }
{code}

The code simply removes a container request after successfully receiving an 
allocated container in the ApplicationMaster. When you submit this application 
by specifying, say, 3 containers on the CLI, you will sometimes get 4 
containers allocated (not counting the AM container)! 

{code}
root@node2:~# hadoop 
org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar
 -shell_command sleep 10 -num_containers 3 -timeout 2
{code}
{code}
root@node2:~# yarn container -list appattempt_1431531743796_0015_01
15/05/15 20:49:01 INFO client.RMProxy: Connecting to ResourceManager at 
node2/10.211.55.102:8032
15/05/15 20:49:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Total number of containers :5
  Container-IdStart Time Finish Time
   StateHost   Node Http Address
 LOG-URL
container_1431531743796_0015_01_05  Fri May 15 20:44:12 + 2015  
 N/A RUNNING node3:50093
http://node3:8042
http://node3:8042/node/containerlogs/container_1431531743796_0015_01_05/root
container_1431531743796_0015_01_01  Fri May 15 20:44:06 + 2015  
 N/A RUNNING node3:50093
http://node3:8042
http://node3:8042/node/containerlogs/container_1431531743796_0015_01_01/root
container_1431531743796_0015_01_02  Fri May 15 20:44:10 + 2015  
 N/A RUNNING node3:50093
http://node3:8042
http://node3:8042/node/containerlogs/container_1431531743796_0015_01_02/root
container_1431531743796_0015_01_04  Fri May 15 20:44:11 + 2015  
 N/A RUNNING node3:50093
http://node3:8042
http://node3:8042/node/containerlogs/container_1431531743796_0015_01_04/root
container_1431531743796_0015_01_03  Fri May 15 20:44:10 + 2015  
 N/A RUNNING node4:41128
http://node4:8042
http://node4:8042/node/containerlogs/container_1431531743796_0015_01_03/root
{code}

The *fundamental* problem here, I believe, is that the AMRMClient maintains an 
internal request table *remoteRequestsTable* that keeps track of *total* 
container requests (i.e., including container requests that have been 
satisfied, and that are not yet satisfied):

{code:title=AMRMClient.java|borderStyle=solid}
protected final 
  MapPriority, MapString, TreeMapResource, ResourceRequestInfo
remoteRequestsTable =
new TreeMapPriority, MapString, TreeMapResource, 
ResourceRequestInfo();
{code}

However, the corresponding table *requests* at the scheduler side (inside 
AppSchedulingInfo.java) keeps track of *outstanding* container requests (i.e., 
container requests that are not yet satisfied):

{code:title=AppSchedulingInfo.java|borderStyle=solid}
  final MapPriority, MapString, ResourceRequest requests =
new ConcurrentHashMapPriority, MapString, ResourceRequest();
{code}

Every time an allocation is successfully made, the decResourceRequest() or 
decrementOutstanding() call will update the *requests* table so that it only 

[jira] [Updated] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-15 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3626:
--
Attachment: YARN-3626.9.patch

In that case, here's a patch which goes back to the original approach used 
during troubleshooting, which uses the classpath itself to communicate the 
difference (it only touches other code to revert parts of the earlier patch no 
longer needed, the actual change, when done this way, is solely in 
ContainerLaunch.java, and it makes the conditional determination based on the 
classpath differences already present due to the manipulation earlier in the 
chain, in this case, by mapreduce due to user.classpath.first)

 On Windows localized resources are not moved to the front of the classpath 
 when they should be
 --

 Key: YARN-3626
 URL: https://issues.apache.org/jira/browse/YARN-3626
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.7.1

 Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch, 
 YARN-3626.9.patch


 In response to the mapreduce.job.user.classpath.first setting the classpath 
 is ordered differently so that localized resources will appear before system 
 classpath resources when tasks execute.  On Windows this does not work 
 because the localized resources are not linked into their final location when 
 the classpath jar is created.  To compensate for that localized jar resources 
 are added directly to the classpath generated for the jar rather than being 
 discovered from the localized directories.  Unfortunately, they are always 
 appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations

2015-05-15 Thread Ishai Menache (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546361#comment-14546361
 ] 

Ishai Menache commented on YARN-3656:
-

LowCost judiciously “spreads” the demand of the job throughout the allowed 
time-window according to a global, load-based cost function. This leads to more 
balanced allocations, and in turn substantially improves the acceptance rate of 
jobs and the cluster utilization. 

 LowCost: A Cost-Based Placement Agent for YARN Reservations
 ---

 Key: YARN-3656
 URL: https://issues.apache.org/jira/browse/YARN-3656
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Ishai Menache

 YARN-1051 enables SLA support by allowing users to reserve cluster capacity 
 ahead of time. YARN-1710 introduced a greedy agent for placing user 
 reservations. The greedy agent makes fast placement decisions but at the cost 
 of ignoring the cluster committed resources, which might result in blocking 
 the cluster resources for certain periods of time, and in turn rejecting some 
 arriving jobs.
 We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” 
 the demand of the job throughout the allowed time-window according to a 
 global, load-based cost function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes

2015-05-15 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546403#comment-14546403
 ] 

Craig Welch commented on YARN-3632:
---

findbugs and javac appear to be irrelevant...

 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2015-05-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546418#comment-14546418
 ] 

Vinod Kumar Vavilapalli commented on YARN-1902:
---

bq. Wangda Tan mentioned offline that we could at-least deduct the count 
against the over-all number (ANY request) for a given priority.
Further thought tells me this is not desired in some cases as well.

Take the following example.

User originally wants: 1 container on H1, 1 container on H2, and 2 containers 
on R1 (rack). The request table becomes
|H1|1|
|H2|1|
|R1|2|
|*|4|

Now assuming RM returns a container on R2 (rack), auto-decrementing the request 
table will make it
|H1|1|
|H2|1|
|R1|2|
|*|3|

But user may actually want something like the following. This depends on what 
the user preferences are w.r.t scheduling.
|H1|0|
|H2|1|
|R1|2|
|*|3|

 Allocation of too many containers when a second request is done with the same 
 resource capability
 -

 Key: YARN-1902
 URL: https://issues.apache.org/jira/browse/YARN-1902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0, 2.3.0, 2.4.0
Reporter: Sietse T. Au
Assignee: Sietse T. Au
  Labels: client
 Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch


 Regarding AMRMClientImpl
 Scenario 1:
 Given a ContainerRequest x with Resource y, when addContainerRequest is 
 called z times with x, allocate is called and at least one of the z allocated 
 containers is started, then if another addContainerRequest call is done and 
 subsequently an allocate call to the RM, (z+1) containers will be allocated, 
 where 1 container is expected.
 Scenario 2:
 No containers are started between the allocate calls. 
 Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
 are requested in both scenarios, but that only in the second scenario, the 
 correct behavior is observed.
 Looking at the implementation I have found that this (z+1) request is caused 
 by the structure of the remoteRequestsTable. The consequence of MapResource, 
 ResourceRequestInfo is that ResourceRequestInfo does not hold any 
 information about whether a request has been sent to the RM yet or not.
 There are workarounds for this, such as releasing the excess containers 
 received.
 The solution implemented is to initialize a new ResourceRequest in 
 ResourceRequestInfo when a request has been successfully sent to the RM.
 The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3657) Federation maintenance mechanisms (simple CLI and command propagation)

2015-05-15 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-3657:
--

 Summary: Federation maintenance mechanisms (simple CLI and command 
propagation)
 Key: YARN-3657
 URL: https://issues.apache.org/jira/browse/YARN-3657
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino


The maintenance mechanisms provided by the RM are not sufficient in a federated 
environment. In this JIRA we track few extensions 
(more to come later) to allow basic maintenance mechanisms (and command 
propagation) for the federated components.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3658) Federation Capacity Allocation across sub-cluster

2015-05-15 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-3658:
--

 Summary: Federation Capacity Allocation across sub-cluster
 Key: YARN-3658
 URL: https://issues.apache.org/jira/browse/YARN-3658
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino


This JIRA will track mechanisms to map federation level capacity allocations to 
sub-cluster level ones. (Possibly via reservation mechanisms).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3665) Federation subcluster membership mechanisms

2015-05-15 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan reassigned YARN-3665:


Assignee: Subru Krishnan

 Federation subcluster membership mechanisms
 ---

 Key: YARN-3665
 URL: https://issues.apache.org/jira/browse/YARN-3665
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan

 The member YARN RMs continuously heartbeat to the state store to keep alive 
 and publish their current capability/load information. This JIRA tracks this 
 mechanisms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations

2015-05-15 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546467#comment-14546467
 ] 

Carlo Curino commented on YARN-3656:


I worked with Ishai and Jonathan closely on this, and the integration with 
YARN-1051 is done rather carefully.  
After a month running experiments they confirmed consistently better 
performance on all the key metrics and 
reasonable runtimes.
 
I would argue that after a careful code review and some more testing, the 
LowCost agent he propose 
should become our default for use of reservations, as dominates the greedy 
agent we have today.  

 LowCost: A Cost-Based Placement Agent for YARN Reservations
 ---

 Key: YARN-3656
 URL: https://issues.apache.org/jira/browse/YARN-3656
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Ishai Menache
 Attachments: LowCostRayonExternal.pdf


 YARN-1051 enables SLA support by allowing users to reserve cluster capacity 
 ahead of time. YARN-1710 introduced a greedy agent for placing user 
 reservations. The greedy agent makes fast placement decisions but at the cost 
 of ignoring the cluster committed resources, which might result in blocking 
 the cluster resources for certain periods of time, and in turn rejecting some 
 arriving jobs.
 We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” 
 the demand of the job throughout the allowed time-window according to a 
 global, load-based cost function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance

2015-05-15 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545607#comment-14545607
 ] 

Xianyin Xin commented on YARN-3652:
---

Thanks for comments, [~sunilg].
{quote}
1. *Throughput* : Are you mentioning about #events processed over a period of 
time? If so, how can we set the timeline by which throughput is calculated 
(configurable?)?
A clear indicator from this will be like we can predict possible end timeline 
for the pending events in dispatcher queue. Adding throughput with #no of 
pending events may give much more better indication about RM overload.
{quote}
In fact the first comes in my mind is the #containers allocated by scheduler 
per second, because the containers allocation what users care and the node 
update event is the most important scheduler event. The rate of processing 
events is also a nice indicator, just as you comment. 
{quote}
2. However there are many events coming to scheduler, if possible a filter for 
the events based on events type may be helpful to give an accuracy for 
throughout and scheduling delay.
{quote}
+1 for the idea. Besides, the #events processed by scheduler per second is 
large, so the indexes based on this is volatile. We may consider some method to 
smooth the fluctuate, like making sampling or statistics.

 A SchedulerMetrics may be need for evaluating the scheduler's performance
 -

 Key: YARN-3652
 URL: https://issues.apache.org/jira/browse/YARN-3652
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Xianyin Xin

 As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating 
 the scheduler's performance. The performance indexes includes #events waiting 
 for being handled by scheduler, the throughput, the scheduling delay and/or 
 other indicators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545567#comment-14545567
 ] 

Hudson commented on YARN-3505:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2126 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2126/])
YARN-3505. Node's Log Aggregation Report with SUCCEED should not cached in 
RMApps. Contributed by Xuan Gong. (junping_du: rev 
15ccd967ee3e7046a50522089f67ba01f36ec76a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/LogAggregationReportPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/LogAggregationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppLogAggregationStatusBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java


 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
 YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch, 
 YARN-3505.5.patch, YARN-3505.6.patch, YARN-3505.addendum.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1519) check if sysconf is implemented before using it

2015-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545572#comment-14545572
 ] 

Hudson commented on YARN-1519:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2126 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2126/])
YARN-1519. Check in container-executor if sysconf is implemented before using 
it (Radim Kolar and Eric Payne via raviprak) (raviprak: rev 
53fe4eff09fdaeed75a8cad3a26156bf963a8d37)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


 check if sysconf is implemented before using it
 ---

 Key: YARN-1519
 URL: https://issues.apache.org/jira/browse/YARN-1519
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.3.0
Reporter: Radim Kolar
Assignee: Radim Kolar
  Labels: BB2015-05-TBR
 Fix For: 2.8.0

 Attachments: YARN-1519.002.patch, YARN-1519.003.patch, 
 nodemgr-sysconf.txt


 If sysconf value _SC_GETPW_R_SIZE_MAX is not implemented, it leads to 
 segfault because invalid pointer gets passed to libc function.
 fix: enforce minimum value 1024, same method is used in hadoop-common native 
 code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545597#comment-14545597
 ] 

Hudson commented on YARN-3505:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #186 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/186/])
YARN-3505. Node's Log Aggregation Report with SUCCEED should not cached in 
RMApps. Contributed by Xuan Gong. (junping_du: rev 
15ccd967ee3e7046a50522089f67ba01f36ec76a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/LogAggregationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppLogAggregationStatusBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/LogAggregationReportPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
 YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch, 
 YARN-3505.5.patch, YARN-3505.6.patch, YARN-3505.addendum.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1519) check if sysconf is implemented before using it

2015-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545602#comment-14545602
 ] 

Hudson commented on YARN-1519:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #186 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/186/])
YARN-1519. Check in container-executor if sysconf is implemented before using 
it (Radim Kolar and Eric Payne via raviprak) (raviprak: rev 
53fe4eff09fdaeed75a8cad3a26156bf963a8d37)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


 check if sysconf is implemented before using it
 ---

 Key: YARN-1519
 URL: https://issues.apache.org/jira/browse/YARN-1519
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.3.0
Reporter: Radim Kolar
Assignee: Radim Kolar
  Labels: BB2015-05-TBR
 Fix For: 2.8.0

 Attachments: YARN-1519.002.patch, YARN-1519.003.patch, 
 nodemgr-sysconf.txt


 If sysconf value _SC_GETPW_R_SIZE_MAX is not implemented, it leads to 
 segfault because invalid pointer gets passed to libc function.
 fix: enforce minimum value 1024, same method is used in hadoop-common native 
 code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2915) Enable YARN RM scale out via federation using multiple RM's

2015-05-15 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2915:
-
Attachment: Yarn_federation_design_v1.pdf

Uploading a proposal of design based on offline design discussions within our 
team and with [~kasha], [~adhoot], [~vinodkv], [~acmurthy], [~tucu00] and more 
people (apologize if I missed anyone).  We validated the proposed design by 
developing a prototype and we have a basic end2end functioning system where we 
can stitch multiple YARN clusters into a unified federated cluster and run jobs 
that transparently span across all of them.

 Enable YARN RM scale out via federation using multiple RM's
 ---

 Key: YARN-2915
 URL: https://issues.apache.org/jira/browse/YARN-2915
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Sriram Rao
Assignee: Subru Krishnan
 Attachments: Yarn_federation_design_v1.pdf


 This is an umbrella JIRA that proposes to scale out YARN to support large 
 clusters comprising of tens of thousands of nodes.   That is, rather than 
 limiting a YARN managed cluster to about 4k in size, the proposal is to 
 enable the YARN managed cluster to be elastically scalable.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546508#comment-14546508
 ] 

Hadoop QA commented on YARN-3655:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 39s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 31s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 53s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 19s | The patch appears to introduce 1 
new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  60m 16s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m 47s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
|  |  Inconsistent synchronization of 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS;
 locked 66% of time  Unsynchronized access at FileSystemRMStateStore.java:66% 
of time  Unsynchronized access at FileSystemRMStateStore.java:[line 156] |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733282/YARN-3655.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8f37873 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/7956/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7956/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7956/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7956/console |


This message was automatically generated.

 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation 
 -

 Key: YARN-3655
 URL: https://issues.apache.org/jira/browse/YARN-3655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3655.000.patch


 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation.
 If a node is reserved by an application, all the other applications don't 
 have any chance to assign a new container on this node, unless the 
 application which reserves the node assigns a new container on this node or 
 releases the reserved container on this node.
 The problem is if an application tries to call assignReservedContainer and 
 fail to get a new container due to maxAMShare limitation, it will block all 
 other applications to use the nodes it reserves. If all other running 
 applications can't release their AM containers due to being blocked by these 
 reserved containers. A livelock situation can happen.
 The following is the code at FSAppAttempt#assignContainer which can cause 
 this potential livelock.
 {code}
 // Check the AM resource usage for the leaf queue
 if (!isAmRunning()  !getUnmanagedAM()) {
   ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests();
   if (ask.isEmpty() || !getQueue().canRunAppAM(
   ask.get(0).getCapability())) {
 if (LOG.isDebugEnabled()) {
   LOG.debug(Skipping allocation because maxAMShare limit would  +
   be exceeded);
 }
 return Resources.none();
   }
 }
 {code}
 To fix 

[jira] [Updated] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-15 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3626:
--
Attachment: YARN-3626.11.patch

Now using the environment to pass the configuration.

 On Windows localized resources are not moved to the front of the classpath 
 when they should be
 --

 Key: YARN-3626
 URL: https://issues.apache.org/jira/browse/YARN-3626
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.7.1

 Attachments: YARN-3626.0.patch, YARN-3626.11.patch, 
 YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch


 In response to the mapreduce.job.user.classpath.first setting the classpath 
 is ordered differently so that localized resources will appear before system 
 classpath resources when tasks execute.  On Windows this does not work 
 because the localized resources are not linked into their final location when 
 the classpath jar is created.  To compensate for that localized jar resources 
 are added directly to the classpath generated for the jar rather than being 
 discovered from the localized directories.  Unfortunately, they are always 
 appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String

2015-05-15 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3565:

Attachment: YARN-3565.20150516-1.patch

Hi [~wangda],
Uploading a patch with fixing applicable [~vinodkv] comments and ones which are 
not addressed are :
* ??Not directly related to this patch, but in LabelsToNodeIdsProto?? : As 
discussed offline will be handled in YARN-3583
* ??ResourceTrackerService shouldn't have convertToStringSet(). 
RMNodeLabelsManager.replaceLabelsOnNode() etc.. should be modified to use the 
NodeLabel object??: As per the above [~wangda] comment not required now.
* ??NodeReportProto also have a string for node_labels instead of an object.?? 
As discussed offline will be handled in YARN-3583

 NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object 
 instead of String
 -

 Key: YARN-3565
 URL: https://issues.apache.org/jira/browse/YARN-3565
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
Priority: Blocker
 Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, 
 YARN-3565.20150516-1.patch


 Now NM HB/Register uses SetString, it will be hard to add new fields if we 
 want to support specifying NodeLabel type such as exclusivity/constraints, 
 etc. We need to make sure rolling upgrade works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >