[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278804#comment-14278804
 ] 

Hudson commented on YARN-1492:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2025 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2025/])
YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via 
kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278805#comment-14278805
 ] 

Hudson commented on YARN-2807:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2025 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2025/])
YARN-2807. Option --forceactive not works as described in usage of (xgong: 
rev d15cbae73c7ae22d5d60d8cba16cba565e8e8b20)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm


 Option --forceactive not works as described in usage of yarn rmadmin 
 -transitionToActive
 

 Key: YARN-2807
 URL: https://issues.apache.org/jira/browse/YARN-2807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation, resourcemanager
Reporter: Wangda Tan
Assignee: Masatake Iwasaki
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, 
 YARN-2807.4.patch


 Currently the help message of yarn rmadmin -transitionToActive is:
 {code}
 transitionToActive: incorrect number of arguments
 Usage: HAAdmin [-transitionToActive serviceId [--forceactive]]
 {code}
 But the --forceactive not works as expected. When transition RM state with 
 --forceactive:
 {code}
 yarn rmadmin -transitionToActive rm2 --forceactive
 Automatic failover is enabled for 
 org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e
 Refusing to manually manage HA state, since it may cause
 a split-brain scenario or other incorrect state.
 If you are very sure you know what you are doing, please
 specify the forcemanual flag.
 {code}
 As shown above, we still cannot transitionToActive when automatic failover is 
 enabled with --forceactive.
 The option can work is: {{--forcemanual}}, there's no place in usage 
 describes this option. I think we should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278811#comment-14278811
 ] 

Hudson commented on YARN-3005:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2025 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2025/])
YARN-3005. [JDK7] Use switch statement for String instead of if-else statement 
in RegistrySecurity.java (Contributed by Kengo Seki) (aajisaka: rev 
533e551eb42af188535aeb0ab35f8ebf150a0da1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/java/org/apache/hadoop/registry/client/impl/zk/RegistrySecurity.java
* hadoop-yarn-project/CHANGES.txt


 [JDK7] Use switch statement for String instead of if-else statement in 
 RegistrySecurity.java
 

 Key: YARN-3005
 URL: https://issues.apache.org/jira/browse/YARN-3005
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Priority: Trivial
  Labels: newbie
 Fix For: 2.7.0

 Attachments: YARN-3005.001.patch, YARN-3005.002.patch


 Since we have moved to JDK7, we can refactor the below if-else statement for 
 String.
 {code}
 // TODO JDK7 SWITCH
 if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) {
   access = AccessPolicy.sasl;
 } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) {
   access = AccessPolicy.digest;
 } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) {
   access = AccessPolicy.anon;
 } else {
   throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM
   + \ + auth + \);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278807#comment-14278807
 ] 

Hudson commented on YARN-2217:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2025 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2025/])
YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via 
kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java


 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, 
 YARN-2217-trunk-v9.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk

2015-01-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3064:
--
Attachment: YARN-3064.2.patch

thanks Junping !
updated the patch

 TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with 
 allocation timeout in trunk
 ---

 Key: YARN-3064
 URL: https://issues.apache.org/jira/browse/YARN-3064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3064.1.patch, YARN-3064.2.patch


 Noticed consistent tests failure, see:
 https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/
 Logs like:
 {code}
 Error Message
 Attempt state is not correct (timedout) expected:ALLOCATED but 
 was:SCHEDULED
 Stacktrace
 java.lang.AssertionError: Attempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794)
 {code}
 I can reproduce it in local environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3031) create backing storage write interface for ATS writers

2015-01-15 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279045#comment-14279045
 ] 

Vrushali C commented on YARN-3031:
--

Hi Varun,
I'd like to take ownership of this JIRA, and hope you're OK with that. Do let 
me know.

thanks
Vrushali


 create backing storage write interface for ATS writers
 --

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena

 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-15 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279056#comment-14279056
 ] 

Mayank Bansal commented on YARN-2933:
-

Thanks [~jianhe] and [~wangda] for the review

bq. looks good overall, we should use priority.AMCONTAINER here ?

It was Confusing by name , I changed the names and updated accordingly.

bq. it's better to use enum type instead of int in mockContainer, which can 
avoid call getValue() from enum.
Priority is been override in multiple tests differently so didn't want to 
change the signature of the functions, Moreover its same.

Uploading the updated patch

Thanks,
Mayank

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, 
 YARN-2933-8.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk

2015-01-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279062#comment-14279062
 ] 

Hadoop QA commented on YARN-3064:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692450/YARN-3064.1.patch
  against trunk revision ce29074.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6341//console

This message is automatically generated.

 TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with 
 allocation timeout in trunk
 ---

 Key: YARN-3064
 URL: https://issues.apache.org/jira/browse/YARN-3064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3064.1.patch


 Noticed consistent tests failure, see:
 https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/
 Logs like:
 {code}
 Error Message
 Attempt state is not correct (timedout) expected:ALLOCATED but 
 was:SCHEDULED
 Stacktrace
 java.lang.AssertionError: Attempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794)
 {code}
 I can reproduce it in local environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279116#comment-14279116
 ] 

Varun Saxena commented on YARN-2928:


As branch has been created, we can now decide the order of tasks and assignees 
as [~vinodkv] suggested.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3037) create HBase cluster backing storage implementation for ATS writes

2015-01-15 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279046#comment-14279046
 ] 

Vrushali C commented on YARN-3037:
--

Hi Zhijie
I'd like to take ownership of this JIRA, and hope you're OK with that. Do let 
me know.

thanks
Vrushali

 create HBase cluster backing storage implementation for ATS writes
 --

 Key: YARN-3037
 URL: https://issues.apache.org/jira/browse/YARN-3037
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen

 Per design in YARN-2928, create a backing storage implementation for ATS 
 writes based on a full HBase cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param

2015-01-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3062.
---
Resolution: Invalid

Thanks for your confirmation, [~pramachandran]! Close the Jira.

 timelineserver gives inconsistent data for otherinfo field based on the 
 filter param
 

 Key: YARN-3062
 URL: https://issues.apache.org/jira/browse/YARN-3062
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Prakash Ramachandran
 Attachments: withfilter.json, withoutfilter.json


 When otherinfo field gets updated, in some cases the data returned for an 
 entity is dependent on the filter usage. 
 for ex in the attached files for the 
 - entity: vertex_1421164610335_0020_1_01,
 - entitytype: TEZ_VERTEX_ID,
 for the otherinfo.numTasks,  got updated from 1009 to 253
 - using 
 {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
  {code} gives the updated value: 253
 - using 
 {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code}
  gives the old value: 1009
  
 for the otherinfo.status field, which gets updated,   both of them show the 
 updated value. 
 TEZ-1942 has more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-15 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-9.patch

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, 
 YARN-2933-8.patch, YARN-2933-9.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-15 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279109#comment-14279109
 ] 

Anubhav Dhoot commented on YARN-3021:
-

I don't see any security holes. This token is only for the application's own 
use. The validation and renewal that you are turning off via the new parameter 
should not impact security of YARN or other applications.

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278767#comment-14278767
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #75 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/75/])
YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via 
kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278774#comment-14278774
 ] 

Hudson commented on YARN-3005:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #75 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/75/])
YARN-3005. [JDK7] Use switch statement for String instead of if-else statement 
in RegistrySecurity.java (Contributed by Kengo Seki) (aajisaka: rev 
533e551eb42af188535aeb0ab35f8ebf150a0da1)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/java/org/apache/hadoop/registry/client/impl/zk/RegistrySecurity.java


 [JDK7] Use switch statement for String instead of if-else statement in 
 RegistrySecurity.java
 

 Key: YARN-3005
 URL: https://issues.apache.org/jira/browse/YARN-3005
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Priority: Trivial
  Labels: newbie
 Fix For: 2.7.0

 Attachments: YARN-3005.001.patch, YARN-3005.002.patch


 Since we have moved to JDK7, we can refactor the below if-else statement for 
 String.
 {code}
 // TODO JDK7 SWITCH
 if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) {
   access = AccessPolicy.sasl;
 } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) {
   access = AccessPolicy.digest;
 } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) {
   access = AccessPolicy.anon;
 } else {
   throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM
   + \ + auth + \);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number

2015-01-15 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278914#comment-14278914
 ] 

Naganarasimha G R commented on YARN-3009:
-

Hi [~cwensel]
Hope the approach proposed as part of the patch is fine with you?

[~zjshen], 
Any comments on the earlier work around patch ? 

 TimelineWebServices always parses primary and secondary filters as numbers if 
 first char is a number
 

 Key: YARN-3009
 URL: https://issues.apache.org/jira/browse/YARN-3009
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Chris K Wensel
Assignee: Naganarasimha G R
 Attachments: YARN-3009.20150108-1.patch, YARN-3009.20150111-1.patch


 If you pass a filter value that starts with a number (7CCA...), the filter 
 value will be parsed into the Number '7' causing the filter to fail the 
 search.
 Should be noted the actual value as stored via a PUT operation is properly 
 parsed and stored as a String.
 This manifests as a very hard to identify issue with DAGClient in Apache Tez 
 and naming dags/vertices with alphanumeric guid values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param

2015-01-15 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278878#comment-14278878
 ] 

Prakash Ramachandran commented on YARN-3062:


[~zjshen] looks like that was the issue. can close this jira now. Thanks

 timelineserver gives inconsistent data for otherinfo field based on the 
 filter param
 

 Key: YARN-3062
 URL: https://issues.apache.org/jira/browse/YARN-3062
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Prakash Ramachandran
 Attachments: withfilter.json, withoutfilter.json


 When otherinfo field gets updated, in some cases the data returned for an 
 entity is dependent on the filter usage. 
 for ex in the attached files for the 
 - entity: vertex_1421164610335_0020_1_01,
 - entitytype: TEZ_VERTEX_ID,
 for the otherinfo.numTasks,  got updated from 1009 to 253
 - using 
 {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
  {code} gives the updated value: 253
 - using 
 {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code}
  gives the old value: 1009
  
 for the otherinfo.status field, which gets updated,   both of them show the 
 updated value. 
 TEZ-1942 has more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278768#comment-14278768
 ] 

Hudson commented on YARN-2807:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #75 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/75/])
YARN-2807. Option --forceactive not works as described in usage of (xgong: 
rev d15cbae73c7ae22d5d60d8cba16cba565e8e8b20)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm


 Option --forceactive not works as described in usage of yarn rmadmin 
 -transitionToActive
 

 Key: YARN-2807
 URL: https://issues.apache.org/jira/browse/YARN-2807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation, resourcemanager
Reporter: Wangda Tan
Assignee: Masatake Iwasaki
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, 
 YARN-2807.4.patch


 Currently the help message of yarn rmadmin -transitionToActive is:
 {code}
 transitionToActive: incorrect number of arguments
 Usage: HAAdmin [-transitionToActive serviceId [--forceactive]]
 {code}
 But the --forceactive not works as expected. When transition RM state with 
 --forceactive:
 {code}
 yarn rmadmin -transitionToActive rm2 --forceactive
 Automatic failover is enabled for 
 org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e
 Refusing to manually manage HA state, since it may cause
 a split-brain scenario or other incorrect state.
 If you are very sure you know what you are doing, please
 specify the forcemanual flag.
 {code}
 As shown above, we still cannot transitionToActive when automatic failover is 
 enabled with --forceactive.
 The option can work is: {{--forcemanual}}, there's no place in usage 
 describes this option. I think we should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278770#comment-14278770
 ] 

Hudson commented on YARN-2217:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #75 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/75/])
YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via 
kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java


 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, 
 YARN-2217-trunk-v9.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3031) create backing storage write interface for ATS writers

2015-01-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279130#comment-14279130
 ] 

Varun Saxena commented on YARN-3031:


Sure, go ahead.

 create backing storage write interface for ATS writers
 --

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena

 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3015) yarn classpath command should support same options as hadoop classpath.

2015-01-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3015:
---
Attachment: YARN-3015.004.patch

 yarn classpath command should support same options as hadoop classpath.
 ---

 Key: YARN-3015
 URL: https://issues.apache.org/jira/browse/YARN-3015
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Reporter: Chris Nauroth
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3015.001.patch, YARN-3015.002.patch, 
 YARN-3015.003.patch, YARN-3015.004.patch


 HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional 
 expansion of the wildcards and bundling the classpath into a jar file 
 containing a manifest with the Class-Path attribute. The other classpath 
 commands should do the same for consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2984) Metrics for container's actual memory usage

2015-01-15 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279182#comment-14279182
 ] 

Robert Kanter commented on YARN-2984:
-

+1

 Metrics for container's actual memory usage
 ---

 Key: YARN-2984
 URL: https://issues.apache.org/jira/browse/YARN-2984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2984-1.patch, yarn-2984-2.patch, yarn-2984-3.patch, 
 yarn-2984-prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track memory usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2984) Metrics for container's actual memory usage

2015-01-15 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279205#comment-14279205
 ] 

Robert Kanter commented on YARN-2984:
-

One minor thing: the patch has this:
{code:java}new HashMap();{code}
which is a Java 7 feature.  Did we officially decide to drop Java 6 in the 
Hadoop 2.7.0 release?

 Metrics for container's actual memory usage
 ---

 Key: YARN-2984
 URL: https://issues.apache.org/jira/browse/YARN-2984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2984-1.patch, yarn-2984-2.patch, yarn-2984-3.patch, 
 yarn-2984-prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track memory usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2861) Timeline DT secret manager should not reuse the RM's configs.

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279135#comment-14279135
 ] 

Hudson commented on YARN-2861:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6869 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6869/])
YARN-2861. Fixed Timeline DT secret manager to not reuse RM's configs. 
Contributed by Zhijie Shen (jianhe: rev 
9e33116d1d8944a393937337b3963e192b9c74d1)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMSecretManagerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineDelegationTokenSecretManagerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


 Timeline DT secret manager should not reuse the RM's configs.
 -

 Key: YARN-2861
 URL: https://issues.apache.org/jira/browse/YARN-2861
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.7.0

 Attachments: YARN-2861.1.patch, YARN-2861.2.patch


 This is the configs for RM DT secret manager. We should create separate ones 
 for timeline DT only.
 {code}
   @Override
   protected void serviceInit(Configuration conf) throws Exception {
 long secretKeyInterval =
 conf.getLong(YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_KEY,
 YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_DEFAULT);
 long tokenMaxLifetime =
 conf.getLong(YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_KEY,
 YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT);
 long tokenRenewInterval =
 conf.getLong(YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_KEY,
 YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT);
 secretManager = new 
 TimelineDelegationTokenSecretManager(secretKeyInterval,
 tokenMaxLifetime, tokenRenewInterval,
 360);
 secretManager.startThreads();
 serviceAddr = TimelineUtils.getTimelineTokenServiceAddress(getConfig());
 super.init(conf);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2932) Add entry for preemptable status to scheduler web UI and queue initialize/refresh logging

2015-01-15 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2932:
-
Attachment: YARN-2932.v4.txt

Thank you [~leftnoteasy] for your review and comments.

{quote}
Re 2:
 You're partially correct, queue finally calls setupQueueConfig when 
reinitialize is invoked. 
 The CapacityScheduler reinitialization is creating a new set of queues, and 
copy new parameters to your old queues via
{code}
setupQueueConfigs(
clusterResource,
newlyParsedLeafQueue.capacity, 
...
{code}
So you need put the parameter you wants to update to setupQueueConfig as well. 
Without that, queue will not be refreshed. I didn't find any changes to 
parameter of setupQueueConfig, so I guess so, it's better to add a test to 
verify it.
{quote}
I made the changes to the API for {{AbstractCSQueue#setupQueueConfigs}} to take 
the additional preemptable parameter. When it is called from 
{{[Leaf|Parent]Queue#setupQueueConfigs}}, it calls 
{{AbstractCSQueue#isQueuePathHierarchyPreemptable}} to get the preemptability 
of the queue.

I tested the fixes in both version 3 and version 4 of this patch on a one-node 
cluster and on a 10-node cluster. In both version, I was able to change the 
{{disable_preemption}} properties, refresh the queues using {{yarn rmadmin 
-refreshQueues}}, and I was able to see the updates on the Scheduler UI page. 
However, I think I see that if the new list of queues is different than the old 
list of queues, it would not pick up the parameters for the new queues without 
this change.

{quote}
Re 3:
You can take a look at how AbstractCSQueue initialize labels,

I think they have similar logic – For node label is trying to get value from 
configuration, if not set, inherit from parent. With this, you can make 
getPreemptable interface without defaultVal in CapacitySchedulerConfiguration.
{quote}
I did change {{CapacitySchedulerConfiguration#getQueuePreemptable}} to not take 
a default value, but in order to pass back the {{null}} information, it has to 
return a {{String}} and then the caller has to convert the {{String}} to a 
Boolean, which I think is a little awkward.

{quote}
Since YARN-2056 is also planned in 2.7 (I thought it's already included in 
2.6), do you think is it better to make configuration option name to 
queue-patch.preemptable for consistency?
{quote}
Well, that would be ideal, I think, but it isn't that simple on our side. We 
have already started using the code in YARN-2056 and are using the 
{{disable_preemption}} property.

An argument could be made that {{disable_preemption}} is better because it 
indicates that it is turning off the {{...monitor.capacity.preemption...}} 
property. If {{disable_preemption}} were changed to {{preemptable}}, someone 
may look at that property and think that the queue should have that property 
without considering the overall, system property 
{{...monitor.capacity.preemption...}}.

How important is it to you that {{disable_preemption}} property be changed to 
{{preemptable}}?

 Add entry for preemptable status to scheduler web UI and queue 
 initialize/refresh logging
 ---

 Key: YARN-2932
 URL: https://issues.apache.org/jira/browse/YARN-2932
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt, 
 YARN-2932.v4.txt


 YARN-2056 enables the ability to turn preemption on or off on a per-queue 
 level. This JIRA will provide the preemption status for each queue in the 
 {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue 
 refresh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2932) Add entry for preemptable status to scheduler web UI and queue initialize/refresh logging

2015-01-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279193#comment-14279193
 ] 

Hadoop QA commented on YARN-2932:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692570/YARN-2932.v4.txt
  against trunk revision 9e33116.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6344//console

This message is automatically generated.

 Add entry for preemptable status to scheduler web UI and queue 
 initialize/refresh logging
 ---

 Key: YARN-2932
 URL: https://issues.apache.org/jira/browse/YARN-2932
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt, 
 YARN-2932.v4.txt


 YARN-2056 enables the ability to turn preemption on or off on a per-queue 
 level. This JIRA will provide the preemption status for each queue in the 
 {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue 
 refresh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3031) create backing storage write interface for ATS writers

2015-01-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279129#comment-14279129
 ] 

Varun Saxena commented on YARN-3031:


Sure, go ahead.

 create backing storage write interface for ATS writers
 --

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena

 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed

2015-01-15 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279202#comment-14279202
 ] 

Chris Nauroth commented on YARN-3066:
-

I'm not familiar with {{ssid}} on FreeBSD.  Does it have the same usage as 
Linux {{setsid}}?  If so, then perhaps an appropriate workaround is to copy 
that binary to {{setsid}} and make sure it's available on the {{PATH}}.  This 
might not require any YARN code changes.

bq. I propose to make Shell.isSetsidAvailable test more strict and fail to 
start if it is not found.

This would likely have to be considered backwards-incompatible, because 
applications would fail to start on existing systems that don't have 
{{setsid}}.  I suppose the new behavior could be hidden behind an opt-in 
configuration property.  Also, we need to keep in mind that 
{{Shell.isSetsidAvailable}} is always {{false}} on Windows.  (On Windows, we 
handle the issue of orphaned processes by using Windows API job objects instead 
of {{setsid}}.)

 Hadoop leaves orphaned tasks running after job is killed
 

 Key: YARN-3066
 URL: https://issues.apache.org/jira/browse/YARN-3066
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1
Reporter: Dmitry Sivachenko

 When spawning user task, node manager checks for setsid(1) utility and spawns 
 task program via it. See 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
  for instance:
 String exec = Shell.isSetsidAvailable? exec setsid : exec;
 FreeBSD, unlike Linux, does not have setsid(1) utility.  So plain exec is 
 used to spawn user task.  If that task spawns other external programs (this 
 is common case if a task program is a shell script) and user kills job via 
 mapred job -kill Job, these child processes remain running.
 1) Why do you silently ignore the absence of setsid(1) and spawn task process 
 via exec: this is the guarantee to have orphaned processes when job is 
 prematurely killed.
 2) FreeBSD has a replacement third-party program called ssid (which does 
 almost the same as Linux's setsid).  It would be nice to detect which binary 
 is present during configure stage and put @SETSID@ macros into java file to 
 use the correct name.
 I propose to make Shell.isSetsidAvailable test more strict and fail to start 
 if it is not found:  at least we will know about the problem at start rather 
 than guess why there are orphaned tasks running forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk

2015-01-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279243#comment-14279243
 ] 

Hadoop QA commented on YARN-3064:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692561/YARN-3064.2.patch
  against trunk revision ce29074.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6343//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6343//console

This message is automatically generated.

 TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with 
 allocation timeout in trunk
 ---

 Key: YARN-3064
 URL: https://issues.apache.org/jira/browse/YARN-3064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3064.1.patch, YARN-3064.2.patch


 Noticed consistent tests failure, see:
 https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/
 Logs like:
 {code}
 Error Message
 Attempt state is not correct (timedout) expected:ALLOCATED but 
 was:SCHEDULED
 Stacktrace
 java.lang.AssertionError: Attempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794)
 {code}
 I can reproduce it in local environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk

2015-01-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279587#comment-14279587
 ] 

Hadoop QA commented on YARN-3064:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692561/YARN-3064.2.patch
  against trunk revision 780a6bf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6349//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6349//console

This message is automatically generated.

 TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with 
 allocation timeout in trunk
 ---

 Key: YARN-3064
 URL: https://issues.apache.org/jira/browse/YARN-3064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3064.1.patch, YARN-3064.2.patch


 Noticed consistent tests failure, see:
 https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/
 Logs like:
 {code}
 Error Message
 Attempt state is not correct (timedout) expected:ALLOCATED but 
 was:SCHEDULED
 Stacktrace
 java.lang.AssertionError: Attempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794)
 {code}
 I can reproduce it in local environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279383#comment-14279383
 ] 

Sangjin Lee commented on YARN-2928:
---

And sorry for going back on the name. :)

I realized that the term aggregating is now quite overloaded. When we said 
aggregation, we tried to stick with the definition of adding up metrics to the 
next parent. In that sense, I'm not sure the timeline aggregator would be the 
best name, as it would not do that type of aggregation. Aggregation up to the 
app level would be done by the AM, and the flow run level aggregation is done 
by the backing storage.

How about Timeline writer?

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279377#comment-14279377
 ] 

Sangjin Lee commented on YARN-2928:
---

bq. We will have to carve out some capacity for the per-node companions. I see 
some sort of static allocation like 1GB similar to NodeManager.

The required memory for the per-node aggregator might be larger than 
anticipated. One reference point may be the memory footprint of a MR AM. The 
bulk of the job-related pinned-down memory would be needed on the aggregator. 
And that can be easily in the several hundreds of MB. Also, for buffering 
multiple of such data for writes would require more room. On top of that, one 
would need to multiply by the number of apps it needs to support (x2 or x3 at 
most).

All in all, my gut feeling is that 1 GB might be rather tight. I think we'll 
know more as we start testing it with realistic size apps and backing storage.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279401#comment-14279401
 ] 

Sangjin Lee commented on YARN-2928:
---

Hi [~varun_saxena], give us a just little more time. I think it might make more 
sense for some of us who got involved in the design process earlier to start 
working out the initial key pieces. As those pieces fall into place, we'll be 
in a much better shape to go more parallel.

If you haven't had a chance to do so yet, could you also go over the attached 
design doc and let us know if you have any questions/feedback/suggestions? That 
would certainly be useful in getting up to speed.

Thanks again for your interest! Much appreciated.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2932) Add entry for preemptable status to scheduler web UI and queue initialize/refresh logging

2015-01-15 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2932:
-
Attachment: YARN-2932.v5.txt

Uploading patch v5. Should apply, now, to both branch-2 and trunk.

 Add entry for preemptable status to scheduler web UI and queue 
 initialize/refresh logging
 ---

 Key: YARN-2932
 URL: https://issues.apache.org/jira/browse/YARN-2932
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt, 
 YARN-2932.v4.txt, YARN-2932.v5.txt


 YARN-2056 enables the ability to turn preemption on or off on a per-queue 
 level. This JIRA will provide the preemption status for each queue in the 
 {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue 
 refresh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle

2015-01-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279510#comment-14279510
 ] 

Sangjin Lee commented on YARN-3030:
---

Jotting down my initial thoughts (I wish one could create subtasks of subtasks):

We need to satisfy 3 use cases with this: *the current per-node aggregator, 
RM's aggregator, and the (future) per-app aggregator*.

The baseline idea is to create a *logical* per-app aggregator as a service 
(CompositeService). The per-node aggregator can then be thought of as a thin 
container/manager/router of per-app aggregators which will come and go with 
applications. This will give us ease of development and maximal isolation 
between apps. Furthermore, it would help us supporting the per-app aggregator 
more easily.

However, since we want to serve RM's aggregator as well, it makes sense to 
create a (abstract?) base aggregator service that is common between RM (not 
app-specific) and per-app aggregators. The RM one could be a very thin 
extension of the base. The per-app aggregator would add app-based logic (mostly 
lifecycle management).

These are the pieces for this JIRA:
- set up this class hierarchy
- work out the timeline client API (both sync and async)
- implement the lifecycle of the base aggregator service
- implement the timeline client RPC server end (can be no-op for now)
- work out some batching-related logic

We would still need to work out the backing storage interface and serving 
reads, etc. but they are captured in other tickets.

Thoughts? Feedback?

 set up ATS writer with basic request serving structure and lifecycle
 

 Key: YARN-3030
 URL: https://issues.apache.org/jira/browse/YARN-3030
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 Per design in YARN-2928, create an ATS writer as a service, and implement the 
 basic service structure including the lifecycle management.
 Also, as part of this JIRA, we should come up with the ATS client API for 
 sending requests to this ATS writer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279506#comment-14279506
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~adhoot], Thanks for clarifying. That sounds good.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2984) Metrics for container's actual memory usage

2015-01-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279822#comment-14279822
 ] 

Karthik Kambatla commented on YARN-2984:


Yes. 2.6.x is the last Java 6 release. 2.7.x is all Java 7, dropping Java 6. 

 Metrics for container's actual memory usage
 ---

 Key: YARN-2984
 URL: https://issues.apache.org/jira/browse/YARN-2984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2984-1.patch, yarn-2984-2.patch, yarn-2984-3.patch, 
 yarn-2984-prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track memory usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3003) Provide API for client to retrieve label to node mapping

2015-01-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279890#comment-14279890
 ] 

Varun Saxena commented on YARN-3003:


[~leftnoteasy], the approach sounds good. 

 Provide API for client to retrieve label to node mapping
 

 Key: YARN-3003
 URL: https://issues.apache.org/jira/browse/YARN-3003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Ted Yu
Assignee: Varun Saxena

 Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set 
 of labels associated with the node.
 Client (such as Slider) may be interested in label to node mapping - given 
 label, return the nodes with this label.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279916#comment-14279916
 ] 

Zhijie Shen commented on YARN-2928:
---

bq. I suppose the reason the client-side API resides in yarn-api and 
yarn-common rather than yarn-client is to accommodate RM's use of ATS?

Right, this is because we want to prevent cyclic dependency issue (RM - ATS - 
server-tests - RM). Another issue is TimelineDelegationToken#renewer inside 
common module is using timeline client too. YARN-2506 is investigating the 
solution to correct the packaging.

bq.  but we need to make a decision on where we will put the client and common 
pieces.

IMHO, common code goes to hadoop-yarn-common (or hadoop-yarn-api if it's API 
related). If we can prevent cyclic dependency, the client code is best to be in 
hadoop-yarn-client.

bq. My suggestion would be to use

The package naming looks good. However, there are already some conventions. For 
example, all client libs are under {{org.apache.hadoop.yarn.client.api}}. It 
may be better to keep to it. As to the common code, I saw the style in 
hadoop-yarn-common is {{org.apache.hadoop.yarn.\[feature name\]}}. Finally, the 
server code doesn't have server in the package name.  It may be organized 
like {{org.apache.hadoop.yarn.timelineservice.aggregator.*}}


 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk

2015-01-15 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279933#comment-14279933
 ] 

Junping Du commented on YARN-3064:
--

v2 patch LGTM. +1. Will commit it shortly.

 TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with 
 allocation timeout in trunk
 ---

 Key: YARN-3064
 URL: https://issues.apache.org/jira/browse/YARN-3064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3064.1.patch, YARN-3064.2.patch


 Noticed consistent tests failure, see:
 https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/
 Logs like:
 {code}
 Error Message
 Attempt state is not correct (timedout) expected:ALLOCATED but 
 was:SCHEDULED
 Stacktrace
 java.lang.AssertionError: Attempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794)
 {code}
 I can reproduce it in local environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279210#comment-14279210
 ] 

Hadoop QA commented on YARN-2933:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692557/YARN-2933-9.patch
  against trunk revision ce29074.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6342//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6342//console

This message is automatically generated.

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, 
 YARN-2933-8.patch, YARN-2933-9.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279655#comment-14279655
 ] 

Sangjin Lee commented on YARN-2928:
---

One observation on the code organization. The existing ATS code is actually 
spread out in several places:
- entities, etc. API: {{org.apache.hadoop.yarn.api.records.timeline.\*}} at 
hadoop-yarn-api
- TimelineClient API: {{org.apache.hadoop.yarn.client.api.\*}} at 
hadoop-yarn-common
- server: {{org.apache.hadoop.yarn.server.timeline.\*}} at 
hadoop-yarn-server-applicationhistoryservice

I suppose the reason the client-side API resides in yarn-api and yarn-common 
rather than yarn-client is to accommodate RM's use of ATS?

How should we organize new code? We settled the question on the server piece 
(hadoop-yarn-server-timelineservice), but we need to make a decision on where 
we will put the client and common pieces.

Also, we may want to organize the package names to be coherent. My suggestion 
would be to use
{noformat}
org.apache.hadoop.yarn.[common|client|server].timelineservice.detailed_subfeature
{noformat}

For example, the timeline aggregator would go to 
{{org.apache.hadoop.yarn.server.timelineservice.aggregator.\*}}. The timeline 
client API would go to {{org.apache.hadoop.yarn.client.timelineservice.api.\*}}.

What is the best practice in terms of package naming?

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle

2015-01-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279702#comment-14279702
 ] 

Sangjin Lee commented on YARN-3030:
---

You're right. We need the object model to be able to hash out the client API 
fully. Here I was suggesting putting a skeleton in (mostly empty classes for 
the object model). We'll have to come back to it as we work on the object model.

I'm OK with going with REST for now. We'll need to get quick consensus on the 
code/package organization however (see 
https://issues.apache.org/jira/browse/YARN-2928?focusedCommentId=14279655page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14279655).

 set up ATS writer with basic request serving structure and lifecycle
 

 Key: YARN-3030
 URL: https://issues.apache.org/jira/browse/YARN-3030
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 Per design in YARN-2928, create an ATS writer as a service, and implement the 
 basic service structure including the lifecycle management.
 Also, as part of this JIRA, we should come up with the ATS client API for 
 sending requests to this ATS writer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-1418) Add Tracing to YARN

2015-01-15 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu reassigned YARN-1418:


Assignee: Yi Liu

 Add Tracing to YARN
 ---

 Key: YARN-1418
 URL: https://issues.apache.org/jira/browse/YARN-1418
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api, nodemanager, resourcemanager
Reporter: Masatake Iwasaki
Assignee: Yi Liu

 Adding tracing using HTrace in the same way as HBASE-6449 and HDFS-5274.
 The most part of changes needed for basis such as RPC seems to be almost 
 ready in HDFS-5274.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2467) Add SpanReceiverHost to YARN daemons

2015-01-15 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu reassigned YARN-2467:


Assignee: Yi Liu

 Add SpanReceiverHost to YARN daemons 
 -

 Key: YARN-2467
 URL: https://issues.apache.org/jira/browse/YARN-2467
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, nodemanager, resourcemanager
Reporter: Masatake Iwasaki
Assignee: Yi Liu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2664) Improve RM webapp to expose info about reservations.

2015-01-15 Thread Matteo Mazzucchelli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Mazzucchelli updated YARN-2664:
--
Attachment: YARN-2664.8.patch

The submitted patch adds two new features:
- a switch to change resource shown in the graph (memory, cpu)
- a query parameter to get data to a specific queue
(http://url/cluster/planner/queue_name)


 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, 
 YARN-2664.6.patch, YARN-2664.7.patch, YARN-2664.8.patch, YARN-2664.patch, 
 legal.patch, screenshot_reservation_UI.pdf


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278538#comment-14278538
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #74 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/74/])
YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via 
kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278541#comment-14278541
 ] 

Hudson commented on YARN-2217:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #74 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/74/])
YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via 
kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java


 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, 
 YARN-2217-trunk-v9.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278539#comment-14278539
 ] 

Hudson commented on YARN-2807:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #74 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/74/])
YARN-2807. Option --forceactive not works as described in usage of (xgong: 
rev d15cbae73c7ae22d5d60d8cba16cba565e8e8b20)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm
* hadoop-yarn-project/CHANGES.txt


 Option --forceactive not works as described in usage of yarn rmadmin 
 -transitionToActive
 

 Key: YARN-2807
 URL: https://issues.apache.org/jira/browse/YARN-2807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation, resourcemanager
Reporter: Wangda Tan
Assignee: Masatake Iwasaki
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, 
 YARN-2807.4.patch


 Currently the help message of yarn rmadmin -transitionToActive is:
 {code}
 transitionToActive: incorrect number of arguments
 Usage: HAAdmin [-transitionToActive serviceId [--forceactive]]
 {code}
 But the --forceactive not works as expected. When transition RM state with 
 --forceactive:
 {code}
 yarn rmadmin -transitionToActive rm2 --forceactive
 Automatic failover is enabled for 
 org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e
 Refusing to manually manage HA state, since it may cause
 a split-brain scenario or other incorrect state.
 If you are very sure you know what you are doing, please
 specify the forcemanual flag.
 {code}
 As shown above, we still cannot transitionToActive when automatic failover is 
 enabled with --forceactive.
 The option can work is: {{--forcemanual}}, there's no place in usage 
 describes this option. I think we should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278549#comment-14278549
 ] 

Hudson commented on YARN-2217:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #808 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/808/])
YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via 
kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java


 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, 
 YARN-2217-trunk-v9.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278546#comment-14278546
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #808 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/808/])
YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via 
kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278547#comment-14278547
 ] 

Hudson commented on YARN-2807:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #808 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/808/])
YARN-2807. Option --forceactive not works as described in usage of (xgong: 
rev d15cbae73c7ae22d5d60d8cba16cba565e8e8b20)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm


 Option --forceactive not works as described in usage of yarn rmadmin 
 -transitionToActive
 

 Key: YARN-2807
 URL: https://issues.apache.org/jira/browse/YARN-2807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation, resourcemanager
Reporter: Wangda Tan
Assignee: Masatake Iwasaki
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, 
 YARN-2807.4.patch


 Currently the help message of yarn rmadmin -transitionToActive is:
 {code}
 transitionToActive: incorrect number of arguments
 Usage: HAAdmin [-transitionToActive serviceId [--forceactive]]
 {code}
 But the --forceactive not works as expected. When transition RM state with 
 --forceactive:
 {code}
 yarn rmadmin -transitionToActive rm2 --forceactive
 Automatic failover is enabled for 
 org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e
 Refusing to manually manage HA state, since it may cause
 a split-brain scenario or other incorrect state.
 If you are very sure you know what you are doing, please
 specify the forcemanual flag.
 {code}
 As shown above, we still cannot transitionToActive when automatic failover is 
 enabled with --forceactive.
 The option can work is: {{--forcemanual}}, there's no place in usage 
 describes this option. I think we should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278461#comment-14278461
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6864 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6864/])
YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via 
kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278462#comment-14278462
 ] 

Hudson commented on YARN-2217:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6864 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6864/])
YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via 
kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java


 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, 
 YARN-2217-trunk-v9.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk

2015-01-15 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278436#comment-14278436
 ] 

Junping Du commented on YARN-3064:
--

Patch looks good to me. However, for failures in TestAMRestart, I saw we set 
configuration of YarnConfiguration.RECOVERY_ENABLED in some test cases. May be 
we should apply the same change there?
{code}
conf.setBoolean(YarnConfiguration.RECOVERY_ENABLED, true);
{code}

 TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with 
 allocation timeout in trunk
 ---

 Key: YARN-3064
 URL: https://issues.apache.org/jira/browse/YARN-3064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3064.1.patch


 Noticed consistent tests failure, see:
 https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/
 Logs like:
 {code}
 Error Message
 Attempt state is not correct (timedout) expected:ALLOCATED but 
 was:SCHEDULED
 Stacktrace
 java.lang.AssertionError: Attempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794)
 {code}
 I can reproduce it in local environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.

2015-01-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278601#comment-14278601
 ] 

Hadoop QA commented on YARN-2664:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692499/YARN-2664.8.patch
  against trunk revision ba5116e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 5 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6340//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6340//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6340//console

This message is automatically generated.

 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, 
 YARN-2664.6.patch, YARN-2664.7.patch, YARN-2664.8.patch, YARN-2664.patch, 
 legal.patch, screenshot_reservation_UI.pdf


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278637#comment-14278637
 ] 

Hudson commented on YARN-3005:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6866 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6866/])
YARN-3005. [JDK7] Use switch statement for String instead of if-else statement 
in RegistrySecurity.java (Contributed by Kengo Seki) (aajisaka: rev 
533e551eb42af188535aeb0ab35f8ebf150a0da1)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/java/org/apache/hadoop/registry/client/impl/zk/RegistrySecurity.java


 [JDK7] Use switch statement for String instead of if-else statement in 
 RegistrySecurity.java
 

 Key: YARN-3005
 URL: https://issues.apache.org/jira/browse/YARN-3005
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Priority: Trivial
  Labels: newbie
 Fix For: 2.7.0

 Attachments: YARN-3005.001.patch, YARN-3005.002.patch


 Since we have moved to JDK7, we can refactor the below if-else statement for 
 String.
 {code}
 // TODO JDK7 SWITCH
 if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) {
   access = AccessPolicy.sasl;
 } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) {
   access = AccessPolicy.digest;
 } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) {
   access = AccessPolicy.anon;
 } else {
   throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM
   + \ + auth + \);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java

2015-01-15 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278632#comment-14278632
 ] 

Akira AJISAKA commented on YARN-3005:
-

Can anyone assign [~sekikn] to this issue? Now I don't have the permission to 
do this.

 [JDK7] Use switch statement for String instead of if-else statement in 
 RegistrySecurity.java
 

 Key: YARN-3005
 URL: https://issues.apache.org/jira/browse/YARN-3005
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Priority: Trivial
  Labels: newbie
 Fix For: 2.7.0

 Attachments: YARN-3005.001.patch, YARN-3005.002.patch


 Since we have moved to JDK7, we can refactor the below if-else statement for 
 String.
 {code}
 // TODO JDK7 SWITCH
 if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) {
   access = AccessPolicy.sasl;
 } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) {
   access = AccessPolicy.digest;
 } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) {
   access = AccessPolicy.anon;
 } else {
   throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM
   + \ + auth + \);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java

2015-01-15 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278627#comment-14278627
 ] 

Akira AJISAKA commented on YARN-3005:
-

LGTM, +1. The patch is just to refactor the code, so new tests are not needed.

 [JDK7] Use switch statement for String instead of if-else statement in 
 RegistrySecurity.java
 

 Key: YARN-3005
 URL: https://issues.apache.org/jira/browse/YARN-3005
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Priority: Trivial
  Labels: newbie
 Attachments: YARN-3005.001.patch, YARN-3005.002.patch


 Since we have moved to JDK7, we can refactor the below if-else statement for 
 String.
 {code}
 // TODO JDK7 SWITCH
 if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) {
   access = AccessPolicy.sasl;
 } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) {
   access = AccessPolicy.digest;
 } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) {
   access = AccessPolicy.anon;
 } else {
   throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM
   + \ + auth + \);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2419) RM applications page doesn't sort application id properly

2015-01-15 Thread Andrew Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278696#comment-14278696
 ] 

Andrew Johnson commented on YARN-2419:
--

I am encountering this same problem.  Is there a fix in the works?

 RM applications page doesn't sort application id properly
 -

 Key: YARN-2419
 URL: https://issues.apache.org/jira/browse/YARN-2419
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Thomas Graves

 The ResourceManager apps page doesn't sort the application ids properly when 
 the app id rolls over from  to 1.
 When it rolls over the 1+ application ids end up being many pages down by 
 the 0XXX numbers.
 I assume we just sort alphabetically so we would need a special sorter that 
 knows about application ids.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed

2015-01-15 Thread Dmitry Sivachenko (JIRA)
Dmitry Sivachenko created YARN-3066:
---

 Summary: Hadoop leaves orphaned tasks running after job is killed
 Key: YARN-3066
 URL: https://issues.apache.org/jira/browse/YARN-3066
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1
Reporter: Dmitry Sivachenko


When spawning user task, node manager checks for setsid(1) utility and spawns 
task program via it. See 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
 for instance:

String exec = Shell.isSetsidAvailable? exec setsid : exec;

FreeBSD, unlike Linux, does not have setsid(1) utility.  So plain exec is 
used to spawn user task.  If that task spawns other external programs (this is 
common case if a task program is a shell script) and user kills job via mapred 
job -kill Job, these child processes remain running.

1) Why do you silently ignore the absence of setsid(1) and spawn task process 
via exec: this is the guarantee to have orphaned processes when job is 
prematurely killed.
2) FreeBSD has a replacement third-party program called ssid (which does almost 
the same as Linux's setsid).  It would be nice to detect which binary is 
present during configure stage and put @SETSID@ macros into java file to use 
the correct name.

I propose to make Shell.isSetsidAvailable test more strict and fail to start if 
it is not found:  at least we will know about the problem at start rather than 
guess why there are orphaned tasks running forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed

2015-01-15 Thread Dmitry Sivachenko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279287#comment-14279287
 ] 

Dmitry Sivachenko commented on YARN-3066:
-

Windows case is tested separately, see private static boolean 
isSetsidSupported() in
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shel
l.java

for instance:

if (Shell.WINDOWS) {
  return false;
}

In any UNIX-like case I suppose it will leave orphaned processes, because if 
isSetsidSupported()==false it uses kill(pid) to kill task instead of kill(pgid) 
to kill the whole process group.

ssid(1) in FreeBSD  is the analog setsid(1) in Linux: userland wrapper for 
setsid() system call.

Renaming does not sound as sane idea, because it is hard to convince all people 
to do rename of installed binaries by hand.

I propose to treat it like system-dependent option and act accordingly.

(I suppose other OS's like Solaris also lack setsid(1) utility so they could 
also benefit).

For ssid source see http://tools.suckless.org/ssid/

As for backwards compatibility we can change that in 3.0, it is not fatal, 
failure to start without setsid will just remind users to install setsid() or 
ssid() and proceed futher, and be sure that there will be no side effects like 
orphaned tasks eating CPU.

 Hadoop leaves orphaned tasks running after job is killed
 

 Key: YARN-3066
 URL: https://issues.apache.org/jira/browse/YARN-3066
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1
Reporter: Dmitry Sivachenko

 When spawning user task, node manager checks for setsid(1) utility and spawns 
 task program via it. See 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
  for instance:
 String exec = Shell.isSetsidAvailable? exec setsid : exec;
 FreeBSD, unlike Linux, does not have setsid(1) utility.  So plain exec is 
 used to spawn user task.  If that task spawns other external programs (this 
 is common case if a task program is a shell script) and user kills job via 
 mapred job -kill Job, these child processes remain running.
 1) Why do you silently ignore the absence of setsid(1) and spawn task process 
 via exec: this is the guarantee to have orphaned processes when job is 
 prematurely killed.
 2) FreeBSD has a replacement third-party program called ssid (which does 
 almost the same as Linux's setsid).  It would be nice to detect which binary 
 is present during configure stage and put @SETSID@ macros into java file to 
 use the correct name.
 I propose to make Shell.isSetsidAvailable test more strict and fail to start 
 if it is not found:  at least we will know about the problem at start rather 
 than guess why there are orphaned tasks running forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278720#comment-14278720
 ] 

Hudson commented on YARN-2807:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2006 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2006/])
YARN-2807. Option --forceactive not works as described in usage of (xgong: 
rev d15cbae73c7ae22d5d60d8cba16cba565e8e8b20)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java


 Option --forceactive not works as described in usage of yarn rmadmin 
 -transitionToActive
 

 Key: YARN-2807
 URL: https://issues.apache.org/jira/browse/YARN-2807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation, resourcemanager
Reporter: Wangda Tan
Assignee: Masatake Iwasaki
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, 
 YARN-2807.4.patch


 Currently the help message of yarn rmadmin -transitionToActive is:
 {code}
 transitionToActive: incorrect number of arguments
 Usage: HAAdmin [-transitionToActive serviceId [--forceactive]]
 {code}
 But the --forceactive not works as expected. When transition RM state with 
 --forceactive:
 {code}
 yarn rmadmin -transitionToActive rm2 --forceactive
 Automatic failover is enabled for 
 org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e
 Refusing to manually manage HA state, since it may cause
 a split-brain scenario or other incorrect state.
 If you are very sure you know what you are doing, please
 specify the forcemanual flag.
 {code}
 As shown above, we still cannot transitionToActive when automatic failover is 
 enabled with --forceactive.
 The option can work is: {{--forcemanual}}, there's no place in usage 
 describes this option. I think we should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278731#comment-14278731
 ] 

Hudson commented on YARN-2217:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #71 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/71/])
YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via 
kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java


 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, 
 YARN-2217-trunk-v9.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278722#comment-14278722
 ] 

Hudson commented on YARN-2217:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2006 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2006/])
YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via 
kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java


 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, 
 YARN-2217-trunk-v9.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278719#comment-14278719
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2006 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2006/])
YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via 
kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278729#comment-14278729
 ] 

Hudson commented on YARN-2807:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #71 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/71/])
YARN-2807. Option --forceactive not works as described in usage of (xgong: 
rev d15cbae73c7ae22d5d60d8cba16cba565e8e8b20)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* hadoop-yarn-project/CHANGES.txt


 Option --forceactive not works as described in usage of yarn rmadmin 
 -transitionToActive
 

 Key: YARN-2807
 URL: https://issues.apache.org/jira/browse/YARN-2807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation, resourcemanager
Reporter: Wangda Tan
Assignee: Masatake Iwasaki
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, 
 YARN-2807.4.patch


 Currently the help message of yarn rmadmin -transitionToActive is:
 {code}
 transitionToActive: incorrect number of arguments
 Usage: HAAdmin [-transitionToActive serviceId [--forceactive]]
 {code}
 But the --forceactive not works as expected. When transition RM state with 
 --forceactive:
 {code}
 yarn rmadmin -transitionToActive rm2 --forceactive
 Automatic failover is enabled for 
 org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e
 Refusing to manually manage HA state, since it may cause
 a split-brain scenario or other incorrect state.
 If you are very sure you know what you are doing, please
 specify the forcemanual flag.
 {code}
 As shown above, we still cannot transitionToActive when automatic failover is 
 enabled with --forceactive.
 The option can work is: {{--forcemanual}}, there's no place in usage 
 describes this option. I think we should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2015-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278728#comment-14278728
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #71 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/71/])
YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via 
kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle

2015-01-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279611#comment-14279611
 ] 

Sangjin Lee commented on YARN-3030:
---

Also, the current ATS timeline client API is based on REST. Do we want to use 
REST similarly, or do we want to consider using RPC? The standard pros and cons 
apply here: REST would be bit better with off-cluster arbitrary clients, 
minimize code coupling between client and server, and make things more 
symmetric between reads and writes. On the other hand, RPC would provide more 
flexibility in terms of operations.

Thoughts?

 set up ATS writer with basic request serving structure and lifecycle
 

 Key: YARN-3030
 URL: https://issues.apache.org/jira/browse/YARN-3030
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 Per design in YARN-2928, create an ATS writer as a service, and implement the 
 basic service structure including the lifecycle management.
 Also, as part of this JIRA, we should come up with the ATS client API for 
 sending requests to this ATS writer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number

2015-01-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279664#comment-14279664
 ] 

Zhijie Shen commented on YARN-3009:
---

[~Naganarasimha], thanks for the patch. I think your work around is going to 
mitigate the problem. However, my concern is whether we should do this work 
around instead of how to do it correctly. While I understand it's 
counter-intuitive to use (double) quotes to enforce the value as a string, I'm 
afraid *atoi* or *atof* of jackson parser is probably doing the right thing. A 
string that starts with numeric char, but contains non-numeric char could still 
be a valid number. For example,the value is {{123456D}} or {{123.45E+6}}. On 
the other side, we can also consider them as a string, e.g., representing an ID.


 TimelineWebServices always parses primary and secondary filters as numbers if 
 first char is a number
 

 Key: YARN-3009
 URL: https://issues.apache.org/jira/browse/YARN-3009
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Chris K Wensel
Assignee: Naganarasimha G R
 Attachments: YARN-3009.20150108-1.patch, YARN-3009.20150111-1.patch


 If you pass a filter value that starts with a number (7CCA...), the filter 
 value will be parsed into the Number '7' causing the filter to fail the 
 search.
 Should be noted the actual value as stored via a PUT operation is properly 
 parsed and stored as a String.
 This manifests as a very hard to identify issue with DAGClient in Apache Tez 
 and naming dags/vertices with alphanumeric guid values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle

2015-01-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279680#comment-14279680
 ] 

Zhijie Shen commented on YARN-3030:
---

bq. work out the timeline client API (both sync and async)

I'm wondering if we have to finalize the data model, such as entities, events 
and metrics first, because the APIs are going to operate on these stuff right.

bq. Do we want to use REST similarly, or do we want to consider using RPC?

I suggest going on with REST now, as we may easily reuse existing REST 
communication stack, but we can isolate the client/server interface and the 
underlying communication layer. And in the future, if we want to take advantage 
of the operation flexibility of RPC, we can implement the interface with protos 
and replace the REST one.

 set up ATS writer with basic request serving structure and lifecycle
 

 Key: YARN-3030
 URL: https://issues.apache.org/jira/browse/YARN-3030
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 Per design in YARN-2928, create an ATS writer as a service, and implement the 
 basic service structure including the lifecycle management.
 Also, as part of this JIRA, we should come up with the ATS client API for 
 sending requests to this ATS writer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)