[jira] [Updated] (YARN-3764) CapacityScheduler forbid of moving LeafQueue from one parent to another

2015-06-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3764:
-
Description: 
Currently CapacityScheduler doesn't handle the case well, for example:

A queue structure:
{code}
root
  |
  a (100)
/   \
   x y
  (50)   (50)
{code}

And reinitialize using following structure:
{code}
 root
 /   \ 
(50)a x (50)
|
y
   (100)
{code}

The actual queue structure after reinitialize is:
{code}
 root
/\
   a (50) x (50)
  /  \
 xy
(50)  (100)
{code}

We should forbid admin doing that.


  was:
Currently CapacityScheduler doesn't handle the case well, for example:

A queue structure:
{code}
root
  |
  a (100)
/   \
   x y
  (50)   (50)
{code}

And reinitialize using following structure:
{code}
 root
 /   \ 
(50)a x (50)
|
y
   (100)
{code}

The actual queue structure after reinitialize is:
{code}
 root
/\
   a (50) x (50)
  /  \
 xy
(50)  (100)
{code}

We should handle this case better.



 CapacityScheduler forbid of moving LeafQueue from one parent to another
 ---

 Key: YARN-3764
 URL: https://issues.apache.org/jira/browse/YARN-3764
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker

 Currently CapacityScheduler doesn't handle the case well, for example:
 A queue structure:
 {code}
 root
   |
   a (100)
 /   \
x y
   (50)   (50)
 {code}
 And reinitialize using following structure:
 {code}
  root
  /   \ 
 (50)a x (50)
 |
 y
(100)
 {code}
 The actual queue structure after reinitialize is:
 {code}
  root
 /\
a (50) x (50)
   /  \
  xy
 (50)  (100)
 {code}
 We should forbid admin doing that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571500#comment-14571500
 ] 

Sunil G commented on YARN-3751:
---

Thank you [~zjshen] for committing the patch!

 TestAHSWebServices fails after YARN-3467
 

 Key: YARN-3751
 URL: https://issues.apache.org/jira/browse/YARN-3751
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3751.patch


 YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
 not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-06-03 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571607#comment-14571607
 ] 

Vrushali C commented on YARN-3411:
--

After evaluating both approaches of backend storage implementations in terms of 
their performance, scalability, usability, maintenance as given by YARN-3134 
(Phoenix based HBase schema) and  YARN-3411  (hybrid HBase schema - vanilla 
HBase tables in the direct write path and phoenix based tables for reporting), 
conclusion is to use vanilla hbase tables in the direct write path.
Attached to YARN-2928 is a write-up that describes how we ended up choosing the 
approach of writing to vanilla HBase tables (YARN-3411) in the direct write 
path.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Fix For: YARN-2928

 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-06-03 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571606#comment-14571606
 ] 

Vrushali C commented on YARN-3134:
--


After evaluating both approaches of backend storage implementations in terms of 
their performance, scalability, usability, maintenance as given by YARN-3134 
(Phoenix based HBase schema) and  YARN-3411  (hybrid HBase schema - vanilla 
HBase tables in the direct write path and phoenix based tables for reporting),  
conclusion is to use vanilla hbase tables in the direct write path. 

Attached to YARN-2928 is a write-up that describes how we ended up choosing the 
approach of writing to vanilla HBase tables (YARN-3411) in the direct write 
path.


 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Fix For: YARN-2928

 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, 
 YARN-3134-YARN-2928.007.patch, YARN-3134DataSchema.pdf, 
 hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3765) Fix findbugs the warning in YARN-2928 branch, TimelineMetric

2015-06-03 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3765:

Description: 
There is one warning about reversing the return value of comparisons in 
YARN-2928 branch. This is a valid warning. Quoting the findbugs warning message:

RV_NEGATING_RESULT_OF_COMPARETO: Negating the result of compareTo()/compare()

This code negatives the return value of a compareTo or compare method. This is 
a questionable or bad programming practice, since if the return value is 
Integer.MIN_VALUE, negating the return value won't negate the sign of the 
result. You can achieve the same intended result by reversing the order of the 
operands rather than by negating the results.

  was:There is one warning about reversing the return value of comparisons in 
YARN-2928 branch. I believe this is a false alarm since we intentionally said 
the comparator is a reversed comparator. 


 Fix findbugs the warning in YARN-2928 branch, TimelineMetric
 

 Key: YARN-3765
 URL: https://issues.apache.org/jira/browse/YARN-3765
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3765-YARN-2928.001.patch


 There is one warning about reversing the return value of comparisons in 
 YARN-2928 branch. This is a valid warning. Quoting the findbugs warning 
 message:
 RV_NEGATING_RESULT_OF_COMPARETO: Negating the result of compareTo()/compare()
 This code negatives the return value of a compareTo or compare method. This 
 is a questionable or bad programming practice, since if the return value is 
 Integer.MIN_VALUE, negating the return value won't negate the sign of the 
 result. You can achieve the same intended result by reversing the order of 
 the operands rather than by negating the results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-06-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571480#comment-14571480
 ] 

Hadoop QA commented on YARN-3453:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 55s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 37s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 46s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  50m 14s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  88m 10s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737319/YARN-3453.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c59e745 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8179/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8179/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8179/console |


This message was automatically generated.

 Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
 even in DRF mode causing thrashing
 

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh
 Attachments: YARN-3453.1.patch


 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-06-03 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3044:

Attachment: YARN-3044-YARN-2928.010.patch

Hi [~zjshen],
Please find the attached rebased patch to incorporate YARN-1462

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
 YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, 
 YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, 
 YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
 YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3276) Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings

2015-06-03 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571610#comment-14571610
 ] 

Li Lu commented on YARN-3276:
-

The -1 appears to be irrelevant to the fix in this patch. I can confirm we 
actually have one findbugs warning in TimelineMetric, reverse comparator. Will 
open a separate JIRA to do a quick fix. 

 Refactor and fix null casting in some map cast for TimelineEntity (old and 
 new) and fix findbug warnings
 

 Key: YARN-3276
 URL: https://issues.apache.org/jira/browse/YARN-3276
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3276-YARN-2928.v3.patch, 
 YARN-3276-YARN-2928.v4.patch, YARN-3276-YARN-2928.v5-fix-checkstyle.patch, 
 YARN-3276-YARN-2928.v5.patch, YARN-3276-YARN-2928.v6.patch, 
 YARN-3276-v2.patch, YARN-3276-v3.patch, YARN-3276.patch


 Per discussion in YARN-3087, we need to refactor some similar logic to cast 
 map to hashmap and get rid of NPE issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3765) Fix findbugs the warning in YARN-2928 branch, TimelineMetric

2015-06-03 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3765:

Attachment: YARN-3765-YARN-2928.001.patch

I looked into the warning message, and now I believe it's not a false alarm. 
Previously we're directly negating the comparison, but this may potentially hit 
integer overflow. A simple fix is to reverse the direction of the comparison. 

 Fix findbugs the warning in YARN-2928 branch, TimelineMetric
 

 Key: YARN-3765
 URL: https://issues.apache.org/jira/browse/YARN-3765
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3765-YARN-2928.001.patch


 There is one warning about reversing the return value of comparisons in 
 YARN-2928 branch. I believe this is a false alarm since we intentionally said 
 the comparator is a reversed comparator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3276) Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings

2015-06-03 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3276:
--
Attachment: YARN-3276-YARN-2928.v6.patch

+1 for the last patch. Rebase it against the latest branch. Will commit it 
after jenkins comment.

 Refactor and fix null casting in some map cast for TimelineEntity (old and 
 new) and fix findbug warnings
 

 Key: YARN-3276
 URL: https://issues.apache.org/jira/browse/YARN-3276
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3276-YARN-2928.v3.patch, 
 YARN-3276-YARN-2928.v4.patch, YARN-3276-YARN-2928.v5-fix-checkstyle.patch, 
 YARN-3276-YARN-2928.v5.patch, YARN-3276-YARN-2928.v6.patch, 
 YARN-3276-v2.patch, YARN-3276-v3.patch, YARN-3276.patch


 Per discussion in YARN-3087, we need to refactor some similar logic to cast 
 map to hashmap and get rid of NPE issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2928) YARN Timeline Service: Next generation

2015-06-03 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-2928:
-
Attachment: TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf


We decided to evaluate two approaches of backend storage implementations in 
terms of their performance, scalability, usability, maintenance: YARN-3134 
(Phoenix based HBase schema) and  YARN-3411  (hybrid HBase schema - vanilla 
HBase tables in the direct write path and phoenix based tables for reporting).

Attaching a write-up that describes how we ended up choosing the approach of 
writing to vanilla HBase tables (YARN-3411) in the direct write path.


 YARN Timeline Service: Next generation
 --

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
 v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, 
 TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3765) Fix findbugs the warning in YARN-2928 branch, TimelineMetric

2015-06-03 Thread Li Lu (JIRA)
Li Lu created YARN-3765:
---

 Summary: Fix findbugs the warning in YARN-2928 branch, 
TimelineMetric
 Key: YARN-3765
 URL: https://issues.apache.org/jira/browse/YARN-3765
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Li Lu
Assignee: Li Lu


There is one warning about reversing the return value of comparisons in 
YARN-2928 branch. I believe this is a false alarm since we intentionally said 
the comparator is a reversed comparator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571503#comment-14571503
 ] 

Hudson commented on YARN-3751:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7952 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7952/])
YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by 
Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java
* hadoop-yarn-project/CHANGES.txt


 TestAHSWebServices fails after YARN-3467
 

 Key: YARN-3751
 URL: https://issues.apache.org/jira/browse/YARN-3751
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3751.patch


 YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
 not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571502#comment-14571502
 ] 

Karthik Kambatla commented on YARN-3762:


Thanks for the review, Arun. Good points.

bq. what happens if the collection is modified in between..
The two possible modifications are adding/removing a child queue. Adding a 
child queue to the end of the list doesn't affect container assignment. 
Removing a child queue affects container assignment, but that is a good thing. 
We should probably add a comment to that effect so we don't forget this in the 
future.

bq. instead of using a List and sorting it everytime, we could use a Sorted Bag 
(MultiSet) ? 
One issue with using a sorted list is the sorting happens on addition/removal. 
FSQueues already in the list also change affecting the order. May be, we 
could remove and re-insert the queue if anything changes, but that is a much 
bigger change and needs to be carefully evaluated for performance.

 FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
 ---

 Key: YARN-3762
 URL: https://issues.apache.org/jira/browse/YARN-3762
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-3762-1.patch, yarn-3762-1.patch


 In our testing, we ran into the following ConcurrentModificationException:
 {noformat}
 halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
 queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
 queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
   at java.util.ArrayList$Itr.next(ArrayList.java:851)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1942) Many of ConverterUtils methods need to have public interfaces

2015-06-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571623#comment-14571623
 ] 

Zhijie Shen commented on YARN-1942:
---

It seems that we have more than ConverterUtils that has been referenced by 
external projects. For example, in YARN-1462, we just encountered the issue 
that newInstance is marked as \@Private, but it's actually referenced by Tez.

We need to check the public methods that are annotated as \@Private in 
api/common module. If they are useful to or reasonably referenced by the 
downstream projects, we should mark them \@Public. Sid has suggested to take MR 
as the example. If there're some such methods used by MR, it's very likely to 
be used by others too.

 Many of ConverterUtils methods need to have public interfaces
 -

 Key: YARN-1942
 URL: https://issues.apache.org/jira/browse/YARN-1942
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.4.0
Reporter: Thomas Graves
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-1942.1.patch, YARN-1942.2.patch


 ConverterUtils has a bunch of functions that are useful to application 
 masters.   It should either be made public or we make some of the utilities 
 in it public or we provide other external apis for application masters to 
 use.  Note that distributedshell and MR are both using these interfaces. 
 For instance the main use case I see right now is for getting the application 
 attempt id within the appmaster:
 String containerIdStr =
   System.getenv(Environment.CONTAINER_ID.name());
 ConverterUtils.toContainerId
 ContainerId containerId = ConverterUtils.toContainerId(containerIdStr);
   ApplicationAttemptId applicationAttemptId =
   containerId.getApplicationAttemptId();
 I don't see any other way for the application master to get this information. 
  If there is please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another

2015-06-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3764:
-
Attachment: YARN-3764.1.patch

Attached initial patch for review.

 CapacityScheduler should forbid moving LeafQueue from one parent to another
 ---

 Key: YARN-3764
 URL: https://issues.apache.org/jira/browse/YARN-3764
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3764.1.patch


 Currently CapacityScheduler doesn't handle the case well, for example:
 A queue structure:
 {code}
 root
   |
   a (100)
 /   \
x y
   (50)   (50)
 {code}
 And reinitialize using following structure:
 {code}
  root
  /   \ 
 (50)a x (50)
 |
 y
(100)
 {code}
 The actual queue structure after reinitialize is:
 {code}
  root
 /\
a (50) x (50)
   /  \
  xy
 (50)  (100)
 {code}
 We should forbid admin doing that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3764) CapacityScheduler should properly handle moving LeafQueue from one parent to another

2015-06-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571479#comment-14571479
 ] 

Vinod Kumar Vavilapalli commented on YARN-3764:
---

bq. A short term fix is don't allow remove queue under parentQueue.
We never supported removing queues. So this is not just a short-term fix, this 
is the right fix for now.

 CapacityScheduler should properly handle moving LeafQueue from one parent to 
 another
 

 Key: YARN-3764
 URL: https://issues.apache.org/jira/browse/YARN-3764
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker

 Currently CapacityScheduler doesn't handle the case well, for example:
 A queue structure:
 {code}
 root
   |
   a (100)
 /   \
x y
   (50)   (50)
 {code}
 And reinitialize using following structure:
 {code}
  root
  /   \ 
 (50)a x (50)
 |
 y
(100)
 {code}
 The actual queue structure after reinitialize is:
 {code}
  root
 /\
a (50) x (50)
   /  \
  xy
 (50)  (100)
 {code}
 We should handle this case better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another

2015-06-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571486#comment-14571486
 ] 

Wangda Tan commented on YARN-3764:
--

[~vinodkv], agree. Update the title/desc and will search/file separated ticket 
for moving/removing queue.

 CapacityScheduler should forbid moving LeafQueue from one parent to another
 ---

 Key: YARN-3764
 URL: https://issues.apache.org/jira/browse/YARN-3764
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker

 Currently CapacityScheduler doesn't handle the case well, for example:
 A queue structure:
 {code}
 root
   |
   a (100)
 /   \
x y
   (50)   (50)
 {code}
 And reinitialize using following structure:
 {code}
  root
  /   \ 
 (50)a x (50)
 |
 y
(100)
 {code}
 The actual queue structure after reinitialize is:
 {code}
  root
 /\
a (50) x (50)
   /  \
  xy
 (50)  (100)
 {code}
 We should forbid admin doing that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another

2015-06-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3764:
-
Summary: CapacityScheduler should forbid moving LeafQueue from one parent 
to another  (was: CapacityScheduler forbid of moving LeafQueue from one parent 
to another)

 CapacityScheduler should forbid moving LeafQueue from one parent to another
 ---

 Key: YARN-3764
 URL: https://issues.apache.org/jira/browse/YARN-3764
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker

 Currently CapacityScheduler doesn't handle the case well, for example:
 A queue structure:
 {code}
 root
   |
   a (100)
 /   \
x y
   (50)   (50)
 {code}
 And reinitialize using following structure:
 {code}
  root
  /   \ 
 (50)a x (50)
 |
 y
(100)
 {code}
 The actual queue structure after reinitialize is:
 {code}
  root
 /\
a (50) x (50)
   /  \
  xy
 (50)  (100)
 {code}
 We should forbid admin doing that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3276) Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings

2015-06-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571590#comment-14571590
 ] 

Hadoop QA commented on YARN-3276:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  21m 40s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |  11m  0s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m  7s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 41s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   2m 15s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 50s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 55s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings, and fixes 1 pre-existing warnings. |
| {color:green}+1{color} | yarn tests |   0m 28s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m 18s | Tests passed in 
hadoop-yarn-common. |
| | |  56m 43s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737351/YARN-3276-YARN-2928.v6.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 2e12480 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8183/artifact/patchprocess/YARN-2928FindbugsWarningshadoop-yarn-api.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8183/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8183/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8183/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8183/console |


This message was automatically generated.

 Refactor and fix null casting in some map cast for TimelineEntity (old and 
 new) and fix findbug warnings
 

 Key: YARN-3276
 URL: https://issues.apache.org/jira/browse/YARN-3276
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3276-YARN-2928.v3.patch, 
 YARN-3276-YARN-2928.v4.patch, YARN-3276-YARN-2928.v5-fix-checkstyle.patch, 
 YARN-3276-YARN-2928.v5.patch, YARN-3276-YARN-2928.v6.patch, 
 YARN-3276-v2.patch, YARN-3276-v3.patch, YARN-3276.patch


 Per discussion in YARN-3087, we need to refactor some similar logic to cast 
 map to hashmap and get rid of NPE issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571611#comment-14571611
 ] 

Hadoop QA commented on YARN-3762:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 56s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 45s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  50m  7s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  87m 55s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737349/yarn-3762-2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / dbc4f64 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8182/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8182/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8182/console |


This message was automatically generated.

 FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
 ---

 Key: YARN-3762
 URL: https://issues.apache.org/jira/browse/YARN-3762
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch


 In our testing, we ran into the following ConcurrentModificationException:
 {noformat}
 halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
 queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
 queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
   at java.util.ArrayList$Itr.next(ArrayList.java:851)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-03 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reopened YARN-1462:
---

Per discussion offline with Sid, my proposal is:

1.Revert the current commit, create and commit a new patch with compatible 
newInstance change.

2. Do not change the annotation from Private to Public as it's separate issue. 
File another jira or link to the existing jira to track the problem of 
downstream projects' reference to private methods in api/common module.

 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1462-branch-2.7-1.2.patch, 
 YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
 YARN-1462.3.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571618#comment-14571618
 ] 

Karthik Kambatla commented on YARN-3762:


Thanks Arun. Checking this in. 

 FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
 ---

 Key: YARN-3762
 URL: https://issues.apache.org/jira/browse/YARN-3762
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch


 In our testing, we ran into the following ConcurrentModificationException:
 {noformat}
 halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
 queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
 queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
   at java.util.ArrayList$Itr.next(ArrayList.java:851)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-03 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3762:
---
Attachment: yarn-3762-2.patch

Updated the patch to add more comments. 

 FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
 ---

 Key: YARN-3762
 URL: https://issues.apache.org/jira/browse/YARN-3762
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch


 In our testing, we ran into the following ConcurrentModificationException:
 {noformat}
 halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
 queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
 queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
   at java.util.ArrayList$Itr.next(ArrayList.java:851)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571626#comment-14571626
 ] 

Hudson commented on YARN-3762:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7955 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7955/])
YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) 
(kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* hadoop-yarn-project/CHANGES.txt


 FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
 ---

 Key: YARN-3762
 URL: https://issues.apache.org/jira/browse/YARN-3762
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 2.8.0

 Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch


 In our testing, we ran into the following ConcurrentModificationException:
 {noformat}
 halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
 queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
 queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
   at java.util.ArrayList$Itr.next(ArrayList.java:851)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3764) CapacityScheduler forbid of moving LeafQueue from one parent to another

2015-06-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3764:
-
Summary: CapacityScheduler forbid of moving LeafQueue from one parent to 
another  (was: CapacityScheduler should properly handle moving LeafQueue from 
one parent to another)

 CapacityScheduler forbid of moving LeafQueue from one parent to another
 ---

 Key: YARN-3764
 URL: https://issues.apache.org/jira/browse/YARN-3764
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker

 Currently CapacityScheduler doesn't handle the case well, for example:
 A queue structure:
 {code}
 root
   |
   a (100)
 /   \
x y
   (50)   (50)
 {code}
 And reinitialize using following structure:
 {code}
  root
  /   \ 
 (50)a x (50)
 |
 y
(100)
 {code}
 The actual queue structure after reinitialize is:
 {code}
  root
 /\
a (50) x (50)
   /  \
  xy
 (50)  (100)
 {code}
 We should handle this case better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-03 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571512#comment-14571512
 ] 

Arun Suresh commented on YARN-3762:
---

Makes sense
+1, LGTM

 FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
 ---

 Key: YARN-3762
 URL: https://issues.apache.org/jira/browse/YARN-3762
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch


 In our testing, we ran into the following ConcurrentModificationException:
 {noformat}
 halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
 queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
 queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
   at java.util.ArrayList$Itr.next(ArrayList.java:851)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571614#comment-14571614
 ] 

Hadoop QA commented on YARN-3749:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m  2s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 8 new or modified test files. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 14s | The applied patch generated  1 
new checkstyle issues (total was 212, now 213). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 11s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   7m  0s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  60m 24s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| {color:green}+1{color} | yarn tests |   1m 52s | Tests passed in 
hadoop-yarn-server-tests. |
| | | 120m 57s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737337/YARN-3749.7.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c59e745 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8181/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8181/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8181/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8181/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8181/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-tests test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8181/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8181/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8181/console |


This message was automatically generated.

 We should make a copy of configuration when init MiniYARNCluster with 
 multiple RMs
 --

 Key: YARN-3749
 URL: https://issues.apache.org/jira/browse/YARN-3749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chun Chen
Assignee: Chun Chen
 Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
 YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
 YARN-3749.patch


 When I was trying to write a test case for YARN-2674, I found DS client 
 trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
 when RM failover. But I initially set 
 yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
 yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
 in ClientRMService where the value of yarn.resourcemanager.address.rm2 
 changed to 0.0.0.0:18032. See the following code in ClientRMService:
 {code}
 clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_ADDRESS,

 YarnConfiguration.DEFAULT_RM_ADDRESS,

[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571481#comment-14571481
 ] 

Zhijie Shen commented on YARN-3751:
---

+1 LGTM

No more test case is required, while the existing one covers the code already.

Will commit the patch.

 TestAHSWebServices fails after YARN-3467
 

 Key: YARN-3751
 URL: https://issues.apache.org/jira/browse/YARN-3751
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Sunil G
 Attachments: 0001-YARN-3751.patch


 YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
 not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3764) CapacityScheduler should properly handle moving LeafQueue from one parent to another

2015-06-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571477#comment-14571477
 ] 

Wangda Tan commented on YARN-3764:
--

Following test case can verify this issue:

{code}
  @Test
  public void testQueueParsingWithMoveQueue()
  throws IOException {
YarnConfiguration conf = new YarnConfiguration();
CapacitySchedulerConfiguration csConf =
new CapacitySchedulerConfiguration(conf);
csConf.setQueues(root, new String[] { a });
csConf.setQueues(root.a, new String[] { x, y });
csConf.setCapacity(root.a, 100);
csConf.setCapacity(root.a.x, 50);
csConf.setCapacity(root.a.y, 50);

CapacityScheduler capacityScheduler = new CapacityScheduler();
RMContextImpl rmContext =
new RMContextImpl(null, null, null, null, null, null,
new RMContainerTokenSecretManager(csConf),
new NMTokenSecretManagerInRM(csConf),
new ClientToAMTokenSecretManagerInRM(), null);
rmContext.setNodeLabelManager(nodeLabelManager);
capacityScheduler.setConf(csConf);
capacityScheduler.setRMContext(rmContext);
capacityScheduler.init(csConf);
capacityScheduler.start();

csConf.setQueues(root, new String[] { a, x });
csConf.setQueues(root.a, new String[] { y });
csConf.setCapacity(root.x, 50);
csConf.setCapacity(root.a, 50);
csConf.setCapacity(root.a.y, 100);

capacityScheduler.reinitialize(csConf, rmContext);

Assert.assertEquals(1, ((ParentQueue) capacityScheduler.getQueue(a))
.getChildQueues().size());
  }
{code}

 CapacityScheduler should properly handle moving LeafQueue from one parent to 
 another
 

 Key: YARN-3764
 URL: https://issues.apache.org/jira/browse/YARN-3764
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker

 Currently CapacityScheduler doesn't handle the case well, for example:
 A queue structure:
 {code}
 root
   |
   a (100)
 /   \
x y
   (50)   (50)
 {code}
 And reinitialize using following structure:
 {code}
  root
  /   \ 
 (50)a x (50)
 |
 y
(100)
 {code}
 The actual queue structure after reinitialize is:
 {code}
  root
 /\
a (50) x (50)
   /  \
  xy
 (50)  (100)
 {code}
 We should handle this case better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571633#comment-14571633
 ] 

Xuan Gong commented on YARN-1462:
-

I am ok with this plan.
bq. 1.Revert the current commit, create and commit a new patch with compatible 
newInstance change.

Looks like that we have to revert two commits
{code}
commit 0b5cfacde638bc25cc010cd9236369237b4e51a8
Author: Xuan xg...@apache.org
Date:   Mon Jun 1 11:39:00 2015 -0700

YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in
CHANGES.txt
{code}

And

{code}
commit 4a9ec1a8243e2394ff7221b1c20dfaa80e9f5111
Author: Zhijie Shen zjs...@apache.org
Date:   Sat May 30 09:35:59 2015 -0700

YARN-1462. Made RM write application tags to timeline server and exposed 
them to users via generic history web UI and REST API. Contributed by Xuan Gong.
{code}

bq. 2. Do not change the annotation from Private to Public as it's separate 
issue. File another jira or link to the existing jira to track the problem of 
downstream projects' reference to private methods in api/common module.

Link https://issues.apache.org/jira/browse/YARN-1942

 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1462-branch-2.7-1.2.patch, 
 YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
 YARN-1462.3.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-19) 4-layer topology (with NodeGroup layer) implementation of Container Assignment and Task Scheduling (for YARN)

2015-06-03 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571632#comment-14571632
 ] 

Abin Shahab commented on YARN-19:
-

Hi [~jdu] Can this be merged in 2.8? We at Altiscale are encountering issues 
with split NM and DN. If YARN does not know how to schedule a container based 
on topology, locality suffers. [~raviprakash] and [~aw], what do you guys think?

 4-layer topology (with NodeGroup layer) implementation of Container 
 Assignment and Task Scheduling (for YARN)
 -

 Key: YARN-19
 URL: https://issues.apache.org/jira/browse/YARN-19
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Junping Du
Assignee: Junping Du
 Attachments: 
 HADOOP-8475-ContainerAssignmentTaskScheduling-withNodeGroup.patch, 
 MAPREDUCE-4310-v1.patch, MAPREDUCE-4310.patch, YARN-19-v2.patch, 
 YARN-19-v3-alpha.patch, YARN-19-v4.patch, YARN-19.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that related to data locality which were updated to give 
 preference to running a container on the same nodegroup. This section 
 summarized the changes in the patch that provides a new implementation to 
 support a four-layer hierarchy.
 When the ApplicationMaster makes a resource allocation request to the 
 scheduler of ResourceManager, it will add the node group to the list of 
 attributes in the ResourceRequest. The parameters of the resource request 
 will change from priority, (host, rack, *), memory, #containers to 
 priority, (host, nodegroup, rack, *), memory, #containers.
 After receiving the ResoureRequest the RM scheduler will assign containers 
 for requests in the sequence of data-local, nodegroup-local, rack-local and 
 off-switch.Then, ApplicationMaster schedules tasks on allocated containers in 
 sequence of data- local, nodegroup-local, rack-local and off-switch.
 In terms of code changes made to YARN task scheduling, we updated the class 
 ContainerRequestEvent so that applications can requests for containers can 
 include anodegroup. In RM schedulers, FifoScheduler and CapacityScheduler 
 were updated. For the FifoScheduler, the changes were in the method 
 assignContainers. For the Capacity Scheduler the method 
 assignContainersOnNode in the class of LeafQueue was updated. In both changes 
 a new method, assignNodeGroupLocalContainers() was added in between the 
 assignment data-local and rack-local.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-06-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571542#comment-14571542
 ] 

Hudson commented on YARN-3585:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7953 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7953/])
YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery 
is enabled. Contributed by Rohith Sharmaks (jlowe: rev 
e13b671aa510f553f4a6a232b4694b6a4cce88ae)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* hadoop-yarn-project/CHANGES.txt


 NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
 --

 Key: YARN-3585
 URL: https://issues.apache.org/jira/browse/YARN-3585
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.1

 Attachments: 0001-YARN-3585.patch, YARN-3585.patch


 With NM recovery enabled, after decommission, nodemanager log show stop but 
 process cannot end. 
 non daemon thread:
 {noformat}
 DestroyJavaVM prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
 condition [0x]
 leveldb prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
 [0x]
 VM Thread prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
 Gang worker#0 (Parallel GC Threads) prio=10 tid=0x7f346002 
 nid=0x29ed runnable 
 Gang worker#1 (Parallel GC Threads) prio=10 tid=0x7f3460022000 
 nid=0x29ee runnable 
 Gang worker#2 (Parallel GC Threads) prio=10 tid=0x7f3460024000 
 nid=0x29ef runnable 
 Gang worker#3 (Parallel GC Threads) prio=10 tid=0x7f3460025800 
 nid=0x29f0 runnable 
 Gang worker#4 (Parallel GC Threads) prio=10 tid=0x7f3460027800 
 nid=0x29f1 runnable 
 Gang worker#5 (Parallel GC Threads) prio=10 tid=0x7f3460029000 
 nid=0x29f2 runnable 
 Gang worker#6 (Parallel GC Threads) prio=10 tid=0x7f346002b000 
 nid=0x29f3 runnable 
 Gang worker#7 (Parallel GC Threads) prio=10 tid=0x7f346002d000 
 nid=0x29f4 runnable 
 Concurrent Mark-Sweep GC Thread prio=10 tid=0x7f3460120800 nid=0x29f7 
 runnable 
 Gang worker#0 (Parallel CMS Threads) prio=10 tid=0x7f346011c800 
 nid=0x29f5 runnable 
 Gang worker#1 (Parallel CMS Threads) prio=10 tid=0x7f346011e800 
 nid=0x29f6 runnable 
 VM Periodic Task Thread prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
 on condition 
 {noformat}
 and jni leveldb thread stack
 {noformat}
 Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
 #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
 /lib64/libpthread.so.0
 #1  0x7f33dfce2a3b in leveldb::(anonymous 
 namespace)::PosixEnv::BGThreadWrapper(void*) () from 
 /tmp/libleveldbjni-64-1-6922178968300745716.8
 #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
 #3  0x003d830e811d in clone () from /lib64/libc.so.6
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness

2015-06-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571575#comment-14571575
 ] 

Wangda Tan commented on YARN-3510:
--

[~sunilg],
bq.  and in general if the new approach gives a more fair preemption, then we 
can move to that.
The approach mentioend by [~cwelch] at 
https://issues.apache.org/jira/browse/YARN-3510?focusedCommentId=14571405page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14571405
 is the true fair. You can come back to look at the patch once it is uploaded.

 Create an extension of ProportionalCapacityPreemptionPolicy which preempts a 
 number of containers from each application in a way which respects fairness
 

 Key: YARN-3510
 URL: https://issues.apache.org/jira/browse/YARN-3510
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3510.2.patch, YARN-3510.3.patch, YARN-3510.5.patch, 
 YARN-3510.6.patch


 The ProportionalCapacityPreemptionPolicy preempts as many containers from 
 applications as it can during it's preemption run.  For fifo this makes 
 sense, as it is prempting in reverse order  therefore maintaining the 
 primacy of the oldest.  For fair ordering this does not have the desired 
 effect - instead, it should preempt a number of containers from each 
 application which maintains a fair balance /close to a fair balance between 
 them



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-06-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571593#comment-14571593
 ] 

Karthik Kambatla commented on YARN-3655:


bq. IMHO, It is not good to add if (isValidReservation) check in 
FSAppAttempt#reserve because all the conditions checked in isValidReservation 
are already checked before we call FSAppAttempt#reserve, it will be duplicate 
code which will affect the performance.
Is it possible to avoid the checks before the call, and do all the checks in 
the call. The reasoning behind this is to have all reservation-related code in 
as few places as possible. If this is not possible, we can leave it as the 
patch has it now.

bq. While adding this check in FSAppAttempt#assignContainer(node) might work in 
practice, it somehow feels out of place. 
Instead of adding the check to assignContainer(node) can we add it to 
assignContainer(node, request, nodeType, reserved)?

 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation 
 -

 Key: YARN-3655
 URL: https://issues.apache.org/jira/browse/YARN-3655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
 YARN-3655.002.patch, YARN-3655.003.patch


 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation.
 If a node is reserved by an application, all the other applications don't 
 have any chance to assign a new container on this node, unless the 
 application which reserves the node assigns a new container on this node or 
 releases the reserved container on this node.
 The problem is if an application tries to call assignReservedContainer and 
 fail to get a new container due to maxAMShare limitation, it will block all 
 other applications to use the nodes it reserves. If all other running 
 applications can't release their AM containers due to being blocked by these 
 reserved containers. A livelock situation can happen.
 The following is the code at FSAppAttempt#assignContainer which can cause 
 this potential livelock.
 {code}
 // Check the AM resource usage for the leaf queue
 if (!isAmRunning()  !getUnmanagedAM()) {
   ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests();
   if (ask.isEmpty() || !getQueue().canRunAppAM(
   ask.get(0).getCapability())) {
 if (LOG.isDebugEnabled()) {
   LOG.debug(Skipping allocation because maxAMShare limit would  +
   be exceeded);
 }
 return Resources.none();
   }
 }
 {code}
 To fix this issue, we can unreserve the node if we can't allocate the AM 
 container on the node due to Max AM share limitation and the node is reserved 
 by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-06-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571732#comment-14571732
 ] 

Zhijie Shen commented on YARN-3044:
---

[~Naganarasimha], thanks for updating the patch. It looks good to me so far, 
but I want to hold the patch for the following issues.

1. After YARN-3276 is committed, this patch will conflict on {{return 
l2.compareTo(l1);}}.

2. We're reworking YARN-1462. It won't affect this patch, but there's commit 
revert. Let's wait until YARN-1462 is done.

3. It not caused by this patch, but I found a race condition of publishing app 
finish event:
{code}
15/06/03 14:59:56 INFO rmapp.RMAppImpl: application_1433367826630_0002 State 
change from FINISHING to FINISHED
15/06/03 14:59:56 INFO capacity.LeafQueue: completedContainer 
container=Container: [ContainerId: container_1433367826630_0002_01_01, 
NodeId: localhost:9105, NodeHttpAddress: localhost:8042, Resource: 
memory:2048, vCores:1, Priority: 0, Token: Token { kind: ContainerToken, 
service: 127.0.0.1:9105 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, 
usedResources=memory:0, vCores:0, usedCapacity=0.0, absoluteUsedCapacity=0.0, 
numApps=1, numContainers=0 cluster=memory:8192, vCores:8
15/06/03 14:59:56 INFO resourcemanager.RMAuditLogger: USER=zshen
OPERATION=Application Finished - Succeeded  TARGET=RMAppManager 
RESULT=SUCCESS  APPID=application_1433367826630_0002
15/06/03 14:59:56 INFO capacity.ParentQueue: completedContainer queue=root 
usedCapacity=0.0 absoluteUsedCapacity=0.0 used=memory:0, vCores:0 
cluster=memory:8192, vCores:8
15/06/03 14:59:56 ERROR metrics.TimelineServiceV2Publisher: Error when 
publishing entity TimelineEntity[type='YARN_APPLICATION', 
id='application_1433367826630_0002']
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:273)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.publishApplicationFinishedEvent(TimelineServiceV2Publisher.java:133)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(AbstractTimelineServicePublisher.java:70)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(AbstractTimelineServicePublisher.java:35)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:176)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
at java.lang.Thread.run(Thread.java:745)
15/06/03 14:59:56 INFO amlauncher.AMLauncher: Cleaning master 
appattempt_1433367826630_0002_01
{code}

I think the problem is we stop the timeline collector immediately after calling 
appFinished, which is an async call, and publishing operation is executed 
asynchronously on another thread. One option is to stopTimelineCollector after 
publishing finish event in publisher. Can you take care of it?
{code}
  app.rmContext.getSystemMetricsPublisher()
  .appFinished(app, finalState, app.finishTime);

  app.stopTimelineCollector();
{code}

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
 YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, 
 YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, 
 YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
 YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1942) Many of ConverterUtils methods need to have public interfaces

2015-06-03 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571761#comment-14571761
 ] 

Sergey Shelukhin commented on YARN-1942:


No, it's used in production code as far as I can tell

 Many of ConverterUtils methods need to have public interfaces
 -

 Key: YARN-1942
 URL: https://issues.apache.org/jira/browse/YARN-1942
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.4.0
Reporter: Thomas Graves
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-1942.1.patch, YARN-1942.2.patch


 ConverterUtils has a bunch of functions that are useful to application 
 masters.   It should either be made public or we make some of the utilities 
 in it public or we provide other external apis for application masters to 
 use.  Note that distributedshell and MR are both using these interfaces. 
 For instance the main use case I see right now is for getting the application 
 attempt id within the appmaster:
 String containerIdStr =
   System.getenv(Environment.CONTAINER_ID.name());
 ConverterUtils.toContainerId
 ContainerId containerId = ConverterUtils.toContainerId(containerIdStr);
   ApplicationAttemptId applicationAttemptId =
   containerId.getApplicationAttemptId();
 I don't see any other way for the application master to get this information. 
  If there is please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-06-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571667#comment-14571667
 ] 

Karthik Kambatla commented on YARN-3453:


Should we add {{SchedulingPolicy#getResourceCalculator()}} and use that 
instead? 

 Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
 even in DRF mode causing thrashing
 

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh
 Attachments: YARN-3453.1.patch


 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1942) Many of ConverterUtils methods need to have public interfaces

2015-06-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571668#comment-14571668
 ] 

Vinod Kumar Vavilapalli commented on YARN-1942:
---

bq. It seems that we have more than ConverterUtils that has been referenced by 
external projects. For example, in YARN-1462, we just encountered the issue 
that newInstance is marked as @Private, but it's actually referenced by Tez.
Is this only in tests? Then you need YARN-2792.

 Many of ConverterUtils methods need to have public interfaces
 -

 Key: YARN-1942
 URL: https://issues.apache.org/jira/browse/YARN-1942
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.4.0
Reporter: Thomas Graves
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-1942.1.patch, YARN-1942.2.patch


 ConverterUtils has a bunch of functions that are useful to application 
 masters.   It should either be made public or we make some of the utilities 
 in it public or we provide other external apis for application masters to 
 use.  Note that distributedshell and MR are both using these interfaces. 
 For instance the main use case I see right now is for getting the application 
 attempt id within the appmaster:
 String containerIdStr =
   System.getenv(Environment.CONTAINER_ID.name());
 ConverterUtils.toContainerId
 ContainerId containerId = ConverterUtils.toContainerId(containerIdStr);
   ApplicationAttemptId applicationAttemptId =
   containerId.getApplicationAttemptId();
 I don't see any other way for the application master to get this information. 
  If there is please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571669#comment-14571669
 ] 

Zhijie Shen commented on YARN-1462:
---

Reverted.

 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1462-branch-2.7-1.2.patch, 
 YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
 YARN-1462.3.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation

2015-06-03 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571765#comment-14571765
 ] 

Li Lu commented on YARN-2928:
-

Thanks [~sjlee0], [~jrottinghuis], and [~vrushalic] for hosting the benchmark 
session. This is very helpful! 

 YARN Timeline Service: Next generation
 --

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
 v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, 
 TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1462:

Attachment: YARN-1462.4.patch

 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1462-branch-2.7-1.2.patch, 
 YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
 YARN-1462.3.patch, YARN-1462.4.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3766:

Attachment: YARN-3766.1.patch

Create a patch to fix it. No testcases needed

 ATS Web UI breaks because of YARN-3467
 --

 Key: YARN-3766
 URL: https://issues.apache.org/jira/browse/YARN-3766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Affects Versions: 2.8.0
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch


 The ATS web UI breaks because of the following changes made in YARN-3467.
 {code}
 +++ 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
 @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
.append(, 'mRender': renderHadoopDate })
.append(\n, {'sType':'numeric', bSearchable:false, 'aTargets':);
  if (isFairSchedulerPage) {
 -  sb.append([11]);
 +  sb.append([13]);
  } else if (isResourceManager) {
 -  sb.append([10]);
 +  sb.append([12]);
  } else {
sb.append([9]);
  }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-06-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571859#comment-14571859
 ] 

Hadoop QA commented on YARN-3453:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m  7s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 39s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 30s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  50m  7s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  86m 58s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737386/YARN-3453.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / bc85959 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8187/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8187/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8187/console |


This message was automatically generated.

 Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
 even in DRF mode causing thrashing
 

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh
 Attachments: YARN-3453.1.patch, YARN-3453.2.patch


 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571676#comment-14571676
 ] 

Hudson commented on YARN-1462:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7956 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7956/])
Revert YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in 
(zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7)
* hadoop-yarn-project/CHANGES.txt
Revert YARN-1462. Made RM write application tags to timeline server and 
exposed them to users via generic history web UI and REST API. Contributed by 
Xuan Gong. (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java


 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1462-branch-2.7-1.2.patch, 
 YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
 YARN-1462.3.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another

2015-06-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571797#comment-14571797
 ] 

Hadoop QA commented on YARN-3764:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m 52s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 48s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 41s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 4  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  54m  5s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 100m  6s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737374/YARN-3764.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / bc85959 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8186/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8186/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8186/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8186/console |


This message was automatically generated.

 CapacityScheduler should forbid moving LeafQueue from one parent to another
 ---

 Key: YARN-3764
 URL: https://issues.apache.org/jira/browse/YARN-3764
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3764.1.patch


 Currently CapacityScheduler doesn't handle the case well, for example:
 A queue structure:
 {code}
 root
   |
   a (100)
 /   \
x y
   (50)   (50)
 {code}
 And reinitialize using following structure:
 {code}
  root
  /   \ 
 (50)a x (50)
 |
 y
(100)
 {code}
 The actual queue structure after reinitialize is:
 {code}
  root
 /\
a (50) x (50)
   /  \
  xy
 (50)  (100)
 {code}
 We should forbid admin doing that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571867#comment-14571867
 ] 

Hadoop QA commented on YARN-3766:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 25s | Pre-patch trunk has 3 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 54s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 59s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   3m 12s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-server-common. |
| | |  42m 39s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737404/YARN-3766.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / bc85959 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8189/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8189/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8189/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8189/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8189/console |


This message was automatically generated.

 ATS Web UI breaks because of YARN-3467
 --

 Key: YARN-3766
 URL: https://issues.apache.org/jira/browse/YARN-3766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Affects Versions: 2.8.0
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch


 The ATS web UI breaks because of the following changes made in YARN-3467.
 {code}
 +++ 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
 @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
.append(, 'mRender': renderHadoopDate })
.append(\n, {'sType':'numeric', bSearchable:false, 'aTargets':);
  if (isFairSchedulerPage) {
 -  sb.append([11]);
 +  sb.append([13]);
  } else if (isResourceManager) {
 -  sb.append([10]);
 +  sb.append([12]);
  } else {
sb.append([9]);
  }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571896#comment-14571896
 ] 

Hudson commented on YARN-3749:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7958 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7958/])
YARN-3749. We should make a copy of configuration when init (xgong: rev 
5766a04428f65bb008b5c451f6f09e61e1000300)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java


 We should make a copy of configuration when init MiniYARNCluster with 
 multiple RMs
 --

 Key: YARN-3749
 URL: https://issues.apache.org/jira/browse/YARN-3749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chun Chen
Assignee: Chun Chen
 Fix For: 2.8.0

 Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
 YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
 YARN-3749.patch


 When I was trying to write a test case for YARN-2674, I found DS client 
 trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
 when RM failover. But I initially set 
 yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
 yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
 in ClientRMService where the value of yarn.resourcemanager.address.rm2 
 changed to 0.0.0.0:18032. See the following code in ClientRMService:
 {code}
 clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_ADDRESS,

 YarnConfiguration.DEFAULT_RM_ADDRESS,
server.getListenerAddress());
 {code}
 Since we use the same instance of configuration in rm1 and rm2 and init both 
 RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
 during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
 starting of rm1.
 So I think it is safe to make a copy of configuration when init both of the 
 rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another

2015-06-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571872#comment-14571872
 ] 

Jian He commented on YARN-3764:
---

looks good, +1

 CapacityScheduler should forbid moving LeafQueue from one parent to another
 ---

 Key: YARN-3764
 URL: https://issues.apache.org/jira/browse/YARN-3764
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3764.1.patch


 Currently CapacityScheduler doesn't handle the case well, for example:
 A queue structure:
 {code}
 root
   |
   a (100)
 /   \
x y
   (50)   (50)
 {code}
 And reinitialize using following structure:
 {code}
  root
  /   \ 
 (50)a x (50)
 |
 y
(100)
 {code}
 The actual queue structure after reinitialize is:
 {code}
  root
 /\
a (50) x (50)
   /  \
  xy
 (50)  (100)
 {code}
 We should forbid admin doing that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571886#comment-14571886
 ] 

Xuan Gong commented on YARN-3749:
-

Committed into trunk/branch-2. Thanks, Chun Chen. And thanks for review, zhihai

 We should make a copy of configuration when init MiniYARNCluster with 
 multiple RMs
 --

 Key: YARN-3749
 URL: https://issues.apache.org/jira/browse/YARN-3749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chun Chen
Assignee: Chun Chen
 Fix For: 2.8.0

 Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
 YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
 YARN-3749.patch


 When I was trying to write a test case for YARN-2674, I found DS client 
 trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
 when RM failover. But I initially set 
 yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
 yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
 in ClientRMService where the value of yarn.resourcemanager.address.rm2 
 changed to 0.0.0.0:18032. See the following code in ClientRMService:
 {code}
 clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_ADDRESS,

 YarnConfiguration.DEFAULT_RM_ADDRESS,
server.getListenerAddress());
 {code}
 Since we use the same instance of configuration in rm1 and rm2 and init both 
 RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
 during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
 starting of rm1.
 So I think it is safe to make a copy of configuration when init both of the 
 rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-06-03 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571821#comment-14571821
 ] 

Srikanth Kandula commented on YARN-3366:


1) Does this also capture the network usage due to non containers? For eg. that 
due to evacuation or replication or data downloads? 

2) What about receive bandwidth?

3) Perhaps i missed this above, but what are the overhead microbenchmark 
numbers re: added latency for normal packets and extra cpu usage overall due to 
sending packets through tc/ due to polling tc counters periodically?

 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Fix For: 2.8.0

 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
 YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
 YARN-3366.006.patch, YARN-3366.007.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS

2015-06-03 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571822#comment-14571822
 ] 

Hitesh Shah commented on YARN-2513:
---

+1 to making this available for ATS v1. Would be useful in various deployments .

 Host framework UIs in YARN for use with the ATS
 ---

 Key: YARN-2513
 URL: https://issues.apache.org/jira/browse/YARN-2513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, 
 YARN-2513.v3.patch


 Allow for pluggable UIs as described by TEZ-8. Yarn can provide the 
 infrastructure to host java script and possible java UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-06-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571685#comment-14571685
 ] 

Hadoop QA commented on YARN-3044:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 50s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 43s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 35s | The applied patch generated  1 
new checkstyle issues (total was 242, now 242). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m 48s | The patch appears to introduce 7 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   0m 28s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |  50m 54s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| {color:green}+1{color} | yarn tests |   1m 14s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  98m  2s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737355/YARN-3044-YARN-2928.010.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 2e12480 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/YARN-2928FindbugsWarningshadoop-yarn-api.html
 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/YARN-2928FindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8184/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8184/console |


This message was automatically generated.

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
 YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, 
 YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, 
 YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
 YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3765) Fix findbugs the warning in YARN-2928 branch, TimelineMetric

2015-06-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571709#comment-14571709
 ] 

Hadoop QA commented on YARN-3765:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 22s | Pre-patch YARN-2928 has 1 
extant Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 45s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 47s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  1s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 42s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 34s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings, and fixes 1 pre-existing warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| | |  39m 40s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737370/YARN-3765-YARN-2928.001.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 2e12480 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8185/artifact/patchprocess/YARN-2928FindbugsWarningshadoop-yarn-api.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8185/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8185/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8185/console |


This message was automatically generated.

 Fix findbugs the warning in YARN-2928 branch, TimelineMetric
 

 Key: YARN-3765
 URL: https://issues.apache.org/jira/browse/YARN-3765
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3765-YARN-2928.001.patch


 There is one warning about reversing the return value of comparisons in 
 YARN-2928 branch. This is a valid warning. Quoting the findbugs warning 
 message:
 RV_NEGATING_RESULT_OF_COMPARETO: Negating the result of compareTo()/compare()
 This code negatives the return value of a compareTo or compare method. This 
 is a questionable or bad programming practice, since if the return value is 
 Integer.MIN_VALUE, negating the return value won't negate the sign of the 
 result. You can achieve the same intended result by reversing the order of 
 the operands rather than by negating the results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-06-03 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-3453:
--
Attachment: YARN-3453.2.patch

Agreed..
Updating patch with your suggestion.. 

 Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
 even in DRF mode causing thrashing
 

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh
 Attachments: YARN-3453.1.patch, YARN-3453.2.patch


 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1942) Many of ConverterUtils methods need to have public interfaces

2015-06-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571755#comment-14571755
 ] 

Zhijie Shen commented on YARN-1942:
---

[~sershe], would you please comment?

 Many of ConverterUtils methods need to have public interfaces
 -

 Key: YARN-1942
 URL: https://issues.apache.org/jira/browse/YARN-1942
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.4.0
Reporter: Thomas Graves
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-1942.1.patch, YARN-1942.2.patch


 ConverterUtils has a bunch of functions that are useful to application 
 masters.   It should either be made public or we make some of the utilities 
 in it public or we provide other external apis for application masters to 
 use.  Note that distributedshell and MR are both using these interfaces. 
 For instance the main use case I see right now is for getting the application 
 attempt id within the appmaster:
 String containerIdStr =
   System.getenv(Environment.CONTAINER_ID.name());
 ConverterUtils.toContainerId
 ContainerId containerId = ConverterUtils.toContainerId(containerIdStr);
   ApplicationAttemptId applicationAttemptId =
   containerId.getApplicationAttemptId();
 I don't see any other way for the application master to get this information. 
  If there is please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3766:

Priority: Blocker  (was: Major)

 ATS Web UI breaks because of YARN-3467
 --

 Key: YARN-3766
 URL: https://issues.apache.org/jira/browse/YARN-3766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Affects Versions: 2.8.0
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker

 The ATS web UI breaks because of the following changes made in YARN-3467.
 {code}
 +++ 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
 @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
.append(, 'mRender': renderHadoopDate })
.append(\n, {'sType':'numeric', bSearchable:false, 'aTargets':);
  if (isFairSchedulerPage) {
 -  sb.append([11]);
 +  sb.append([13]);
  } else if (isResourceManager) {
 -  sb.append([10]);
 +  sb.append([12]);
  } else {
sb.append([9]);
  }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3766:

Affects Version/s: 2.8.0

 ATS Web UI breaks because of YARN-3467
 --

 Key: YARN-3766
 URL: https://issues.apache.org/jira/browse/YARN-3766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Affects Versions: 2.8.0
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker

 The ATS web UI breaks because of the following changes made in YARN-3467.
 {code}
 +++ 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
 @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
.append(, 'mRender': renderHadoopDate })
.append(\n, {'sType':'numeric', bSearchable:false, 'aTargets':);
  if (isFairSchedulerPage) {
 -  sb.append([11]);
 +  sb.append([13]);
  } else if (isResourceManager) {
 -  sb.append([10]);
 +  sb.append([12]);
  } else {
sb.append([9]);
  }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-03 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-3766:
---

 Summary: ATS Web UI breaks because of YARN-3467
 Key: YARN-3766
 URL: https://issues.apache.org/jira/browse/YARN-3766
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong


The ATS web UI breaks because of the following changes made in YARN-3467.
{code}
+++ 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
@@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
   .append(, 'mRender': renderHadoopDate })
   .append(\n, {'sType':'numeric', bSearchable:false, 'aTargets':);
 if (isFairSchedulerPage) {
-  sb.append([11]);
+  sb.append([13]);
 } else if (isResourceManager) {
-  sb.append([10]);
+  sb.append([12]);
 } else {
   sb.append([9]);
 }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-06-03 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571925#comment-14571925
 ] 

Naganarasimha G R commented on YARN-3044:
-

Hi [~zjshen],
bq. It not caused by this patch, but I found a race condition of publishing app 
finish event
i got stuck big time with YARN-3045 for similar issue in NM side, and wanted to 
propose the same but was not sure whether the approach was fine. 
Will take care of this in RM side as you mentioned but shall i adopt the 
similar approach in NM side ? 

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
 YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, 
 YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, 
 YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
 YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571658#comment-14571658
 ] 

Wangda Tan commented on YARN-3733:
--

Patch LGTM generally, will commit the patch once [~sunilg] +1.

 DominantRC#compare() does not work as expected if cluster resource is empty
 ---

 Key: YARN-3733
 URL: https://issues.apache.org/jira/browse/YARN-3733
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3 , 2 NM , 2 RM
 one NM - 3 GB 6 v core
Reporter: Bibin A Chundatt
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
 YARN-3733.patch


 Steps to reproduce
 =
 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
 size to 512 MB
 3. Configure capacity scheduler and AM limit to .5 
 (DominantResourceCalculator is configured)
 4. Submit 30 concurrent task 
 5. Switch RM
 Actual
 =
 For 12 Jobs AM gets allocated and all 12 starts running
 No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
 Expected
 ===
 Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation

2015-06-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571660#comment-14571660
 ] 

Sangjin Lee commented on YARN-2928:
---

Thanks [~vrushalic] for the summary!

 YARN Timeline Service: Next generation
 --

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
 v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, 
 TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571811#comment-14571811
 ] 

Xuan Gong commented on YARN-1462:
-

Create a new patch with compatible newInstance change.

 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1462-branch-2.7-1.2.patch, 
 YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
 YARN-1462.3.patch, YARN-1462.4.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3766:

Attachment: ATSWebPageBreaks.png

Uploaded a screen shot

 ATS Web UI breaks because of YARN-3467
 --

 Key: YARN-3766
 URL: https://issues.apache.org/jira/browse/YARN-3766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Affects Versions: 2.8.0
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: ATSWebPageBreaks.png


 The ATS web UI breaks because of the following changes made in YARN-3467.
 {code}
 +++ 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
 @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
.append(, 'mRender': renderHadoopDate })
.append(\n, {'sType':'numeric', bSearchable:false, 'aTargets':);
  if (isFairSchedulerPage) {
 -  sb.append([11]);
 +  sb.append([13]);
  } else if (isResourceManager) {
 -  sb.append([10]);
 +  sb.append([12]);
  } else {
sb.append([9]);
  }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571879#comment-14571879
 ] 

Xuan Gong commented on YARN-3749:
-

+1. LGTM. Will commit

 We should make a copy of configuration when init MiniYARNCluster with 
 multiple RMs
 --

 Key: YARN-3749
 URL: https://issues.apache.org/jira/browse/YARN-3749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chun Chen
Assignee: Chun Chen
 Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
 YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
 YARN-3749.patch


 When I was trying to write a test case for YARN-2674, I found DS client 
 trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
 when RM failover. But I initially set 
 yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
 yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
 in ClientRMService where the value of yarn.resourcemanager.address.rm2 
 changed to 0.0.0.0:18032. See the following code in ClientRMService:
 {code}
 clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_ADDRESS,

 YarnConfiguration.DEFAULT_RM_ADDRESS,
server.getListenerAddress());
 {code}
 Since we use the same instance of configuration in rm1 and rm2 and init both 
 RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
 during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
 starting of rm1.
 So I think it is safe to make a copy of configuration when init both of the 
 rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-03 Thread Lavkesh Lahngir (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lavkesh Lahngir updated YARN-3745:
--
Attachment: YARN-3745.1.patch

Added test. 
With previous implementation the test was failing with NoSuchMethodException
{code}
testDeserializeWithDefaultConstructor(org.apache.hadoop.yarn.api.records.impl.pb.TestSerializedExceptionPBImpl)
  Time elapsed: 0.129 sec   ERROR!
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.lang.NoSuchMethodException: 
java.nio.channels.ClosedChannelException.init(java.lang.String)
at java.lang.Class.getConstructor0(Class.java:2892)
at java.lang.Class.getConstructor(Class.java:1723)
at 
org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:181)
at 
org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at 
org.apache.hadoop.yarn.api.records.impl.pb.TestSerializedExceptionPBImpl.testDeserializeWithDefaultConstructor(TestSerializedExceptionPBImpl.java:72)
{code}

 SerializedException should also try to instantiate internal exception with 
 the default constructor
 --

 Key: YARN-3745
 URL: https://issues.apache.org/jira/browse/YARN-3745
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3745.1.patch, YARN-3745.patch


 While deserialising a SerializedException it tries to create internal 
 exception in instantiateException() with cn = 
 cls.getConstructor(String.class).
 if cls does not has a constructor with String parameter it throws 
 Nosuchmethodexception
 for example ClosedChannelException class.  
 We should also try to instantiate exception with default constructor so that 
 inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness

2015-06-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570664#comment-14570664
 ] 

Sunil G commented on YARN-3510:
---

Hi [~cwelch] 
Thanks for taking this optimization up. I have few doubts here.

Here an evenly distributed preempting policy across applications are tried. But 
each application internally has containers from different priorities, and least 
priority container is selected first from an application for preemption.

Now consider a scenario where we have 2 applications (assuming map reduce). 
{noformat}
App1 has containers 10 containers:Priority 10, 5 containers:Priority 20   Old 
timestamp
App2 has containers 10 containers:Priority 10, 2 containers:Priority 20   New 
timestamp
{noformat}
As per new implementation, after 2 rounds, some containers of priority 10(maps) 
may  get preempted if I am not wrong. Is this intentional, because killing maps 
is costlier.

I feel, we can group containers based on priority among all applications, and 
then can do this preemption at each container priority level. It may be more 
better but we may have more buckets of priorities. Please share your thoughts.



 Create an extension of ProportionalCapacityPreemptionPolicy which preempts a 
 number of containers from each application in a way which respects fairness
 

 Key: YARN-3510
 URL: https://issues.apache.org/jira/browse/YARN-3510
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3510.2.patch, YARN-3510.3.patch, YARN-3510.5.patch, 
 YARN-3510.6.patch


 The ProportionalCapacityPreemptionPolicy preempts as many containers from 
 applications as it can during it's preemption run.  For fifo this makes 
 sense, as it is prempting in reverse order  therefore maintaining the 
 primacy of the oldest.  For fair ordering this does not have the desired 
 effect - instead, it should preempt a number of containers from each 
 application which maintains a fair balance /close to a fair balance between 
 them



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3432) Cluster metrics have wrong Total Memory when there is reserved memory on CS

2015-06-03 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570680#comment-14570680
 ] 

Brahma Reddy Battula commented on YARN-3432:


Kindly review the attached patch!!!

 Cluster metrics have wrong Total Memory when there is reserved memory on CS
 ---

 Key: YARN-3432
 URL: https://issues.apache.org/jira/browse/YARN-3432
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Brahma Reddy Battula
 Attachments: YARN-3432-002.patch, YARN-3432.patch


 I noticed that when reservations happen when using the Capacity Scheduler, 
 the UI and web services report the wrong total memory.
 For example.  I have a 300GB of total memory in my cluster.  I allocate 50 
 and I reserve 10.  The cluster metrics for total memory get reported as 290GB.
 This was broken by https://issues.apache.org/jira/browse/YARN-656 so perhaps 
 there is a difference between fair scheduler and capacity scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure

2015-06-03 Thread Lavkesh Lahngir (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570652#comment-14570652
 ] 

Lavkesh Lahngir commented on YARN-3591:
---

Thanks [~sunilg] and [~zxu] for comments and review. I did slightly 
differently. I added newRepairedDirs and newErrorDirs into DirectoryCollection. 
 
In this version checkLocalizedResources(dirsTocheck) takes the list of good 
dirs.

{code:title=DirectoryCollection.java|borderStyle=solid}
+  private ListString newErrorDirs;
+  private ListString newRepariedDirs;
 
   private int numFailures;
   
@@ -159,6 +161,8 @@ public DirectoryCollection(String[] dirs,
 localDirs = new CopyOnWriteArrayListString(dirs);
 errorDirs = new CopyOnWriteArrayListString();
 fullDirs = new CopyOnWriteArrayListString();
+newErrorDirs = new CopyOnWriteArrayListString();
+newRepariedDirs = new CopyOnWriteArrayListString();
 
 
@@ -213,6 +217,20 @@ synchronized int getNumFailures() {
   }
 
   /**
+   * @return Recently discovered error dirs
+   */
+  synchronized ListString getNewErrorDirs() {
+return newErrorDirs;
+  }
+
+  /**
+   * @return Recently discovered repaired dirs
+   */
+  synchronized ListString getNewRepairedDirs() {
+return newRepariedDirs;
+  }
+

@@ -259,6 +277,8 @@ synchronized boolean checkDirs() {
 localDirs.clear();
 errorDirs.clear();
 fullDirs.clear();
+newRepariedDirs.clear();
+newErrorDirs.clear();
 
 for (Map.EntryString, DiskErrorInformation entry : dirsFailedCheck
   .entrySet()) {
@@ -292,6 +312,11 @@ synchronized boolean checkDirs() {
 }
 SetString postCheckFullDirs = new HashSetString(fullDirs);
 SetString postCheckOtherDirs = new HashSetString(errorDirs);
+for (String dir : preCheckGoodDirs) {
+  if (postCheckOtherDirs.contains(dir)) {
+newErrorDirs.add(dir);
+  }
+}
 for (String dir : preCheckFullDirs) {
   if (postCheckOtherDirs.contains(dir)) {
 LOG.warn(Directory  + dir +  error 
@@ -304,6 +329,9 @@ synchronized boolean checkDirs() {
 LOG.warn(Directory  + dir +  error 
 + dirsFailedCheck.get(dir).message);
   }
+  if (localDirs.contains(dir) || postCheckFullDirs.contains(dir)) {
+newRepariedDirs.add(dir);
+  }
 }
{code}

{code:title=LocalDirsHandlerService.java|borderStyle=solid}
+   * @return Recently added error dirs
+   */
+  public ListString getDiskNewErrorDirs() {
+return localDirs.getNewErrorDirs();
+  }
+
+  /**
+   * @return Recently added repaired dirs
+   */
+  public ListString getDiskNewRepairedDirs() {
+return localDirs.getNewRepairedDirs();
+  }
{code}

{code:title=ResourceLocalizationService.java|borderStyle=solid}
   @Override
   public void onDirsChanged() {
 checkAndInitializeLocalDirs();
+ListString dirsTocheck =
+new ArrayListString(dirsHandler.getLocalDirs());
+dirsTocheck.addAll(dirsHandler.getDiskFullLocalDirs());
+// checks if resources are present in the dirsTocheck
+publicRsrc.checkLocalizedResources(dirsTocheck);
 for (LocalResourcesTracker tracker : privateRsrc.values()) {
+  tracker.checkLocalizedResources(dirsTocheck);
+}
+ListString newRepairedDirs = dirsHandler.getDiskNewRepairedDirs();
+// Delete any resources found in the newly repaired Dirs.
+for (String dir : newRepairedDirs) {
+  cleanUpLocalDir(lfs, delService, dir);
 }
+// Add code here to add errordirs to statestore.
   }
 };
{code}

{code:title=DirectoryCollection.java|borderStyle=solid}
  synchronized ListString getErrorDirs() {
return Collections.unmodifiableList(errorDirs);
  }
{code}
We can use getErroeDirs() and keep it in the NMstate as suggested and upon 
start we can do a cleanUpLocalDir on the errordirs.
 

 Resource Localisation on a bad disk causes subsequent containers failure 
 -

 Key: YARN-3591
 URL: https://issues.apache.org/jira/browse/YARN-3591
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, 
 YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch


 It happens when a resource is localised on the disk, after localising that 
 disk has gone bad. NM keeps paths for localised resources in memory.  At the 
 time of resource request isResourcePresent(rsrc) will be called which calls 
 file.exists() on the localised path.
 In some cases when disk has gone bad, inodes are stilled cached and 
 file.exists() returns true. But at the time of reading, file will not open.
 Note: file.exists() actually calls stat64 natively which returns true because 
 it was 

[jira] [Updated] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-03 Thread Lavkesh Lahngir (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lavkesh Lahngir updated YARN-3745:
--
Attachment: YARN-3745.patch

 SerializedException should also try to instantiate internal exception with 
 the default constructor
 --

 Key: YARN-3745
 URL: https://issues.apache.org/jira/browse/YARN-3745
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3745.patch


 While deserialising a SerializedException it tries to create internal 
 exception in instantiateException() with cn = 
 cls.getConstructor(String.class).
 if cls does not has a constructor with String parameter it throws 
 Nosuchmethodexception
 for example ClosedChannelException class.  
 We should also try to instantiate exception with default constructor so that 
 inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570706#comment-14570706
 ] 

Hadoop QA commented on YARN-3745:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  3s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 36s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 54s | The applied patch generated  1 
new checkstyle issues (total was 8, now 9). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 33s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| | |  40m  9s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737155/YARN-3745.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c59e745 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8171/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8171/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8171/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8171/console |


This message was automatically generated.

 SerializedException should also try to instantiate internal exception with 
 the default constructor
 --

 Key: YARN-3745
 URL: https://issues.apache.org/jira/browse/YARN-3745
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3745.patch


 While deserialising a SerializedException it tries to create internal 
 exception in instantiateException() with cn = 
 cls.getConstructor(String.class).
 if cls does not has a constructor with String parameter it throws 
 Nosuchmethodexception
 for example ClosedChannelException class.  
 We should also try to instantiate exception with default constructor so that 
 inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3432) Cluster metrics have wrong Total Memory when there is reserved memory on CS

2015-06-03 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3432:
---
Attachment: YARN-3432-002.patch

 Cluster metrics have wrong Total Memory when there is reserved memory on CS
 ---

 Key: YARN-3432
 URL: https://issues.apache.org/jira/browse/YARN-3432
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Brahma Reddy Battula
 Attachments: YARN-3432-002.patch, YARN-3432.patch


 I noticed that when reservations happen when using the Capacity Scheduler, 
 the UI and web services report the wrong total memory.
 For example.  I have a 300GB of total memory in my cluster.  I allocate 50 
 and I reserve 10.  The cluster metrics for total memory get reported as 290GB.
 This was broken by https://issues.apache.org/jira/browse/YARN-656 so perhaps 
 there is a difference between fair scheduler and capacity scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570869#comment-14570869
 ] 

Hadoop QA commented on YARN-3745:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  8s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 52s | The applied patch generated  1 
new checkstyle issues (total was 8, now 9). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 33s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 55s | Tests passed in 
hadoop-yarn-common. |
| | |  40m  7s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737170/YARN-3745.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c59e745 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8173/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8173/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8173/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8173/console |


This message was automatically generated.

 SerializedException should also try to instantiate internal exception with 
 the default constructor
 --

 Key: YARN-3745
 URL: https://issues.apache.org/jira/browse/YARN-3745
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3745.1.patch, YARN-3745.patch


 While deserialising a SerializedException it tries to create internal 
 exception in instantiateException() with cn = 
 cls.getConstructor(String.class).
 if cls does not has a constructor with String parameter it throws 
 Nosuchmethodexception
 for example ClosedChannelException class.  
 We should also try to instantiate exception with default constructor so that 
 inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server

2015-06-03 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570921#comment-14570921
 ] 

Chang Li commented on YARN-2556:


[~zjshen], [~djp] could you please help review the latest patch? Thanks!

 Tool to measure the performance of the timeline server
 --

 Key: YARN-2556
 URL: https://issues.apache.org/jira/browse/YARN-2556
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Chang Li
  Labels: BB2015-05-TBR
 Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
 YARN-2556.1.patch, YARN-2556.10.patch, YARN-2556.11.patch, 
 YARN-2556.12.patch, YARN-2556.13.patch, YARN-2556.13.whitespacefix.patch, 
 YARN-2556.14.patch, YARN-2556.14.whitespacefix.patch, YARN-2556.2.patch, 
 YARN-2556.3.patch, YARN-2556.4.patch, YARN-2556.5.patch, YARN-2556.6.patch, 
 YARN-2556.7.patch, YARN-2556.8.patch, YARN-2556.9.patch, YARN-2556.patch, 
 yarn2556.patch, yarn2556.patch, yarn2556_wip.patch


 We need to be able to understand the capacity model for the timeline server 
 to give users the tools they need to deploy a timeline server with the 
 correct capacity.
 I propose we create a mapreduce job that can measure timeline server write 
 and read performance. Transactions per second, I/O for both read and write 
 would be a good start.
 This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570950#comment-14570950
 ] 

Hadoop QA commented on YARN-3733:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 20s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 34s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 40s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  0s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  50m 12s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  93m 53s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737171/0002-YARN-3733.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c59e745 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8174/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8174/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8174/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8174/console |


This message was automatically generated.

 DominantRC#compare() does not work as expected if cluster resource is empty
 ---

 Key: YARN-3733
 URL: https://issues.apache.org/jira/browse/YARN-3733
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3 , 2 NM , 2 RM
 one NM - 3 GB 6 v core
Reporter: Bibin A Chundatt
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
 YARN-3733.patch


 Steps to reproduce
 =
 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
 size to 512 MB
 3. Configure capacity scheduler and AM limit to .5 
 (DominantResourceCalculator is configured)
 4. Submit 30 concurrent task 
 5. Switch RM
 Actual
 =
 For 12 Jobs AM gets allocated and all 12 starts running
 No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
 Expected
 ===
 Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570937#comment-14570937
 ] 

Sunil G commented on YARN-3733:
---

Thank you [~rohithsharma] for the detailed information and patch.

1. Could we add a test case where only memory or vcores are more in 
TestCapacityScheduler.
{code}
Resource amResource2 =
Resource.newInstance(amResourceLimit.getMemory() + 1,
amResourceLimit.getVirtualCores());
{code}

2. In TestCapacityScheduler#verifyAMLimitForLeafQueue, while submitting second 
app, you could change the app name to app-2.

 DominantRC#compare() does not work as expected if cluster resource is empty
 ---

 Key: YARN-3733
 URL: https://issues.apache.org/jira/browse/YARN-3733
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3 , 2 NM , 2 RM
 one NM - 3 GB 6 v core
Reporter: Bibin A Chundatt
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
 YARN-3733.patch


 Steps to reproduce
 =
 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
 size to 512 MB
 3. Configure capacity scheduler and AM limit to .5 
 (DominantResourceCalculator is configured)
 4. Submit 30 concurrent task 
 5. Switch RM
 Actual
 =
 For 12 Jobs AM gets allocated and all 12 starts running
 No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
 Expected
 ===
 Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-06-03 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570941#comment-14570941
 ] 

Varun Saxena commented on YARN-3051:


bq. I noticed there is a readerLimit for read operations, which works for ATS 
v1. I'm wondering if it's fine to use -1 to indicate there's no such limit? Not 
sure if this feature is already there.
You mean limit to limit the number of records ?

bq. The fromId parameter, we may need to be careful on the concept of id. In 
timeline v2 we need context information to identify each entity, such as 
cluster, user, flow, run. When querying with fromId, what kind of assumptions 
should we make on the id here?
{{fromId}} is primarily there to be backward compatible with ATS v1. It is used 
in context of entity ID only. This will be documented in the javadoc. I have 
not changed names of the query params (if these parameters are supported in ATS 
v1).
Whether we need to support same REST endpoints as ATS v1 for the sake of 
backward compatibility or whether we can break the backward compatibility(in 
case of no use case) is something which I wanted to discuss. Commented on 
YARN-3411 as well regarding one such param.

bq. In some APIs, we're requiring clusterID and appID, but not having flow/run 
informationMaybe we can have flow and run information as optional 
parameters so that we can avoid full table scans when the caller does have flow 
and run information?
Agree with your suggestion. Even I was thinking about including them in the 
next patch as query params. This will make the parameter list even longer :)

bq. The current APIs require a pretty long list of parameters. For most of the 
use cases, I think we can abstract something much simpler.
These parameters are directly fetched from query params coming in REST API and 
are directly passed down to storage layer(after minor verification). Yes, we 
can decide on few of the key parameters(which correspond to row key/primary 
key) and have different methods for that. And have different reader API methods 
for them as well.

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051-YARN-2928.003.patch, 
 YARN-3051-YARN-2928.03.patch, YARN-3051.wip.02.YARN-2928.patch, 
 YARN-3051.wip.patch, YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3733) DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-03 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3733:
-
Attachment: 0002-YARN-3733.patch

Thanks [~sunilg] and [~leftnoteasy] for sharing your thoughts..

I modified bit of logic and the order of if check so that it should handle all 
the possible combination of inputs below table. The problem was in 5th and 7th 
inputs. The validation returning 1 but it was expected to be zero  for 5th 
combinations i.e flow never reach 2nd check since 1st step is OR for memory vs 
cpu.
||Sl.no||cr||lhs||rhs||Output||
|1|0,0| 1,1 | 1,1 | 0 |
|2|0,0| 1,1 | 0,0 | 1 |
|3|0,0| 0,0 | 1,1 | -1 |
|4|0,0| 0,1 | 1,0 |  0 |
|5|0,0| 1,0 | 0,1 |  0 |
|6|0,0| 1,1 | 1,0 | 1  |
|7|0,0| 1,0 | 1,1 | -1  |

Updated Patch has followig change : 
# Changed the logic for comparing lhs and rhs resources when clusterResource is 
empty as suggested.
# Added test for AMLimit usage.
# Addred test for all above cobination of inputs.

Kindly review the patch

 DominantRC#compare() does not work as expected if cluster resource is empty
 ---

 Key: YARN-3733
 URL: https://issues.apache.org/jira/browse/YARN-3733
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3 , 2 NM , 2 RM
 one NM - 3 GB 6 v core
Reporter: Bibin A Chundatt
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
 YARN-3733.patch


 Steps to reproduce
 =
 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
 size to 512 MB
 3. Configure capacity scheduler and AM limit to .5 
 (DominantResourceCalculator is configured)
 4. Submit 30 concurrent task 
 5. Switch RM
 Actual
 =
 For 12 Jobs AM gets allocated and all 12 starts running
 No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
 Expected
 ===
 Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3754) Race condition when the NodeManager is shutting down and container is launched

2015-06-03 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3754:
--
Attachment: NM.log

 Race condition when the NodeManager is shutting down and container is launched
 --

 Key: YARN-3754
 URL: https://issues.apache.org/jira/browse/YARN-3754
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Sunil G
Priority: Critical
 Attachments: NM.log


 Container is launched and returned to ContainerImpl
 NodeManager closed the DB connection which resulting in 
 {{org.iq80.leveldb.DBException: Closed}}. 
 *Attaching the exception trace*
 {code}
 2015-05-30 02:11:49,122 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
  Unable to update state store diagnostics for 
 container_e310_1432817693365_3338_01_02
 java.io.IOException: org.iq80.leveldb.DBException: Closed
 at 
 org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:261)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1109)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1101)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1129)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:83)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:246)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: org.iq80.leveldb.DBException: Closed
 at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:123)
 at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:106)
 at 
 org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:259)
 ... 15 more
 {code}
 we can add a check whether DB is closed while we move container from ACQUIRED 
 state.
 As per the discussion in YARN-3585 have add the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3754) Race condition when the NodeManager is shutting down and container is launched

2015-06-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570790#comment-14570790
 ] 

Sunil G commented on YARN-3754:
---

I have got the logs from [~bibinchundatt] offline.

{noformat}
2015-05-30 01:11:16,179 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception 
from container-launch with container ID: 
container_e313_1432908361253_4506_01_01 and exit code: 0
java.io.IOException: java.lang.InterruptedException
...
...
2015-05-30 01:11:16,179 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Unable to update diagnostics in state store for 
container_e313_1432908361253_4506_01_01
java.io.IOException: org.iq80.leveldb.DBException: Closed
at 
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostic
{noformat}

When NM is shutting down, ContainerLaunch is also interrupted. During this 
interrupted exception handling, NM tries to update container diagnostics. But 
from main thread statestore is down ,hence caused the DB Close exception.

This scenario is handled in YARN-3641 already by [~djp] . [~bibinchundatt] 
could you please update this patch and check this and we can close this ticket 
as duplicate. Attaching NM logs too.


 Race condition when the NodeManager is shutting down and container is launched
 --

 Key: YARN-3754
 URL: https://issues.apache.org/jira/browse/YARN-3754
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Sunil G
Priority: Critical

 Container is launched and returned to ContainerImpl
 NodeManager closed the DB connection which resulting in 
 {{org.iq80.leveldb.DBException: Closed}}. 
 *Attaching the exception trace*
 {code}
 2015-05-30 02:11:49,122 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
  Unable to update state store diagnostics for 
 container_e310_1432817693365_3338_01_02
 java.io.IOException: org.iq80.leveldb.DBException: Closed
 at 
 org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:261)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1109)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1101)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1129)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:83)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:246)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: org.iq80.leveldb.DBException: Closed
 at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:123)
 at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:106)
 at 
 org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:259)
 ... 15 more
 {code}
 we can add a check whether DB is closed while we move container from ACQUIRED 
 state.
 As per the discussion in YARN-3585 have add the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570970#comment-14570970
 ] 

Sunil G commented on YARN-3745:
---

HI [~lavkesh]
Thanks for working on this patch.
In initExceptionWithConstructor, I feel  *IllegalArgumentException* also has to 
be thrown. Its missing now.

 SerializedException should also try to instantiate internal exception with 
 the default constructor
 --

 Key: YARN-3745
 URL: https://issues.apache.org/jira/browse/YARN-3745
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3745.1.patch, YARN-3745.patch


 While deserialising a SerializedException it tries to create internal 
 exception in instantiateException() with cn = 
 cls.getConstructor(String.class).
 if cls does not has a constructor with String parameter it throws 
 Nosuchmethodexception
 for example ClosedChannelException class.  
 We should also try to instantiate exception with default constructor so that 
 inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-03 Thread Lavkesh Lahngir (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570989#comment-14570989
 ] 

Lavkesh Lahngir commented on YARN-3745:
---

[~sunilg] : Uh.. IllegalArgumentException is not a checked Exception. It is not 
needed to be declared thrown.

 SerializedException should also try to instantiate internal exception with 
 the default constructor
 --

 Key: YARN-3745
 URL: https://issues.apache.org/jira/browse/YARN-3745
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3745.1.patch, YARN-3745.patch


 While deserialising a SerializedException it tries to create internal 
 exception in instantiateException() with cn = 
 cls.getConstructor(String.class).
 if cls does not has a constructor with String parameter it throws 
 Nosuchmethodexception
 for example ClosedChannelException class.  
 We should also try to instantiate exception with default constructor so that 
 inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571235#comment-14571235
 ] 

Sunil G commented on YARN-3745:
---

Yes. Missed it :) Thanks!

 SerializedException should also try to instantiate internal exception with 
 the default constructor
 --

 Key: YARN-3745
 URL: https://issues.apache.org/jira/browse/YARN-3745
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3745.1.patch, YARN-3745.patch


 While deserialising a SerializedException it tries to create internal 
 exception in instantiateException() with cn = 
 cls.getConstructor(String.class).
 if cls does not has a constructor with String parameter it throws 
 Nosuchmethodexception
 for example ClosedChannelException class.  
 We should also try to instantiate exception with default constructor so that 
 inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571275#comment-14571275
 ] 

Hadoop QA commented on YARN-41:
---

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m 30s | Pre-patch trunk has 3 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 12 new or modified test files. |
| {color:green}+1{color} | javac |   9m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m 20s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 30s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m 15s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 42s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   2m  6s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 42s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m  4s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 35s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   0m 39s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |   7m 23s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  48m  0s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-server-tests. |
| | | 115m 37s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735565/YARN-41-8.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c59e745 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8175/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8175/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8175/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8175/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8175/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-tests test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8175/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8175/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8175/console |


This message was automatically generated.

 The RM should handle the graceful shutdown of the NM.
 -

 Key: YARN-41
 URL: https://issues.apache.org/jira/browse/YARN-41
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Ravi Teja Ch N V
Assignee: Devaraj K
 Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
 MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
 YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, 
 YARN-41-8.patch, YARN-41.patch


 Instead of waiting for the NM expiry, RM should remove and handle the NM, 
 which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3534) Collect memory/cpu usage on the node

2015-06-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571318#comment-14571318
 ] 

Hadoop QA commented on YARN-3534:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 42s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | javac |   7m 34s | The applied patch generated  1  
additional warning messages. |
| {color:red}-1{color} | javadoc |   9m 39s | The applied patch generated  3  
additional warning messages. |
| {color:red}-1{color} | release audit |   0m 18s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 24s | The applied patch generated  9 
new checkstyle issues (total was 212, now 220). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 18s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 21s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   6m  4s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  50m 53s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737047/YARN-3534-10.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c59e745 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/diffJavacWarnings.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/diffJavadocWarnings.txt
 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8176/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8176/console |


This message was automatically generated.

 Collect memory/cpu usage on the node
 

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-10.patch, 
 YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, 
 YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch, 
 YARN-3534-9.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the collection of memory/cpu 
 usage on the node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571335#comment-14571335
 ] 

Sunil G commented on YARN-3751:
---

Hi

 TestAHSWebServices fails after YARN-3467
 

 Key: YARN-3751
 URL: https://issues.apache.org/jira/browse/YARN-3751
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Sunil G
 Attachments: 0001-YARN-3751.patch


 YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
 not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571272#comment-14571272
 ] 

Karthik Kambatla commented on YARN-3762:


The patch just adds read-write locks to address any races. Haven't added any 
tests since it is hard to test race conditions. The code changed is accessed 
directly or indirectly by existing tests. Running jcarder should catch any 
deadlocks introduced by this change.

 FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
 ---

 Key: YARN-3762
 URL: https://issues.apache.org/jira/browse/YARN-3762
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-3762-1.patch, yarn-3762-1.patch


 In our testing, we ran into the following ConcurrentModificationException:
 {noformat}
 halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
 queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
 queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
   at java.util.ArrayList$Itr.next(ArrayList.java:851)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-06-03 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reassigned YARN-3453:
-

Assignee: Arun Suresh

 Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
 even in DRF mode causing thrashing
 

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh

 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-03 Thread Mohammad Shahid Khan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Shahid Khan updated YARN-3017:
---
Attachment: YARN-3017_2.patch

fixed the review comment

 ContainerID in ResourceManager Log Has Slightly Different Format From 
 AppAttemptID
 --

 Key: YARN-3017
 URL: https://issues.apache.org/jira/browse/YARN-3017
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: MUFEED USMAN
Priority: Minor
  Labels: PatchAvailable
 Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch


 Not sure if this should be filed as a bug or not.
 In the ResourceManager log in the events surrounding the creation of a new
 application attempt,
 ...
 ...
 2014-11-14 17:45:37,258 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
 masterappattempt_1412150883650_0001_02
 ...
 ...
 The application attempt has the ID format _1412150883650_0001_02.
 Whereas the associated ContainerID goes by _1412150883650_0001_02_.
 ...
 ...
 2014-11-14 17:45:37,260 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
 up
 container Container: [ContainerId: container_1412150883650_0001_02_01,
 NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, 
 vCores:1,
 disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service:
 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
 ...
 ...
 Curious to know if this is kept like that for a reason. If not while using
 filtering tools to, say, grep events surrounding a specific attempt by the
 numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-06-03 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-3453:
--
Attachment: YARN-3453.1.patch

[~peng.zhang], [~ashwinshankar77],
Thank you reporting this.. and the associated discussion

I vote that we :
# fix the {{isStarved()}} method to use the correct Calculator
# fix the {{resToPreempt()}} method to use componentWiseMin for the target... 
but defer using the {{targetRatio}}, since it is probably an optimization and 
can be addressed in a future JIRA

I have attached a preliminary patch that does this..
Will upload one with test cases shortly


 Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
 even in DRF mode causing thrashing
 

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh
 Attachments: YARN-3453.1.patch


 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571338#comment-14571338
 ] 

Sunil G commented on YARN-3751:
---

Hi [~zjshen]
I checked the patch and the tests are getting passed now. Please check if this 
is fine.

 TestAHSWebServices fails after YARN-3467
 

 Key: YARN-3751
 URL: https://issues.apache.org/jira/browse/YARN-3751
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Sunil G
 Attachments: 0001-YARN-3751.patch


 YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
 not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-03 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1462:
--
Target Version/s: 2.8.0  (was: 2.7.1)

 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1462-branch-2.7-1.2.patch, 
 YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
 YARN-1462.3.patch, YARN-1462.4.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-03 Thread Chun Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571957#comment-14571957
 ] 

Chun Chen commented on YARN-3749:
-

Thanks for reviewing and committing the patch, [~xgong].

 We should make a copy of configuration when init MiniYARNCluster with 
 multiple RMs
 --

 Key: YARN-3749
 URL: https://issues.apache.org/jira/browse/YARN-3749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chun Chen
Assignee: Chun Chen
 Fix For: 2.8.0

 Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
 YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
 YARN-3749.patch


 When I was trying to write a test case for YARN-2674, I found DS client 
 trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
 when RM failover. But I initially set 
 yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
 yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
 in ClientRMService where the value of yarn.resourcemanager.address.rm2 
 changed to 0.0.0.0:18032. See the following code in ClientRMService:
 {code}
 clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_ADDRESS,

 YarnConfiguration.DEFAULT_RM_ADDRESS,
server.getListenerAddress());
 {code}
 Since we use the same instance of configuration in rm1 and rm2 and init both 
 RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
 during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
 starting of rm1.
 So I think it is safe to make a copy of configuration when init both of the 
 rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3706) Generalize native HBase writer for additional tables

2015-06-03 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572000#comment-14572000
 ] 

Joep Rottinghuis commented on YARN-3706:


It turns out that the TimelineWriterUtils.join has a bug where it returns an 
extra byte at the end of the return value if a null argument is passed. In 
attempting to fix this I realized we're having a hard time to distinguish nulls 
from spaces.

As I was discussing the fix with [~sjlee0] I realized that we currently have a 
mix of replace, cleanse etc. Sometimes we replace, sometimes we strip. That is 
a bit of a mess. He wondered if we can simply URL Encode all columns.
Rather than doing that I'm not taking the approach to URL encode the separators 
that are needed, and to change to ensure that we set a limit when splitting 
separators out again.

The only downside is that we still cannot differentiate between null values and 
empty strings, but in most cases when we need to encode qualifiers in columns, 
this will not happen (entity IDs are never null). The other disadvantage is 
that if an identifier (rowkey, related entity key, etc.) contain URL encoded 
strings, we might end up decoding them. I think that is an acceptable approach.

New patch with these fixes coming up.

 Generalize native HBase writer for additional tables
 

 Key: YARN-3706
 URL: https://issues.apache.org/jira/browse/YARN-3706
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Minor
 Attachments: YARN-3706-YARN-2928.001.patch, 
 YARN-3726-YARN-2928.002.patch, YARN-3726-YARN-2928.003.patch, 
 YARN-3726-YARN-2928.004.patch


 When reviewing YARN-3411 we noticed that we could change the class hierarchy 
 a little in order to accommodate additional tables easily.
 In order to get ready for benchmark testing we left the original layout in 
 place, as performance would not be impacted by the code hierarchy.
 Here is a separate jira to address the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness

2015-06-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572010#comment-14572010
 ] 

Sunil G commented on YARN-3510:
---

Thank you [~leftnoteasy] for the pointer. I almost understood the idea overall. 
I will also take a look when [~cwelch] shares the patch. 


 Create an extension of ProportionalCapacityPreemptionPolicy which preempts a 
 number of containers from each application in a way which respects fairness
 

 Key: YARN-3510
 URL: https://issues.apache.org/jira/browse/YARN-3510
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3510.2.patch, YARN-3510.3.patch, YARN-3510.5.patch, 
 YARN-3510.6.patch


 The ProportionalCapacityPreemptionPolicy preempts as many containers from 
 applications as it can during it's preemption run.  For fifo this makes 
 sense, as it is prempting in reverse order  therefore maintaining the 
 primacy of the oldest.  For fair ordering this does not have the desired 
 effect - instead, it should preempt a number of containers from each 
 application which maintains a fair balance /close to a fair balance between 
 them



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3752) TestRMFailover fails due to intermittent UnknownHostException

2015-06-03 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki resolved YARN-3752.

   Resolution: Duplicate
Fix Version/s: 2.8.0

I can not reproduce the issue after YARN-3749 is committed. I'm closing this 
issue as duplicate of YARN-3749.

 TestRMFailover fails due to intermittent UnknownHostException
 -

 Key: YARN-3752
 URL: https://issues.apache.org/jira/browse/YARN-3752
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor
 Fix For: 2.8.0


 Client fails to create connection due to UnknownHostException while client 
 retries to connect to next RM after failover in unit test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572116#comment-14572116
 ] 

Zhijie Shen commented on YARN-1462:
---

+1 for the last patch. The change in ApplicationReport should be backward 
compatible. [~sershe], would you please take a look?

 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1462-branch-2.7-1.2.patch, 
 YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
 YARN-1462.3.patch, YARN-1462.4.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2578) NM does not failover timely if RM node network connection fails

2015-06-03 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572143#comment-14572143
 ] 

Masatake Iwasaki commented on YARN-2578:


Hi [~wilfreds], do you have any update on this? I saw the same issue in our 
cluster and the attached patch worked. I would like the fix to comes in the 
next release. If you do not have enough time, I would like to take over. 
Otherwise we can commit the current patch and fix hadoop-common later. It still 
applies to trunk and branch-2.

 NM does not failover timely if RM node network connection fails
 ---

 Key: YARN-2578
 URL: https://issues.apache.org/jira/browse/YARN-2578
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg
 Attachments: YARN-2578.patch


 The NM does not fail over correctly when the network cable of the RM is 
 unplugged or the failure is simulated by a service network stop or a 
 firewall that drops all traffic on the node. The RM fails over to the standby 
 node when the failure is detected as expected. The NM should than re-register 
 with the new active RM. This re-register takes a long time (15 minutes or 
 more). Until then the cluster has no nodes for processing and applications 
 are stuck.
 Reproduction test case which can be used in any environment:
 - create a cluster with 3 nodes
 node 1: ZK, NN, JN, ZKFC, DN, RM, NM
 node 2: ZK, NN, JN, ZKFC, DN, RM, NM
 node 3: ZK, JN, DN, NM
 - start all services make sure they are in good health
 - kill the network connection of the RM that is active using one of the 
 network kills from above
 - observe the NN and RM failover
 - the DN's fail over to the new active NN
 - the NM does not recover for a long time
 - the logs show a long delay and traces show no change at all
 The stack traces of the NM all show the same set of threads. The main thread 
 which should be used in the re-register is the Node Status Updater This 
 thread is stuck in:
 {code}
 Node Status Updater prio=10 tid=0x7f5a6cc99800 nid=0x18d0 in 
 Object.wait() [0x7f5a51fc1000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.hadoop.ipc.Client.call(Client.java:1395)
   - locked 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at org.apache.hadoop.ipc.Client.call(Client.java:1362)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy26.nodeHeartbeat(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
 {code}
 The client connection which goes through the proxy can be traced back to the 
 ResourceTrackerPBClientImpl. The generated proxy does not time out and we 
 should be using a version which takes the RPC timeout (from the 
 configuration) as a parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-19) 4-layer topology (with NodeGroup layer) implementation of Container Assignment and Task Scheduling (for YARN)

2015-06-03 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572210#comment-14572210
 ] 

Junping Du commented on YARN-19:


Hi [~ashahab], thanks for your feedback on this! 
I remember long time ago, the community decide to go hierarchical way instead 
of plugable way so the patch here may not suitable to go forward (please check 
YARN-18 design doc for details). I haven't get bandwidth to follow up the new 
design for a new implementation given other priorities. However, if you are 
interested, please feel free to take over YARN-18 and 19 and move it forward 
(better to conform with new design), and I will try to help on review.

 4-layer topology (with NodeGroup layer) implementation of Container 
 Assignment and Task Scheduling (for YARN)
 -

 Key: YARN-19
 URL: https://issues.apache.org/jira/browse/YARN-19
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Junping Du
Assignee: Junping Du
 Attachments: 
 HADOOP-8475-ContainerAssignmentTaskScheduling-withNodeGroup.patch, 
 MAPREDUCE-4310-v1.patch, MAPREDUCE-4310.patch, YARN-19-v2.patch, 
 YARN-19-v3-alpha.patch, YARN-19-v4.patch, YARN-19.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that related to data locality which were updated to give 
 preference to running a container on the same nodegroup. This section 
 summarized the changes in the patch that provides a new implementation to 
 support a four-layer hierarchy.
 When the ApplicationMaster makes a resource allocation request to the 
 scheduler of ResourceManager, it will add the node group to the list of 
 attributes in the ResourceRequest. The parameters of the resource request 
 will change from priority, (host, rack, *), memory, #containers to 
 priority, (host, nodegroup, rack, *), memory, #containers.
 After receiving the ResoureRequest the RM scheduler will assign containers 
 for requests in the sequence of data-local, nodegroup-local, rack-local and 
 off-switch.Then, ApplicationMaster schedules tasks on allocated containers in 
 sequence of data- local, nodegroup-local, rack-local and off-switch.
 In terms of code changes made to YARN task scheduling, we updated the class 
 ContainerRequestEvent so that applications can requests for containers can 
 include anodegroup. In RM schedulers, FifoScheduler and CapacityScheduler 
 were updated. For the FifoScheduler, the changes were in the method 
 assignContainers. For the Capacity Scheduler the method 
 assignContainersOnNode in the class of LeafQueue was updated. In both changes 
 a new method, assignNodeGroupLocalContainers() was added in between the 
 assignment data-local and rack-local.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-03 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572085#comment-14572085
 ] 

Rohith commented on YARN-3733:
--

bq. only memory or vcores are more in TestCapacityScheduler.
All the combination of inputs are verified in the TestResourceCalculator. And 
in TestCapacityScheduler, app submission happens only for memory in 
{{MockRM.submitApp}}, so default vcore minimum allocation is 1 which will be 
taken by default. So just changing memory to {{amResourceLimit.getMemory() + 
2}} should enough.

bq. TestCapacityScheduler#verifyAMLimitForLeafQueue, while submitting second 
app, you could change the app name to app-2.
Agree.

I will upload a patch soon

 DominantRC#compare() does not work as expected if cluster resource is empty
 ---

 Key: YARN-3733
 URL: https://issues.apache.org/jira/browse/YARN-3733
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3 , 2 NM , 2 RM
 one NM - 3 GB 6 v core
Reporter: Bibin A Chundatt
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
 YARN-3733.patch


 Steps to reproduce
 =
 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
 size to 512 MB
 3. Configure capacity scheduler and AM limit to .5 
 (DominantResourceCalculator is configured)
 4. Submit 30 concurrent task 
 5. Switch RM
 Actual
 =
 For 12 Jobs AM gets allocated and all 12 starts running
 No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
 Expected
 ===
 Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >