[jira] [Assigned] (YARN-1494) YarnClient doesn't wrap renewDelegationToken/cancelDelegationToken of ApplicationClientProtocol

2014-11-24 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-1494:
--

Assignee: Varun Saxena

 YarnClient doesn't wrap renewDelegationToken/cancelDelegationToken of 
 ApplicationClientProtocol
 ---

 Key: YARN-1494
 URL: https://issues.apache.org/jira/browse/YARN-1494
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena

 YarnClient doesn't wrap renewDelegationToken/cancelDelegationToken of 
 ApplicationClientProtocol, but getDelegationToken. After YARN-1363, 
 renewDelegationToken/cancelDelegationToken  are going to be async, such that 
 procedure of canceling/renewing a DT is not that straightforward. It's better 
 to wrap these two APIs as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2014-11-24 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated YARN-2190:

Attachment: YARN-2190.7.patch

Attach a new patch that incorporated latest changes in winutils.

 Provide a Windows container executor that can limit memory and CPU
 --

 Key: YARN-2190
 URL: https://issues.apache.org/jira/browse/YARN-2190
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Reporter: Chuan Liu
Assignee: Chuan Liu
 Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
 YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch, 
 YARN-2190.6.patch, YARN-2190.7.patch


 Yarn default container executor on Windows does not set the resource limit on 
 the containers currently. The memory limit is enforced by a separate 
 monitoring thread. The container implementation on Windows uses Job Object 
 right now. The latest Windows (8 or later) API allows CPU and memory limits 
 on the job objects. We want to create a Windows container executor that sets 
 the limits on job objects thus provides resource enforcement at OS level.
 http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2014-11-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222776#comment-14222776
 ] 

Hadoop QA commented on YARN-2190:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683290/YARN-2190.7.patch
  against trunk revision 555fa2d.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5915//console

This message is automatically generated.

 Provide a Windows container executor that can limit memory and CPU
 --

 Key: YARN-2190
 URL: https://issues.apache.org/jira/browse/YARN-2190
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Reporter: Chuan Liu
Assignee: Chuan Liu
 Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
 YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch, 
 YARN-2190.6.patch, YARN-2190.7.patch


 Yarn default container executor on Windows does not set the resource limit on 
 the containers currently. The memory limit is enforced by a separate 
 monitoring thread. The container implementation on Windows uses Job Object 
 right now. The latest Windows (8 or later) API allows CPU and memory limits 
 on the job objects. We want to create a Windows container executor that sets 
 the limits on job objects thus provides resource enforcement at OS level.
 http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling

2014-11-24 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222823#comment-14222823
 ] 

Devaraj K commented on YARN-2877:
-

+1 for the idea [~sriramsrao], [~curino] . I just wanted to know these if I am 
not missing something from the above.

1. If the OPTIMISTIC Container is assigned to AM, and also at the same time RM 
assigned a container i.e. CONSERVATIVE for the same resource, which one NM will 
consider and start it?

2. If the OPTIMISTIC Container is assigned to AM and started it, and NM 
receives a container start request for CONSERVATIVE and resources are not 
available, will the NM preempt the running OPTIMISTIC Containers or it will 
make CONSERVATIVE request to wait for completing the OPTIMISTIC Containers?

3. Any provision for AM to request OPTIMISTIC containers in the remote NM also?


 Extend YARN to support distributed scheduling
 -

 Key: YARN-2877
 URL: https://issues.apache.org/jira/browse/YARN-2877
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Sriram Rao

 This is an umbrella JIRA that proposes to extend YARN to support distributed 
 scheduling.  Briefly, some of the motivations for distributed scheduling are 
 the following:
 1. Improve cluster utilization by opportunistically executing tasks otherwise 
 idle resources on individual machines.
 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
 (i.e., task execution time is much less compared to the time required for 
 obtaining a container from the RM).
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster

2014-11-24 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned YARN-2892:
---

Assignee: Sevada Abraamyan  (was: Rohith)

I added [~sevada] as contributor and assigned this.

 Unable to get AMRMToken in unmanaged AM when using a secure cluster
 ---

 Key: YARN-2892
 URL: https://issues.apache.org/jira/browse/YARN-2892
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Sevada Abraamyan
Assignee: Sevada Abraamyan

 An AMRMToken is retrieved from the ApplicationReport by the YarnClient. 
 When the RM creates the ApplicationReport and sends it back to the client it 
 makes a simple security check whether it should include the AMRMToken in the 
 report (See createAndGetApplicationReport in RMAppImpl).This security check 
 verifies that the user who submitted the original application is the same 
 user who is requesting the ApplicationReport. If they are indeed the same 
 user then it includes the AMRMToken, otherwise it does not include it.
 The problem arises from the fact that when an application is submitted, the 
 RM  saves the short username of the user who created the application (See 
 submitApplication in ClientRmService). Afterwards when the ApplicationReport 
 is requested, the system tries to match the full username of the requester 
 against the previously stored short username. 
 In a secure cluster using Kerberos this check fails because the principle is 
 stripped from the username when we request a short username. So for example 
 the short username might be Foo whereas the full username is 
 f...@company.com
 Note: A very similar problem has been previously reported 
 ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2243) Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor

2014-11-24 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-2243:

Affects Version/s: 2.5.1

 Order of arguments for Preconditions.checkNotNull() is wrong in 
 SchedulerApplicationAttempt ctor
 

 Key: YARN-2243
 URL: https://issues.apache.org/jira/browse/YARN-2243
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Ted Yu
Assignee: Devaraj K
Priority: Minor
 Attachments: YARN-2243.patch, YARN-2243.patch


 {code}
   public SchedulerApplicationAttempt(ApplicationAttemptId 
 applicationAttemptId, 
   String user, Queue queue, ActiveUsersManager activeUsersManager,
   RMContext rmContext) {
 Preconditions.checkNotNull(RMContext should not be null, rmContext);
 {code}
 Order of arguments is wrong for Preconditions.checkNotNull().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling

2014-11-24 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222872#comment-14222872
 ] 

Konstantinos Karanasos commented on YARN-2877:
--

[~wangda], regarding your question about how the AM will know which NM is more 
idle than others, this is related with YARN-2886. Each NM estimates its waiting 
queue time (based on the tasks running and those waiting in the queue already) 
and sends this waiting time to the RM through the heartbeat. Note that this is 
just an integer, so it is very lightweight. Then the RM can push this 
information to the rest of the NMs (again through the heartbeats). This way 
each node knows the queue status of the other NMs and can decide where to queue 
its queueable requests. However, since this information may be always precise 
(due to bad estimation or stale info), we also introduce correction mechanisms 
for rebalancing the queues, if need be (YARN-2888).

Regarding your other questions:
# These malicious AMs is one of the basic reasons we have introduced the 
Local RM. The AMs can make queueable requests only to the Local RM, who can 
throttle down aggressive AMs without even needing to reach the central RM. 
Clearly, as you mention, the central RM can also be involved for imposing 
elaborate fairness/capacity constraints, if those are needed.
# Promoting a queueable container to a guaranteed-start one is indeed 
interesting, and we have been investigating the cases for which it would bring 
benefits. One is the case you mention. Another is in case a queueable container 
has been pre-empted/killed many times due to other guaranteed-start requests.


 Extend YARN to support distributed scheduling
 -

 Key: YARN-2877
 URL: https://issues.apache.org/jira/browse/YARN-2877
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Sriram Rao

 This is an umbrella JIRA that proposes to extend YARN to support distributed 
 scheduling.  Briefly, some of the motivations for distributed scheduling are 
 the following:
 1. Improve cluster utilization by opportunistically executing tasks otherwise 
 idle resources on individual machines.
 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
 (i.e., task execution time is much less compared to the time required for 
 obtaining a container from the RM).
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling

2014-11-24 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222876#comment-14222876
 ] 

Konstantinos Karanasos commented on YARN-2877:
--

[~devaraj], to answer your questions:
# Guaranteed-start containers always have priority over queueable ones. Thus, 
in the case you describe, if not both requests can be accommodated by the NM, 
the guaranteed-start will start first. 
# If the queueable one was started before the guaranteed-start arrived, it will 
be pre-empted/killed for the guaranteed-start to begin execution.
# Queueable requests are submitted by the AM in the Local RM running in the 
same node as the AM, but those requests can be queued at any NM of the cluster 
(we pick at each moment the most idle ones to queue those requests).

 Extend YARN to support distributed scheduling
 -

 Key: YARN-2877
 URL: https://issues.apache.org/jira/browse/YARN-2877
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Sriram Rao

 This is an umbrella JIRA that proposes to extend YARN to support distributed 
 scheduling.  Briefly, some of the motivations for distributed scheduling are 
 the following:
 1. Improve cluster utilization by opportunistically executing tasks otherwise 
 idle resources on individual machines.
 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
 (i.e., task execution time is much less compared to the time required for 
 obtaining a container from the RM).
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling

2014-11-24 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222880#comment-14222880
 ] 

Konstantinos Karanasos commented on YARN-2877:
--

I used the wrong name in the above comment -- it was referring to 
[~devaraj.k]'s comment.

 Extend YARN to support distributed scheduling
 -

 Key: YARN-2877
 URL: https://issues.apache.org/jira/browse/YARN-2877
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Sriram Rao

 This is an umbrella JIRA that proposes to extend YARN to support distributed 
 scheduling.  Briefly, some of the motivations for distributed scheduling are 
 the following:
 1. Improve cluster utilization by opportunistically executing tasks otherwise 
 idle resources on individual machines.
 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
 (i.e., task execution time is much less compared to the time required for 
 obtaining a container from the RM).
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2894) Disallow binding of aclManagers while starting RMWebApp

2014-11-24 Thread Rohith (JIRA)
Rohith created YARN-2894:


 Summary: Disallow binding of aclManagers while starting RMWebApp
 Key: YARN-2894
 URL: https://issues.apache.org/jira/browse/YARN-2894
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.7.0


Binding aclManager to RMWebApp would cause problem if RM is switched. There 
could be some validation check may fail.
I think , we should not bind aclManager for RMWebApp, instead we should get 
from RM instance.
In RMWebApp,
{code}
if (rm != null) {
  bind(ResourceManager.class).toInstance(rm);
  bind(RMContext.class).toInstance(rm.getRMContext());
  bind(ApplicationACLsManager.class).toInstance(
  rm.getApplicationACLsManager());
  bind(QueueACLsManager.class).toInstance(rm.getQueueACLsManager());
}
{code}

and in AppBlock#render below check may fail(Need to test and confirm)
{code}
   if (callerUGI != null
 !(this.aclsManager.checkAccess(callerUGI,
ApplicationAccessType.VIEW_APP, app.getUser(), appID) ||
 this.queueACLsManager.checkAccess(callerUGI,
QueueACL.ADMINISTER_QUEUE, app.getQueue( {
  puts(You (User  + remoteUser
  + ) are not authorized to view application  + appID);
  return;
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2014-11-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223079#comment-14223079
 ] 

Jason Lowe commented on YARN-1984:
--

Thanks for picking this up, Varun.

getStartTimeLong can leak the runtime DBException and shouldn't.

Is there a reason to have deleteNextEntity throw DBException rather than 
IOException?  It would be cleaner for callers if deleteNextEnttiy handled this.

loadVersion can leak the runtime DBException



 LeveldbTimelineStore does not handle db exceptions properly
 ---

 Key: YARN-1984
 URL: https://issues.apache.org/jira/browse/YARN-1984
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-1984.patch


 The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
 rather than IOException which can easily leak up the stack and kill threads 
 (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1963) Support priorities across applications within the same queue

2014-11-24 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1963:
--
Attachment: YARN Application Priorities Design_01.pdf

Updated design doc as per the comments from [~wangda]

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: YARN Application Priorities Design.pdf, YARN Application 
 Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2517) Implement TimelineClientAsync

2014-11-24 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223091#comment-14223091
 ] 

Mit Desai commented on YARN-2517:
-

I had similar concerns. Do we really need this at this point? And as Hitesh 
pointed out, if this may hinder the design in future.

bq.  Also, is the timeline layer meant to eventually be reliable and always up? 
As far as what I am aware of, this is not going to be in near future.

 Implement TimelineClientAsync
 -

 Key: YARN-2517
 URL: https://issues.apache.org/jira/browse/YARN-2517
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2517.1.patch, YARN-2517.2.patch


 In some scenarios, we'd like to put timeline entities in another thread no to 
 block the current one.
 It's good to have a TimelineClientAsync like AMRMClientAsync and 
 NMClientAsync. It can buffer entities, put them in a separate thread, and 
 have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2691) User level API support for priority label

2014-11-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223100#comment-14223100
 ] 

Sunil G commented on YARN-2691:
---

Hi [~rohithsharma]
This patch might need rebasing. Pls rebase against trunk.

 User level API support for priority label
 -

 Key: YARN-2691
 URL: https://issues.apache.org/jira/browse/YARN-2691
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Sunil G
Assignee: Rohith
 Attachments: YARN-2691.patch


 Support for handling Application-Priority label coming from client to 
 ApplicationSubmissionContext.
 Common api support for user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2895) Integrate distributed scheduling with capacity scheduler

2014-11-24 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2895:


 Summary: Integrate distributed scheduling with capacity scheduler
 Key: YARN-2895
 URL: https://issues.apache.org/jira/browse/YARN-2895
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Wangda Tan
Assignee: Wangda Tan


There're some benefit to integrate distributed scheduling mechanism (LocalRM) 
with capacity scheduler:
- Resource usage of opportunistic container can be tracked by central RM and 
capacity could be enforced
- Opportunity to transfer opportunistic container to conservative container 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2014-11-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223127#comment-14223127
 ] 

Wangda Tan commented on YARN-1963:
--

[~sunilg],
I agree with your latest comment,
Will get back to you once I read the new design doc.

Thanks,

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: YARN Application Priorities Design.pdf, YARN Application 
 Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support

2014-11-24 Thread Sunil G (JIRA)
Sunil G created YARN-2896:
-

 Summary: Server side PB changes for Priority Label Manager and 
Admin CLI support
 Key: YARN-2896
 URL: https://issues.apache.org/jira/browse/YARN-2896
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sunil G
Assignee: Sunil G






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()

2014-11-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223138#comment-14223138
 ] 

Rohith commented on YARN-2025:
--

Impact from this is both RM's are in standby and not able to recover at all.

 Possible NPE in schedulers#addApplicationAttempt()
 --

 Key: YARN-2025
 URL: https://issues.apache.org/jira/browse/YARN-2025
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2025.1.patch


 In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we 
 don't check whether {{application}} is null. This can cause NPE in following 
 sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) 
 - addApplicationAttempt().
 {code}
 SchedulerApplication application =
 applications.get(applicationAttemptId.getApplicationId());
 String user = application.getUser();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support

2014-11-24 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2896:
--
Description: 
Common changes:
 * PB support changes required for Admin APIs 
 * PB support for File System store (Priority Label Store)

 Server side PB changes for Priority Label Manager and Admin CLI support
 ---

 Key: YARN-2896
 URL: https://issues.apache.org/jira/browse/YARN-2896
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Sunil G
Assignee: Sunil G

 Common changes:
  * PB support changes required for Admin APIs 
  * PB support for File System store (Priority Label Store)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support

2014-11-24 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2896:
--
Attachment: 0001-YARN-2896.patch

Uploading an initial patch for common PB support. This patch is needed for 
Priority Label manager. Tests also will be added soon.

 Server side PB changes for Priority Label Manager and Admin CLI support
 ---

 Key: YARN-2896
 URL: https://issues.apache.org/jira/browse/YARN-2896
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2896.patch


 Common changes:
  * PB support changes required for Admin APIs 
  * PB support for File System store (Priority Label Store)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()

2014-11-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223136#comment-14223136
 ] 

Rohith commented on YARN-2025:
--

I ran into weird scenario where I got the NPE in 
{{CapacityScheduler.addApplicationAttempt}} in a different manner. I could able 
to get some informationf from the logs but not fully since log were rolled out.

Application final state is FAILED but ApplicationAttempt final state is 
null. This looks very strange that how can RMApp-FAILED but 
RMAppAttempt-null..?
Extracted log from RM is below. Because of this scenario, application recovery 
throw NPE since RMAppAttempt tries to add attempt to scheduler but application 
details are not added to schedulers.
{noformat}
2014-11-24 23:53:32,608 | INFO  | main-EventThread | Recovering app: 
application_1416805604019_0038 with 1 attempts and final state = FAILED | 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:700)
2014-11-24 23:53:32,609 | INFO  | main-EventThread | Recovering attempt: 
appattempt_1416805604019_0038_01 with final state: null | 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:735)
{noformat}

NPE trace as follows.
{noformat}
2014-11-24 23:53:32,610 | ERROR | main-EventThread | Failed to load/recover 
state | 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:527)
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:607)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:941)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:97)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:963)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:931)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:698)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:803)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:95)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:825)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:808)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:681)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:335)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1148)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:523)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:927)
{noformat}

 Possible NPE in schedulers#addApplicationAttempt()
 --

 Key: YARN-2025
 URL: https://issues.apache.org/jira/browse/YARN-2025
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: 

[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()

2014-11-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223142#comment-14223142
 ] 

Hadoop QA commented on YARN-2025:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643596/YARN-2025.1.patch
  against trunk revision 555fa2d.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5916//console

This message is automatically generated.

 Possible NPE in schedulers#addApplicationAttempt()
 --

 Key: YARN-2025
 URL: https://issues.apache.org/jira/browse/YARN-2025
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2025.1.patch


 In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we 
 don't check whether {{application}} is null. This can cause NPE in following 
 sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) 
 - addApplicationAttempt().
 {code}
 SchedulerApplication application =
 applications.get(applicationAttemptId.getApplicationId());
 String user = application.getUser();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage priority labels

2014-11-24 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2693:
--
Attachment: 0002-YARN-2693.patch

Updating patch after moving the PB changes to a common JIRA which handles only 
PB related changes.

Also moved the ApplicationPriority class to user api support JIRA. Tests will 
be added soon. Kindly check.

 Priority Label Manager in RM to manage priority labels
 --

 Key: YARN-2693
 URL: https://issues.apache.org/jira/browse/YARN-2693
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch


 Focus of this JIRA is to have a centralized service to handle priority labels.
 Support operations such as
 * Add/Delete priority label to a specified queue
 * Manage integer mapping associated with each priority label
 * Support managing default priority label of a given queue
 * ACL support in queue level for priority label
 * Expose interface to RM to validate priority label
 Storage for this labels will be done in FileSystem and in Memory similar to 
 NodeLabel
 * FileSystem Based : persistent across RM restart
 * Memory Based: non-persistent across RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling

2014-11-24 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223197#comment-14223197
 ] 

Carlo Curino commented on YARN-2877:


I am going to echo [~kkaranasos] regarding malicious AMs. 

The key architectural change we propose is to introduce a proxy layer 
(YARN-2884). This is giving us a place that is both distributed, but part of 
the infrastructure (thus inherently trusted) where to enact policies. 
This is where we host the LocalRM functionality of YARN-2885. With this in 
place we do not have to depend on the trusting the AM regarding distributed 
decisions (the AM only exposes need for containers of different type). 
On the contrary, we can enable a broad spectrum of infrastructure-level 
policies, that can leverage explicit or implicit information to impose caps, or 
to balance (or skew) where the queuable containers should be allocated etc.

As we have done in the past, we are working towards providing rather *general 
purpose mechanisms*, and propose a *first set of policies* (AM, LocalRM, NM 
start/stop of containers). Policies can be evolved/overridden 
easily depending on use-cases, while mechanisms are a little harder to change. 
To this end, discussing carefully other use cases, such as the conversation 
around using queuable containers for Impala, is very important, 
as we might have missed hooks as part of the mechanisms, that are necessary 
to support those scenarios.




 Extend YARN to support distributed scheduling
 -

 Key: YARN-2877
 URL: https://issues.apache.org/jira/browse/YARN-2877
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Sriram Rao

 This is an umbrella JIRA that proposes to extend YARN to support distributed 
 scheduling.  Briefly, some of the motivations for distributed scheduling are 
 the following:
 1. Improve cluster utilization by opportunistically executing tasks otherwise 
 idle resources on individual machines.
 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
 (i.e., task execution time is much less compared to the time required for 
 obtaining a container from the RM).
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment

2014-11-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223203#comment-14223203
 ] 

Wangda Tan commented on YARN-2801:
--

Since the assignee is empty and got no response, taking over.

 Documentation development for Node labels requirment
 

 Key: YARN-2801
 URL: https://issues.apache.org/jira/browse/YARN-2801
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gururaj Shetty
Assignee: Wangda Tan

 Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2801) Documentation development for Node labels requirment

2014-11-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-2801:


Assignee: Wangda Tan

 Documentation development for Node labels requirment
 

 Key: YARN-2801
 URL: https://issues.apache.org/jira/browse/YARN-2801
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gururaj Shetty
Assignee: Wangda Tan

 Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2014-11-24 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2637:
--
Attachment: YARN-2637.9.patch

This patch seems to pass all the existing unit tests on my box, verifing.  
Still todo, unit test for change as such, remove some extra logging.

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.2.patch, 
 YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-11-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223219#comment-14223219
 ] 

Rohith commented on YARN-2762:
--

Am little confused about HadoopQA result. Am able to apply patch successfully. 
I rekick jenkins to check again whether really compilation problem.

 RMAdminCLI node-labels-related args should be trimmed and checked before 
 sending to RM
 --

 Key: YARN-2762
 URL: https://issues.apache.org/jira/browse/YARN-2762
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: YARN-2762.1.patch, YARN-2762.patch


 All NodeLabel args validation's are done at server side. The same can be done 
 at RMAdminCLI so that unnecessary RPC calls can be avoided.
 And for the input such as x,y,,z,, no need to add empty string instead can 
 be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2014-11-24 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-1984:
---
Attachment: YARN-1984.001.patch

 LeveldbTimelineStore does not handle db exceptions properly
 ---

 Key: YARN-1984
 URL: https://issues.apache.org/jira/browse/YARN-1984
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-1984.001.patch, YARN-1984.patch


 The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
 rather than IOException which can easily leak up the stack and kill threads 
 (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2014-11-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223221#comment-14223221
 ] 

Varun Saxena commented on YARN-1984:


Thanks for the review [~jlowe].

I had missed handling DBException at one place and other places didnt handle 
because the caller method was handling DBException.
But in hindsight, I think we should handle the DBException in all the methods 
you mentioned above, as the method signature doesn't advertise throwing of 
DBException.

I have uploaded a new patch. Kindly review.

 LeveldbTimelineStore does not handle db exceptions properly
 ---

 Key: YARN-1984
 URL: https://issues.apache.org/jira/browse/YARN-1984
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-1984.001.patch, YARN-1984.patch


 The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
 rather than IOException which can easily leak up the stack and kill threads 
 (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2014-11-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223225#comment-14223225
 ] 

Zhijie Shen commented on YARN-1984:
---

Thanks for your effort, Varun and Jason!

bq. Is there a reason to have deleteNextEntity throw DBException rather than 
IOException? It would be cleaner for callers if deleteNextEnttiy handled this.

Maybe we don't need to to that. It's consistent to catch DBException at the 
same method where LeveldbIterator is constructed, but not at the inner method 
where LeveldbIterator is passed in. The test case should be fine if we need to 
catch DBException separately when testing a private method.

bq. loadVersion can leak the runtime DBException

It seems that loadVersion doesn't user iterator. Or LeveldbIterator can help 
the get method too?

 


 LeveldbTimelineStore does not handle db exceptions properly
 ---

 Key: YARN-1984
 URL: https://issues.apache.org/jira/browse/YARN-1984
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-1984.001.patch, YARN-1984.patch


 The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
 rather than IOException which can easily leak up the stack and kill threads 
 (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2897) CrossOriginFilter needs more log statements

2014-11-24 Thread Mit Desai (JIRA)
Mit Desai created YARN-2897:
---

 Summary: CrossOriginFilter needs more log statements
 Key: YARN-2897
 URL: https://issues.apache.org/jira/browse/YARN-2897
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mit Desai
Assignee: Mit Desai


CrossOriginFilter does not log as mcch to make debugging easier



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2014-11-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223225#comment-14223225
 ] 

Zhijie Shen edited comment on YARN-1984 at 11/24/14 6:15 PM:
-

Thanks for your effort, Varun and Jason!

bq. Is there a reason to have deleteNextEntity throw DBException rather than 
IOException? It would be cleaner for callers if deleteNextEnttiy handled this.

Maybe we don't need to to that. It's consistent to catch DBException at the 
same method where LeveldbIterator is constructed, but not at the inner method 
where LeveldbIterator is passed in. The test case should be fine if we need to 
catch DBException separately when testing a private method.

bq. loadVersion can leak the runtime DBException

It seems that loadVersion doesn't user iterator. Or LeveldbIterator can help 
the get method too?

 BTW, handleException can be static and more general to taken one more param: 
error code, such that it can be reused in more places of this class.



was (Author: zjshen):
Thanks for your effort, Varun and Jason!

bq. Is there a reason to have deleteNextEntity throw DBException rather than 
IOException? It would be cleaner for callers if deleteNextEnttiy handled this.

Maybe we don't need to to that. It's consistent to catch DBException at the 
same method where LeveldbIterator is constructed, but not at the inner method 
where LeveldbIterator is passed in. The test case should be fine if we need to 
catch DBException separately when testing a private method.

bq. loadVersion can leak the runtime DBException

It seems that loadVersion doesn't user iterator. Or LeveldbIterator can help 
the get method too?

 


 LeveldbTimelineStore does not handle db exceptions properly
 ---

 Key: YARN-1984
 URL: https://issues.apache.org/jira/browse/YARN-1984
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-1984.001.patch, YARN-1984.patch


 The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
 rather than IOException which can easily leak up the stack and kill threads 
 (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2014-11-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223261#comment-14223261
 ] 

Varun Saxena commented on YARN-1984:


Thanks [~zjshen] for the review.

loadVersion needs to handle DBException because DB#get can throw DBException. 
I guess handling DBException inside deleteNextEntity is a matter of choice. But 
as the method advertises throwing only IOException, handling DBException inside 
the method would avoid any mistakes in future if a developer chooses to call 
this method and overlooks handling DBException.

You are correct. handleException can be changed to static and take one more 
error code. Will upload a patch with these changes.

 LeveldbTimelineStore does not handle db exceptions properly
 ---

 Key: YARN-1984
 URL: https://issues.apache.org/jira/browse/YARN-1984
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-1984.001.patch, YARN-1984.patch


 The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
 rather than IOException which can easily leak up the stack and kill threads 
 (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2014-11-24 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-1984:
---
Attachment: YARN-1984.002.patch

Made the changes as per review. Kindly review [~jlowe] and [~zjshen]

 LeveldbTimelineStore does not handle db exceptions properly
 ---

 Key: YARN-1984
 URL: https://issues.apache.org/jira/browse/YARN-1984
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-1984.001.patch, YARN-1984.002.patch, YARN-1984.patch


 The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
 rather than IOException which can easily leak up the stack and kill threads 
 (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2898) Contaniner-executor prints out wrong error information when failed

2014-11-24 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2898:
--
Summary: Contaniner-executor prints out wrong error information when failed 
 (was: Contaniner-executor may fail with wrong information)

 Contaniner-executor prints out wrong error information when failed
 --

 Key: YARN-2898
 URL: https://issues.apache.org/jira/browse/YARN-2898
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor

 *Settings*: YARN cluster using LinuxContainerExecutor, and the banned.users 
 is left empty in the container-executor.cfg. The default banned list is 
 \{mapred, hdfs, bin\}.
 *Problem*: let user mapred submit a job, it will fail with 
 ExitCodeException exitCode=139, which is segment fault. This is incorrect, 
 and the correct information should be Requested user mapred is banned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2898) Contaniner-executor prints out wrong error information when failed

2014-11-24 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2898:
--
Attachment: YARN-2898-1.patch

 Contaniner-executor prints out wrong error information when failed
 --

 Key: YARN-2898
 URL: https://issues.apache.org/jira/browse/YARN-2898
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2898-1.patch


 *Settings*: YARN cluster using LinuxContainerExecutor, and the banned.users 
 is left empty in the container-executor.cfg. The default banned list is 
 \{mapred, hdfs, bin\}.
 *Problem*: let user mapred submit a job, it will fail with 
 ExitCodeException exitCode=139, which is segment fault. This is incorrect, 
 and the correct information should be Requested user mapred is banned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2898) Contaniner-executor may fail with wrong information

2014-11-24 Thread Wei Yan (JIRA)
Wei Yan created YARN-2898:
-

 Summary: Contaniner-executor may fail with wrong information
 Key: YARN-2898
 URL: https://issues.apache.org/jira/browse/YARN-2898
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor


*Settings*: YARN cluster using LinuxContainerExecutor, and the banned.users 
is left empty in the container-executor.cfg. The default banned list is 
\{mapred, hdfs, bin\}.

*Problem*: let user mapred submit a job, it will fail with ExitCodeException 
exitCode=139, which is segment fault. This is incorrect, and the correct 
information should be Requested user mapred is banned.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-11-24 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2762:
-
Attachment: YARN-2762.2.patch

 RMAdminCLI node-labels-related args should be trimmed and checked before 
 sending to RM
 --

 Key: YARN-2762
 URL: https://issues.apache.org/jira/browse/YARN-2762
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.patch


 All NodeLabel args validation's are done at server side. The same can be done 
 at RMAdminCLI so that unnecessary RPC calls can be avoided.
 And for the input such as x,y,,z,, no need to add empty string instead can 
 be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-2898) Contaniner-executor prints out wrong error information when failed

2014-11-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223288#comment-14223288
 ] 

Jason Lowe edited comment on YARN-2898 at 11/24/14 6:52 PM:


This is a duplicate of YARN-2847.


was (Author: jlowe):
This is a duplicate of YAN-2847.

 Contaniner-executor prints out wrong error information when failed
 --

 Key: YARN-2898
 URL: https://issues.apache.org/jira/browse/YARN-2898
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2898-1.patch


 *Settings*: YARN cluster using LinuxContainerExecutor, and the banned.users 
 is left empty in the container-executor.cfg. The default banned list is 
 \{mapred, hdfs, bin\}.
 *Problem*: let user mapred submit a job, it will fail with 
 ExitCodeException exitCode=139, which is segment fault. This is incorrect, 
 and the correct information should be Requested user mapred is banned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-11-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223303#comment-14223303
 ] 

Rohith commented on YARN-2762:
--

I updated patch by creating patch from different branch,but I dont see any 
diffrence between both the patches. Still I attached it.Let wait for jenkins to 
run!

 RMAdminCLI node-labels-related args should be trimmed and checked before 
 sending to RM
 --

 Key: YARN-2762
 URL: https://issues.apache.org/jira/browse/YARN-2762
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.patch


 All NodeLabel args validation's are done at server side. The same can be done 
 at RMAdminCLI so that unnecessary RPC calls can be avoided.
 And for the input such as x,y,,z,, no need to add empty string instead can 
 be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2898) Contaniner-executor prints out wrong error information when failed

2014-11-24 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-2898.
--
Resolution: Duplicate

This is a duplicate of YAN-2847.

 Contaniner-executor prints out wrong error information when failed
 --

 Key: YARN-2898
 URL: https://issues.apache.org/jira/browse/YARN-2898
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2898-1.patch


 *Settings*: YARN cluster using LinuxContainerExecutor, and the banned.users 
 is left empty in the container-executor.cfg. The default banned list is 
 \{mapred, hdfs, bin\}.
 *Problem*: let user mapred submit a job, it will fail with 
 ExitCodeException exitCode=139, which is segment fault. This is incorrect, 
 and the correct information should be Requested user mapred is banned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2897) CrossOriginFilter needs more log statements

2014-11-24 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-2897:

Affects Version/s: 2.6.0

 CrossOriginFilter needs more log statements
 ---

 Key: YARN-2897
 URL: https://issues.apache.org/jira/browse/YARN-2897
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2897.patch


 CrossOriginFilter does not log as mcch to make debugging easier



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2898) Contaniner-executor prints out wrong error information when failed

2014-11-24 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223291#comment-14223291
 ] 

Wei Yan commented on YARN-2898:
---

Oh, yes. Thanks, [~jlowe].

 Contaniner-executor prints out wrong error information when failed
 --

 Key: YARN-2898
 URL: https://issues.apache.org/jira/browse/YARN-2898
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2898-1.patch


 *Settings*: YARN cluster using LinuxContainerExecutor, and the banned.users 
 is left empty in the container-executor.cfg. The default banned list is 
 \{mapred, hdfs, bin\}.
 *Problem*: let user mapred submit a job, it will fail with 
 ExitCodeException exitCode=139, which is segment fault. This is incorrect, 
 and the correct information should be Requested user mapred is banned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2014-11-24 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-1984:
---
Attachment: (was: YARN-1984.002.patch)

 LeveldbTimelineStore does not handle db exceptions properly
 ---

 Key: YARN-1984
 URL: https://issues.apache.org/jira/browse/YARN-1984
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-1984.001.patch, YARN-1984.patch


 The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
 rather than IOException which can easily leak up the stack and kill threads 
 (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2897) CrossOriginFilter needs more log statements

2014-11-24 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-2897:

Attachment: YARN-2897.patch

Attaching the patch

 CrossOriginFilter needs more log statements
 ---

 Key: YARN-2897
 URL: https://issues.apache.org/jira/browse/YARN-2897
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2897.patch


 CrossOriginFilter does not log as mcch to make debugging easier



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2014-11-24 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-1984:
---
Attachment: YARN-1984.002.patch

discardOldEntites does not need to handle DBException if deleteNextEntity 
handles it. Updated the patch with this change.

 LeveldbTimelineStore does not handle db exceptions properly
 ---

 Key: YARN-1984
 URL: https://issues.apache.org/jira/browse/YARN-1984
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-1984.001.patch, YARN-1984.002.patch, YARN-1984.patch


 The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
 rather than IOException which can easily leak up the stack and kill threads 
 (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2897) CrossOriginFilter needs more log statements

2014-11-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223336#comment-14223336
 ] 

Hadoop QA commented on YARN-2897:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683387/YARN-2897.patch
  against trunk revision f636f9d.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5922//console

This message is automatically generated.

 CrossOriginFilter needs more log statements
 ---

 Key: YARN-2897
 URL: https://issues.apache.org/jira/browse/YARN-2897
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2897.patch


 CrossOriginFilter does not log as mcch to make debugging easier



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2691) User level API support for priority label

2014-11-24 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2691:
-
Attachment: YARN-2691.patch

Updated the patch by rebasing it. And also fixed comment that 
ApplicationPriority implements comparable interface.

 User level API support for priority label
 -

 Key: YARN-2691
 URL: https://issues.apache.org/jira/browse/YARN-2691
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Sunil G
Assignee: Rohith
 Attachments: YARN-2691.patch, YARN-2691.patch


 Support for handling Application-Priority label coming from client to 
 ApplicationSubmissionContext.
 Common api support for user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2014-11-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223351#comment-14223351
 ] 

Hadoop QA commented on YARN-1984:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683377/YARN-1984.002.patch
  against trunk revision 555fa2d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5919//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5919//console

This message is automatically generated.

 LeveldbTimelineStore does not handle db exceptions properly
 ---

 Key: YARN-1984
 URL: https://issues.apache.org/jira/browse/YARN-1984
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-1984.001.patch, YARN-1984.002.patch, YARN-1984.patch


 The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
 rather than IOException which can easily leak up the stack and kill threads 
 (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated

2014-11-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223363#comment-14223363
 ] 

Zhijie Shen commented on YARN-2854:
---

[~Naganarasimha], thanks of taking this documentation work. I suggest doing the 
following updates:

1. If you read through the document, the per-framework data (which is actually 
the basic timeline service) and the generic data (aka the generic history 
service) sound two equal pieces in this daemon. It may be better to promote the 
timeline service as the first-class citizen of this document, and then explain 
the generic history service as the build-in payload of it.

2. We need to update the current status section. Up till now, the essential 
functionality of the timeline server is done, and it can work in both insecure 
and secure modes. The generic history service has already ridden on the 
timeline store. The coming thing may be the scalability and reliability of the 
timeline service. As the target fix version is set to 2.7.0. We may update it 
around the end of the release.

2. Add a section about enabling security of the timeline server. I can help 
with this section if necessary.

3. The configurations about the generic history service needs to be updated.

4. It may be better to enhance the client example code, at least to show how to 
create a domain, and put the entity into a particular domain.

5. For the API specifications, let's still keep it separately: YARN-1876



 The document about timeline service and generic service needs to be updated
 ---

 Key: YARN-2854
 URL: https://issues.apache.org/jira/browse/YARN-2854
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Naganarasimha G R
Priority: Critical
 Attachments: YARN-2854.20141120-1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2014-11-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223364#comment-14223364
 ] 

Hadoop QA commented on YARN-2637:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683370/YARN-2637.9.patch
  against trunk revision 555fa2d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 13 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5917//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5917//console

This message is automatically generated.

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.2.patch, 
 YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2517) Implement TimelineClientAsync

2014-11-24 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223299#comment-14223299
 ] 

Sangjin Lee commented on YARN-2517:
---

It might be conceptually cleaner (and simpler to the clients) to have the 
clients post the events but not deal with what would happen if the result was 
not successfully sent. Even the very concept of whether the result was sent 
is problematic. An ATS writer implementation could make a decision of buffering 
a certain amount of data before it physically writes it to the backing storage 
(as optimization).

If the client needs to know whether the event was truly sent to the backing 
storage and also deal with its failure, it may lead to ATS writer 
implementations leaking to clients and may limit the way an ATS write 
implementation can be optimized, etc.

How about a sync write (as it stands now) for critical data and an async write 
which is basically fire-and-forget? The understanding there would be the async 
write is basically a best effort and it would fall on the implementation of ATS 
writes to try to deliver the events to the backing storage as reliably and 
optimally as it can (but in theory no guarantees still). Then the async write 
can even be enabled with a boolean flag.

 Implement TimelineClientAsync
 -

 Key: YARN-2517
 URL: https://issues.apache.org/jira/browse/YARN-2517
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2517.1.patch, YARN-2517.2.patch


 In some scenarios, we'd like to put timeline entities in another thread no to 
 block the current one.
 It's good to have a TimelineClientAsync like AMRMClientAsync and 
 NMClientAsync. It can buffer entities, put them in a separate thread, and 
 have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2014-11-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223369#comment-14223369
 ] 

Eric Payne commented on YARN-1963:
--

Hi [~sunilg]. Thanks for the work you are doing on this issue.

bq. {{yarn.scheduler.capacity.root.queue_name.priority_label.acl}}
If this property doesn't exist, will queue admins still be able to change 
priorities of jobs in the queue?

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: YARN Application Priorities Design.pdf, YARN Application 
 Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-11-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223386#comment-14223386
 ] 

Hadoop QA commented on YARN-2762:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683382/YARN-2762.2.patch
  against trunk revision f636f9d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5920//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5920//console

This message is automatically generated.

 RMAdminCLI node-labels-related args should be trimmed and checked before 
 sending to RM
 --

 Key: YARN-2762
 URL: https://issues.apache.org/jira/browse/YARN-2762
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.patch


 All NodeLabel args validation's are done at server side. The same can be done 
 at RMAdminCLI so that unnecessary RPC calls can be avoided.
 And for the input such as x,y,,z,, no need to add empty string instead can 
 be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2014-11-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223401#comment-14223401
 ] 

Hadoop QA commented on YARN-1984:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683388/YARN-1984.002.patch
  against trunk revision f636f9d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5921//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5921//console

This message is automatically generated.

 LeveldbTimelineStore does not handle db exceptions properly
 ---

 Key: YARN-1984
 URL: https://issues.apache.org/jira/browse/YARN-1984
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-1984.001.patch, YARN-1984.002.patch, YARN-1984.patch


 The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
 rather than IOException which can easily leak up the stack and kill threads 
 (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2691) User level API support for priority label

2014-11-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223468#comment-14223468
 ] 

Hadoop QA commented on YARN-2691:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683391/YARN-2691.patch
  against trunk revision 380a361.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1229 javac 
compiler warnings (more than the trunk's current 1219 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5923//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5923//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5923//console

This message is automatically generated.

 User level API support for priority label
 -

 Key: YARN-2691
 URL: https://issues.apache.org/jira/browse/YARN-2691
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Sunil G
Assignee: Rohith
 Attachments: YARN-2691.patch, YARN-2691.patch


 Support for handling Application-Priority label coming from client to 
 ApplicationSubmissionContext.
 Common api support for user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2899) Run TestDockerContainerExecutorWithMocks on Linux only

2014-11-24 Thread Ming Ma (JIRA)
Ming Ma created YARN-2899:
-

 Summary: Run TestDockerContainerExecutorWithMocks on Linux only
 Key: YARN-2899
 URL: https://issues.apache.org/jira/browse/YARN-2899
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma
Priority: Minor


It seems the test should strictly check for Linux, otherwise, it will fail when 
the OS isn't Linux.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2014-11-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223490#comment-14223490
 ] 

Eric Payne commented on YARN-2009:
--

If we are to choose the less complicated route, I believe that, at the very 
least, when {{ProportionalCapacityPreemptionPolicy}} determines that {{queueA}} 
needs to give up some containers, it should first select containers from the 
lowest priority apps.

 Priority support for preemption in ProportionalCapacityPreemptionPolicy
 ---

 Key: YARN-2009
 URL: https://issues.apache.org/jira/browse/YARN-2009
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Devaraj K
Assignee: Sunil G

 While preempting containers based on the queue ideal assignment, we may need 
 to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2899) Run TestDockerContainerExecutorWithMocks on Linux only

2014-11-24 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-2899:
--
Attachment: YARN-2899.patch

 Run TestDockerContainerExecutorWithMocks on Linux only
 --

 Key: YARN-2899
 URL: https://issues.apache.org/jira/browse/YARN-2899
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma
Priority: Minor
 Attachments: YARN-2899.patch


 It seems the test should strictly check for Linux, otherwise, it will fail 
 when the OS isn't Linux.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2014-11-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223511#comment-14223511
 ] 

Jason Lowe commented on YARN-1984:
--

+1 latest patch lgtm.  [~zjshen] do you have further comments?

 LeveldbTimelineStore does not handle db exceptions properly
 ---

 Key: YARN-1984
 URL: https://issues.apache.org/jira/browse/YARN-1984
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-1984.001.patch, YARN-1984.002.patch, YARN-1984.patch


 The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
 rather than IOException which can easily leak up the stack and kill threads 
 (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-2900:
-

 Summary: Application Not Found in AHS throws Internal Server Error 
with NPE
 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2014-11-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223550#comment-14223550
 ] 

Zhijie Shen commented on YARN-1984:
---

LGTM

 LeveldbTimelineStore does not handle db exceptions properly
 ---

 Key: YARN-1984
 URL: https://issues.apache.org/jira/browse/YARN-1984
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-1984.001.patch, YARN-1984.002.patch, YARN-1984.patch


 The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
 rather than IOException which can easily leak up the stack and kill threads 
 (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223552#comment-14223552
 ] 

Jonathan Eagles commented on YARN-2900:
---

Application not found in the history store should be a normal case and not an 
exceptional in the REST api case since the application id is user provided 
information.

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai

 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2900:
--
Description: 
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
at 
org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
at 
org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
at 
org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
... 59 more

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai

 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223557#comment-14223557
 ] 

Zhijie Shen commented on YARN-2900:
---

[~jeagles], it seems that you're still using ApplicationHistoryManagerImpl and 
the old application history store.
{code}
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
{code}

Otherwise, you should see ApplicationHistoryManagerOnTimelineStore instead. Per 
discussion in 
[YARN-2900|https://issues.apache.org/jira/browse/YARN-2033?focusedCommentId=14126073page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14126073],
 we will no longer support the old storage stack. 

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai

 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223562#comment-14223562
 ] 

Zhijie Shen commented on YARN-2900:
---

BTW, it's a known issue. and I've filed a ticket before: YARN-1835

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai

 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223557#comment-14223557
 ] 

Zhijie Shen edited comment on YARN-2900 at 11/24/14 9:38 PM:
-

[~jeagles], it seems that you're still using ApplicationHistoryManagerImpl and 
the old application history store.
{code}
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
{code}

Otherwise, you should see ApplicationHistoryManagerOnTimelineStore instead. Per 
discussion in 
[YARN-2033|https://issues.apache.org/jira/browse/YARN-2033?focusedCommentId=14126073page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14126073],
 we will no longer support the old storage stack. 


was (Author: zjshen):
[~jeagles], it seems that you're still using ApplicationHistoryManagerImpl and 
the old application history store.
{code}
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
{code}

Otherwise, you should see ApplicationHistoryManagerOnTimelineStore instead. Per 
discussion in 
[YARN-2900|https://issues.apache.org/jira/browse/YARN-2033?focusedCommentId=14126073page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14126073],
 we will no longer support the old storage stack. 

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai

 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2188) Client service for cache manager

2014-11-24 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2188:
---
Attachment: YARN-2188-trunk-v5.patch

[~kasha] v5 attached.

Link to a diff between v4 and v5: 
https://github.com/ctrezzo/hadoop/commit/99b80eba32af42d8032fa47e58e4c1068f2707e4

Thanks!

 Client service for cache manager
 

 Key: YARN-2188
 URL: https://issues.apache.org/jira/browse/YARN-2188
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2188-trunk-v1.patch, YARN-2188-trunk-v2.patch, 
 YARN-2188-trunk-v3.patch, YARN-2188-trunk-v4.patch, YARN-2188-trunk-v5.patch


 Implement the client service for the shared cache manager. This service is 
 responsible for handling client requests to use and release resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-2900:

Attachment: YARN-2900.patch

Attaching the patch that checks for null and returns appropriate result so that 
we can catch the NotFoundException in WebServices.java

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223621#comment-14223621
 ] 

Jonathan Eagles commented on YARN-2900:
---

[~zjshen], please don't jump to any conclusions. This is my setup, which I 
believe is a supported configuration for 2.6.0.

{quote}
yarn.timeline-service.generic-application-history.enabled=false
yarn.timeline-service.generic-application-history.store-class=org.apache.hadoop.yarn.server.applicationhistoryservice.NullApplicationHistoryStore
{quote}

The Tez UI make applicationhistory rest api calls to gather fine details for 
those who have it enabled. In my case where generic history is disabled, it is 
causing massive flooding of log files.

As far as not finding the duplicate JIRA, I was unable to find this issue in 
the search. Try to include details that are searchable (stack track, logs, 
class/file names) so that users are able to find the appropriate issue.

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223624#comment-14223624
 ] 

Hadoop QA commented on YARN-2900:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683426/YARN-2900.patch
  against trunk revision 2967c17.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5926//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5926//console

This message is automatically generated.

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2188) Client service for cache manager

2014-11-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223632#comment-14223632
 ] 

Hadoop QA commented on YARN-2188:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12683425/YARN-2188-trunk-v5.patch
  against trunk revision 2967c17.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5925//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5925//console

This message is automatically generated.

 Client service for cache manager
 

 Key: YARN-2188
 URL: https://issues.apache.org/jira/browse/YARN-2188
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2188-trunk-v1.patch, YARN-2188-trunk-v2.patch, 
 YARN-2188-trunk-v3.patch, YARN-2188-trunk-v4.patch, YARN-2188-trunk-v5.patch


 Implement the client service for the shared cache manager. This service is 
 responsible for handling client requests to use and release resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223671#comment-14223671
 ] 

Jonathan Eagles commented on YARN-2900:
---

I do see this in the log file that is suspicious now that I am looking at the 
code. 

2014-11-24 22:12:42,107 [main] WARN 
applicationhistoryservice.ApplicationHistoryServer: The filesystem based 
application history store is deprecated.

Looking into this.

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223676#comment-14223676
 ] 

Jonathan Eagles commented on YARN-2900:
---

Issue is spacing in the config file. Here is the updated stack trace.

{quote}
2014-11-24 22:34:53,900 [17694135@qtp-11347161-6] WARN 
webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
javax.ws.rs.WebApplicationException: 
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: The entity for 
application application_1416586084624_0011 doesn't exist in the timeline store
at 
org.apache.hadoop.yarn.server.webapp.WebServices.rewrapAndThrowException(WebServices.java:452)
at 
org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:227)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSWebServices.getApp(AHSWebServices.java:95)

Caused by: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: The 
entity for application application_1416586084624_0011 doesn't exist in the 
timeline store
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplication(ApplicationHistoryManagerOnTimelineStore.java:542)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplication(ApplicationHistoryManagerOnTimelineStore.java:94)
at 
org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
at 
org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
at 
org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
... 59 more
{quote}

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223679#comment-14223679
 ] 

Zhijie Shen commented on YARN-2900:
---

bq.  This is my setup, which I believe is a supported configuration for 2.6.0.

Yeah, the configuration should be supported. However, with the configuration, 
ApplicationHistoryManagerOnTimelineStore should be used instead. Here's the 
related code in ApplicationHistoryServer.
{code}
  private ApplicationHistoryManager createApplicationHistoryManager(
  Configuration conf) {
// Backward compatibility:
// APPLICATION_HISTORY_STORE is neither null nor empty, it means that the
// user has enabled it explicitly.
if (conf.get(YarnConfiguration.APPLICATION_HISTORY_STORE) == null ||
conf.get(YarnConfiguration.APPLICATION_HISTORY_STORE).length() == 0 ||
conf.get(YarnConfiguration.APPLICATION_HISTORY_STORE).equals(
NullApplicationHistoryStore.class.getName())) {
  return new ApplicationHistoryManagerOnTimelineStore(
  timelineDataManager, aclsManager);
} else {
  LOG.warn(The filesystem based application history store is deprecated.);
  return new ApplicationHistoryManagerImpl();
}
  }
{code}

I tested this config locally. It seems that the new 
ApplicationHistoryManagerOnTimelineStore was picked. If it doesn't pick the 
right manager, then it is really bad.

But given ApplicationHistoryManagerOnTimelineStore is picked, we shouldn't see 
NPE exception in the description.

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2014-11-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223689#comment-14223689
 ] 

Hudson commented on YARN-1984:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6597 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6597/])
YARN-1984. LeveldbTimelineStore does not handle db exceptions properly. 
Contributed by Varun Saxena (jlowe: rev 
1ce4d33c2dc86d711b227a04d2f9a2ab696a24a1)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java


 LeveldbTimelineStore does not handle db exceptions properly
 ---

 Key: YARN-1984
 URL: https://issues.apache.org/jira/browse/YARN-1984
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-1984.001.patch, YARN-1984.002.patch, YARN-1984.patch


 The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
 rather than IOException which can easily leak up the stack and kill threads 
 (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223691#comment-14223691
 ] 

Zhijie Shen commented on YARN-2900:
---

bq. Issue is spacing in the config file. Here is the updated stack trace.

Then, this log message was expected when the app is not found. But 
INTERNAL_SERVER_ERROR is bad. We should return NOT_FOUND instead. We can 
capture NotFoundException and convert it to webapp.NotFoundException.

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223694#comment-14223694
 ] 

Jonathan Eagles commented on YARN-2900:
---

FYI: Here is the config that was causing the original failure. Notice the 
newline as part of the value.

{quote}
   property
 descriptionStore class name for history store, defaulting to file system 
store/description
 nameyarn.timeline-service.generic-application-history.store-class/name
 
valueorg.apache.hadoop.yarn.server.applicationhistoryservice.NullApplicationHistoryStore
 /value
   /property
{quote}

Internal System Error is still happens with 
ApplicationHistoryManagerOnTimelineStore which this issues now tracks.

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2679) Add metric for container launch duration

2014-11-24 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2679:

Labels: metrics supportability  (was: )

 Add metric for container launch duration
 

 Key: YARN-2679
 URL: https://issues.apache.org/jira/browse/YARN-2679
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Fix For: 2.7.0

 Attachments: YARN-2679.000.patch, YARN-2679.001.patch, 
 YARN-2679.002.patch


 add metrics in NodeManagerMetrics to get prepare time to launch container.
 The prepare time is the duration between sending 
 ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving  
 ContainerEventType.CONTAINER_LAUNCHED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.

2014-11-24 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2675:

Labels: metrics supportability  (was: )

 the containersKilled metrics is not updated when the container is killed 
 during localization.
 -

 Key: YARN-2675
 URL: https://issues.apache.org/jira/browse/YARN-2675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
 YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch, 
 YARN-2675.005.patch


 The containersKilled metrics is not updated when the container is killed 
 during localization. We should add KILLING state in finished of 
 ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2802) ClusterMetrics to include AM launch and register delays

2014-11-24 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2802:

Labels: metrics supportability  (was: )

 ClusterMetrics to include AM launch and register delays
 ---

 Key: YARN-2802
 URL: https://issues.apache.org/jira/browse/YARN-2802
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Fix For: 2.7.0

 Attachments: YARN-2802.000.patch, YARN-2802.001.patch, 
 YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, 
 YARN-2802.005.patch


 add AM container launch and register delay metrics in QueueMetrics to help 
 diagnose performance issue.
 Added two metrics in QueueMetrics:
 aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
 to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
 aMRegisterDelay: the time waiting from receiving event 
 RMAppAttemptEventType.LAUNCHED to receiving event 
 RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223734#comment-14223734
 ] 

Mit Desai commented on YARN-2900:
-

bq. We can capture NotFoundException and convert it to 
webapp.NotFoundException
[~zjshen] How about we just return a null if the entity == null in 
ApplicationHistoryManagerOnTimelineStore? That way, when the call returns to 
WebServices, it will throw NotFoundException form its current implementation. 
This is the exact same approach that is used in the patch that I submitted 
earlier

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2697) RMAuthenticationHandler is no longer useful

2014-11-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223735#comment-14223735
 ] 

Zhijie Shen commented on YARN-2697:
---

+1. Remove useless code path. It should be okay with tests. Will commit the 
patch

 RMAuthenticationHandler is no longer useful
 ---

 Key: YARN-2697
 URL: https://issues.apache.org/jira/browse/YARN-2697
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: haosdent
 Attachments: YARN-2697.patch


 After YARN-2656, RMAuthenticationHandler is no longer useful, because 
 authentication mechanism is reusing the common DT auth filter stack. It 
 should be safe to remove this unused code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2697) RMAuthenticationHandler is no longer useful

2014-11-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223748#comment-14223748
 ] 

Hudson commented on YARN-2697:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6598 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6598/])
YARN-2697. Remove useless RMAuthenticationHandler. Contributed by Haosong 
Huang. (zjshen: rev e37a4ff0c1712a1cb80e0412ec53a5d10b8d30f9)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMAuthenticationHandler.java


 RMAuthenticationHandler is no longer useful
 ---

 Key: YARN-2697
 URL: https://issues.apache.org/jira/browse/YARN-2697
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: haosdent
 Attachments: YARN-2697.patch


 After YARN-2656, RMAuthenticationHandler is no longer useful, because 
 authentication mechanism is reusing the common DT auth filter stack. It 
 should be safe to remove this unused code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223767#comment-14223767
 ] 

Zhijie Shen commented on YARN-2900:
---

bq. How about we just return a null if the entity == null in 
ApplicationHistoryManagerOnTimelineStore?

This is fine, too. It should also benefit web UI.

But please make sure that in ApplicationHistoryClientService, if Report == 
null, then throw XxxxNotFoundException. The initial consideration of not 
returning null, but throwing XXXNotFoundException is to be consistent with 
ClientRMServices. Thanks!

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)

2014-11-24 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2900:
--
Summary: Application (Attempt and Container) Not Found in AHS results in 
Internal Server Error (500)  (was: Application Not Found in AHS throws Internal 
Server Error with NPE)

 Application (Attempt and Container) Not Found in AHS results in Internal 
 Server Error (500)
 ---

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN

2014-11-24 Thread Swapnil Daingade (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223772#comment-14223772
 ] 

Swapnil Daingade commented on YARN-2139:


+1 for having an abstract policy to wrap spindles / disk affinity / iops / 
bandwidth, etc.

 [Umbrella] Support for Disk as a Resource in YARN 
 --

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
 Attachments: Disk_IO_Scheduling_Design_1.pdf, 
 Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, 
 YARN-2139-prototype.patch


 YARN should consider disk as another resource for (1) scheduling tasks on 
 nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)

2014-11-24 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223781#comment-14223781
 ] 

Mit Desai commented on YARN-2900:
-

Thanks! I will post the updated patch following the discussion here. You can 
review it once its up.

 Application (Attempt and Container) Not Found in AHS results in Internal 
 Server Error (500)
 ---

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1996) Provide alternative policies for UNHEALTHY nodes.

2014-11-24 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-1996:
--
Attachment: YARN-1996-2.patch

[~jira.shegalov], [~maysamyabandeh] and I identified the root cause of 
https://issues.apache.org/jira/browse/MAPREDUCE-6043 and came up with the 
updated patch to address that scenario.

MRAppMaster's RMContainerAllocator depends on RM's CompletedContainers messages 
to make allocation request. In some corner cases when the node becomes 
unhealthy, CompletedContainers messages might be lost. The new patch makes sure 
RM will deliver CompletedContainers messages to AM in the following scenarios.

* When NM delivers unhealthy and completed containers notifications in the same 
heartbeat to RM.
* NM becomes unhealthy first, then it restarts.
* NM becomes unhealthy first, then it becomes healthy.
* NM becomes unhealthy first, then RM asks it to reboot.
* NM becomes unhealthy first, then it is decommissioned.
* NM becomes unhealthy first, then RM lost it.

For work preserving RM restart, an unhealthy NM will first be transitioned to 
RUNNING state after RM restart, and then to UNHEALTHY state. So if the RM 
restarts while it is draining unhealthy nodes, it should be able to continue to 
drain unhealthy nodes after the restart.

Appreciate any input on this.



 Provide alternative policies for UNHEALTHY nodes.
 -

 Key: YARN-1996
 URL: https://issues.apache.org/jira/browse/YARN-1996
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, scheduler
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1996-2.patch, YARN-1996.v01.patch


 Currently, UNHEALTHY nodes can significantly prolong execution of large 
 expensive jobs as demonstrated by MAPREDUCE-5817, and downgrade the cluster 
 health even further due to [positive 
 feedback|http://en.wikipedia.org/wiki/Positive_feedback]. A container set 
 that might have deemed the node unhealthy in the first place starts spreading 
 across the cluster because the current node is declared unusable and all its 
 containers are killed and rescheduled on different nodes.
 To mitigate this, we experiment with a patch that allows containers already 
 running on a node turning UNHEALTHY to complete (drain) whereas no new 
 container can be assigned to it until it turns healthy again.
 This mechanism can also be used for graceful decommissioning of NM. To this 
 end, we have to write a health script  such that it can deterministically 
 report UNHEALTHY. For example with 
 {code}
 if [ -e $1 ] ; then   
  
   echo ERROR Node decommmissioning via health script hack 
  
 fi 
 {code}
 In the current version patch, the behavior is controlled by a boolean 
 property {{yarn.nodemanager.unhealthy.drain.containers}}. More versatile 
 policies are possible in the future work. Currently, the health state of a 
 node is binary determined based on the disk checker and the health script 
 ERROR outputs. However, we can as well interpret health script output similar 
 to java logging levels (one of which is ERROR) such as WARN, FATAL. Each 
 level can then be treated differently. E.g.,
 - FATAL:  unusable like today 
 - ERROR: drain
 - WARN: halve the node capacity.
 complimented with some equivalence rules such as 3 WARN messages == ERROR,  
 2*ERROR == FATAL, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2014-11-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223827#comment-14223827
 ] 

Sunil G commented on YARN-1963:
---

Thank you Wangda and [~eepayne] for the comments.

ACL, if configured for a queue, will be considered before submitting the job. 
If there are no configuration, only queue ACL will be checked which is same as 
what is happening now. priority label level ACL is on top of queue level ACL 
which is extra and can be configured as needed by admin.

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: YARN Application Priorities Design.pdf, YARN Application 
 Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2014-11-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223860#comment-14223860
 ] 

Sunil G commented on YARN-2009:
---

Thank you [~curino] for the thoughts.
I understand the complexity for the user when they experience preemption of few 
containers and that itself may be tougher for them to understand why that 
container is preempted and reasons for it. In a simple way, if timestamp is 
only considered (to an extent user-limit factor also now), that itself will be 
tougher to express through logs.

Hence solving some of these small imbalances like what I mentioned may not help 
much the users in a big level. Based on use cases we can check whether these 
are needed later. Coming to the focus of this JIRA, within a queue if slow and 
low priority applications are running and consuming full resources, it will be 
good if we can make some space by preempting lower priority ones. This 
preemption can be done within a Queue. we have seen some lower priority 
applications are taking more cluster and higher priority applications are 
waiting to launch for long time. Please suggest your thoughts on this.

 Priority support for preemption in ProportionalCapacityPreemptionPolicy
 ---

 Key: YARN-2009
 URL: https://issues.apache.org/jira/browse/YARN-2009
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Devaraj K
Assignee: Sunil G

 While preempting containers based on the queue ideal assignment, we may need 
 to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2774) shared cache service should authorize calls properly

2014-11-24 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2774:
---
Description: 
The shared cache manager (SCM) services should authorize calls properly.

Currently, the uploader service (done in YARN-2186) does not authorize calls to 
notify the SCM on newly uploaded resource. Proper security/authorization needs 
to be done in this RPC call. Also, the use/release calls (done in YARN-2188) 
and the scmAdmin commands (done in YARN-2189) are not properly authorized.

  was:
The shared cache manager (SCM) services should authorize calls properly.

Currently, the uploader service (done in YARN-2186) does not authorize calls to 
notify the SCM on newly uploaded resource. Proper security/authorization needs 
to be done in this RPC call. Also, the use/release calls (done in YARN-2188) 
are not properly authorized.


 shared cache service should authorize calls properly
 

 Key: YARN-2774
 URL: https://issues.apache.org/jira/browse/YARN-2774
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee

 The shared cache manager (SCM) services should authorize calls properly.
 Currently, the uploader service (done in YARN-2186) does not authorize calls 
 to notify the SCM on newly uploaded resource. Proper security/authorization 
 needs to be done in this RPC call. Also, the use/release calls (done in 
 YARN-2188) and the scmAdmin commands (done in YARN-2189) are not properly 
 authorized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1996) Provide alternative policies for UNHEALTHY nodes.

2014-11-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223917#comment-14223917
 ] 

Hadoop QA commented on YARN-1996:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683446/YARN-1996-2.patch
  against trunk revision 8caf537.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1223 javac 
compiler warnings (more than the trunk's current 1219 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5927//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5927//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5927//console

This message is automatically generated.

 Provide alternative policies for UNHEALTHY nodes.
 -

 Key: YARN-1996
 URL: https://issues.apache.org/jira/browse/YARN-1996
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, scheduler
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1996-2.patch, YARN-1996.v01.patch


 Currently, UNHEALTHY nodes can significantly prolong execution of large 
 expensive jobs as demonstrated by MAPREDUCE-5817, and downgrade the cluster 
 health even further due to [positive 
 feedback|http://en.wikipedia.org/wiki/Positive_feedback]. A container set 
 that might have deemed the node unhealthy in the first place starts spreading 
 across the cluster because the current node is declared unusable and all its 
 containers are killed and rescheduled on different nodes.
 To mitigate this, we experiment with a patch that allows containers already 
 running on a node turning UNHEALTHY to complete (drain) whereas no new 
 container can be assigned to it until it turns healthy again.
 This mechanism can also be used for graceful decommissioning of NM. To this 
 end, we have to write a health script  such that it can deterministically 
 report UNHEALTHY. For example with 
 {code}
 if [ -e $1 ] ; then   
  
   echo ERROR Node decommmissioning via health script hack 
  
 fi 
 {code}
 In the current version patch, the behavior is controlled by a boolean 
 property {{yarn.nodemanager.unhealthy.drain.containers}}. More versatile 
 policies are possible in the future work. Currently, the health state of a 
 node is binary determined based on the disk checker and the health script 
 ERROR outputs. However, we can as well interpret health script output similar 
 to java logging levels (one of which is ERROR) such as WARN, FATAL. Each 
 level can then be treated differently. E.g.,
 - FATAL:  unusable like today 
 - ERROR: drain
 - WARN: halve the node capacity.
 complimented with some equivalence rules such as 3 WARN messages == ERROR,  
 2*ERROR == FATAL, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2517) Implement TimelineClientAsync

2014-11-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223923#comment-14223923
 ] 

Zhijie Shen commented on YARN-2517:
---

Thanks for sharing your idea, Hitesh, Mit and Sangjin! I'd like to make some 
clarification. The error that the handler wants to take care of is not the 
communication problem, but the problem that happens when the server is 
processing the posted timeline entity (See TimelinePutResponse). It could be 
the data integrity issue that is messed up by the app. For example, the posted 
Entity A in Domain 1 is trying to relate to Entity B in Domain 2. It's fine if 
in some use cases, the app doesn't want to care about it, and consequently 
doesn't require an Ack. The app can just go ahead without providing the 
handler. However, it's still better to be generic enough to cover the other use 
cases where the app wants to make sure it has the timeline data persisted, or 
at least know it is succeeded or not.

Queueing/messaging layer may help to mitigate the communication problem, but 
aforementioned data problem still could happen, such that the app may still 
want to hear about the put response. However, it sounds right that the 
implementation of this layer will affect that of async call. It makes sense to 
wait until we make it clear how to make client - TS communication reliable. 
If we eventually find handler in async call is difficult and will further 
prevent optimization, a sync write (as it stands now) for critical data and an 
async write which is basically fire-and-forge sounds a reasonable alternative 
plan.

 Implement TimelineClientAsync
 -

 Key: YARN-2517
 URL: https://issues.apache.org/jira/browse/YARN-2517
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2517.1.patch, YARN-2517.2.patch


 In some scenarios, we'd like to put timeline entities in another thread no to 
 block the current one.
 It's good to have a TimelineClientAsync like AMRMClientAsync and 
 NMClientAsync. It can buffer entities, put them in a separate thread, and 
 have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()

2014-11-24 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223993#comment-14223993
 ] 

Tsuyoshi OZAWA commented on YARN-2025:
--

Thanks for your point, [~rohithsharma]. I'll take a look.

 Possible NPE in schedulers#addApplicationAttempt()
 --

 Key: YARN-2025
 URL: https://issues.apache.org/jira/browse/YARN-2025
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2025.1.patch


 In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we 
 don't check whether {{application}} is null. This can cause NPE in following 
 sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) 
 - addApplicationAttempt().
 {code}
 SchedulerApplication application =
 applications.get(applicationAttemptId.getApplicationId());
 String user = application.getUser();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class

2014-11-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224012#comment-14224012
 ] 

Jian He commented on YARN-2404:
---

looks good, one minor thing:
we could  just do return after checking attemptTokens == null
{code}
   if(attemptTokens == null) {
  builder.clearAppAttemptTokens();
}
{code}

 Remove ApplicationAttemptState and ApplicationState class in RMStateStore 
 class 
 

 Key: YARN-2404
 URL: https://issues.apache.org/jira/browse/YARN-2404
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2404.1.patch, YARN-2404.2.patch, YARN-2404.3.patch, 
 YARN-2404.4.patch, YARN-2404.5.patch, YARN-2404.6.patch, YARN-2404.7.patch


 We can remove ApplicationState and ApplicationAttemptState class in 
 RMStateStore, given that we already have ApplicationStateData and 
 ApplicationAttemptStateData records. we may just replace ApplicationState 
 with ApplicationStateData, similarly for ApplicationAttemptState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2014-11-24 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2637:
--
Attachment: YARN-2637.12.patch

Added test specific to changed behavior, all existing tests should still pass, 
this patch should be ready for review.

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
 YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()

2014-11-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224028#comment-14224028
 ] 

Jian He commented on YARN-2025:
---

bq. This looks very strange that how can RMApp-FAILED but RMAppAttempt-null..?
YARN-2834 should fix this. [~rohithsharma], Are you running a build with the 
patch or without ?

 Possible NPE in schedulers#addApplicationAttempt()
 --

 Key: YARN-2025
 URL: https://issues.apache.org/jira/browse/YARN-2025
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2025.1.patch


 In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we 
 don't check whether {{application}} is null. This can cause NPE in following 
 sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) 
 - addApplicationAttempt().
 {code}
 SchedulerApplication application =
 applications.get(applicationAttemptId.getApplicationId());
 String user = application.getUser();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2014-11-24 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224034#comment-14224034
 ] 

Craig Welch commented on YARN-2637:
---

One open question still in my mind is whether or not the configuration 
parameter should be changed to actually behave as a percent.  Other things so 
named (userlimit, at least) are actually a percentage - and the name of this 
parameter tends to suggest that - but it is actually just a float value (so you 
would use .1 to limit to 10 percent of cluster resource, not 10...).  I did 
take a pass at making the change, it looks doable (with quite a few more test 
changes...).  On the one hand, it seems like the time to make this change, as 
the meaning of the value is changing considerably as it is.  On the other hand, 
it may be more impact than we want - as users who have configured, say, .3, 
will still have about the same behavior on a sizable cluster as they do today 
with the change as it is now, but if we modify it to actually behave as a 
percent value (e.g. / 100), then it will have a far more limiting impact (if 
the users do not adjust their configuration).  Thoughts?  Myself, I can see 
arguments both ways, though I'm leaning toward making the change to remove all 
of the surprise factor of how this parameter works... (e.g. make it a proper 
% value)

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
 YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >