[jira] [Updated] (YARN-1702) Expose kill app functionality as part of RM web services

2014-02-24 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-1702:


Attachment: apache-yarn-1702.5.patch

 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, 
 apache-yarn-1702.4.patch, apache-yarn-1702.5.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1702) Expose kill app functionality as part of RM web services

2014-02-24 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-1702:


Attachment: (was: apache-yarn-1702.5.patch)

 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, 
 apache-yarn-1702.4.patch, apache-yarn-1702.5.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services

2014-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910095#comment-13910095
 ] 

Hadoop QA commented on YARN-1702:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12630626/apache-yarn-1702.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3162//console

This message is automatically generated.

 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, 
 apache-yarn-1702.4.patch, apache-yarn-1702.5.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1702) Expose kill app functionality as part of RM web services

2014-02-24 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-1702:


Attachment: apache-yarn-1702.5.patch

 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, 
 apache-yarn-1702.4.patch, apache-yarn-1702.5.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1702) Expose kill app functionality as part of RM web services

2014-02-24 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-1702:


Attachment: (was: apache-yarn-1702.5.patch)

 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, 
 apache-yarn-1702.4.patch, apache-yarn-1702.5.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services

2014-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910142#comment-13910142
 ] 

Hadoop QA commented on YARN-1702:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12630633/apache-yarn-1702.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3163//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3163//console

This message is automatically generated.

 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, 
 apache-yarn-1702.4.patch, apache-yarn-1702.5.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1686) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.

2014-02-24 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1686:
-

Attachment: YARN-1686.2.patch

Thank you vinod for your reviewing patch.

I have updated the patch addressing all your comments. Please review new patch.

Jian He, tx for motivation.:-)

 NodeManager.resyncWithRM() does not handle exception which cause NodeManger 
 to Hang.
 

 Key: YARN-1686
 URL: https://issues.apache.org/jira/browse/YARN-1686
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 3.0.0

 Attachments: YARN-1686.1.patch, YARN-1686.2.patch


 During start of NodeManager,if registration with resourcemanager throw 
 exception then nodemager shutdown happens. 
 Consider case where NM-1 is registered with RM. RM issued Resync to NM. If 
 any exception thrown in resyncWithRM (starts new thread which does not 
 handle exception) during RESYNC evet, then this thread is lost. NodeManger 
 enters hanged state. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1686) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.

2014-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910219#comment-13910219
 ] 

Hadoop QA commented on YARN-1686:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630643/YARN-1686.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3164//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3164//console

This message is automatically generated.

 NodeManager.resyncWithRM() does not handle exception which cause NodeManger 
 to Hang.
 

 Key: YARN-1686
 URL: https://issues.apache.org/jira/browse/YARN-1686
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 3.0.0

 Attachments: YARN-1686.1.patch, YARN-1686.2.patch


 During start of NodeManager,if registration with resourcemanager throw 
 exception then nodemager shutdown happens. 
 Consider case where NM-1 is registered with RM. RM issued Resync to NM. If 
 any exception thrown in resyncWithRM (starts new thread which does not 
 handle exception) during RESYNC evet, then this thread is lost. NodeManger 
 enters hanged state. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2014-02-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910411#comment-13910411
 ] 

Jason Lowe commented on YARN-221:
-

bq. We can have RM AM wait for notification as in container exit - NM notifies 
RM - RM notifies AM. That will create some delay for AM to declare the job is 
done. With the NM - RM heartbeat value used in big clusters, it could add 
couple seconds delay for the job. That might not be a big deal for regular MR 
jobs.

The NM does out-of-band heartbeats when containers exit, so the turnaround time 
can be shorter than a full NM heartbeat interval. 

If we're really concerned about any additional time added for graceful task 
exit we can also have the AM unregister when the job succeeds/fails but before 
all tasks exit, and eventually the RM will kill all containers of the 
application when the AM eventually exits (or times out waiting).  In that sense 
it would not add any time from the job client's perspective, as the job could 
report completion at the same time it did before.  However it would add some 
time from the YARN perspective, as the application is lingering on the cluster 
a few extra seconds in the FINISHING state than it did before.

bq. One thing to add we need the definition and policy on how to handle those 
tasks that are in the finishing state and MR AM ends up stopping them as they 
don't exit by themselves.

I don't think we need to get too tricky here.  The NM will see the container 
return a non-zero exit code and assume that's failure.  If tasks are succeeding 
but returning non-zero exit codes then that's probably a bug and arguably a 
good thing we're grabbing the logs to show what went wrong when it tried to 
tear down.  IMHO we should fix what's causing the non-zero exit code rather 
than try to add a mechanism to prevent logs from being aggregated in what 
should be a rare and abnormal case.

 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Robert Joseph Evans
Assignee: Chris Trezzo
 Attachments: YARN-221-trunk-v1.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1336) Work-preserving nodemanager restart

2014-02-24 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1336:
-

Attachment: YARN-1336-rollup.patch

Attaching a rollup patch for the prototype that [~raviprak] and I developed.  
This recovers resource localization state, applications and containers, tokens, 
log aggregation, deletion service, and the MR shuffle auxiliary service.  A 
quick high-level overview:

- Restart functionality is enabled by configuring 
yarn.nodemanager.recovery.enabled to true and yarn.nodemanager.recovery.dir to 
a directory on the local filesystem where the state will be stored.
- Containers are launched with an additional shell layer which places the exit 
code of the container in an .exitcode file.  This allows the restarted NM 
instance to recover containers that are already running or have exited since 
the last NM instance.
- NMStateStoreService is the abstraction layer for the state store.  
NMNullStateStoreService is used when recovery is disabled and 
NMLevelDBStateStoreService is used when it is enabled.
- Rather than explicitly record localized resource reference counts, resources 
are recovered with no references and recovered containers re-request their 
resources as during a normal container lifecycle to restore the reference 
counts.

Some things that are still missing:
- ability to distinguish shutdown for restart vs. decommission
- proper handling of state store errors
- adding unit tests
- adding formal documentation.

Feedback is greatly appreciated.  I'll be working on addressing the missing 
items and splitting the patch into smaller pieces across the appropriate 
subtasks to simplify reviews.

 Work-preserving nodemanager restart
 ---

 Key: YARN-1336
 URL: https://issues.apache.org/jira/browse/YARN-1336
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
 Attachments: YARN-1336-rollup.patch


 This serves as an umbrella ticket for tasks related to work-preserving 
 nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (YARN-1336) Work-preserving nodemanager restart

2014-02-24 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned YARN-1336:


Assignee: Jason Lowe

 Work-preserving nodemanager restart
 ---

 Key: YARN-1336
 URL: https://issues.apache.org/jira/browse/YARN-1336
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1336-rollup.patch


 This serves as an umbrella ticket for tasks related to work-preserving 
 nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits

2014-02-24 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910615#comment-13910615
 ] 

Robert Kanter commented on YARN-1490:
-

By the way, the issue I mentioned a few comments 
[up|https://issues.apache.org/jira/browse/YARN-1490?focusedCommentId=13895329page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13895329]
 is actually now fixed by YARN-1689.  

 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Fix For: 2.4.0

 Attachments: YARN-1490.1.patch, YARN-1490.10.patch, 
 YARN-1490.11.patch, YARN-1490.11.patch, YARN-1490.12.patch, 
 YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, 
 YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch, YARN-1490.9.patch, 
 org.apache.oozie.service.TestRecoveryService_thread-dump.txt


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1730) Leveldb timeline store needs simple write locking

2014-02-24 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910612#comment-13910612
 ] 

Billie Rinaldi commented on YARN-1730:
--

I don't think using hold count will be sufficient.  The hold count only returns 
the number of holds that have been obtained by the current thread.  So as soon 
as the current thread is done with the lock, it would drop the lock from the 
lock map, which is not what we want.

 Leveldb timeline store needs simple write locking
 -

 Key: YARN-1730
 URL: https://issues.apache.org/jira/browse/YARN-1730
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1730.1.patch, YARN-1730.2.patch


 The actual data writes are performed atomically in a batch, but a lock should 
 be held while identifying a start time for the entity, which precedes every 
 write.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1755) Add support for web services to the WebApp proxy

2014-02-24 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-1755:
---

 Summary: Add support for web services to the WebApp proxy
 Key: YARN-1755
 URL: https://issues.apache.org/jira/browse/YARN-1755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev


The RM currently has an inbuilt web proxy that is used to serve requests. The 
web proxy is necessary for security reasons which are described on the Apache 
Hadoop website 
(http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html).
  The web application proxy is a part of YARN and can be configured to run as a 
standalone proxy. Currently, the RM itself supports web services. Adding 
support for all the web service calls in the web app proxy allows it to support 
failover and retry for all web services. The changes involved are the following 
–
a.  Add support for web service calls to the RM web application proxy and 
have it make the equivalent RPC calls.
b.  Add support for failover and retry to the web application proxy. We can 
refactor a lot of the existing client code from the Yarn client.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1686) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.

2014-02-24 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1686:
--

Attachment: YARN-1686.3.patch

Same patch as before but with a test time-out. Will check it in once Jenkins 
says okay..

 NodeManager.resyncWithRM() does not handle exception which cause NodeManger 
 to Hang.
 

 Key: YARN-1686
 URL: https://issues.apache.org/jira/browse/YARN-1686
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Rohith
Assignee: Rohith
 Attachments: YARN-1686.1.patch, YARN-1686.2.patch, YARN-1686.3.patch


 During start of NodeManager,if registration with resourcemanager throw 
 exception then nodemager shutdown happens. 
 Consider case where NM-1 is registered with RM. RM issued Resync to NM. If 
 any exception thrown in resyncWithRM (starts new thread which does not 
 handle exception) during RESYNC evet, then this thread is lost. NodeManger 
 enters hanged state. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-986) YARN should use cluster-id as token service address

2014-02-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910690#comment-13910690
 ] 

Vinod Kumar Vavilapalli commented on YARN-986:
--

Couldn't find time last week, will look at it today..

 YARN should use cluster-id as token service address
 ---

 Key: YARN-986
 URL: https://issues.apache.org/jira/browse/YARN-986
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-986-1.patch, yarn-986-prelim-0.patch


 This needs to be done to support non-ip based fail over of RM. Once the 
 server sets the token service address to be this generic ClusterId/ServiceId, 
 clients can translate it to appropriate final IP and then be able to select 
 tokens via TokenSelectors.
 Some workarounds for other related issues were put in place at YARN-945.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1754) Container process is not really killed

2014-02-24 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910693#comment-13910693
 ] 

Gera Shegalov commented on YARN-1754:
-

Get https://github.com/jerrykuch/ersatz-setsid and make sure that setsid is on 
your standard PATH.

 Container process is not really killed
 --

 Key: YARN-1754
 URL: https://issues.apache.org/jira/browse/YARN-1754
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
 Environment: Mac
Reporter: Jeff Zhang

 I test the following distributed shell example on my mac:
 hadoop jar 
 share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar 
 -appname shell -jar 
 share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar 
 -shell_command=sleep -shell_args=10 -num_containers=1
 And it will start 2 process for one container, one is the shell process, 
 another is the real command I execute ( here is sleep 10). 
 And then I kill this application by running command yarn application -kill 
 app_id
 it will kill the shell process, but won't kill the real command process. The 
 reason is that yarn use kill command to kill process, but it won't kill its 
 child process. use pkill could resolve this issue.
 IMHO, it is a very important case which will make the resource usage 
 inconsistency, and have potential security problem. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits

2014-02-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910697#comment-13910697
 ] 

Vinod Kumar Vavilapalli commented on YARN-1490:
---

Thanks for the update [~rkanter].

 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Fix For: 2.4.0

 Attachments: YARN-1490.1.patch, YARN-1490.10.patch, 
 YARN-1490.11.patch, YARN-1490.11.patch, YARN-1490.12.patch, 
 YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, 
 YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch, YARN-1490.9.patch, 
 org.apache.oozie.service.TestRecoveryService_thread-dump.txt


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1741) XInclude support broken for YARN ResourceManager

2014-02-24 Thread Eric Sirianni (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910703#comment-13910703
 ] 

Eric Sirianni commented on YARN-1741:
-

Yes - This was the approach I was planning on investigating with a potential 
patch.  The trick is how to most cleanly get that to work with the 
{{ConfigurationProvider}} API.  Two main approaches seem possible:
# Change {{ConfigurationProvider.getConfigurationInputStream()}} to return a 
{{(String, InputStream)}} pair.
# Change {{ConfigurationProvider}} to provide directly into the 
{{Configuration}} object itself.  Something like 
{{ConfigurationProvider.provideTo(Configuration conf)}}.  With this approach, 
the different {{ConfigurationProvider}} subclasses could invoke the specific 
{{conf.addResource()}} overload that made sense for the subclass.

Based on investigating the usages of 
{{ConfigurationProvider.getConfigurationInputStream()}}, I was leaning towards 
the 2nd approach.

 XInclude support broken for YARN ResourceManager
 

 Key: YARN-1741
 URL: https://issues.apache.org/jira/browse/YARN-1741
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Eric Sirianni
Priority: Minor
  Labels: regression

 The XInclude support in Hadoop configuration files (introduced via 
 HADOOP-4944) was broken by the recent {{ConfigurationProvider}} changes to 
 YARN ResourceManager.  Specifically, YARN-1459 and, more generally, the 
 YARN-1611 family of JIRAs for ResourceManager HA.
 The issue is that {{ConfigurationProvider}} provides a raw {{InputStream}} as 
 a {{Configuration}} resource for what was previously a {{Path}}-based 
 resource.  
 For {{Path}} resources, the absolute file path is used as the {{systemId}} 
 for the {{DocumentBuilder.parse()}} call:
 {code}
   } else if (resource instanceof Path) {  // a file resource
 ...
   doc = parse(builder, new BufferedInputStream(
   new FileInputStream(file)), ((Path)resource).toString());
 }
 {code}
 The {{systemId}} is used to resolve XIncludes (among other things):
 {code}
 /**
  * Parse the content of the given codeInputStream/code as an
  * XML document and return a new DOM Document object.
 ...
  * @param systemId Provide a base for resolving relative URIs.
 ...
  */
 public Document parse(InputStream is, String systemId)
 {code}
 However, for loading raw {{InputStream}} resources, the {{systemId}} is set 
 to {{null}}:
 {code}
   } else if (resource instanceof InputStream) {
 doc = parse(builder, (InputStream) resource, null);
 {code}
 causing XInclude resolution to fail.
 In our particular environment, we make extensive use of XIncludes to 
 standardize common configuration parameters across multiple Hadoop clusters.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1740) Redirection from AM-URL is broken with HTTPS_ONLY policy

2014-02-24 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1740:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1280

 Redirection from AM-URL is broken with HTTPS_ONLY policy
 

 Key: YARN-1740
 URL: https://issues.apache.org/jira/browse/YARN-1740
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora
Assignee: Jian He
 Attachments: YARN-1740.1.patch


 Steps to reproduce:
 1) Run a sleep job
 2) Run: yarn application -list command to find AM URL.
 root@host1:~# yarn application -list
 Total number of applications (application-types: [] and states: SUBMITTED, 
 ACCEPTED, RUNNING):1
 Application-Id Application-Name Application-Type User Queue State Final-State 
 Progress Tracking-URL
 application_1383251398986_0003 Sleep job MAPREDUCE hdfs default RUNNING 
 UNDEFINED 5% http://host1:40653
 3) Try to access http://host1:40653/ws/v1/mapreduce/info; url.
 This URL redirects to 
 http://RM_host:RM_https_port/proxy/application_1383251398986_0003/ws/v1/mapreduce/info
 Here, Http protocol is used with HTTPS port for RM.
 The expected Url is 
 https://RM_host:RM_https_port/proxy/application_1383251398986_0003/ws/v1/mapreduce/info



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1515) Ability to dump the container threads and stop the containers in a single RPC

2014-02-24 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated YARN-1515:


Attachment: YARN-1515.v05.patch

v05 adds auto thread dump for stuck AM's as well.

 Ability to dump the container threads and stop the containers in a single RPC
 -

 Key: YARN-1515
 URL: https://issues.apache.org/jira/browse/YARN-1515
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, nodemanager
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, 
 YARN-1515.v03.patch, YARN-1515.v04.patch, YARN-1515.v05.patch


 This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for 
 timed-out task attempts.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1686) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.

2014-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910731#comment-13910731
 ] 

Hadoop QA commented on YARN-1686:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630777/YARN-1686.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3165//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3165//console

This message is automatically generated.

 NodeManager.resyncWithRM() does not handle exception which cause NodeManger 
 to Hang.
 

 Key: YARN-1686
 URL: https://issues.apache.org/jira/browse/YARN-1686
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Rohith
Assignee: Rohith
 Attachments: YARN-1686.1.patch, YARN-1686.2.patch, YARN-1686.3.patch


 During start of NodeManager,if registration with resourcemanager throw 
 exception then nodemager shutdown happens. 
 Consider case where NM-1 is registered with RM. RM issued Resync to NM. If 
 any exception thrown in resyncWithRM (starts new thread which does not 
 handle exception) during RESYNC evet, then this thread is lost. NodeManger 
 enters hanged state. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active

2014-02-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910744#comment-13910744
 ] 

Jian He commented on YARN-1734:
---

ServiceFailedException is also one type of IOException that will be retried in 
RPC level by RMProxy 

 RM should get the updated Configurations when it transits from Standby to 
 Active
 

 Key: YARN-1734
 URL: https://issues.apache.org/jira/browse/YARN-1734
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, 
 YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch


 Currently, we have ConfigurationProvider which can support 
 LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and 
 FileSystemBasedConfiguration is enabled, RM can not get the updated 
 Configurations when it transits from Standby to Active



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active

2014-02-24 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910756#comment-13910756
 ] 

Xuan Gong commented on YARN-1734:
-

bq. ServiceFailedException is also one type of IOException that will be retried 
in RPC level by RMProxy

In HA, we provide different RetryPolicy which is failoverOnNetworkException

 RM should get the updated Configurations when it transits from Standby to 
 Active
 

 Key: YARN-1734
 URL: https://issues.apache.org/jira/browse/YARN-1734
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, 
 YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch


 Currently, we have ConfigurationProvider which can support 
 LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and 
 FileSystemBasedConfiguration is enabled, RM can not get the updated 
 Configurations when it transits from Standby to Active



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1741) XInclude support broken for YARN ResourceManager

2014-02-24 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910759#comment-13910759
 ] 

Xuan Gong commented on YARN-1741:
-

Noted that ConfigurationProvider not only provides inputstream for 
Configuration files, also providers the inputStream for include_node file and 
exclude_node file

 XInclude support broken for YARN ResourceManager
 

 Key: YARN-1741
 URL: https://issues.apache.org/jira/browse/YARN-1741
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Eric Sirianni
Priority: Minor
  Labels: regression

 The XInclude support in Hadoop configuration files (introduced via 
 HADOOP-4944) was broken by the recent {{ConfigurationProvider}} changes to 
 YARN ResourceManager.  Specifically, YARN-1459 and, more generally, the 
 YARN-1611 family of JIRAs for ResourceManager HA.
 The issue is that {{ConfigurationProvider}} provides a raw {{InputStream}} as 
 a {{Configuration}} resource for what was previously a {{Path}}-based 
 resource.  
 For {{Path}} resources, the absolute file path is used as the {{systemId}} 
 for the {{DocumentBuilder.parse()}} call:
 {code}
   } else if (resource instanceof Path) {  // a file resource
 ...
   doc = parse(builder, new BufferedInputStream(
   new FileInputStream(file)), ((Path)resource).toString());
 }
 {code}
 The {{systemId}} is used to resolve XIncludes (among other things):
 {code}
 /**
  * Parse the content of the given codeInputStream/code as an
  * XML document and return a new DOM Document object.
 ...
  * @param systemId Provide a base for resolving relative URIs.
 ...
  */
 public Document parse(InputStream is, String systemId)
 {code}
 However, for loading raw {{InputStream}} resources, the {{systemId}} is set 
 to {{null}}:
 {code}
   } else if (resource instanceof InputStream) {
 doc = parse(builder, (InputStream) resource, null);
 {code}
 causing XInclude resolution to fail.
 In our particular environment, we make extensive use of XIncludes to 
 standardize common configuration parameters across multiple Hadoop clusters.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1741) XInclude support broken for YARN ResourceManager

2014-02-24 Thread Eric Sirianni (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910766#comment-13910766
 ] 

Eric Sirianni commented on YARN-1741:
-

OK - approach 2 would not work then.  I thought when I did a usage search that 
all callers of {{ConfigurationProvider.getConfigurationInputStream()}} were 
immediately handing the returned {{InputStream}} to a {{Configuration}} object. 
 Guess I missed some usages.

 XInclude support broken for YARN ResourceManager
 

 Key: YARN-1741
 URL: https://issues.apache.org/jira/browse/YARN-1741
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Eric Sirianni
Priority: Minor
  Labels: regression

 The XInclude support in Hadoop configuration files (introduced via 
 HADOOP-4944) was broken by the recent {{ConfigurationProvider}} changes to 
 YARN ResourceManager.  Specifically, YARN-1459 and, more generally, the 
 YARN-1611 family of JIRAs for ResourceManager HA.
 The issue is that {{ConfigurationProvider}} provides a raw {{InputStream}} as 
 a {{Configuration}} resource for what was previously a {{Path}}-based 
 resource.  
 For {{Path}} resources, the absolute file path is used as the {{systemId}} 
 for the {{DocumentBuilder.parse()}} call:
 {code}
   } else if (resource instanceof Path) {  // a file resource
 ...
   doc = parse(builder, new BufferedInputStream(
   new FileInputStream(file)), ((Path)resource).toString());
 }
 {code}
 The {{systemId}} is used to resolve XIncludes (among other things):
 {code}
 /**
  * Parse the content of the given codeInputStream/code as an
  * XML document and return a new DOM Document object.
 ...
  * @param systemId Provide a base for resolving relative URIs.
 ...
  */
 public Document parse(InputStream is, String systemId)
 {code}
 However, for loading raw {{InputStream}} resources, the {{systemId}} is set 
 to {{null}}:
 {code}
   } else if (resource instanceof InputStream) {
 doc = parse(builder, (InputStream) resource, null);
 {code}
 causing XInclude resolution to fail.
 In our particular environment, we make extensive use of XIncludes to 
 standardize common configuration parameters across multiple Hadoop clusters.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1619) Add cli to kill yarn container

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1619:


Fix Version/s: (was: 2.3.0)
   2.4.0

 Add cli to kill yarn container
 --

 Key: YARN-1619
 URL: https://issues.apache.org/jira/browse/YARN-1619
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Ramya Sunil
 Fix For: 2.4.0


 It will be useful to have a generic cli tool to kill containers.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1621) Add CLI to list states of yarn container-IDs/hosts

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1621:


Fix Version/s: (was: 2.3.0)
   2.4.0

 Add CLI to list states of yarn container-IDs/hosts
 --

 Key: YARN-1621
 URL: https://issues.apache.org/jira/browse/YARN-1621
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Tassapol Athiapinya
 Fix For: 2.4.0


 As more applications are moved to YARN, we need generic CLI to list states of 
 yarn containers and their hosts. Today if YARN application running in a 
 container does hang, there is no way other than to manually kill its process.
 For each running application, it is useful to differentiate between 
 running/succeeded/failed/killed containers. 
 {code:title=proposed yarn cli}
 $ yarn application -list-containers appId status
 where status is one of running/succeeded/killed/failed/all
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1334) YARN should give more info on errors when running failed distributed shell command

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1334:


Fix Version/s: (was: 2.3.0)
   2.4.0

 YARN should give more info on errors when running failed distributed shell 
 command
 --

 Key: YARN-1334
 URL: https://issues.apache.org/jira/browse/YARN-1334
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1334.1.patch


 Run incorrect command such as:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distributedshell jar -shell_command ./test1.sh -shell_script ./
 would show shell exit code exception with no useful message. It should print 
 out sysout/syserr of containers/AM of why it is failing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1514:


Fix Version/s: (was: 2.3.0)
   2.4.0

 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
 

 Key: YARN-1514
 URL: https://issues.apache.org/jira/browse/YARN-1514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.4.0


 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
 YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
 called when RM-HA cluster does failover. Therefore, its execution time 
 impacts failover time of RM-HA.
 We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
 as development tool.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1147) Add end-to-end tests for HA

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1147:


Fix Version/s: (was: 2.3.0)
   2.4.0

 Add end-to-end tests for HA
 ---

 Key: YARN-1147
 URL: https://issues.apache.org/jira/browse/YARN-1147
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.4.0


 While individual sub-tasks add tests for the code they include, it will be 
 handy to write end-to-end tests for HA including some stress testing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1301) Need to log the blacklist additions/removals when YarnSchedule#allocate

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1301:


Fix Version/s: (was: 2.3.0)
   2.4.0

 Need to log the blacklist additions/removals when YarnSchedule#allocate
 ---

 Key: YARN-1301
 URL: https://issues.apache.org/jira/browse/YARN-1301
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Fix For: 2.4.0

 Attachments: YARN-1301.1.patch, YARN-1301.2.patch, YARN-1301.3.patch, 
 YARN-1301.4.patch, YARN-1301.5.patch


 Now without the log, it's hard to debug whether blacklist is updated on the 
 scheduler side or not



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1561) Fix a generic type warning in FairScheduler

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1561:


Fix Version/s: (was: 2.3.0)
   2.4.0

 Fix a generic type warning in FairScheduler
 ---

 Key: YARN-1561
 URL: https://issues.apache.org/jira/browse/YARN-1561
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Junping Du
Assignee: Chen He
Priority: Minor
  Labels: newbie
 Fix For: 2.4.0

 Attachments: yarn-1561.patch


 The Comparator below should be specified with type:
 private Comparator nodeAvailableResourceComparator =
   new NodeAvailableResourceComparator(); 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1142:


Fix Version/s: (was: 2.3.0)
   2.4.0

 MiniYARNCluster web ui does not work properly
 -

 Key: YARN-1142
 URL: https://issues.apache.org/jira/browse/YARN-1142
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
 Fix For: 2.4.0


 When going to the RM http port, the NM web ui is displayed. It seems there is 
 a singleton somewhere that breaks things when RM  NMs run in the same 
 process.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1375) RM logs get filled with scheduler monitor logs when we enable scheduler monitoring

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1375:


Fix Version/s: (was: 2.3.0)
   2.4.0

 RM logs get filled with scheduler monitor logs when we enable scheduler 
 monitoring
 --

 Key: YARN-1375
 URL: https://issues.apache.org/jira/browse/YARN-1375
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Assignee: haosdent
 Fix For: 2.4.0

 Attachments: YARN-1375.patch


 When we enable scheduler monitor, it is filling the RM logs with the same 
 queue states periodically. We can log only when any difference with the 
 previous state instead of logging the same message. 
 {code:xml}
 2013-10-30 23:30:08,464 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156008464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:11,464 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156011464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:14,465 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156014465, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:17,466 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156017466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:20,466 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156020466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:23,467 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156023467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:26,468 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156026467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:29,468 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156029468, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:32,469 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156032469, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-745) Move UnmanagedAMLauncher to yarn client package

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-745:
---

Fix Version/s: (was: 2.3.0)
   2.4.0

 Move UnmanagedAMLauncher to yarn client package
 ---

 Key: YARN-745
 URL: https://issues.apache.org/jira/browse/YARN-745
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Fix For: 2.4.0


 Its currently sitting in yarn applications project which sounds wrong. client 
 project sounds better since it contains the utilities/libraries that clients 
 use to write and debug yarn applications.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1330) Fair Scheduler: defaultQueueSchedulingPolicy does not take effect

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1330:


Fix Version/s: (was: 2.3.0)
   2.4.0

 Fair Scheduler: defaultQueueSchedulingPolicy does not take effect
 -

 Key: YARN-1330
 URL: https://issues.apache.org/jira/browse/YARN-1330
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.4.0

 Attachments: YARN-1330-1.patch, YARN-1330-1.patch, YARN-1330.patch


 The defaultQueueSchedulingPolicy property for the Fair Scheduler allocations 
 file doesn't take effect.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1477) No Submit time on AM web pages

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1477:


Fix Version/s: (was: 2.3.0)
   2.4.0

 No Submit time on AM web pages
 --

 Key: YARN-1477
 URL: https://issues.apache.org/jira/browse/YARN-1477
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Chen He
Assignee: Chen He
  Labels: features
 Fix For: 2.4.0


 Similar to MAPREDUCE-5052, This is a fix on AM side. Add submitTime field to 
 the AM's web services REST API



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1234:


Fix Version/s: (was: 2.3.0)
   2.4.0

  Container localizer logs are not created in secured cluster
 

 Key: YARN-1234
 URL: https://issues.apache.org/jira/browse/YARN-1234
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Fix For: 2.4.0


 When we are running ContainerLocalizer in secured cluster we potentially are 
 not creating any log file to track log messages. This will be helpful in 
 potentially identifying ContainerLocalization issues in secured cluster.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1156) Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1156:


Fix Version/s: (was: 2.3.0)
   2.4.0

 Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values
 -

 Key: YARN-1156
 URL: https://issues.apache.org/jira/browse/YARN-1156
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
Priority: Minor
  Labels: metrics, newbie
 Fix For: 2.4.0

 Attachments: YARN-1156.1.patch


 AllocatedGB and AvailableGB metrics are now integer type. If there are four 
 times 500MB memory allocation to container, AllocatedGB is incremented four 
 times by {{(int)500/1024}}, which means 0. That is, the memory size allocated 
 is actually 2000MB, but the metrics shows 0GB. Let's use float type for these 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-650) User guide for preemption

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-650:
---

Fix Version/s: (was: 2.3.0)
   2.4.0

 User guide for preemption
 -

 Key: YARN-650
 URL: https://issues.apache.org/jira/browse/YARN-650
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Chris Douglas
Priority: Minor
 Fix For: 2.4.0

 Attachments: Y650-0.patch


 YARN-45 added a protocol for the RM to ask back resources. The docs on 
 writing YARN applications should include a section on how to interpret this 
 message.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-153) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-153:
---

Fix Version/s: (was: 2.3.0)
   2.4.0

 PaaS on YARN: an YARN application to demonstrate that YARN can be used as a 
 PaaS
 

 Key: YARN-153
 URL: https://issues.apache.org/jira/browse/YARN-153
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Jacob Jaigak Song
Assignee: Jacob Jaigak Song
 Fix For: 2.4.0

 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, 
 MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, 
 MAPREDUCE4393.patch

   Original Estimate: 336h
  Time Spent: 336h
  Remaining Estimate: 0h

 This application is to demonstrate that YARN can be used for non-mapreduce 
 applications. As Hadoop has already been adopted and deployed widely and its 
 deployment in future will be highly increased, we thought that it's a good 
 potential to be used as PaaS.  
 I have implemented a proof of concept to demonstrate that YARN can be used as 
 a PaaS (Platform as a Service). I have done a gap analysis against VMware's 
 Cloud Foundry and tried to achieve as many PaaS functionalities as possible 
 on YARN.
 I'd like to check in this POC as a YARN example application.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-314:
---

Fix Version/s: (was: 2.3.0)
   2.4.0

 Schedulers should allow resource requests of different sizes at the same 
 priority and location
 --

 Key: YARN-314
 URL: https://issues.apache.org/jira/browse/YARN-314
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.4.0


 Currently, resource requests for the same container and locality are expected 
 to all be the same size.
 While it it doesn't look like it's needed for apps currently, and can be 
 circumvented by specifying different priorities if absolutely necessary, it 
 seems to me that the ability to request containers with different resource 
 requirements at the same priority level should be there for the future and 
 for completeness sake.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-160:
---

Fix Version/s: (was: 2.3.0)
   2.4.0

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
 Fix For: 2.4.0


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-113) WebAppProxyServlet must use SSLFactory for the HttpClient connections

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-113:
---

Fix Version/s: (was: 2.3.0)
   2.4.0

 WebAppProxyServlet must use SSLFactory for the HttpClient connections
 -

 Key: YARN-113
 URL: https://issues.apache.org/jira/browse/YARN-113
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.4.0


 The HttpClient must be configured to use the SSLFactory when the web UIs are 
 over HTTPS, otherwise the proxy servlet fails to connect to the AM because of 
 unknown (self-signed) certificates.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1064:


Fix Version/s: (was: 2.3.0)
   2.4.0

 YarnConfiguration scheduler configuration constants are not consistent
 --

 Key: YARN-1064
 URL: https://issues.apache.org/jira/browse/YARN-1064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Priority: Blocker
  Labels: newbie
 Fix For: 2.4.0


 Some of the scheduler configuration constants in YarnConfiguration have 
 RM_PREFIX and others YARN_PREFIX. For consistency we should move all under 
 the same prefix.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-322) Add cpu information to queue metrics

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-322:
---

Fix Version/s: (was: 2.3.0)
   2.4.0

 Add cpu information to queue metrics
 

 Key: YARN-322
 URL: https://issues.apache.org/jira/browse/YARN-322
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, scheduler
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 2.4.0


 Post YARN-2 we need to add cpu information to queue metrics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-965) NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-965:
---

Fix Version/s: (was: 2.3.0)
   2.4.0

 NodeManager Metrics containersRunning is not correct When localizing 
 container process is failed or killed
 --

 Key: YARN-965
 URL: https://issues.apache.org/jira/browse/YARN-965
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.4-alpha
 Environment: suse linux
Reporter: Li Yuan
 Fix For: 2.4.0


 When successfully launched a container, container state from LOCALIZED to 
 RUNNING, containersRunning ++. Container state from EXITED_WITH_FAILURE or 
 KILLING to DONE, containersRunning--. 
 However, state EXITED_WITH_FAILURE or KILLING could come from 
 LOCALIZING(LOCALIZED), not RUNNING, which caused containersRunningis less 
 than the actual number. Further more, Metrics is wrong, containersLaunched != 
 containersCompleted + containersFailed + containersKilled + containersRunning 
 + containersIniting



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-308) Improve documentation about what asks means in AMRMProtocol

2014-02-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-308:
---

Fix Version/s: (was: 2.3.0)
   2.4.0

 Improve documentation about what asks means in AMRMProtocol
 -

 Key: YARN-308
 URL: https://issues.apache.org/jira/browse/YARN-308
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, documentation, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.4.0

 Attachments: YARN-308.patch


 It's unclear to me from reading the javadoc exactly what asks means when 
 the AM sends a heartbeat to the RM.  Is the AM supposed to send a list of all 
 resources that it is waiting for?  Or just inform the RM about new ones that 
 it wants?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1515) Ability to dump the container threads and stop the containers in a single RPC

2014-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910783#comment-13910783
 ] 

Hadoop QA commented on YARN-1515:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630783/YARN-1515.v05.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3166//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3166//console

This message is automatically generated.

 Ability to dump the container threads and stop the containers in a single RPC
 -

 Key: YARN-1515
 URL: https://issues.apache.org/jira/browse/YARN-1515
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, nodemanager
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, 
 YARN-1515.v03.patch, YARN-1515.v04.patch, YARN-1515.v05.patch


 This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for 
 timed-out task attempts.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-02-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910788#comment-13910788
 ] 

Bikas Saha commented on YARN-1410:
--

Yes. I would like to understand why we are proposing a custom solution that 
only works for application submission instead of laying down a common pattern 
(using Retry Cache) that can be subsequently used in a uniform manner for all 
other remaining non-idempotent operations. Given then HDFS already uses that 
layer, it would be good to depend on a common framework that has already been 
debugged and proven to work on HDFS. Given that YARN and HDFS will be commonly 
deployed together, sharing these basic pieces will go a long way in making it 
easier to build/deploy and operate. Given so many pros for this approach why 
should we not invest in adopting it?

 Handle client failover during 2 step client API's like app submission
 -

 Key: YARN-1410
 URL: https://issues.apache.org/jira/browse/YARN-1410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
 YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, 
 YARN-1410.5.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 App submission involves
 1) creating appId
 2) using that appId to submit an ApplicationSubmissionContext to the user.
 The client may have obtained an appId from an RM, the RM may have failed 
 over, and the client may submit the app to the new RM.
 Since the new RM has a different notion of cluster timestamp (used to create 
 app id) the new RM may reject the app submission resulting in unexpected 
 failure on the client side.
 The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1330) Fair Scheduler: defaultQueueSchedulingPolicy does not take effect

2014-02-24 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910789#comment-13910789
 ] 

Sandy Ryza commented on YARN-1330:
--

The above issue was fixed by the AllocationFileLoaderService work.  
Re-resolving this.

 Fair Scheduler: defaultQueueSchedulingPolicy does not take effect
 -

 Key: YARN-1330
 URL: https://issues.apache.org/jira/browse/YARN-1330
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.3.0

 Attachments: YARN-1330-1.patch, YARN-1330-1.patch, YARN-1330.patch


 The defaultQueueSchedulingPolicy property for the Fair Scheduler allocations 
 file doesn't take effect.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (YARN-1330) Fair Scheduler: defaultQueueSchedulingPolicy does not take effect

2014-02-24 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza resolved YARN-1330.
--

   Resolution: Fixed
Fix Version/s: (was: 2.4.0)
   2.3.0

 Fair Scheduler: defaultQueueSchedulingPolicy does not take effect
 -

 Key: YARN-1330
 URL: https://issues.apache.org/jira/browse/YARN-1330
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.3.0

 Attachments: YARN-1330-1.patch, YARN-1330-1.patch, YARN-1330.patch


 The defaultQueueSchedulingPolicy property for the Fair Scheduler allocations 
 file doesn't take effect.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1334) YARN should give more info on errors when running failed distributed shell command

2014-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910790#comment-13910790
 ] 

Hadoop QA commented on YARN-1334:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609555/YARN-1334.1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3169//console

This message is automatically generated.

 YARN should give more info on errors when running failed distributed shell 
 command
 --

 Key: YARN-1334
 URL: https://issues.apache.org/jira/browse/YARN-1334
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1334.1.patch


 Run incorrect command such as:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distributedshell jar -shell_command ./test1.sh -shell_script ./
 would show shell exit code exception with no useful message. It should print 
 out sysout/syserr of containers/AM of why it is failing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1756) capture the time when newApplication is called in RM

2014-02-24 Thread Ming Ma (JIRA)
Ming Ma created YARN-1756:
-

 Summary: capture the time when newApplication is called in RM
 Key: YARN-1756
 URL: https://issues.apache.org/jira/browse/YARN-1756
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma


The application submission time ( when submitApplication is called) is 
collected by RM and application history server. But it doesn't capture when the 
client calls newApplication method. The delta between newApplication and 
submitApplication could be useful if the client submits large jar files. This 
metric will be useful for https://issues.apache.org/jira/browse/YARN-1492.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1327) Fix nodemgr native compilation problems on FreeBSD9

2014-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910815#comment-13910815
 ] 

Hadoop QA commented on YARN-1327:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12609276/nodemgr-portability.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3168//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3168//console

This message is automatically generated.

 Fix nodemgr native compilation problems on FreeBSD9
 ---

 Key: YARN-1327
 URL: https://issues.apache.org/jira/browse/YARN-1327
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Radim Kolar
Assignee: Radim Kolar
 Fix For: 3.0.0, 2.4.0

 Attachments: nodemgr-portability.txt


 There are several portability problems preventing from compiling native 
 component on freebsd.
 1. libgen.h is not included. correct function prototype is there but linux 
 glibc has workaround to define it for user if libgen.h is not directly 
 included. Include this file directly.
 2. query max size of login name using sysconf. it follows same code style 
 like rest of code using sysconf too.
 3. cgroups are linux only feature, make conditional compile and return error 
 if mount_cgroup is attempted on non linux OS
 4. do not use posix function setpgrp() since it clashes with same function 
 from BSD 4.2, use equivalent function. After inspecting glibc sources its 
 just shortcut to setpgid(0,0)
 These changes makes it compile on both linux and freebsd.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1375) RM logs get filled with scheduler monitor logs when we enable scheduler monitoring

2014-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910835#comment-13910835
 ] 

Hadoop QA commented on YARN-1375:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611764/YARN-1375.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3167//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3167//console

This message is automatically generated.

 RM logs get filled with scheduler monitor logs when we enable scheduler 
 monitoring
 --

 Key: YARN-1375
 URL: https://issues.apache.org/jira/browse/YARN-1375
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Assignee: haosdent
 Fix For: 2.4.0

 Attachments: YARN-1375.patch


 When we enable scheduler monitor, it is filling the RM logs with the same 
 queue states periodically. We can log only when any difference with the 
 previous state instead of logging the same message. 
 {code:xml}
 2013-10-30 23:30:08,464 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156008464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:11,464 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156011464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:14,465 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156014465, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:17,466 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156017466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:20,466 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156020466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:23,467 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156023467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:26,468 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156026467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:29,468 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156029468, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:32,469 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156032469, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 {code}



--
This message was sent by Atlassian JIRA

[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-02-24 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910845#comment-13910845
 ] 

Xuan Gong commented on YARN-1410:
-

I really doubt that the Retry cache would work for us. Look at the code on how 
they are using RetryCache. Take FSNameSystem.delete() as an example, 
{code}
  boolean delete(String src, boolean recursive)
  throws AccessControlException, SafeModeException,
  UnresolvedLinkException, IOException {
CacheEntry cacheEntry = RetryCache.waitForCompletion(retryCache);
if (cacheEntry != null  cacheEntry.isSuccess()) {
  return true; // Return previous response
}
boolean ret = false;
try {
  ret = deleteInt(src, recursive, cacheEntry != null);
} catch (AccessControlException e) {
  logAuditEvent(false, delete, src);
  throw e;
} finally {
  RetryCache.setState(cacheEntry, ret);
}
return ret;
  }
{code}

Before it starts to do the operation, it will check whether this operation is 
successful. Before it sends the response, it will mark the operation is 
successful. It will works perfectly in these HDFS operations. Because after we 
received the operation response, we can say that the operation is finished.

But this does not work for the YARN operations. Take ApplicationSubmission as 
an example, can we say applicationSubmission is finished when we receives the 
response from ClientRMService? No, we cannot make that conclusion. Then how 
will we set the state for the cahceEntry in RetryCache? Set in YarnClientImpl# 
submitApplication? Then we need to find a way to expose the RetryCache to 
client code. Or maybe we can add extra logic in ClientRMService to check 
whether the app is submitted before return back the response? Then this will 
add another hop and decrease the performance just like my old 
check-before-submission proposal.

I think that the over-all logic of RetryCache does not work, maybe not that 
useful, for the YARN operations, except that it can provide global unique ID 
for checking repeated operations. But just for providing such ID, I really do 
not think that we need to use such “complicate” structures.

Also for “proposing a custom solution”, I think the proposal that saves enough 
information, such as ClientId and ServiceId in ApplicationSubmissionContext, 
then read them back to rebuild the RetryCache , is a custom solution for 
ApplicationSubmission, too. I do not think that this way can work for other 
non-idempotent apis, such as renewDelegationToken(), etc.



 Handle client failover during 2 step client API's like app submission
 -

 Key: YARN-1410
 URL: https://issues.apache.org/jira/browse/YARN-1410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
 YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, 
 YARN-1410.5.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 App submission involves
 1) creating appId
 2) using that appId to submit an ApplicationSubmissionContext to the user.
 The client may have obtained an appId from an RM, the RM may have failed 
 over, and the client may submit the app to the new RM.
 Since the new RM has a different notion of cluster timestamp (used to create 
 app id) the new RM may reject the app submission resulting in unexpected 
 failure on the client side.
 The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-02-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910848#comment-13910848
 ] 

Karthik Kambatla commented on YARN-1410:


bq. can we say applicationSubmission is finished when we receives the response 
from ClientRMService?
I think the response of ClientRMService#submitApplication() should tell us 
whether the submission is successful or not. If that is not the case, we should 
probably fix that first. 

 Handle client failover during 2 step client API's like app submission
 -

 Key: YARN-1410
 URL: https://issues.apache.org/jira/browse/YARN-1410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
 YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, 
 YARN-1410.5.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 App submission involves
 1) creating appId
 2) using that appId to submit an ApplicationSubmissionContext to the user.
 The client may have obtained an appId from an RM, the RM may have failed 
 over, and the client may submit the app to the new RM.
 Since the new RM has a different notion of cluster timestamp (used to create 
 app id) the new RM may reject the app submission resulting in unexpected 
 failure on the client side.
 The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover

2014-02-24 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910849#comment-13910849
 ] 

Hitesh Shah commented on YARN-1666:
---

{code}
-if (!(this.configurationProvider instanceof LocalConfigurationProvider)) {
-  // load yarn-site.xml
-  this.conf =
-  this.configurationProvider.getConfiguration(this.conf,
-  YarnConfiguration.YARN_SITE_XML_FILE);
-  // load core-site.xml
-  this.conf =
-  this.configurationProvider.getConfiguration(this.conf,
-  YarnConfiguration.CORE_SITE_CONFIGURATION_FILE);
-  // Do refreshUserToGroupsMappings with loaded core-site.xml
-  Groups.getUserToGroupsMappingServiceWithLoadedConfiguration(this.conf)
-  .refresh();
-}
+
+// load yarn-site.xml
+this.conf.addResource(this.configurationProvider
+.getConfigurationInputStream(this.conf,
+YarnConfiguration.YARN_SITE_CONFIGURATION_FILE));
+// load core-site.xml
+this.conf.addResource(this.configurationProvider
+.getConfigurationInputStream(this.conf,
+YarnConfiguration.CORE_SITE_CONFIGURATION_FILE));
+// Do refreshUserToGroupsMappings with loaded core-site.xml
+Groups.getUserToGroupsMappingServiceWithLoadedConfiguration(this.conf)
+.refresh();

{code}


The above code seems to be breaking MiniClusters. Is the expectation now that 
anyone using a MiniCluster has to create the appropriate config files and add 
them into the unit test class path? 

Stack trace below:

{code}
Exception: null
java.lang.NullPointerException
  at 
org.apache.hadoop.conf.Configuration$Resource.init(Configuration.java:182)
  at org.apache.hadoop.conf.Configuration.addResource(Configuration.java:751)
  at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:193)
  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
  at 
org.apache.hadoop.yarn.server.MiniYARNCluster.initResourceManager(MiniYARNCluster.java:268)
  at 
org.apache.hadoop.yarn.server.MiniYARNCluster.access$400(MiniYARNCluster.java:90)
  at 
org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceInit(MiniYARNCluster.java:419)
{code} 


 Make admin refreshNodes work across RM failover
 ---

 Key: YARN-1666
 URL: https://issues.apache.org/jira/browse/YARN-1666
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, 
 YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, 
 YARN-1666.6.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover

2014-02-24 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910869#comment-13910869
 ] 

Hitesh Shah commented on YARN-1666:
---

http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/
 doesn't show those files.

 Make admin refreshNodes work across RM failover
 ---

 Key: YARN-1666
 URL: https://issues.apache.org/jira/browse/YARN-1666
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, 
 YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, 
 YARN-1666.6.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover

2014-02-24 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910871#comment-13910871
 ] 

Hitesh Shah commented on YARN-1666:
---

My point is that those files should be in the same jar that contains 
MiniYARNCluster.

 Make admin refreshNodes work across RM failover
 ---

 Key: YARN-1666
 URL: https://issues.apache.org/jira/browse/YARN-1666
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, 
 YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, 
 YARN-1666.6.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover

2014-02-24 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910873#comment-13910873
 ] 

Xuan Gong commented on YARN-1666:
-

But I did include them in the YARN-1666.6.patch

 Make admin refreshNodes work across RM failover
 ---

 Key: YARN-1666
 URL: https://issues.apache.org/jira/browse/YARN-1666
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, 
 YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, 
 YARN-1666.6.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1757) Auxiliary service support for nodemanager recovery

2014-02-24 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-1757:


 Summary: Auxiliary service support for nodemanager recovery
 Key: YARN-1757
 URL: https://issues.apache.org/jira/browse/YARN-1757
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe


There needs to be a mechanism for communicating to auxiliary services whether 
nodemanager recovery is enabled and where they should store their state.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-02-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910882#comment-13910882
 ] 

Karthik Kambatla commented on YARN-1410:


I guess we need to define what it means for an application submission to be 
successful. As a user, I would assume the submission is successful if the RM 
has stored it in a place it is not going to lose. In a restart/ HA setup, this 
translates to the app being saved to the store. So, 
ClientRMService#submitApplication should ideally return only after the app is 
saved. 

When a scheduler rejects an application, we should probably kick it out of the 
store or add a REJECTED final state so we don't try recovering a rejected app 
in case of a failover. 

 Handle client failover during 2 step client API's like app submission
 -

 Key: YARN-1410
 URL: https://issues.apache.org/jira/browse/YARN-1410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
 YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, 
 YARN-1410.5.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 App submission involves
 1) creating appId
 2) using that appId to submit an ApplicationSubmissionContext to the user.
 The client may have obtained an appId from an RM, the RM may have failed 
 over, and the client may submit the app to the new RM.
 Since the new RM has a different notion of cluster timestamp (used to create 
 app id) the new RM may reject the app submission resulting in unexpected 
 failure on the client side.
 The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover

2014-02-24 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910888#comment-13910888
 ] 

Hitesh Shah commented on YARN-1666:
---

[~xgong] Those newly added files are in the wrong location. 

[~vinodkv] In any case, the above committed patch seems a bit wrong to me. If 
someone is using a Configuration object with loaded resources, say core-site, 
yarn-site and foo-site followed by some Configuration::set() calls, the above 
code will override all conflicting settings. This seems wrong especially in the 
MiniYARNCluster case. 

 Make admin refreshNodes work across RM failover
 ---

 Key: YARN-1666
 URL: https://issues.apache.org/jira/browse/YARN-1666
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, 
 YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, 
 YARN-1666.6.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover

2014-02-24 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910896#comment-13910896
 ] 

Hitesh Shah commented on YARN-1666:
---

Done. See related jiras for the new issues filed. 

 Make admin refreshNodes work across RM failover
 ---

 Key: YARN-1666
 URL: https://issues.apache.org/jira/browse/YARN-1666
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, 
 YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, 
 YARN-1666.6.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1758) MiniYARNCluster broken post YARN-1666

2014-02-24 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910894#comment-13910894
 ] 

Hitesh Shah commented on YARN-1758:
---

Exception: null
java.lang.NullPointerException
  at 
org.apache.hadoop.conf.Configuration$Resource.init(Configuration.java:182)
  at org.apache.hadoop.conf.Configuration.addResource(Configuration.java:751)
  at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:193)
  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
  at 
org.apache.hadoop.yarn.server.MiniYARNCluster.initResourceManager(MiniYARNCluster.java:268)
  at 
org.apache.hadoop.yarn.server.MiniYARNCluster.access$400(MiniYARNCluster.java:90)
  at 
org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceInit(MiniYARNCluster.java:419)

 MiniYARNCluster broken post YARN-1666
 -

 Key: YARN-1758
 URL: https://issues.apache.org/jira/browse/YARN-1758
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 NPE seen when trying to use MiniYARNCluster



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1758) MiniYARNCluster broken post YARN-1666

2014-02-24 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-1758:
--

Description: NPE seen when trying to use MiniYARNCluster

 MiniYARNCluster broken post YARN-1666
 -

 Key: YARN-1758
 URL: https://issues.apache.org/jira/browse/YARN-1758
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 NPE seen when trying to use MiniYARNCluster



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover

2014-02-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910889#comment-13910889
 ] 

Vinod Kumar Vavilapalli commented on YARN-1666:
---

[~hitesh]/[~xgong], can you file a ticket? Unless it's a minor tweak to the 
committed patch.

 Make admin refreshNodes work across RM failover
 ---

 Key: YARN-1666
 URL: https://issues.apache.org/jira/browse/YARN-1666
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, 
 YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, 
 YARN-1666.6.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-02-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910898#comment-13910898
 ] 

Bikas Saha commented on YARN-1410:
--

There is considerable confusion here. I havent seen the latest code but here is 
my understanding of app submission in yarn.
1) client call submitApp(). this submits the app context and returns success or 
failure after initial static checks.
2) if success is returned then client call getAppReport() and waits for the app 
to be accepted. If the app gets accepted, then client reports success to use 
that app has been successfully submitted. Else app submission fails.

Now there can be retries in step 1) or step 2). Step 2 is idempotent. We dont 
need to worry about that. Step 1) is non-idempotent. With the retry cache 
approach, upon retry (directly to the same RM or to a failed over RM), a 
correctly working RetryCache will return the same response as was originally 
sent by the RM. So if the RM returned success, RetryCache will return success. 
If the RM returned immediate failure (based on static checks) then the 
RetryCache will return failure. Its not clear to me why this would cause issues 
or why it wont work in YARN.

The RetryCache is used for per RPC retries. It is not related to the 2-step 
process that we use in YARN where each step is a different RPC request. Final 
success for the user is based on the completion of both steps. RetryCache can 
be used to return the same RPC response for Step 1 as many times as the client 
retries that same RPC request. Thats exactly what we want. The crucial piece is 
storing whats needed to re-populate the RetryCache upon failover. Here, we are 
piggy-backing on AppSubmissionContext storage just like HDFS piggybacks on the 
edit log entry.

I hope this make things clear. [~sureshms] Does this make sense?

Side Note: 
RetryCache also has an option to store a payload along with the response. This 
is useful when the response has a large internal object that is hard/expensive 
to re-create and can be fetched from the RetryCache directly.


 Handle client failover during 2 step client API's like app submission
 -

 Key: YARN-1410
 URL: https://issues.apache.org/jira/browse/YARN-1410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
 YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, 
 YARN-1410.5.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 App submission involves
 1) creating appId
 2) using that appId to submit an ApplicationSubmissionContext to the user.
 The client may have obtained an appId from an RM, the RM may have failed 
 over, and the client may submit the app to the new RM.
 Since the new RM has a different notion of cluster timestamp (used to create 
 app id) the new RM may reject the app submission resulting in unexpected 
 failure on the client side.
 The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1758) MiniYARNCluster broken post YARN-1666

2014-02-24 Thread Hitesh Shah (JIRA)
Hitesh Shah created YARN-1758:
-

 Summary: MiniYARNCluster broken post YARN-1666
 Key: YARN-1758
 URL: https://issues.apache.org/jira/browse/YARN-1758
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler

2014-02-24 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-1760:
--

 Summary: TestRMAdminService assumes the use of CapacityScheduler
 Key: YARN-1760
 URL: https://issues.apache.org/jira/browse/YARN-1760
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. 

{noformat}
java.lang.ClassCastException: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
cannot be cast to 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1375) RM logs get filled with scheduler monitor logs when we enable scheduler monitoring

2014-02-24 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1375:
--

 Description: 
When we enable scheduler monitor, it is filling the RM logs with the same queue 
states periodically. We can log only when any difference with the previous 
state instead of logging the same message. 

{code:xml}
2013-10-30 23:30:08,464 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156008464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:11,464 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156011464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:14,465 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156014465, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:17,466 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156017466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:20,466 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156020466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:23,467 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156023467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:26,468 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156026467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:29,468 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156029468, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:32,469 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156032469, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
{code}


  was:

When we enable scheduler monitor, it is filling the RM logs with the same queue 
states periodically. We can log only when any difference with the previous 
state instead of logging the same message. 

{code:xml}
2013-10-30 23:30:08,464 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156008464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:11,464 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156011464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:14,465 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156014465, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:17,466 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156017466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:20,466 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156020466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:23,467 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156023467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:26,468 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156026467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:29,468 INFO 

[jira] [Updated] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler

2014-02-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1760:
---

Priority: Trivial  (was: Major)

 TestRMAdminService assumes the use of CapacityScheduler
 ---

 Key: YARN-1760
 URL: https://issues.apache.org/jira/browse/YARN-1760
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
  Labels: test
 Attachments: yarn-1760-1.patch


 YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. 
 {noformat}
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
 cannot be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler

2014-02-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1760:
---

Attachment: yarn-1760-1.patch

Trivial patch - the test explicitly sets the scheduler to CS.

 TestRMAdminService assumes the use of CapacityScheduler
 ---

 Key: YARN-1760
 URL: https://issues.apache.org/jira/browse/YARN-1760
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: test
 Attachments: yarn-1760-1.patch


 YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. 
 {noformat}
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
 cannot be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1678) Fair scheduler gabs incessantly about reservations

2014-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910937#comment-13910937
 ] 

Hudson commented on YARN-1678:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5216 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5216/])
YARN-1678. Fair scheduler gabs incessantly about reservations (Sandy Ryza) 
(sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571468)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 Fair scheduler gabs incessantly about reservations
 --

 Key: YARN-1678
 URL: https://issues.apache.org/jira/browse/YARN-1678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.5.0

 Attachments: YARN-1678-1.patch, YARN-1678-1.patch, YARN-1678.patch


 Come on FS. We really don't need to know every time a node with a reservation 
 on it heartbeats.
 {code}
 2014-01-29 03:48:16,043 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Trying to fulfill reservation for application 
 appattempt_1390547864213_0347_01 on node: host: 
 a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 
 used=memory:8192, vCores:8
 2014-01-29 03:48:16,043 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: 
 Making reservation: node=a2330.halxg.cloudera.com 
 app_id=application_1390547864213_0347
 2014-01-29 03:48:16,043 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
  Application application_1390547864213_0347 reserved container 
 container_1390547864213_0347_01_03 on node host: 
 a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 
 used=memory:8192, vCores:8, currently has 6 at priority 0; 
 currentReservation 6144
 2014-01-29 03:48:16,044 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
 Updated reserved container container_1390547864213_0347_01_03 on node 
 host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, 
 vCores:8 used=memory:8192, vCores:8 for application 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@1cb01d20
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1686) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.

2014-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910936#comment-13910936
 ] 

Hudson commented on YARN-1686:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5216 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5216/])
YARN-1686. Fixed NodeManager to properly handle any errors during 
re-registration after a RESYNC and thus avoid hanging. Contributed by Rohith 
Sharma. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571474)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java


 NodeManager.resyncWithRM() does not handle exception which cause NodeManger 
 to Hang.
 

 Key: YARN-1686
 URL: https://issues.apache.org/jira/browse/YARN-1686
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.4.0

 Attachments: YARN-1686.1.patch, YARN-1686.2.patch, YARN-1686.3.patch


 During start of NodeManager,if registration with resourcemanager throw 
 exception then nodemager shutdown happens. 
 Consider case where NM-1 is registered with RM. RM issued Resync to NM. If 
 any exception thrown in resyncWithRM (starts new thread which does not 
 handle exception) during RESYNC evet, then this thread is lost. NodeManger 
 enters hanged state. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler

2014-02-24 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910944#comment-13910944
 ] 

Sandy Ryza commented on YARN-1760:
--

A couple nits:
* The same configuration is used for all the tests.  If the goal is to only use 
the capacity scheduler for a couple tests, then it should be instantiated in 
setup()
{code}
+configuration.set(YarnConfiguration.RM_SCHEDULER,
+
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler);
{code}
It looks like this goes over 80 characters.  Also, probably better to use 
CapacityScheduler.class.getName().

 TestRMAdminService assumes the use of CapacityScheduler
 ---

 Key: YARN-1760
 URL: https://issues.apache.org/jira/browse/YARN-1760
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
  Labels: test
 Attachments: yarn-1760-1.patch


 YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. 
 {noformat}
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
 cannot be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler

2014-02-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1760:
---

Attachment: yarn-1760-2.patch

Thanks Sandy. Here is an updated patch. 

 TestRMAdminService assumes the use of CapacityScheduler
 ---

 Key: YARN-1760
 URL: https://issues.apache.org/jira/browse/YARN-1760
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
  Labels: test
 Attachments: yarn-1760-1.patch, yarn-1760-2.patch


 YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. 
 {noformat}
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
 cannot be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler

2014-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910984#comment-13910984
 ] 

Hadoop QA commented on YARN-1760:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630822/yarn-1760-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3170//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3170//console

This message is automatically generated.

 TestRMAdminService assumes the use of CapacityScheduler
 ---

 Key: YARN-1760
 URL: https://issues.apache.org/jira/browse/YARN-1760
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
  Labels: test
 Attachments: yarn-1760-1.patch, yarn-1760-2.patch


 YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. 
 {noformat}
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
 cannot be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active

2014-02-24 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911009#comment-13911009
 ] 

Xuan Gong commented on YARN-1734:
-

bq. we will retry in the nonHA case? That also seems unwanted.

AdminService#transitionToActive/transitionToStandby can only be called when HA 
is enabled.

bq. One other comment related to the patch: The RefreshContext code is adding 
unnecessary complexity, let's just directly call each of the individual refresh 
methods?

Sure. Removed.

 RM should get the updated Configurations when it transits from Standby to 
 Active
 

 Key: YARN-1734
 URL: https://issues.apache.org/jira/browse/YARN-1734
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, 
 YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch


 Currently, we have ConfigurationProvider which can support 
 LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and 
 FileSystemBasedConfiguration is enabled, RM can not get the updated 
 Configurations when it transits from Standby to Active



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active

2014-02-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1734:


Attachment: YARN-1734.7.patch

 RM should get the updated Configurations when it transits from Standby to 
 Active
 

 Key: YARN-1734
 URL: https://issues.apache.org/jira/browse/YARN-1734
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, 
 YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch


 Currently, we have ConfigurationProvider which can support 
 LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and 
 FileSystemBasedConfiguration is enabled, RM can not get the updated 
 Configurations when it transits from Standby to Active



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby

2014-02-24 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-1761:
---

 Summary: RMAdminCLI should check whether HA is enabled before 
executes transitionToActive/transitionToStandby
 Key: YARN-1761
 URL: https://issues.apache.org/jira/browse/YARN-1761
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler

2014-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911034#comment-13911034
 ] 

Hadoop QA commented on YARN-1760:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630834/yarn-1760-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3171//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3171//console

This message is automatically generated.

 TestRMAdminService assumes the use of CapacityScheduler
 ---

 Key: YARN-1760
 URL: https://issues.apache.org/jira/browse/YARN-1760
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
  Labels: test
 Attachments: yarn-1760-1.patch, yarn-1760-2.patch


 YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. 
 {noformat}
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
 cannot be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler

2014-02-24 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911038#comment-13911038
 ] 

Sandy Ryza commented on YARN-1760:
--

Thanks. One more thing: Configuration.addDefaultResource is a static method 
that applies to all configurations.  So it should either go in setup or the 
non-static configuration.addResource should be used. 

 TestRMAdminService assumes the use of CapacityScheduler
 ---

 Key: YARN-1760
 URL: https://issues.apache.org/jira/browse/YARN-1760
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
  Labels: test
 Attachments: yarn-1760-1.patch, yarn-1760-2.patch


 YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. 
 {noformat}
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
 cannot be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active

2014-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911052#comment-13911052
 ] 

Hadoop QA commented on YARN-1734:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630839/YARN-1734.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3172//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3172//console

This message is automatically generated.

 RM should get the updated Configurations when it transits from Standby to 
 Active
 

 Key: YARN-1734
 URL: https://issues.apache.org/jira/browse/YARN-1734
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, 
 YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch


 Currently, we have ConfigurationProvider which can support 
 LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and 
 FileSystemBasedConfiguration is enabled, RM can not get the updated 
 Configurations when it transits from Standby to Active



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1363) Get / Cancel / Renew delegation token api should be non blocking

2014-02-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911074#comment-13911074
 ] 

Zhijie Shen commented on YARN-1363:
---

Talk to Jian offline. Canceled the patch, and seek for a light-weighted solution

 Get / Cancel / Renew delegation token api should be non blocking
 

 Key: YARN-1363
 URL: https://issues.apache.org/jira/browse/YARN-1363
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Zhijie Shen
 Attachments: YARN-1363.1.patch, YARN-1363.2.patch, YARN-1363.3.patch, 
 YARN-1363.4.patch, YARN-1363.5.patch, YARN-1363.6.patch, YARN-1363.7.patch


 Today GetDelgationToken, CancelDelegationToken and RenewDelegationToken are 
 all blocking apis.
 * As a part of these calls we try to update RMStateStore and that may slow it 
 down.
 * Now as we have limited number of client request handlers we may fill up 
 client handlers quickly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler

2014-02-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911086#comment-13911086
 ] 

Vinod Kumar Vavilapalli commented on YARN-1760:
---

If you agree, then we can close this as invalid..

 TestRMAdminService assumes the use of CapacityScheduler
 ---

 Key: YARN-1760
 URL: https://issues.apache.org/jira/browse/YARN-1760
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
  Labels: test
 Attachments: yarn-1760-1.patch, yarn-1760-2.patch


 YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. 
 {noformat}
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
 cannot be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler

2014-02-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911085#comment-13911085
 ] 

Vinod Kumar Vavilapalli commented on YARN-1760:
---

Wait, from what I understand, Xuan will have a similar FairScheduler test via 
YARN-1679. That test explicitly was for CapacityScheduler, we will very likely 
rename it at YARN-1679.

 TestRMAdminService assumes the use of CapacityScheduler
 ---

 Key: YARN-1760
 URL: https://issues.apache.org/jira/browse/YARN-1760
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
  Labels: test
 Attachments: yarn-1760-1.patch, yarn-1760-2.patch


 YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. 
 {noformat}
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
 cannot be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler

2014-02-24 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911094#comment-13911094
 ] 

Sandy Ryza commented on YARN-1760:
--

The goal here is just to make the use of the Capacity Scheduler in the existing 
tests explicit, so that they will pass on distros that set other schedulers as 
default.

 TestRMAdminService assumes the use of CapacityScheduler
 ---

 Key: YARN-1760
 URL: https://issues.apache.org/jira/browse/YARN-1760
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
  Labels: test
 Attachments: yarn-1760-1.patch, yarn-1760-2.patch


 YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. 
 {noformat}
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
 cannot be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler

2014-02-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911100#comment-13911100
 ] 

Vinod Kumar Vavilapalli commented on YARN-1760:
---

I have seen other JIRAs like this and I think I understand the goal. But I 
don't see this JIRA adding any value once YARN-1679 adds a fair-scheduler 
specific test in the same class.

 TestRMAdminService assumes the use of CapacityScheduler
 ---

 Key: YARN-1760
 URL: https://issues.apache.org/jira/browse/YARN-1760
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
  Labels: test
 Attachments: yarn-1760-1.patch, yarn-1760-2.patch


 YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. 
 {noformat}
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
 cannot be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler

2014-02-24 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911102#comment-13911102
 ] 

Sandy Ryza commented on YARN-1760:
--

I assume that YARN-1679 will have 
conf.setClass(YarnConfiguration.RM_SCHEDULER_CLASS, FairScheduler.class) in 
the FS-specific tests that it adds.  This JIRA adds the same to the CS-specific 
tests.  In some other JIRAs, I've tried to make it so that certain tests pass 
independent of whether the Fair or Capacity scheduler is used. But the goal 
with this patch is just to make the dependency of the existing tests on the 
Capacity Scheduler explicit so that it will override a non-CS default.

 TestRMAdminService assumes the use of CapacityScheduler
 ---

 Key: YARN-1760
 URL: https://issues.apache.org/jira/browse/YARN-1760
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
  Labels: test
 Attachments: yarn-1760-1.patch, yarn-1760-2.patch


 YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. 
 {noformat}
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
 cannot be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-02-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911105#comment-13911105
 ] 

Vinod Kumar Vavilapalli commented on YARN-1410:
---

Finally on to this.

There are three types of fail-over conditions w.r.t submission:
 # RM fails over after getApplicationID() and *before* submitApplication().
 # RM fail overs *during* the submitApplication call.
 # RM fails over *after* the submitApplication call and before the subsequent 
getApplicationReport().

This JIRA started to solve (1) above (as described in the description) and 
completely degenerated into (2).

In the interest of making progress, can we focus only on (1) here and track (2) 
and (3) separately? (1) itself has implications on the user APIs depending the 
implementation. I had looked at few of the very early patches and I believe 
Xuan was trying to solve those in this JIRA.

 Handle client failover during 2 step client API's like app submission
 -

 Key: YARN-1410
 URL: https://issues.apache.org/jira/browse/YARN-1410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
 YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, 
 YARN-1410.5.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 App submission involves
 1) creating appId
 2) using that appId to submit an ApplicationSubmissionContext to the user.
 The client may have obtained an appId from an RM, the RM may have failed 
 over, and the client may submit the app to the new RM.
 Since the new RM has a different notion of cluster timestamp (used to create 
 app id) the new RM may reject the app submission resulting in unexpected 
 failure on the client side.
 The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active

2014-02-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1390#comment-1390
 ] 

Vinod Kumar Vavilapalli commented on YARN-1734:
---

bq. AdminService#transitionToActive/transitionToStandby can only be called when 
HA is enabled.
Ah yes. That makes sense.

The latest patch looks good. Checking this in.

 RM should get the updated Configurations when it transits from Standby to 
 Active
 

 Key: YARN-1734
 URL: https://issues.apache.org/jira/browse/YARN-1734
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, 
 YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch


 Currently, we have ConfigurationProvider which can support 
 LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and 
 FileSystemBasedConfiguration is enabled, RM can not get the updated 
 Configurations when it transits from Standby to Active



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1561) Fix a generic type warning in FairScheduler

2014-02-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1397#comment-1397
 ] 

Junping Du commented on YARN-1561:
--

Thanks Chen for the patch! It looks good to me. Will commit it shortly.

 Fix a generic type warning in FairScheduler
 ---

 Key: YARN-1561
 URL: https://issues.apache.org/jira/browse/YARN-1561
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Junping Du
Assignee: Chen He
Priority: Minor
  Labels: newbie
 Fix For: 2.4.0

 Attachments: yarn-1561.patch


 The Comparator below should be specified with type:
 private Comparator nodeAvailableResourceComparator =
   new NodeAvailableResourceComparator(); 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-153) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS

2014-02-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911122#comment-13911122
 ] 

Junping Du commented on YARN-153:
-

Hi [~jaigak.song], any update on this JIRA? I am happened to have some 
experience on Cloud Foundry and have some thoughts too. Mind to have a 
discussion?

 PaaS on YARN: an YARN application to demonstrate that YARN can be used as a 
 PaaS
 

 Key: YARN-153
 URL: https://issues.apache.org/jira/browse/YARN-153
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Jacob Jaigak Song
Assignee: Jacob Jaigak Song
 Fix For: 2.4.0

 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, 
 MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, 
 MAPREDUCE4393.patch

   Original Estimate: 336h
  Time Spent: 336h
  Remaining Estimate: 0h

 This application is to demonstrate that YARN can be used for non-mapreduce 
 applications. As Hadoop has already been adopted and deployed widely and its 
 deployment in future will be highly increased, we thought that it's a good 
 potential to be used as PaaS.  
 I have implemented a proof of concept to demonstrate that YARN can be used as 
 a PaaS (Platform as a Service). I have done a gap analysis against VMware's 
 Cloud Foundry and tried to achieve as many PaaS functionalities as possible 
 on YARN.
 I'd like to check in this POC as a YARN example application.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1588) Rebind NM tokens for previous attempt's running containers to the new attempt

2014-02-24 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1588:
--

Attachment: YARN-1588.4.patch

 Rebind NM tokens for previous attempt's running containers to the new attempt
 -

 Key: YARN-1588
 URL: https://issues.apache.org/jira/browse/YARN-1588
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1588.1.patch, YARN-1588.1.patch, YARN-1588.2.patch, 
 YARN-1588.3.patch, YARN-1588.4.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler

2014-02-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911130#comment-13911130
 ] 

Vinod Kumar Vavilapalli commented on YARN-1760:
---

hm.. okay.

 TestRMAdminService assumes the use of CapacityScheduler
 ---

 Key: YARN-1760
 URL: https://issues.apache.org/jira/browse/YARN-1760
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
  Labels: test
 Attachments: yarn-1760-1.patch, yarn-1760-2.patch


 YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. 
 {noformat}
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
 cannot be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active

2014-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911135#comment-13911135
 ] 

Hudson commented on YARN-1734:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5218 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5218/])
YARN-1734. Fixed ResourceManager to update the configurations when it transits 
from standby to active mode so as to assimilate any changes that happened while 
it was in standby mode. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571539)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java


 RM should get the updated Configurations when it transits from Standby to 
 Active
 

 Key: YARN-1734
 URL: https://issues.apache.org/jira/browse/YARN-1734
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, 
 YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch


 Currently, we have ConfigurationProvider which can support 
 LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and 
 FileSystemBasedConfiguration is enabled, RM can not get the updated 
 Configurations when it transits from Standby to Active



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   >