[jira] [Commented] (YARN-7265) Hadoop Server Log Correlation

2018-02-08 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357531#comment-16357531
 ] 

Arun C Murthy commented on YARN-7265:
-

[~tanping] - Like [~jlowe] suggested, separating this from YARN makes sense. 
For e.g. Ambari has "log search", leveraging that would be a better option. 
Makes sense? Thanks.

> Hadoop Server Log Correlation  
> ---
>
> Key: YARN-7265
> URL: https://issues.apache.org/jira/browse/YARN-7265
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: log-aggregation
>Reporter: Tanping Wang
>Priority: Major
>
> Hadoop has many server logs, yarn tasks logs, node manager logs, HDFS logs..  
>  There are also a lot of different ways can be used to expose the logs, build 
> relationship horizontally to correlate the logs or search the logs by 
> keyword. There is a need for a default and yet convenient  logging analytics 
> mechanism in Hadoop itself that at least covers all the server logs of 
> Hadoop.  This log analytics system can correlate the Hadoop server logs by 
> grouping them by various dimensions including application ID,  task ID, job 
> ID or node ID etc.   The raw logs with correlation can be easily accessed by 
> the application developer or cluster administrator via web page for managing 
> and debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1126) Add validation of users input nodes-states options to nodes CLI

2015-02-06 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1126:

Attachment: YARN-905-addendum.patch

Uploading patch from YARN-905 on behalf of [~ywskycn].

 Add validation of users input nodes-states options to nodes CLI
 ---

 Key: YARN-1126
 URL: https://issues.apache.org/jira/browse/YARN-1126
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-905-addendum.patch


 Follow the discussion in YARN-905.
 (1) case-insensitive checks for all.
 (2) validation of users input, exit with non-zero code and print all valid 
 states when user gives an invalid state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-1126) Add validation of users input nodes-states options to nodes CLI

2015-02-06 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reopened YARN-1126:
-

I'm re-opening this to commit the addendum patch from YARN-905 
(https://issues.apache.org/jira/secure/attachment/12606009/YARN-905-addendum.patch)
 since the other jira already went out in 2.3.0.

Targeting this for 2.7.0.

 Add validation of users input nodes-states options to nodes CLI
 ---

 Key: YARN-1126
 URL: https://issues.apache.org/jira/browse/YARN-1126
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan

 Follow the discussion in YARN-905.
 (1) case-insensitive checks for all.
 (2) validation of users input, exit with non-zero code and print all valid 
 states when user gives an invalid state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-893) Capacity scheduler allocates vcores to containers but does not report it in headroom

2015-02-05 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308705#comment-14308705
 ] 

Arun C Murthy commented on YARN-893:


[~kj-ki]  [~ozawa] - the {{DefaultResourceCalculator}} is meant to not use 
vcores by design - if vcores is desired, one should use the 
{{DominantResourceCalculator}}. Makes sense?

 Capacity scheduler allocates vcores to containers but does not report it in 
 headroom
 

 Key: YARN-893
 URL: https://issues.apache.org/jira/browse/YARN-893
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta, 2.3.0
Reporter: Bikas Saha
Assignee: Kenji Kikushima
 Attachments: YARN-893-2.patch, YARN-893.patch


 In non-DRF mode, it reports 0 vcores in the headroom but it allocates 1 vcore 
 to containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1449) Protocol changes and implementations in NM side to support change container resource

2015-02-05 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308786#comment-14308786
 ] 

Arun C Murthy commented on YARN-1449:
-

Cleaning up stale PA patches.

 Protocol changes and implementations in NM side to support change container 
 resource
 

 Key: YARN-1449
 URL: https://issues.apache.org/jira/browse/YARN-1449
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Wangda Tan (No longer used)
Assignee: Wangda Tan (No longer used)
 Attachments: yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, 
 yarn-1449.5.patch


 As described in YARN-1197, we need add API/implementation changes,
 1) Add a changeContainersResources method in ContainerManagementProtocol
 2) Can get succeed/failed increased/decreased containers in response of 
 changeContainersResources
 3) Add a new decreased containers field in NodeStatus which can help NM 
 notify RM such changes
 4) Added changeContainersResources implementation in ContainerManagerImpl
 5) Added changes in ContainersMonitorImpl to support change resource limit of 
 containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2731) RegisterApplicationMasterResponsePBImpl: not properly initialized builder

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-2731:

Fix Version/s: (was: 2.6.0)
   2.7.0

 RegisterApplicationMasterResponsePBImpl: not properly initialized builder
 -

 Key: YARN-2731
 URL: https://issues.apache.org/jira/browse/YARN-2731
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.7.0

 Attachments: YARN-2731.patch


 If I am not mistaken in RegisterApplicationMasterResponsePBImpl we are 
 missing to initialize the builder in  setNMTokensFromPreviousAttempts(), and 
 we initialize the builder in the wrong place in:  setClientToAMTokenMasterKey



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1142:

Fix Version/s: (was: 2.6.0)
   2.7.0

 MiniYARNCluster web ui does not work properly
 -

 Key: YARN-1142
 URL: https://issues.apache.org/jira/browse/YARN-1142
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
 Fix For: 2.7.0


 When going to the RM http port, the NM web ui is displayed. It seems there is 
 a singleton somewhere that breaks things when RM  NMs run in the same 
 process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1514:

Fix Version/s: (was: 2.6.0)
   2.7.0

 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
 

 Key: YARN-1514
 URL: https://issues.apache.org/jira/browse/YARN-1514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.7.0

 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, 
 YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, 
 YARN-1514.wip-2.patch, YARN-1514.wip.patch


 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
 YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
 called when RM-HA cluster does failover. Therefore, its execution time 
 impacts failover time of RM-HA.
 We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
 as development tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1723) AMRMClientAsync missing blacklist addition and removal functionality

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1723:

Fix Version/s: (was: 2.6.0)
   2.7.0

 AMRMClientAsync missing blacklist addition and removal functionality
 

 Key: YARN-1723
 URL: https://issues.apache.org/jira/browse/YARN-1723
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Bikas Saha
 Fix For: 2.7.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2280) Resource manager web service fields are not accessible

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-2280:

Fix Version/s: (was: 2.6.0)
   2.7.0

 Resource manager web service fields are not accessible
 --

 Key: YARN-2280
 URL: https://issues.apache.org/jira/browse/YARN-2280
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0, 2.4.1
Reporter: Krisztian Horvath
Assignee: Krisztian Horvath
Priority: Trivial
 Fix For: 2.7.0

 Attachments: YARN-2280.patch


 Using the resource manager's rest api 
 (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some 
 rest call returns a class where the fields after the unmarshal cannot be 
 accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same 
 classes on client side these fields only accessible via reflection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-314:
---
Fix Version/s: (was: 2.6.0)
   2.7.0

 Schedulers should allow resource requests of different sizes at the same 
 priority and location
 --

 Key: YARN-314
 URL: https://issues.apache.org/jira/browse/YARN-314
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
 Fix For: 2.7.0

 Attachments: yarn-314-prelim.patch


 Currently, resource requests for the same container and locality are expected 
 to all be the same size.
 While it it doesn't look like it's needed for apps currently, and can be 
 circumvented by specifying different priorities if absolutely necessary, it 
 seems to me that the ability to request containers with different resource 
 requirements at the same priority level should be there for the future and 
 for completeness sake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1156) Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1156:

Fix Version/s: (was: 2.6.0)
   2.7.0

 Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values
 -

 Key: YARN-1156
 URL: https://issues.apache.org/jira/browse/YARN-1156
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
Priority: Minor
  Labels: metrics, newbie
 Fix For: 2.7.0

 Attachments: YARN-1156.1.patch, YARN-1156.2.patch


 AllocatedGB and AvailableGB metrics are now integer type. If there are four 
 times 500MB memory allocation to container, AllocatedGB is incremented four 
 times by {{(int)500/1024}}, which means 0. That is, the memory size allocated 
 is actually 2000MB, but the metrics shows 0GB. Let's use float type for these 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-745) Move UnmanagedAMLauncher to yarn client package

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-745:
---
Fix Version/s: (was: 2.6.0)
   2.7.0

 Move UnmanagedAMLauncher to yarn client package
 ---

 Key: YARN-745
 URL: https://issues.apache.org/jira/browse/YARN-745
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Fix For: 2.7.0


 Its currently sitting in yarn applications project which sounds wrong. client 
 project sounds better since it contains the utilities/libraries that clients 
 use to write and debug yarn applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1334) YARN should give more info on errors when running failed distributed shell command

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1334:

Fix Version/s: (was: 2.6.0)
   2.7.0

 YARN should give more info on errors when running failed distributed shell 
 command
 --

 Key: YARN-1334
 URL: https://issues.apache.org/jira/browse/YARN-1334
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-1334.1.patch


 Run incorrect command such as:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distributedshell jar -shell_command ./test1.sh -shell_script ./
 would show shell exit code exception with no useful message. It should print 
 out sysout/syserr of containers/AM of why it is failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-650) User guide for preemption

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-650:
---
Fix Version/s: (was: 2.6.0)
   2.7.0

 User guide for preemption
 -

 Key: YARN-650
 URL: https://issues.apache.org/jira/browse/YARN-650
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Chris Douglas
Priority: Minor
 Fix For: 2.7.0

 Attachments: Y650-0.patch


 YARN-45 added a protocol for the RM to ask back resources. The docs on 
 writing YARN applications should include a section on how to interpret this 
 message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-2113:

Fix Version/s: (was: 2.6.0)
   2.7.0

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2113
 URL: https://issues.apache.org/jira/browse/YARN-2113
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.7.0


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1621:

Fix Version/s: (was: 2.6.0)
   2.7.0

 Add CLI to list rows of task attempt ID, container ID, host of container, 
 state of container
 --

 Key: YARN-1621
 URL: https://issues.apache.org/jira/browse/YARN-1621
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Tassapol Athiapinya
 Fix For: 2.7.0

 Attachments: YARN-1621.1.patch


 As more applications are moved to YARN, we need generic CLI to list rows of 
 task attempt ID, container ID, host of container, state of container. Today 
 if YARN application running in a container does hang, there is no way to find 
 out more info because a user does not know where each attempt is running in.
 For each running application, it is useful to differentiate between 
 running/succeeded/failed/killed containers.
  
 {code:title=proposed yarn cli}
 $ yarn application -list-containers -applicationId appId [-containerState 
 state of container]
 where containerState is optional filter to list container in given state only.
 container state can be running/succeeded/killed/failed/all.
 A user can specify more than one container state at once e.g. KILLED,FAILED.
 task attempt ID container ID host of container state of container 
 {code}
 CLI should work with running application/completed application. If a 
 container runs many task attempts, all attempts should be shown. That will 
 likely be the case of Tez container-reuse application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2483) TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails due to incorrect AppAttempt state

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-2483:

Fix Version/s: (was: 2.6.0)
   2.7.0

 TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails due to 
 incorrect AppAttempt state
 

 Key: YARN-2483
 URL: https://issues.apache.org/jira/browse/YARN-2483
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
 Fix For: 2.7.0


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/665/console :
 {code}
 testShouldNotCountFailureToMaxAttemptRetry(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart)
   Time elapsed: 49.686 sec   FAILURE!
 java.lang.AssertionError: AppAttempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:84)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:417)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:582)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:589)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForNewAMToLaunchAndRegister(MockRM.java:182)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:402)
 {code}
 TestApplicationMasterLauncher#testallocateBeforeAMRegistration fails with 
 similar cause.
 These tests failed in build #664 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1234:

Fix Version/s: (was: 2.6.0)
   2.7.0

  Container localizer logs are not created in secured cluster
 

 Key: YARN-1234
 URL: https://issues.apache.org/jira/browse/YARN-1234
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Fix For: 2.7.0


 When we are running ContainerLocalizer in secured cluster we potentially are 
 not creating any log file to track log messages. This will be helpful in 
 potentially identifying ContainerLocalization issues in secured cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-308) Improve documentation about what asks means in AMRMProtocol

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-308:
---
Fix Version/s: (was: 2.6.0)
   2.7.0

 Improve documentation about what asks means in AMRMProtocol
 -

 Key: YARN-308
 URL: https://issues.apache.org/jira/browse/YARN-308
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, documentation, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.7.0

 Attachments: YARN-308.patch


 It's unclear to me from reading the javadoc exactly what asks means when 
 the AM sends a heartbeat to the RM.  Is the AM supposed to send a list of all 
 resources that it is waiting for?  Or just inform the RM about new ones that 
 it wants?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1477) Improve AM web UI to avoid confusion about AM restart

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1477:

Fix Version/s: (was: 2.6.0)
   2.7.0

 Improve AM web UI to avoid confusion about AM restart
 -

 Key: YARN-1477
 URL: https://issues.apache.org/jira/browse/YARN-1477
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Chen He
Assignee: Chen He
  Labels: features
 Fix For: 2.7.0


 Improve AM web UI,  Add submitTime field to the AM's web services REST API, 
 improve Elapsed:  row time computation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-965) NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-965:
---
Fix Version/s: (was: 2.6.0)
   2.7.0

 NodeManager Metrics containersRunning is not correct When localizing 
 container process is failed or killed
 --

 Key: YARN-965
 URL: https://issues.apache.org/jira/browse/YARN-965
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.4-alpha
 Environment: suse linux
Reporter: Li Yuan
 Fix For: 2.7.0


 When successfully launched a container, container state from LOCALIZED to 
 RUNNING, containersRunning ++. Container state from EXITED_WITH_FAILURE or 
 KILLING to DONE, containersRunning--. 
 However, state EXITED_WITH_FAILURE or KILLING could come from 
 LOCALIZING(LOCALIZED), not RUNNING, which caused containersRunningis less 
 than the actual number. Further more, Metrics is wrong, containersLaunched != 
 containersCompleted + containersFailed + containersKilled + containersRunning 
 + containersIniting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-153) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-153:
---
Fix Version/s: (was: 2.6.0)
   2.7.0

 PaaS on YARN: an YARN application to demonstrate that YARN can be used as a 
 PaaS
 

 Key: YARN-153
 URL: https://issues.apache.org/jira/browse/YARN-153
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Jacob Jaigak Song
Assignee: Jacob Jaigak Song
 Fix For: 2.7.0

 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, 
 MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, 
 MAPREDUCE4393.patch

   Original Estimate: 336h
  Time Spent: 336h
  Remaining Estimate: 0h

 This application is to demonstrate that YARN can be used for non-mapreduce 
 applications. As Hadoop has already been adopted and deployed widely and its 
 deployment in future will be highly increased, we thought that it's a good 
 potential to be used as PaaS.  
 I have implemented a proof of concept to demonstrate that YARN can be used as 
 a PaaS (Platform as a Service). I have done a gap analysis against VMware's 
 Cloud Foundry and tried to achieve as many PaaS functionalities as possible 
 on YARN.
 I'd like to check in this POC as a YARN example application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1147) Add end-to-end tests for HA

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1147:

Fix Version/s: (was: 2.6.0)
   2.7.0

 Add end-to-end tests for HA
 ---

 Key: YARN-1147
 URL: https://issues.apache.org/jira/browse/YARN-1147
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.7.0


 While individual sub-tasks add tests for the code they include, it will be 
 handy to write end-to-end tests for HA including some stress testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-160:
---
Fix Version/s: (was: 2.6.0)
   2.7.0

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, 
 apache-yarn-160.2.patch


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-113) WebAppProxyServlet must use SSLFactory for the HttpClient connections

2014-11-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-113:
---
Fix Version/s: (was: 2.6.0)
   2.7.0

 WebAppProxyServlet must use SSLFactory for the HttpClient connections
 -

 Key: YARN-113
 URL: https://issues.apache.org/jira/browse/YARN-113
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.7.0


 The HttpClient must be configured to use the SSLFactory when the web UIs are 
 over HTTPS, otherwise the proxy servlet fails to connect to the AM because of 
 unknown (self-signed) certificates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN

2014-11-20 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220047#comment-14220047
 ] 

Arun C Murthy commented on YARN-2139:
-

Sorry, been busy with 2.6.0 - just coming up for air.

What are we modeling with vdisk again? What is the metric? Is it directly the 
blkio parameter? If so, that is my biggest concern.

 [Umbrella] Support for Disk as a Resource in YARN 
 --

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
 Attachments: Disk_IO_Scheduling_Design_1.pdf, 
 Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, 
 YARN-2139-prototype.patch


 YARN should consider disk as another resource for (1) scheduling tasks on 
 nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2635) TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS

2014-11-13 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-2635:

Fix Version/s: (was: 2.7.0)
   2.6.0

 TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS
 --

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.6.0

 Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch, 
 yarn-2635-4.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2853) Killing app may hang while AM is unregistering

2014-11-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210143#comment-14210143
 ] 

Arun C Murthy commented on YARN-2853:
-

I've merged this back into branch-2.6 for hadoop-2.6.0-rc1.

 Killing app may hang while AM is unregistering
 --

 Key: YARN-2853
 URL: https://issues.apache.org/jira/browse/YARN-2853
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2853.1.patch, YARN-2853.1.patch, YARN-2853.2.patch, 
 YARN-2853.3.patch


 When killing an app, app first moves to KILLING state, If RMAppAttempt 
 receives the attempt_unregister event before attempt_kill event,  it'll 
 ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
 KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2853) Killing app may hang while AM is unregistering

2014-11-13 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-2853:

Fix Version/s: (was: 2.7.0)
   2.6.0

 Killing app may hang while AM is unregistering
 --

 Key: YARN-2853
 URL: https://issues.apache.org/jira/browse/YARN-2853
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2853.1.patch, YARN-2853.1.patch, YARN-2853.2.patch, 
 YARN-2853.3.patch


 When killing an app, app first moves to KILLING state, If RMAppAttempt 
 receives the attempt_unregister event before attempt_kill event,  it'll 
 ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
 KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS

2014-11-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210142#comment-14210142
 ] 

Arun C Murthy commented on YARN-2635:
-

I've merged this back into branch-2.6 since it is safe, and is causing 
conflicts with too many cherry-picks.

 TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS
 --

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.6.0

 Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch, 
 yarn-2635-4.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2841) RMProxy should retry EOFException

2014-11-12 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-2841:

Fix Version/s: (was: 2.7.0)
   2.6.0

 RMProxy should retry EOFException 
 --

 Key: YARN-2841
 URL: https://issues.apache.org/jira/browse/YARN-2841
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jian He
Assignee: Jian He
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-2841.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2843) NodeLabels manager should trim all inputs for hosts and labels

2014-11-12 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-2843:

Target Version/s: 2.6.0  (was: 2.7.0)

 NodeLabels manager should trim all inputs for hosts and labels
 --

 Key: YARN-2843
 URL: https://issues.apache.org/jira/browse/YARN-2843
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sushmitha Sreenivasan
Assignee: Wangda Tan
 Fix For: 2.6.0

 Attachments: YARN-2843-1.patch, YARN-2843-2.patch


 NodeLabels manager should trim all inputs for hosts and labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2841) RMProxy should retry EOFException

2014-11-12 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208441#comment-14208441
 ] 

Arun C Murthy commented on YARN-2841:
-

Merged this into branch-2.6 for hadoop-2.6.0-rc1.

 RMProxy should retry EOFException 
 --

 Key: YARN-2841
 URL: https://issues.apache.org/jira/browse/YARN-2841
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jian He
Assignee: Jian He
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-2841.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2843) NodeLabels manager should trim all inputs for hosts and labels

2014-11-12 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-2843:

Fix Version/s: (was: 2.7.0)
   2.6.0

 NodeLabels manager should trim all inputs for hosts and labels
 --

 Key: YARN-2843
 URL: https://issues.apache.org/jira/browse/YARN-2843
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sushmitha Sreenivasan
Assignee: Wangda Tan
 Fix For: 2.6.0

 Attachments: YARN-2843-1.patch, YARN-2843-2.patch


 NodeLabels manager should trim all inputs for hosts and labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2843) NodeLabels manager should trim all inputs for hosts and labels

2014-11-12 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208440#comment-14208440
 ] 

Arun C Murthy commented on YARN-2843:
-

Merged this into branch-2.6 for hadoop-2.6.0-rc1.

 NodeLabels manager should trim all inputs for hosts and labels
 --

 Key: YARN-2843
 URL: https://issues.apache.org/jira/browse/YARN-2843
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sushmitha Sreenivasan
Assignee: Wangda Tan
 Fix For: 2.6.0

 Attachments: YARN-2843-1.patch, YARN-2843-2.patch


 NodeLabels manager should trim all inputs for hosts and labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-11-12 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1964:

Assignee: Abin Shahab  (was: Ravi Prakash)

 Create Docker analog of the LinuxContainerExecutor in YARN
 --

 Key: YARN-1964
 URL: https://issues.apache.org/jira/browse/YARN-1964
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Abin Shahab
 Fix For: 2.6.0

 Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch


 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to *package* their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-11-12 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208615#comment-14208615
 ] 

Arun C Murthy commented on YARN-1964:
-

[~raviprak] I'm concerned this is coming in VERY late into 2.6... we've been in 
closedown mode for a while. The only mitigating factor is that this is fairly 
isolated since it's a new {{ContainerExecutor}}, and we can label it as an 
*alpha* feature. Any other objections to pulling this into 2.6?

 Create Docker analog of the LinuxContainerExecutor in YARN
 --

 Key: YARN-1964
 URL: https://issues.apache.org/jira/browse/YARN-1964
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Abin Shahab
 Fix For: 2.6.0

 Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch


 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to *package* their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode

2014-11-09 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14204002#comment-14204002
 ] 

Arun C Murthy commented on YARN-2830:
-

Is this ready to go?

 Add backwords compatible ContainerId.newInstance constructor for use within 
 Tez Local Mode
 --

 Key: YARN-2830
 URL: https://issues.apache.org/jira/browse/YARN-2830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Blocker
 Attachments: YARN-2830-v1.patch, YARN-2830-v2.patch, 
 YARN-2830-v3.patch, YARN-2830-v4.patch


 YARN-2229 modified the private unstable api for constructing. Tez uses this 
 api (shouldn't, but does) for use with Tez Local Mode. This causes a 
 NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose 
 we add the backwards compatible api since overflow is not a problem in tez 
 local mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2834) Resource manager crashed with Null Pointer Exception

2014-11-09 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-2834:

Priority: Blocker  (was: Critical)

 Resource manager crashed with Null Pointer Exception
 

 Key: YARN-2834
 URL: https://issues.apache.org/jira/browse/YARN-2834
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2834.1.patch


 Resource manager failed after restart. 
 {noformat}
 2014-11-09 04:12:53,013 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:initializeQueues(467)) - Initialized root queue root: 
 numChildQueue= 2, capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:0, vCores:0usedCapacity=0.0, numApps=0, numContainers=0
 2014-11-09 04:12:53,013 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:initializeQueueMappings(436)) - Initialized queue 
 mappings, override: false
 2014-11-09 04:12:53,013 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:initScheduler(305)) - Initialized CapacityScheduler 
 with calculator=class 
 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
 minimumAllocation=memory:256, vCores:1, maximumAllocation=memory:2048, 
 vCores:32, asynchronousScheduling=false, asyncScheduleInterval=5ms
 2014-11-09 04:12:53,015 INFO  service.AbstractService 
 (AbstractService.java:noteFailure(272)) - Service ResourceManager failed in 
 state STARTED; cause: java.lang.NullPointerException
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1041)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1005)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:821)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:843)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:826)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:701)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
 at 

[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-11-05 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198729#comment-14198729
 ] 

Arun C Murthy commented on YARN-2579:
-

Can we please get this in today? Tx.

 Both RM's state is Active , but 1 RM is not really active.
 --

 Key: YARN-2579
 URL: https://issues.apache.org/jira/browse/YARN-2579
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: YARN-2579-20141105.1.patch, YARN-2579-20141105.patch, 
 YARN-2579.patch, YARN-2579.patch


 I encountered a situaltion where both RM's web page was able to access and 
 its state displayed as Active. But One of the RM's ActiveServices were 
 stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2817) Disk as a resource in YARN

2014-11-05 Thread Arun C Murthy (JIRA)
Arun C Murthy created YARN-2817:
---

 Summary: Disk as a resource in YARN
 Key: YARN-2817
 URL: https://issues.apache.org/jira/browse/YARN-2817
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Arun C Murthy
Assignee: Arun C Murthy


As YARN continues to cover new ground in terms of new workloads, disk is 
becoming a very important resource to govern.

It might be prudent to start with something very simple - allow applications to 
request entire drives (e.g. 2 drives out of the 12 available on a node), we can 
then also add support for specific iops, bandwidth etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2817) Disk as a resource in YARN

2014-11-05 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199910#comment-14199910
 ] 

Arun C Murthy commented on YARN-2817:
-

Kafka on YARN (KAFKA-1754) would benefit enormously if it could reserve a 
certain number of drives on a node exclusively.

 Disk as a resource in YARN
 --

 Key: YARN-2817
 URL: https://issues.apache.org/jira/browse/YARN-2817
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 As YARN continues to cover new ground in terms of new workloads, disk is 
 becoming a very important resource to govern.
 It might be prudent to start with something very simple - allow applications 
 to request entire drives (e.g. 2 drives out of the 12 available on a node), 
 we can then also add support for specific iops, bandwidth etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2817) Disk drive as a resource in YARN

2014-11-05 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199926#comment-14199926
 ] 

Arun C Murthy commented on YARN-2817:
-

Sure, I'll link it to YARN-2139. Tx

 Disk drive as a resource in YARN
 

 Key: YARN-2817
 URL: https://issues.apache.org/jira/browse/YARN-2817
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 As YARN continues to cover new ground in terms of new workloads, disk is 
 becoming a very important resource to govern.
 It might be prudent to start with something very simple - allow applications 
 to request entire drives (e.g. 2 drives out of the 12 available on a node), 
 we can then also add support for specific iops, bandwidth etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2817) Disk drive as a resource in YARN

2014-11-05 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-2817:

Summary: Disk drive as a resource in YARN  (was: Disk as a resource in YARN)

 Disk drive as a resource in YARN
 

 Key: YARN-2817
 URL: https://issues.apache.org/jira/browse/YARN-2817
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 As YARN continues to cover new ground in terms of new workloads, disk is 
 becoming a very important resource to govern.
 It might be prudent to start with something very simple - allow applications 
 to request entire drives (e.g. 2 drives out of the 12 available on a node), 
 we can then also add support for specific iops, bandwidth etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-2817) Disk drive as a resource in YARN

2014-11-05 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199926#comment-14199926
 ] 

Arun C Murthy edited comment on YARN-2817 at 11/6/14 7:08 AM:
--

Sure, I'll link it to YARN-2139. We can use this jira to track supporting disk 
drive as a resource. Tx


was (Author: acmurthy):
Sure, I'll link it to YARN-2139. Tx

 Disk drive as a resource in YARN
 

 Key: YARN-2817
 URL: https://issues.apache.org/jira/browse/YARN-2817
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 As YARN continues to cover new ground in terms of new workloads, disk is 
 becoming a very important resource to govern.
 It might be prudent to start with something very simple - allow applications 
 to request entire drives (e.g. 2 drives out of the 12 available on a node), 
 we can then also add support for specific iops, bandwidth etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2139) Add support for disk IO isolation/scheduling for containers

2014-11-05 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199957#comment-14199957
 ] 

Arun C Murthy commented on YARN-2139:
-

[~ywskycn] - thanks for the design doc, it's well put together.

Some feedback:

# We shouldn't embed Linux or blkio specific semantics such as {{proportional 
weight division}} into YARN. We need something generic such as {{bandwidth}} 
which can be understood by users, supportable on heterogenous nodes in the same 
cluster and supportable on other platforms like Windows.
# Spindle locality or I/O parallelism is a real concern - we should probably 
support {{bandwidth}} and {{spindles}}.
# Spindle locality or I/O parallelism cannot be tied to HDFS. In fact, YARN 
should not have a dependency on HDFS at all (*smile*)! This is particularly 
important in light of developments like Kafka-on-YARN (KAFKA-1754) because 
people want to use YARN to deploy only Kafka  Storm etc. YARN-2817 helps in 
this regard.

Makes sense?

 Add support for disk IO isolation/scheduling for containers
 ---

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: Disk_IO_Scheduling_Design_1.pdf, 
 Disk_IO_Scheduling_Design_2.pdf






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2481) YARN should allow defining the location of java

2014-09-26 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14150049#comment-14150049
 ] 

Arun C Murthy commented on YARN-2481:
-

[~ashahab] YARN already allows the JAVA_HOME to be overridable... take a look 
at {{ApplicationConstants.Environment.JAVA_HOME}} and 
{{YarnConfiguration.DEFAULT_NM_ENV_WHITELIST}} for the code-path.

 YARN should allow defining the location of java
 ---

 Key: YARN-2481
 URL: https://issues.apache.org/jira/browse/YARN-2481
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Abin Shahab

 Yarn right now uses the location of the JAVA_HOME on the host to launch 
 containers. This does not work with Docker containers which have their own 
 filesystem namespace and OS. If the location of the Java binary of the 
 container to be launched is configurable, yarn can launch containers that 
 have java in a different location than the host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues

2014-08-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095825#comment-14095825
 ] 

Arun C Murthy commented on YARN-2411:
-

Looks good to me. Hopefully this should be a simple enhancement in CS.

 [Capacity Scheduler] support simple user and group mappings to queues
 -

 Key: YARN-2411
 URL: https://issues.apache.org/jira/browse/YARN-2411
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Ram Venkatesh

 YARN-2257 has a proposal to extend and share the queue placement rules for 
 the fair scheduler and the capacity scheduler. This is a good long term 
 solution to streamline queue placement of both schedulers but it has core 
 infra work that has to happen first and might require changes to current 
 features in all schedulers along with corresponding configuration changes, if 
 any. 
 I would like to propose a change with a smaller scope in the capacity 
 scheduler that addresses the core use cases for implicitly mapping jobs that 
 have the default queue or no queue specified to specific queues based on the 
 submitting user and user groups. It will be useful in a number of real-world 
 scenarios and can be migrated over to the unified scheme when YARN-2257 
 becomes available.
 The proposal is to add two new configuration options:
 yarn.scheduler.capacity.queue-mappings.enable 
 A boolean that controls if queue mappings are enabled, default is false.
 and,
 yarn.scheduler.capacity.queue-mappings
 A string that specifies a list of mappings in the following format:
 map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]*
 map_specifier := user (u) | group (g)
 source_attribute := user | group | %user
 queue_name := the name of the mapped queue | %user | %primary_group
 The mappings will be evaluated left to right, and the first valid mapping 
 will be used. If the mapped queue does not exist, or the current user does 
 not have permissions to submit jobs to the mapped queue, the submission will 
 fail.
 Example usages:
 1. user1 is mapped to queue1, group1 is mapped to queue2
 u:user1:queue1,g:group1:queue2
 2. To map users to queues with the same name as the user:
 u:%user:%user
 I am happy to volunteer to take this up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues

2014-08-13 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-2411:


Assignee: Ram Venkatesh

 [Capacity Scheduler] support simple user and group mappings to queues
 -

 Key: YARN-2411
 URL: https://issues.apache.org/jira/browse/YARN-2411
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Ram Venkatesh
Assignee: Ram Venkatesh

 YARN-2257 has a proposal to extend and share the queue placement rules for 
 the fair scheduler and the capacity scheduler. This is a good long term 
 solution to streamline queue placement of both schedulers but it has core 
 infra work that has to happen first and might require changes to current 
 features in all schedulers along with corresponding configuration changes, if 
 any. 
 I would like to propose a change with a smaller scope in the capacity 
 scheduler that addresses the core use cases for implicitly mapping jobs that 
 have the default queue or no queue specified to specific queues based on the 
 submitting user and user groups. It will be useful in a number of real-world 
 scenarios and can be migrated over to the unified scheme when YARN-2257 
 becomes available.
 The proposal is to add two new configuration options:
 yarn.scheduler.capacity.queue-mappings.enable 
 A boolean that controls if queue mappings are enabled, default is false.
 and,
 yarn.scheduler.capacity.queue-mappings
 A string that specifies a list of mappings in the following format:
 map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]*
 map_specifier := user (u) | group (g)
 source_attribute := user | group | %user
 queue_name := the name of the mapped queue | %user | %primary_group
 The mappings will be evaluated left to right, and the first valid mapping 
 will be used. If the mapped queue does not exist, or the current user does 
 not have permissions to submit jobs to the mapped queue, the submission will 
 fail.
 Example usages:
 1. user1 is mapped to queue1, group1 is mapped to queue2
 u:user1:queue1,g:group1:queue2
 2. To map users to queues with the same name as the user:
 u:%user:%user
 I am happy to volunteer to take this up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1488) Allow containers to delegate resources to another container

2014-08-06 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned YARN-1488:
---

Assignee: Arun C Murthy

 Allow containers to delegate resources to another container
 ---

 Key: YARN-1488
 URL: https://issues.apache.org/jira/browse/YARN-1488
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We should allow containers to delegate resources to another container. This 
 would allow external frameworks to share not just YARN's resource-management 
 capabilities but also it's workload-management capabilities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1488) Allow containers to delegate resources to another container

2014-08-06 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088309#comment-14088309
 ] 

Arun C Murthy commented on YARN-1488:
-

I have an early patch I'll share shortly, this feature ask is coming up in a 
lot of places and has generated lots of interest.

 Allow containers to delegate resources to another container
 ---

 Key: YARN-1488
 URL: https://issues.apache.org/jira/browse/YARN-1488
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We should allow containers to delegate resources to another container. This 
 would allow external frameworks to share not just YARN's resource-management 
 capabilities but also it's workload-management capabilities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1963) Support priorities across applications within the same queue

2014-04-30 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1963:


Assignee: Sunil G  (was: Arun C Murthy)

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G

 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2014-04-30 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986163#comment-13986163
 ] 

Arun C Murthy commented on YARN-1963:
-

[~sunilg] thanks for taking this up!

As [~vinodkv] mentioned; a short writeup will help - look forward to helping 
get this in; thanks again!

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G

 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1696) Document RM HA

2014-04-23 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13978901#comment-13978901
 ] 

Arun C Murthy commented on YARN-1696:
-

[~kasha] do you think we can get this in for 2.4.1? Tx.

 Document RM HA
 --

 Key: YARN-1696
 URL: https://issues.apache.org/jira/browse/YARN-1696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: YARN-1696.2.patch, yarn-1696-1.patch


 Add documentation for RM HA. Marking this a blocker for 2.4 as this is 
 required to call RM HA Stable and ready for public consumption. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1964) Support Docker containers in YARN

2014-04-22 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1964:


Assignee: (was: Arun C Murthy)

 Support Docker containers in YARN
 -

 Key: YARN-1964
 URL: https://issues.apache.org/jira/browse/YARN-1964
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy

 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to *package* their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1964) Support Docker containers in YARN

2014-04-22 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1964:


Assignee: Abin Shahab

 Support Docker containers in YARN
 -

 Key: YARN-1964
 URL: https://issues.apache.org/jira/browse/YARN-1964
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Abin Shahab

 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to *package* their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1964) Support Docker containers in YARN

2014-04-22 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977489#comment-13977489
 ] 

Arun C Murthy commented on YARN-1964:
-

[~ashahab] - That's great to hear, awesome! Thanks for taking this up!

 Support Docker containers in YARN
 -

 Key: YARN-1964
 URL: https://issues.apache.org/jira/browse/YARN-1964
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Abin Shahab

 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to *package* their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1963) Support priorities across applications within the same queue

2014-04-19 Thread Arun C Murthy (JIRA)
Arun C Murthy created YARN-1963:
---

 Summary: Support priorities across applications within the same 
queue 
 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Arun C Murthy


It will be very useful to support priorities among applications within the same 
queue, particularly in production scenarios. It allows for finer-grained 
controls without having to force admins to create a multitude of queues, plus 
allows existing applications to continue using existing queues which are 
usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1964) Support Docker containers in YARN

2014-04-19 Thread Arun C Murthy (JIRA)
Arun C Murthy created YARN-1964:
---

 Summary: Support Docker containers in YARN
 Key: YARN-1964
 URL: https://issues.apache.org/jira/browse/YARN-1964
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Arun C Murthy


Docker (https://www.docker.io/) is, increasingly, a very popular container 
technology.

In context of YARN, the support for Docker will provide a very elegant solution 
to allow applications to *package* their software into a Docker container 
(entire Linux file system incl. custom versions of perl, python etc.) and use 
it as a blueprint to launch all their YARN containers with requisite software 
environment. This provides both consistency (all YARN containers will have the 
same software environment) and isolation (no interference with whatever is 
installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1932) Javascript injection on the job status page

2014-04-14 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1932:


Priority: Blocker  (was: Critical)

 Javascript injection on the job status page
 ---

 Key: YARN-1932
 URL: https://issues.apache.org/jira/browse/YARN-1932
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.9, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
Priority: Blocker
 Attachments: YARN-1932.patch


 Scripts can be injected into the job status page as the diagnostics field is
 not sanitized. Whatever string you set there will show up to the jobs page as 
 it is ... ie. if you put any script commands, they will be executed in the 
 browser of the user who is opening the page.
 We need escaping the diagnostic string in order to not run the scripts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1701) Improve default paths of timeline store and generic history store

2014-04-13 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1701:


Priority: Blocker  (was: Major)

 Improve default paths of timeline store and generic history store
 -

 Key: YARN-1701
 URL: https://issues.apache.org/jira/browse/YARN-1701
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: YARN-1701.v01.patch, YARN-1701.v02.patch


 When I enable AHS via yarn.ahs.enabled, the app history is still not visible 
 in AHS webUI. This is due to NullApplicationHistoryStore as 
 yarn.resourcemanager.history-writer.class. It would be good to have just one 
 key to enable basic functionality.
 yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is 
 local file system location. However, FileSystemApplicationHistoryStore uses 
 DFS by default.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1701) Improve default paths of timeline store and generic history store

2014-04-13 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1701:


Affects Version/s: (was: 2.4.1)
   2.4.0

 Improve default paths of timeline store and generic history store
 -

 Key: YARN-1701
 URL: https://issues.apache.org/jira/browse/YARN-1701
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: YARN-1701.v01.patch, YARN-1701.v02.patch


 When I enable AHS via yarn.ahs.enabled, the app history is still not visible 
 in AHS webUI. This is due to NullApplicationHistoryStore as 
 yarn.resourcemanager.history-writer.class. It would be good to have just one 
 key to enable basic functionality.
 yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is 
 local file system location. However, FileSystemApplicationHistoryStore uses 
 DFS by default.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1701) Improve default paths of timeline store and generic history store

2014-04-13 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1701:


Target Version/s: 2.4.1

 Improve default paths of timeline store and generic history store
 -

 Key: YARN-1701
 URL: https://issues.apache.org/jira/browse/YARN-1701
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: YARN-1701.v01.patch, YARN-1701.v02.patch


 When I enable AHS via yarn.ahs.enabled, the app history is still not visible 
 in AHS webUI. This is due to NullApplicationHistoryStore as 
 yarn.resourcemanager.history-writer.class. It would be good to have just one 
 key to enable basic functionality.
 yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is 
 local file system location. However, FileSystemApplicationHistoryStore uses 
 DFS by default.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1935) Security for ATS

2014-04-13 Thread Arun C Murthy (JIRA)
Arun C Murthy created YARN-1935:
---

 Summary: Security for ATS
 Key: YARN-1935
 URL: https://issues.apache.org/jira/browse/YARN-1935
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy


Jira to track work to secure the ATS



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-322) Add cpu information to queue metrics

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-322:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 Add cpu information to queue metrics
 

 Key: YARN-322
 URL: https://issues.apache.org/jira/browse/YARN-322
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, scheduler
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 2.5.0


 Post YARN-2 we need to add cpu information to queue metrics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1334) YARN should give more info on errors when running failed distributed shell command

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1334:


Fix Version/s: (was: 2.4.0)
   2.5.0

 YARN should give more info on errors when running failed distributed shell 
 command
 --

 Key: YARN-1334
 URL: https://issues.apache.org/jira/browse/YARN-1334
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.5.0

 Attachments: YARN-1334.1.patch


 Run incorrect command such as:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distributedshell jar -shell_command ./test1.sh -shell_script ./
 would show shell exit code exception with no useful message. It should print 
 out sysout/syserr of containers/AM of why it is failing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-650) User guide for preemption

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-650:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 User guide for preemption
 -

 Key: YARN-650
 URL: https://issues.apache.org/jira/browse/YARN-650
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Chris Douglas
Priority: Minor
 Fix For: 2.5.0

 Attachments: Y650-0.patch


 YARN-45 added a protocol for the RM to ask back resources. The docs on 
 writing YARN applications should include a section on how to interpret this 
 message.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1514:


Fix Version/s: (was: 2.4.0)
   2.5.0

 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
 

 Key: YARN-1514
 URL: https://issues.apache.org/jira/browse/YARN-1514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.5.0


 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
 YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
 called when RM-HA cluster does failover. Therefore, its execution time 
 impacts failover time of RM-HA.
 We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
 as development tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1722) AMRMProtocol should have a way of getting all the nodes in the cluster

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1722:


Fix Version/s: (was: 2.4.0)
   2.5.0

 AMRMProtocol should have a way of getting all the nodes in the cluster
 --

 Key: YARN-1722
 URL: https://issues.apache.org/jira/browse/YARN-1722
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Bikas Saha
 Fix For: 2.5.0


 There is no way for an AM to find out the names of all the nodes in the 
 cluster via the AMRMProtocol. An AM can only at best ask for containers at * 
 location. The only way to get that information is via the ClientRMProtocol 
 but that is secured by Kerberos or RMDelegationToken while the AM has an 
 AMRMToken. This is a pretty important piece of missing functionality. There 
 are other jiras opened about getting cluster topology etc. but they havent 
 been addressed due to a clear definition of cluster topology perhaps. Adding 
 a means to at least get the node information would be a good first step.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-153) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-153:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 PaaS on YARN: an YARN application to demonstrate that YARN can be used as a 
 PaaS
 

 Key: YARN-153
 URL: https://issues.apache.org/jira/browse/YARN-153
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Jacob Jaigak Song
Assignee: Jacob Jaigak Song
 Fix For: 2.5.0

 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, 
 MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, 
 MAPREDUCE4393.patch

   Original Estimate: 336h
  Time Spent: 336h
  Remaining Estimate: 0h

 This application is to demonstrate that YARN can be used for non-mapreduce 
 applications. As Hadoop has already been adopted and deployed widely and its 
 deployment in future will be highly increased, we thought that it's a good 
 potential to be used as PaaS.  
 I have implemented a proof of concept to demonstrate that YARN can be used as 
 a PaaS (Platform as a Service). I have done a gap analysis against VMware's 
 Cloud Foundry and tried to achieve as many PaaS functionalities as possible 
 on YARN.
 I'd like to check in this POC as a YARN example application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1234:


Fix Version/s: (was: 2.4.0)
   2.5.0

  Container localizer logs are not created in secured cluster
 

 Key: YARN-1234
 URL: https://issues.apache.org/jira/browse/YARN-1234
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Fix For: 2.5.0


 When we are running ContainerLocalizer in secured cluster we potentially are 
 not creating any log file to track log messages. This will be helpful in 
 potentially identifying ContainerLocalization issues in secured cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-314:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 Schedulers should allow resource requests of different sizes at the same 
 priority and location
 --

 Key: YARN-314
 URL: https://issues.apache.org/jira/browse/YARN-314
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.5.0


 Currently, resource requests for the same container and locality are expected 
 to all be the same size.
 While it it doesn't look like it's needed for apps currently, and can be 
 circumvented by specifying different priorities if absolutely necessary, it 
 seems to me that the ability to request containers with different resource 
 requirements at the same priority level should be there for the future and 
 for completeness sake.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-113) WebAppProxyServlet must use SSLFactory for the HttpClient connections

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-113:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 WebAppProxyServlet must use SSLFactory for the HttpClient connections
 -

 Key: YARN-113
 URL: https://issues.apache.org/jira/browse/YARN-113
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.5.0


 The HttpClient must be configured to use the SSLFactory when the web UIs are 
 over HTTPS, otherwise the proxy servlet fails to connect to the AM because of 
 unknown (self-signed) certificates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1723) AMRMClientAsync missing blacklist addition and removal functionality

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1723:


Fix Version/s: (was: 2.4.0)
   2.5.0

 AMRMClientAsync missing blacklist addition and removal functionality
 

 Key: YARN-1723
 URL: https://issues.apache.org/jira/browse/YARN-1723
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Bikas Saha
 Fix For: 2.5.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1477) No Submit time on AM web pages

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1477:


Fix Version/s: (was: 2.4.0)
   2.5.0

 No Submit time on AM web pages
 --

 Key: YARN-1477
 URL: https://issues.apache.org/jira/browse/YARN-1477
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Chen He
Assignee: Chen He
  Labels: features
 Fix For: 2.5.0


 Similar to MAPREDUCE-5052, This is a fix on AM side. Add submitTime field to 
 the AM's web services REST API



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1147) Add end-to-end tests for HA

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1147:


Fix Version/s: (was: 2.4.0)
   2.5.0

 Add end-to-end tests for HA
 ---

 Key: YARN-1147
 URL: https://issues.apache.org/jira/browse/YARN-1147
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.5.0


 While individual sub-tasks add tests for the code they include, it will be 
 handy to write end-to-end tests for HA including some stress testing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1156) Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1156:


Fix Version/s: (was: 2.4.0)
   2.5.0

 Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values
 -

 Key: YARN-1156
 URL: https://issues.apache.org/jira/browse/YARN-1156
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
Priority: Minor
  Labels: metrics, newbie
 Fix For: 2.5.0

 Attachments: YARN-1156.1.patch


 AllocatedGB and AvailableGB metrics are now integer type. If there are four 
 times 500MB memory allocation to container, AllocatedGB is incremented four 
 times by {{(int)500/1024}}, which means 0. That is, the memory size allocated 
 is actually 2000MB, but the metrics shows 0GB. Let's use float type for these 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-965) NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-965:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 NodeManager Metrics containersRunning is not correct When localizing 
 container process is failed or killed
 --

 Key: YARN-965
 URL: https://issues.apache.org/jira/browse/YARN-965
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.4-alpha
 Environment: suse linux
Reporter: Li Yuan
 Fix For: 2.5.0


 When successfully launched a container, container state from LOCALIZED to 
 RUNNING, containersRunning ++. Container state from EXITED_WITH_FAILURE or 
 KILLING to DONE, containersRunning--. 
 However, state EXITED_WITH_FAILURE or KILLING could come from 
 LOCALIZING(LOCALIZED), not RUNNING, which caused containersRunningis less 
 than the actual number. Further more, Metrics is wrong, containersLaunched != 
 containersCompleted + containersFailed + containersKilled + containersRunning 
 + containersIniting



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1142:


Fix Version/s: (was: 2.4.0)
   2.5.0

 MiniYARNCluster web ui does not work properly
 -

 Key: YARN-1142
 URL: https://issues.apache.org/jira/browse/YARN-1142
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
 Fix For: 2.5.0


 When going to the RM http port, the NM web ui is displayed. It seems there is 
 a singleton somewhere that breaks things when RM  NMs run in the same 
 process.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-308) Improve documentation about what asks means in AMRMProtocol

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-308:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 Improve documentation about what asks means in AMRMProtocol
 -

 Key: YARN-308
 URL: https://issues.apache.org/jira/browse/YARN-308
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, documentation, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.5.0

 Attachments: YARN-308.patch


 It's unclear to me from reading the javadoc exactly what asks means when 
 the AM sends a heartbeat to the RM.  Is the AM supposed to send a list of all 
 resources that it is waiting for?  Or just inform the RM about new ones that 
 it wants?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-614:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 Retry attempts automatically for hardware failures or YARN issues and set 
 default app retries to 1
 --

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Chris Riccomini
 Fix For: 2.5.0

 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
 YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1621) Add CLI to list states of yarn container-IDs/hosts

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1621:


Fix Version/s: (was: 2.4.0)
   2.5.0

 Add CLI to list states of yarn container-IDs/hosts
 --

 Key: YARN-1621
 URL: https://issues.apache.org/jira/browse/YARN-1621
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Tassapol Athiapinya
 Fix For: 2.5.0


 As more applications are moved to YARN, we need generic CLI to list states of 
 yarn containers and their hosts. Today if YARN application running in a 
 container does hang, there is no way other than to manually kill its process.
 For each running application, it is useful to differentiate between 
 running/succeeded/failed/killed containers. 
 {code:title=proposed yarn cli}
 $ yarn application -list-containers appId status
 where status is one of running/succeeded/killed/failed/all
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1327) Fix nodemgr native compilation problems on FreeBSD9

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1327:


Fix Version/s: (was: 2.4.0)
   2.5.0

 Fix nodemgr native compilation problems on FreeBSD9
 ---

 Key: YARN-1327
 URL: https://issues.apache.org/jira/browse/YARN-1327
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Radim Kolar
Assignee: Radim Kolar
 Fix For: 3.0.0, 2.5.0

 Attachments: nodemgr-portability.txt


 There are several portability problems preventing from compiling native 
 component on freebsd.
 1. libgen.h is not included. correct function prototype is there but linux 
 glibc has workaround to define it for user if libgen.h is not directly 
 included. Include this file directly.
 2. query max size of login name using sysconf. it follows same code style 
 like rest of code using sysconf too.
 3. cgroups are linux only feature, make conditional compile and return error 
 if mount_cgroup is attempted on non linux OS
 4. do not use posix function setpgrp() since it clashes with same function 
 from BSD 4.2, use equivalent function. After inspecting glibc sources its 
 just shortcut to setpgid(0,0)
 These changes makes it compile on both linux and freebsd.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-160:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
 Fix For: 2.5.0


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-745) Move UnmanagedAMLauncher to yarn client package

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-745:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 Move UnmanagedAMLauncher to yarn client package
 ---

 Key: YARN-745
 URL: https://issues.apache.org/jira/browse/YARN-745
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Fix For: 2.5.0


 Its currently sitting in yarn applications project which sounds wrong. client 
 project sounds better since it contains the utilities/libraries that clients 
 use to write and debug yarn applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-04-10 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965515#comment-13965515
 ] 

Arun C Murthy commented on YARN-1769:
-

Sorry guys, been slammed. I'll take a look at this presently. Tx.

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active

2014-04-04 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960929#comment-13960929
 ] 

Arun C Murthy commented on YARN-1878:
-

[~xgong] is this ready to go? Let's get this into 2.4.1. Tx.

 Yarn standby RM taking long to transition to active
 ---

 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong
 Attachments: YARN-1878.1.patch


 In our HA tests we are noticing that some times it can take upto 10s for the 
 standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1878) Yarn standby RM taking long to transition to active

2014-04-04 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1878:


Target Version/s: 2.4.1

 Yarn standby RM taking long to transition to active
 ---

 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong
 Attachments: YARN-1878.1.patch


 In our HA tests we are noticing that some times it can take upto 10s for the 
 standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1878) Yarn standby RM taking long to transition to active

2014-04-04 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1878:


Priority: Blocker  (was: Major)

 Yarn standby RM taking long to transition to active
 ---

 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1878.1.patch


 In our HA tests we are noticing that some times it can take upto 10s for the 
 standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1696) Document RM HA

2014-03-31 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1696:


Target Version/s: 2.4.1  (was: 2.4.0)

 Document RM HA
 --

 Key: YARN-1696
 URL: https://issues.apache.org/jira/browse/YARN-1696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: YARN-1696.2.patch, yarn-1696-1.patch


 Add documentation for RM HA. Marking this a blocker for 2.4 as this is 
 required to call RM HA Stable and ready for public consumption. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1696) Document RM HA

2014-03-31 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13955015#comment-13955015
 ] 

Arun C Murthy commented on YARN-1696:
-

[~kasha] - I'm almost done with rc0, moving this to 2.4.1 - if we need to spin 
rc1 we can get this in. Else, we can manually put this doc on the site when 
ready for 2.4.0. Thanks.

 Document RM HA
 --

 Key: YARN-1696
 URL: https://issues.apache.org/jira/browse/YARN-1696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: YARN-1696.2.patch, yarn-1696-1.patch


 Add documentation for RM HA. Marking this a blocker for 2.4 as this is 
 required to call RM HA Stable and ready for public consumption. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1696) Document RM HA

2014-03-29 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954411#comment-13954411
 ] 

Arun C Murthy commented on YARN-1696:
-

[~kasha] - You think you can update the doc w/ the feedback quick-ish? Thanks.

 Document RM HA
 --

 Key: YARN-1696
 URL: https://issues.apache.org/jira/browse/YARN-1696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: YARN-1696.2.patch, yarn-1696-1.patch


 Add documentation for RM HA. Marking this a blocker for 2.4 as this is 
 required to call RM HA Stable and ready for public consumption. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1696) Document RM HA

2014-03-21 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943822#comment-13943822
 ] 

Arun C Murthy commented on YARN-1696:
-

Thanks [~kkambatl]. In the worst case we can put your existing docs on jira if 
we can't get it in early next week and this is the only one blocking 2.4.

 Document RM HA
 --

 Key: YARN-1696
 URL: https://issues.apache.org/jira/browse/YARN-1696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker

 Add documentation for RM HA. Marking this a blocker for 2.4 as this is 
 required to call RM HA Stable and ready for public consumption. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-1696) Document RM HA

2014-03-21 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943822#comment-13943822
 ] 

Arun C Murthy edited comment on YARN-1696 at 3/22/14 1:15 AM:
--

Thanks [~kkambatl]. In the worst case we can put your existing docs on the wiki 
if we can't get it in early next week and this is the only one blocking 2.4.


was (Author: acmurthy):
Thanks [~kkambatl]. In the worst case we can put your existing docs on jira if 
we can't get it in early next week and this is the only one blocking 2.4.

 Document RM HA
 --

 Key: YARN-1696
 URL: https://issues.apache.org/jira/browse/YARN-1696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker

 Add documentation for RM HA. Marking this a blocker for 2.4 as this is 
 required to call RM HA Stable and ready for public consumption. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1696) Document RM HA

2014-03-20 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942098#comment-13942098
 ] 

Arun C Murthy commented on YARN-1696:
-

[~kkambatl] Any update on this? Thanks.

 Document RM HA
 --

 Key: YARN-1696
 URL: https://issues.apache.org/jira/browse/YARN-1696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker

 Add documentation for RM HA. Marking this a blocker for 2.4 as this is 
 required to call RM HA Stable and ready for public consumption. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2014-03-20 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942106#comment-13942106
 ] 

Arun C Murthy commented on YARN-1051:
-

Thanks [~subru], I'll take a look at the update.

One thing I've mentioned to [~curino] offline is that I think we are better of 
relying on enhancing/reducing *priorities* for applications to effect 
reservations rather than relying on adding/removing queues.

Priorities within the same queue is an often requested feature anyway - that 
way we can solve multiple goals (operational-feature/reservations) with the 
same underlying mechanism i.e. priorities.

 YARN Admission Control/Planner: enhancing the resource allocation model with 
 time.
 --

 Key: YARN-1051
 URL: https://issues.apache.org/jira/browse/YARN-1051
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: YARN-1051-design.pdf, curino_MSR-TR-2013-108.pdf, 
 techreport.pdf


 In this umbrella JIRA we propose to extend the YARN RM to handle time 
 explicitly, allowing users to reserve capacity over time. This is an 
 important step towards SLAs, long-running services, workflows, and helps for 
 gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-03-20 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942119#comment-13942119
 ] 

Arun C Murthy commented on YARN-1707:
-

I'm very supportive of features like removing queues (adding is already 
supported).

However, as I just commented on YARN-1051, I think we are better of relying on 
enhancing/reducing priorities rather than adding/removing queues.

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino

 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2014-03-20 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942122#comment-13942122
 ] 

Arun C Murthy commented on YARN-1051:
-

More color on why I prefer priorities for reservations rather than 
adding/removing queues...

In vast majority of deployments, queues are an organizational/economic concept 
(e.g. per-department queues are very common) and are queues (hierarchy, names 
etc.) are quite stable and well recognized and part of the institutional memory.

If we rely on adding/removing queues to provide reservations, I'm concerned it 
will cause some confusion among both admins and users. For e.g. a user/admin 
trying to debug his application will be quite challenged to figure 
demand/supply of resources when he has to go back in time to reconstruct a 
programmatically generated queue hierarchy, particularly after it's long gone.

Priorities, OTOH, is quite a familiar concept to admins (think unix 'nice'); 
and more importantly is a natural fit to the problem at hand i.e. temporally 
increase/decrease the priority of the application based on it's reservation at 
a point in time.

Furthermore, as I said previously, priorities are an often requested feature - 
especially by admins.

 YARN Admission Control/Planner: enhancing the resource allocation model with 
 time.
 --

 Key: YARN-1051
 URL: https://issues.apache.org/jira/browse/YARN-1051
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: YARN-1051-design.pdf, curino_MSR-TR-2013-108.pdf, 
 techreport.pdf


 In this umbrella JIRA we propose to extend the YARN RM to handle time 
 explicitly, allowing users to reserve capacity over time. This is an 
 important step towards SLAs, long-running services, workflows, and helps for 
 gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2014-03-20 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942122#comment-13942122
 ] 

Arun C Murthy edited comment on YARN-1051 at 3/20/14 6:50 PM:
--

More color on why I prefer priorities for reservations rather than 
adding/removing queues...

In vast majority of deployments, queues are an organizational/economic concept 
(e.g. per-department queues are very common) and are queues (hierarchy, names 
etc.) are quite stable and well recognized to point of being part of the 
institutional memory.

If we rely on adding/removing queues to provide reservations, I'm concerned it 
will cause some confusion among both admins and users. For e.g. a user/admin 
trying to debug his application will be quite challenged to figure 
demand/supply of resources when he has to go back in time to reconstruct a 
programmatically generated queue hierarchy, particularly after it's long gone.

Priorities, OTOH, is quite a familiar concept to admins (think unix 'nice'); 
and more importantly is a natural fit to the problem at hand i.e. temporally 
increase/decrease the priority of the application based on it's reservation at 
a point in time.

Furthermore, as I said previously, priorities are an often requested feature - 
especially by admins.


was (Author: acmurthy):
More color on why I prefer priorities for reservations rather than 
adding/removing queues...

In vast majority of deployments, queues are an organizational/economic concept 
(e.g. per-department queues are very common) and are queues (hierarchy, names 
etc.) are quite stable and well recognized and part of the institutional memory.

If we rely on adding/removing queues to provide reservations, I'm concerned it 
will cause some confusion among both admins and users. For e.g. a user/admin 
trying to debug his application will be quite challenged to figure 
demand/supply of resources when he has to go back in time to reconstruct a 
programmatically generated queue hierarchy, particularly after it's long gone.

Priorities, OTOH, is quite a familiar concept to admins (think unix 'nice'); 
and more importantly is a natural fit to the problem at hand i.e. temporally 
increase/decrease the priority of the application based on it's reservation at 
a point in time.

Furthermore, as I said previously, priorities are an often requested feature - 
especially by admins.

 YARN Admission Control/Planner: enhancing the resource allocation model with 
 time.
 --

 Key: YARN-1051
 URL: https://issues.apache.org/jira/browse/YARN-1051
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: YARN-1051-design.pdf, curino_MSR-TR-2013-108.pdf, 
 techreport.pdf


 In this umbrella JIRA we propose to extend the YARN RM to handle time 
 explicitly, allowing users to reserve capacity over time. This is an 
 important step towards SLAs, long-running services, workflows, and helps for 
 gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   3   4   5   6   >