[jira] [Commented] (YARN-369) Handle ( or throw a proper error when receiving) status updates from application masters that have not registered

2013-07-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700365#comment-13700365
 ] 

Bikas Saha commented on YARN-369:
-

Some comments
This probably needs to change since its also used for double registration. 
Maybe just say invalid AM request exception.
{code}
+/**
+ * The exception is thrown when an application Master call allocate without
+ * calling RegisterApplicationMaster.
+ */
{code}

Not quite sure if this will break other tests or not? Do other tests that use 
this method continue to pass with this change? We could create a different 
registerAppAttempt() does not wait for LAUNCHED state and the current 
registerAppAttempt() could wait and then call the new one.
{code}
   public RegisterApplicationMasterResponse registerAppAttempt() throws 
Exception {
-waitForState(RMAppAttemptState.LAUNCHED);
{code}

The test has some code with author name in it. Please remove them.

 Handle ( or throw a proper error when receiving) status updates from 
 application masters that have not registered
 -

 Key: YARN-369
 URL: https://issues.apache.org/jira/browse/YARN-369
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, trunk-win
Reporter: Hitesh Shah
Assignee: Mayank Bansal
 Attachments: YARN-369.patch, YARN-369-trunk-1.patch, 
 YARN-369-trunk-2.patch, YARN-369-trunk-3.patch


 Currently, an allocate call from an unregistered application is allowed and 
 the status update for it throws a statemachine error that is silently dropped.
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588)
at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471)
at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
at java.lang.Thread.run(Thread.java:680)
 ApplicationMasterService should likely throw an appropriate error for 
 applications' requests that should not be handled in such cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700371#comment-13700371
 ] 

Bikas Saha commented on YARN-353:
-

I dont think it makes sense to have default value for this. ZK location is not 
something we control and we cannot assume it to be running on some default 
location. The commented value in the default.xml file is just for a syntax 
example.
{code}
+  public static final String DEFAULT_ZK_RM_STATE_STORE_ADDRESS =
+  127.0.0.1:2181;
{code}

Wherever we are doing multiple operations, we should probably use the ZK multi 
API's to guarantee atomic operations.
{code}
++ latestSequenceNumber);
+try {
+  if (dtSequenceNumberPath != null) {
+deleteWithRetries(dtSequenceNumberPath, 0);
+  }
+  createWithRetries(latestSequenceNumberPath, null, zkAcl,
+CreateMode.PERSISTENT);
+} catch (Exception e) {
+  LOG.info(Error in storing  + dtSequenceNumberPath);
+  throw e;
+}
+dtSequenceNumberPath = latestSequenceNumberPath;
{code}



 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-07-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700372#comment-13700372
 ] 

Bikas Saha commented on YARN-845:
-

Looks good. +1.

 RM crash with NPE on NODE_UPDATE
 

 Key: YARN-845
 URL: https://issues.apache.org/jira/browse/YARN-845
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Mayank Bansal
 Attachments: rm.log, YARN-845-trunk-1.patch, 
 YARN-845-trunk-draft.patch


 the following stack trace is generated in rm
 {code}
 n, service: 68.142.246.147:45454 }, ] resource=memory:1536, vCores:1 
 queue=default: capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:44544, vCores:29usedCapacity=0.90625, 
 absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
 (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(832)) - Application 
 appattempt_1371448527090_0844_01 released container 
 container_1371448527090_0844_01_05 on node: host: hostXX:45454 
 #containers=4 available=2048 used=6144 with event: FINISHED
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
 application application_1371448527090_0844 on node: hostXX:45454
 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
 (FiCaSchedulerApp.java:unreserve(435)) - Application 
 application_1371448527090_0844 unreserved  on node host: hostXX:45454 
 #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
 currentReservation memory:6144, vCores:4
 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
 (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
 deactivate...
 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
 (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
 the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
 at java.lang.Thread.run(Thread.java:662)
 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
 (ResourceManager.java:run(426)) - Exiting, bbye..
 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
 SelectChannelConnector@hostXX:8088
 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager 
 (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion 
 recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
 interrupted
 2013-06-17 12:43:53,766 INFO  impl.MetricsSystemImpl 
 (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics 
 system...
 2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl 
 (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
 2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl 
 (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system 
 shutdown complete.
 2013-06-17 12:43:53,768 WARN  

[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM

2013-07-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700377#comment-13700377
 ] 

Bikas Saha commented on YARN-763:
-

By the time the callback thread handles the shutdown request, the heartbeat 
thread may have already pinged the RM multiple times and we should ideally 
avoid that. e.g. since each time the RM will end up sending it a 
resync/shutdown or might fail it.
Ideally, the heartbeater thread should check the command and stop as needed so 
that there are no subsequent heartbeats.

Not quite clear what the test is testing? The thing to be tested is that there 
should not be an allocate call made by the heartbeater thread after it has been 
sent a shutdown command by the RM. I dont quite see anything that verifies this 
behavior.

Secondly, there is a lot of probably unnecessary code in the test. I dont think 
multiple responses after shutdown or mocking client.getAvailableResources is 
required.
{code}
+final AllocateResponse response1 = createAllocateResponse(
+new ArrayListContainerStatus(), allocated1, null);
+final AllocateResponse response2 = createAllocateResponse(completed1,
+new ArrayListContainer(), null);
+final AllocateResponse shutDownResponse = createAllocateResponse(
+new ArrayListContainerStatus(), new ArrayListContainer(), null);
+shutDownResponse.setAMCommand(AMCommand.AM_SHUTDOWN);
+
+TestCallbackHandler callbackHandler = new TestCallbackHandler();
+final AMRMClientContainerRequest client = mock(AMRMClientImpl.class);
+when(client.allocate(anyFloat())).thenReturn(shutDownResponse)
+.thenReturn(response1).thenReturn(response2);
+
+when(client.registerApplicationMaster(anyString(), anyInt(), anyString()))
+  .thenReturn(null);
+when(client.getAvailableResources()).thenAnswer(new AnswerResource() {
+  @Override
+  public Resource answer(InvocationOnMock invocation)
+  throws Throwable {
+// take client lock to simulate behavior of real impl
+synchronized (client) {
+  Thread.sleep(10);
+}
+return null;
+  }
+});
{code}

On a different note, serviceStop() should not call join() on the heartbeater 
thread. While serviceStop() blocks on the join() it may be holding onto 
application locks in its call tree. The callback thread might be waiting on 
those locks as it upcalls to the app code. Resulting in a deadlock. However, we 
should ensure the JVM is not hung because of any issue on this thread. So we 
should mark the callback thread as a daemon so that the JVM exits even if that 
thread is running.

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-07-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700380#comment-13700380
 ] 

Hudson commented on YARN-845:
-

Integrated in Hadoop-trunk-Commit #4043 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4043/])
YARN-845. RM crash with NPE on NODE_UPDATE (Mayank Bansal via bikas) 
(Revision 1499886)

 Result = SUCCESS
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1499886
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java


 RM crash with NPE on NODE_UPDATE
 

 Key: YARN-845
 URL: https://issues.apache.org/jira/browse/YARN-845
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Mayank Bansal
 Attachments: rm.log, YARN-845-trunk-1.patch, 
 YARN-845-trunk-draft.patch


 the following stack trace is generated in rm
 {code}
 n, service: 68.142.246.147:45454 }, ] resource=memory:1536, vCores:1 
 queue=default: capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:44544, vCores:29usedCapacity=0.90625, 
 absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
 (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(832)) - Application 
 appattempt_1371448527090_0844_01 released container 
 container_1371448527090_0844_01_05 on node: host: hostXX:45454 
 #containers=4 available=2048 used=6144 with event: FINISHED
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
 application application_1371448527090_0844 on node: hostXX:45454
 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
 (FiCaSchedulerApp.java:unreserve(435)) - Application 
 application_1371448527090_0844 unreserved  on node host: hostXX:45454 
 #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
 currentReservation memory:6144, vCores:4
 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
 (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
 deactivate...
 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
 (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
 the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
 at java.lang.Thread.run(Thread.java:662)
 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
 (ResourceManager.java:run(426)) - Exiting, bbye..
 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - 

[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700382#comment-13700382
 ] 

Bikas Saha commented on YARN-873:
-

There is probably no concept of an error code in the ApplicationReport object. 
The only current way for the YarnClient method to show an error is via an 
exception or a null report. Null report can be unclear as to what happened.

 YARNClient.getApplicationReport(unknownAppId) returns a null report
 ---

 Key: YARN-873
 URL: https://issues.apache.org/jira/browse/YARN-873
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong

 How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-808) ApplicationReport does not clearly tell that the attempt is running or not

2013-07-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700389#comment-13700389
 ] 

Bikas Saha commented on YARN-808:
-

We should probably expose the state of the app attempt. Probably need a 
translation from the internal app attempt state so that we dont expose the 
internal state machine state. [~vinodkv] Any other ideas?

 ApplicationReport does not clearly tell that the attempt is running or not
 --

 Key: YARN-808
 URL: https://issues.apache.org/jira/browse/YARN-808
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong

 When an app attempt fails and is being retried, ApplicationReport immediately 
 gives the new attemptId and non-null values of host etc. There is no way for 
 clients to know that the attempt is running other than connecting to it and 
 timing out on invalid host. Solution would be to expose the attempt state or 
 return a null value for host instead of N/A

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-818) YARN application classpath should add $PWD/* in addition to $PWD

2013-07-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700394#comment-13700394
 ] 

Bikas Saha commented on YARN-818:
-

In yarn application path we should include only yarn client and api jars 
instead of every jar in yarn. [~vinodkv] any comments?

 YARN application classpath should add $PWD/* in addition to $PWD
 

 Key: YARN-818
 URL: https://issues.apache.org/jira/browse/YARN-818
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-818.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700395#comment-13700395
 ] 

Jian He commented on YARN-353:
--

bq. I dont think it makes sense to have default value for this. ZK location is 
not something we control and we cannot assume it to be running on some default 
location.
Yes, we can not assume which location ZK is ruining on, but I think the result 
would be the same if we provide a default or leave it empty, botch cases should 
raise connect exception or something, which leads the user to config the true 
address. One bonus doing such might make user easier in test mode where ZK is 
running on its defaults, your opinion?

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-881) Priority#compareTo method seems to be wrong.

2013-07-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700397#comment-13700397
 ] 

Bikas Saha commented on YARN-881:
-

That internal code should probably create its own comparator. This compareTo 
method for the class is user facing and its inconsistent for users to see the 
compareTo() method returning results that are opposite to the declared ordering 
of priorities in yarn. [~vinodkv] - what do you think?

 Priority#compareTo method seems to be wrong.
 

 Key: YARN-881
 URL: https://issues.apache.org/jira/browse/YARN-881
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He

 if lower int value means higher priority, shouldn't we return 
 other.getPriority() - this.getPriority()  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700398#comment-13700398
 ] 

Bikas Saha commented on YARN-353:
-

No. It must be required for the user to specify this. We cannot assume some 
random address if the user has not specified a value. The code should throw an 
exception if this is not specified.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700422#comment-13700422
 ] 

Jian He commented on YARN-353:
--

Any downside of doing that ?

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-901) Active users field in Resourcemanager scheduler UI gives negative values

2013-07-04 Thread Nishan Shetty (JIRA)
Nishan Shetty created YARN-901:
--

 Summary: Active users field in Resourcemanager scheduler UI 
gives negative values
 Key: YARN-901
 URL: https://issues.apache.org/jira/browse/YARN-901
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.5-alpha
Reporter: Nishan Shetty
Priority: Minor


Active users field in Resourcemanager scheduler UI gives negative values on 
Resourcemanager restart when job is in progress

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-902) Used Resources field in Resourcemanager scheduler UI not displaying any values

2013-07-04 Thread Nishan Shetty (JIRA)
Nishan Shetty created YARN-902:
--

 Summary: Used Resources field in Resourcemanager scheduler UI 
not displaying any values
 Key: YARN-902
 URL: https://issues.apache.org/jira/browse/YARN-902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.5-alpha
Reporter: Nishan Shetty
Priority: Minor


Used Resources field in Resourcemanager scheduler UI not displaying any values

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700440#comment-13700440
 ] 

Bikas Saha commented on YARN-353:
-

Downside of doing what? Throwing clear exception will alert the user that the 
address is not configured and so the RM will not start.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-901) Active users field in Resourcemanager scheduler UI gives negative values

2013-07-04 Thread rohithsharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700445#comment-13700445
 ] 

rohithsharma commented on YARN-901:
---

Active users shows negative value during restart of RM. When APP_ADDED event, 
Active user values is calculated and same is recalculated at APP_REMOVED event.
Afer submitting job, if we restart RM then calculation lead to Negative 
value.The problem is InMemory storage of User Info at each queue which will be 
reset during RM start up.

 Active users field in Resourcemanager scheduler UI gives negative values
 --

 Key: YARN-901
 URL: https://issues.apache.org/jira/browse/YARN-901
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.5-alpha
Reporter: Nishan Shetty
Priority: Minor

 Active users field in Resourcemanager scheduler UI gives negative values on 
 Resourcemanager restart when job is in progress

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-818) YARN application classpath should add $PWD/* in addition to $PWD

2013-07-04 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700444#comment-13700444
 ] 

Alejandro Abdelnur commented on YARN-818:
-

agree with Bikas.

 YARN application classpath should add $PWD/* in addition to $PWD
 

 Key: YARN-818
 URL: https://issues.apache.org/jira/browse/YARN-818
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-818.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700465#comment-13700465
 ] 

Jian He commented on YARN-353:
--

sorry, I meant downside of giving a default ZK address.  yeah, throwing an 
exception would be clear.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-902) Used Resources field in Resourcemanager scheduler UI not displaying any values

2013-07-04 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700466#comment-13700466
 ] 

Sandy Ryza commented on YARN-902:
-

[~nishan], which scheduler is this occurring for you with?

 Used Resources field in Resourcemanager scheduler UI not displaying any 
 values
 

 Key: YARN-902
 URL: https://issues.apache.org/jira/browse/YARN-902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.5-alpha
Reporter: Nishan Shetty
Priority: Minor

 Used Resources field in Resourcemanager scheduler UI not displaying any 
 values

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations

2013-07-04 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-521:


Attachment: YARN-521.patch

 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira