[jira] [Updated] (YARN-5579) Resourcemanager should surface failed state store operation prominently

2018-08-30 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-5579:
-
Description: 
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.

{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 
application_1470517915158_0001
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:1042)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:639)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:201)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:183)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:955)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1036)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1031)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:notifyStoreOperationFailedInternal(987)) - State store 
operation failed
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
{code}
Resourcemanager should surface the above error prominently.

Likely subsequent application submission would encounter the same error.

  was:
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.

{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 

[jira] [Commented] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-12 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510186#comment-16510186
 ] 

Ted Yu commented on YARN-8414:
--

HBaseAdmin has the following method:
{code}
  boolean isTableAvailable(TableName tableName, byte[][] splitKeys) throws
  IOException;
{code}
You can selectively use the above method to ensure that your table is 
accessible.

> Nodemanager crashes soon if ATSv2 HBase is either down or absent
> 
>
> Key: YARN-8414
> URL: https://issues.apache.org/jira/browse/YARN-8414
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Priority: Critical
>
> Test cluster has 1000 apps running, and a user trigger capacity scheduler 
> queue changes.  This crashes all node managers.  It looks like node manager 
> encounter too many files open while aggregating logs for containers:
> {code}
> 2018-06-07 21:17:59,307 WARN  server.AbstractConnector 
> (AbstractConnector.java:handleAcceptFailure(544)) -
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at 
> org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:371)
> at 
> org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:601)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-07 21:17:59,758 WARN  util.SysInfoLinux 
> (SysInfoLinux.java:readProcMemInfoFile(238)) - Couldn't read /proc/meminfo; 
> can't determine memory settings
> 2018-06-07 21:17:59,758 WARN  util.SysInfoLinux 
> (SysInfoLinux.java:readProcMemInfoFile(238)) - Couldn't read /proc/meminfo; 
> can't determine memory settings
> 2018-06-07 21:18:00,842 WARN  client.ConnectionUtils 
> (ConnectionUtils.java:getStubKey(236)) - Can not resolve host12.example.com, 
> please check your network
> java.net.UnknownHostException: host1.example.com: System error
> at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
> at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
> at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
> at java.net.InetAddress.getAllByName(InetAddress.java:1192)
> at java.net.InetAddress.getAllByName(InetAddress.java:1126)
> at java.net.InetAddress.getByName(InetAddress.java:1076)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:233)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getClient(ConnectionImplementation.java:1189)
> at 
> org.apache.hadoop.hbase.client.ReversedScannerCallable.prepare(ReversedScannerCallable.java:111)
> at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:399)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
> at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Timeline service has thousands of exceptions:
> {code}
> 2018-06-07 21:18:34,182 ERROR client.AsyncProcess 
> (AsyncProcess.java:submit(291)) - Failed to get region location
> java.io.InterruptedIOException
> at 
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:265)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:834)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
> at 
> 

[jira] [Comment Edited] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-12 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510186#comment-16510186
 ] 

Ted Yu edited comment on YARN-8414 at 6/12/18 8:49 PM:
---

HBaseAdmin has the following method:
{code}
  boolean isTableAvailable(TableName tableName) throws
  IOException;
{code}
You can selectively use the above method to ensure that your table is 
accessible.


was (Author: yuzhih...@gmail.com):
HBaseAdmin has the following method:
{code}
  boolean isTableAvailable(TableName tableName, byte[][] splitKeys) throws
  IOException;
{code}
You can selectively use the above method to ensure that your table is 
accessible.

> Nodemanager crashes soon if ATSv2 HBase is either down or absent
> 
>
> Key: YARN-8414
> URL: https://issues.apache.org/jira/browse/YARN-8414
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Priority: Critical
>
> Test cluster has 1000 apps running, and a user trigger capacity scheduler 
> queue changes.  This crashes all node managers.  It looks like node manager 
> encounter too many files open while aggregating logs for containers:
> {code}
> 2018-06-07 21:17:59,307 WARN  server.AbstractConnector 
> (AbstractConnector.java:handleAcceptFailure(544)) -
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at 
> org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:371)
> at 
> org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:601)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-07 21:17:59,758 WARN  util.SysInfoLinux 
> (SysInfoLinux.java:readProcMemInfoFile(238)) - Couldn't read /proc/meminfo; 
> can't determine memory settings
> 2018-06-07 21:17:59,758 WARN  util.SysInfoLinux 
> (SysInfoLinux.java:readProcMemInfoFile(238)) - Couldn't read /proc/meminfo; 
> can't determine memory settings
> 2018-06-07 21:18:00,842 WARN  client.ConnectionUtils 
> (ConnectionUtils.java:getStubKey(236)) - Can not resolve host12.example.com, 
> please check your network
> java.net.UnknownHostException: host1.example.com: System error
> at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
> at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
> at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
> at java.net.InetAddress.getAllByName(InetAddress.java:1192)
> at java.net.InetAddress.getAllByName(InetAddress.java:1126)
> at java.net.InetAddress.getByName(InetAddress.java:1076)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:233)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getClient(ConnectionImplementation.java:1189)
> at 
> org.apache.hadoop.hbase.client.ReversedScannerCallable.prepare(ReversedScannerCallable.java:111)
> at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:399)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
> at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Timeline service has thousands of exceptions:
> {code}
> 2018-06-07 21:18:34,182 ERROR client.AsyncProcess 
> (AsyncProcess.java:submit(291)) - Failed to get region location
> java.io.InterruptedIOException
> at 
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:265)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597)
> at 
> 

[jira] [Commented] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-12 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509851#comment-16509851
 ] 

Ted Yu commented on YARN-8414:
--

bq. TimelineCollector.putEntities is a synchronized method. The throttling 
might need to be implemented here to avoid excessive call

I think this should be done as well.

> Nodemanager crashes soon if ATSv2 HBase is either down or absent
> 
>
> Key: YARN-8414
> URL: https://issues.apache.org/jira/browse/YARN-8414
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Priority: Critical
>
> Test cluster has 1000 apps running, and a user trigger capacity scheduler 
> queue changes.  This crashes all node managers.  It looks like node manager 
> encounter too many files open while aggregating logs for containers:
> {code}
> 2018-06-07 21:17:59,307 WARN  server.AbstractConnector 
> (AbstractConnector.java:handleAcceptFailure(544)) -
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at 
> org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:371)
> at 
> org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:601)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-07 21:17:59,758 WARN  util.SysInfoLinux 
> (SysInfoLinux.java:readProcMemInfoFile(238)) - Couldn't read /proc/meminfo; 
> can't determine memory settings
> 2018-06-07 21:17:59,758 WARN  util.SysInfoLinux 
> (SysInfoLinux.java:readProcMemInfoFile(238)) - Couldn't read /proc/meminfo; 
> can't determine memory settings
> 2018-06-07 21:18:00,842 WARN  client.ConnectionUtils 
> (ConnectionUtils.java:getStubKey(236)) - Can not resolve host12.example.com, 
> please check your network
> java.net.UnknownHostException: host1.example.com: System error
> at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
> at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
> at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
> at java.net.InetAddress.getAllByName(InetAddress.java:1192)
> at java.net.InetAddress.getAllByName(InetAddress.java:1126)
> at java.net.InetAddress.getByName(InetAddress.java:1076)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:233)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getClient(ConnectionImplementation.java:1189)
> at 
> org.apache.hadoop.hbase.client.ReversedScannerCallable.prepare(ReversedScannerCallable.java:111)
> at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:399)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
> at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Timeline service has thousands of exceptions:
> {code}
> 2018-06-07 21:18:34,182 ERROR client.AsyncProcess 
> (AsyncProcess.java:submit(291)) - Failed to get region location
> java.io.InterruptedIOException
> at 
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:265)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:834)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:236)
> at 
> 

[jira] [Commented] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-12 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509826#comment-16509826
 ] 

Ted Yu commented on YARN-8414:
--

In {{ClientScanner}} ctor :
{code}
this.retries = conf.getInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER,
  HConstants.DEFAULT_HBASE_CLIENT_RETRIES_NUMBER);
{code}
Config is "hbase.client.retries.number" with default of 15. You can tune this 
parameter so that client side fails earlier in this scenario.



> Nodemanager crashes soon if ATSv2 HBase is either down or absent
> 
>
> Key: YARN-8414
> URL: https://issues.apache.org/jira/browse/YARN-8414
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Priority: Critical
>
> Test cluster has 1000 apps running, and a user trigger capacity scheduler 
> queue changes.  This crashes all node managers.  It looks like node manager 
> encounter too many files open while aggregating logs for containers:
> {code}
> 2018-06-07 21:17:59,307 WARN  server.AbstractConnector 
> (AbstractConnector.java:handleAcceptFailure(544)) -
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at 
> org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:371)
> at 
> org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:601)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-07 21:17:59,758 WARN  util.SysInfoLinux 
> (SysInfoLinux.java:readProcMemInfoFile(238)) - Couldn't read /proc/meminfo; 
> can't determine memory settings
> 2018-06-07 21:17:59,758 WARN  util.SysInfoLinux 
> (SysInfoLinux.java:readProcMemInfoFile(238)) - Couldn't read /proc/meminfo; 
> can't determine memory settings
> 2018-06-07 21:18:00,842 WARN  client.ConnectionUtils 
> (ConnectionUtils.java:getStubKey(236)) - Can not resolve host12.example.com, 
> please check your network
> java.net.UnknownHostException: host1.example.com: System error
> at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
> at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
> at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
> at java.net.InetAddress.getAllByName(InetAddress.java:1192)
> at java.net.InetAddress.getAllByName(InetAddress.java:1126)
> at java.net.InetAddress.getByName(InetAddress.java:1076)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:233)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getClient(ConnectionImplementation.java:1189)
> at 
> org.apache.hadoop.hbase.client.ReversedScannerCallable.prepare(ReversedScannerCallable.java:111)
> at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:399)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
> at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Timeline service has thousands of exceptions:
> {code}
> 2018-06-07 21:18:34,182 ERROR client.AsyncProcess 
> (AsyncProcess.java:submit(291)) - Failed to get region location
> java.io.InterruptedIOException
> at 
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:265)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:834)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732)
> at 
> 

[jira] [Updated] (YARN-5579) Resourcemanager should surface failed state store operation prominently

2018-03-26 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-5579:
-
Description: 
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.

{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 
application_1470517915158_0001
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:1042)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:639)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:201)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:183)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:955)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1036)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1031)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:notifyStoreOperationFailedInternal(987)) - State store 
operation failed
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
{code}
Resourcemanager should surface the above error prominently.
Likely subsequent application submission would encounter the same error.

  was:
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.
{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 

[jira] [Resolved] (YARN-1869) Access to zkAcl should be synchronized in ZKRMStateStore#addStoreOrUpdateOps()

2018-02-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved YARN-1869.
--
Resolution: Won't Fix

The method is gone.

> Access to zkAcl should be synchronized in ZKRMStateStore#addStoreOrUpdateOps()
> --
>
> Key: YARN-1869
> URL: https://issues.apache.org/jira/browse/YARN-1869
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
> Attachments: yarn-1869.patch
>
>
> Here is related code:
> {code}
>   } else {
> opList.add(Op.create(nodeCreatePath, tokenOs.toByteArray(), zkAcl,
> CreateMode.PERSISTENT));
>   }
> {code}
> The other methods accessing zkAcl are synchronized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5579) Resourcemanager should surface failed state store operation prominently

2018-02-09 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-5579:
-
Description: 
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.
{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 
application_1470517915158_0001
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:1042)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:639)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:201)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:183)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:955)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1036)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1031)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:notifyStoreOperationFailedInternal(987)) - State store 
operation failed
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
{code}
Resourcemanager should surface the above error prominently.
Likely subsequent application submission would encounter the same error.

  was:
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.

{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 

[jira] [Comment Edited] (YARN-7346) Fix compilation errors against hbase2 beta release

2018-01-23 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325683#comment-16325683
 ] 

Ted Yu edited comment on YARN-7346 at 1/23/18 4:37 PM:
---

hbase 2 beta1 has been released.
FYI


was (Author: yuzhih...@gmail.com):
New RC for hbase 2 beta1 has been posted.
FYI

> Fix compilation errors against hbase2 beta release
> --
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Vrushali C
>Priority: Major
> Attachments: YARN-7346.00.patch, YARN-7346.01.patch, 
> YARN-7346.prelim1.patch, YARN-7346.prelim2.patch, YARN-7581.prelim.patch
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 beta release

2018-01-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325683#comment-16325683
 ] 

Ted Yu commented on YARN-7346:
--

New RC for hbase 2 beta1 has been posted.
FYI

> Fix compilation errors against hbase2 beta release
> --
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Vrushali C
> Attachments: YARN-7346.00.patch, YARN-7346.01.patch, 
> YARN-7346.prelim1.patch, YARN-7346.prelim2.patch, YARN-7581.prelim.patch
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 beta release

2018-01-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312355#comment-16312355
 ] 

Ted Yu commented on YARN-7346:
--

Probably because of the following dependency:
{code}
[INFO] org.apache.hbase:hbase-hadoop-compat:jar:3.0.0-SNAPSHOT
[INFO] +- org.apache.hbase:hbase-annotations:test-jar:tests:3.0.0-SNAPSHOT:test
[INFO] +- 
org.apache.hbase.thirdparty:hbase-shaded-miscellaneous:jar:1.0.1:compile
[INFO] +- commons-logging:commons-logging:jar:1.2:compile
[INFO] +- org.apache.hbase:hbase-metrics-api:jar:3.0.0-SNAPSHOT:compile
{code}
Similar dependency exists for hbase-hadoop2-compat module.

> Fix compilation errors against hbase2 beta release
> --
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Vrushali C
> Attachments: YARN-7346.00.patch, YARN-7346.01.patch, 
> YARN-7346.prelim1.patch, YARN-7346.prelim2.patch, YARN-7581.prelim.patch
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 beta release

2018-01-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312338#comment-16312338
 ] 

Ted Yu commented on YARN-7346:
--

First RC for beta1 is being voted upon.

After the formal release of beta1, there is no need to include staging repo.

> Fix compilation errors against hbase2 beta release
> --
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Vrushali C
> Attachments: YARN-7346.00.patch, YARN-7346.01.patch, 
> YARN-7346.prelim1.patch, YARN-7346.prelim2.patch, YARN-7581.prelim.patch
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 alpha release

2017-12-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306611#comment-16306611
 ] 

Ted Yu commented on YARN-7346:
--

bq. Unless HBase releases beta-1

You can find maven artifacts for beta-1 RC here:
https://repository.apache.org/content/groups/staging/org/apache/hbase/hbase-client/2.0.0-beta-1/

> Fix compilation errors against hbase2 alpha release
> ---
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Vrushali C
> Attachments: YARN-7346.00.patch, YARN-7346.prelim1.patch, 
> YARN-7346.prelim2.patch, YARN-7581.prelim.patch
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 alpha release

2017-12-19 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297469#comment-16297469
 ] 

Ted Yu commented on YARN-7346:
--

HBASE-19112 has been integrated.

See if rebase is needed for using the new hbase API.

> Fix compilation errors against hbase2 alpha release
> ---
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Vrushali C
> Attachments: YARN-7346.00.patch, YARN-7346.prelim1.patch, 
> YARN-7346.prelim2.patch, YARN-7581.prelim.patch
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 alpha release

2017-12-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289446#comment-16289446
 ] 

Ted Yu commented on YARN-7346:
--

[~rohithsharma] [~vrushalic]:
Can you review the patch ?

> Fix compilation errors against hbase2 alpha release
> ---
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Vrushali C
> Attachments: YARN-7346.00.patch, YARN-7346.prelim1.patch, 
> YARN-7346.prelim2.patch, YARN-7581.prelim.patch
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7213) [Umbrella] Test and validate HBase-2.0.x with Atsv2

2017-11-16 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256478#comment-16256478
 ] 

Ted Yu commented on YARN-7213:
--

See HBASE-18368

https://issues.apache.org/jira/browse/HBASE-18368?focusedCommentId=16207257=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16207257

bq. so you have to make do with SELECT x,y WHERE y = "foo" instead

True.

> [Umbrella] Test and validate HBase-2.0.x with Atsv2
> ---
>
> Key: YARN-7213
> URL: https://issues.apache.org/jira/browse/YARN-7213
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-7213.prelim.patch, YARN-7213.wip.patch
>
>
> Hbase-2.0.x officially support hadoop-alpha compilations. And also they are 
> getting ready for Hadoop-beta release so that HBase can release their 
> versions compatible with Hadoop-beta. So, this JIRA is to keep track of 
> HBase-2.0 integration issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7213) [Umbrella] Test and validate HBase-2.0.x with Atsv2

2017-11-16 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256315#comment-16256315
 ] 

Ted Yu commented on YARN-7213:
--

Haibo:
Time permitting, formulating simplified Filter test (independent of ATS v2) 
which shows the test failure is beneficial to hbase community (to prevent 
regression).

Thanks

> [Umbrella] Test and validate HBase-2.0.x with Atsv2
> ---
>
> Key: YARN-7213
> URL: https://issues.apache.org/jira/browse/YARN-7213
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-7213.prelim.patch, YARN-7213.wip.patch
>
>
> Hbase-2.0.x officially support hadoop-alpha compilations. And also they are 
> getting ready for Hadoop-beta release so that HBase can release their 
> versions compatible with Hadoop-beta. So, this JIRA is to keep track of 
> HBase-2.0 integration issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7213) [Umbrella] Test and validate HBase-2.0.x with Atsv2

2017-11-16 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255988#comment-16255988
 ] 

Ted Yu commented on YARN-7213:
--

Recently there were a lot of changes for Filters, starting from (trunk):
{code}
Author: huzheng 
Date:   Sat May 27 16:58:00 2017 +0800

HBASE-17678 FilterList with MUST_PASS_ONE may lead to redundant cells 
returned
{code}
to:
{code}
commit 705b3fa98c97806c7eba63617a99f62d829400d1
Author: huzheng 
Date:   Tue Oct 24 15:30:55 2017 +0800

HBASE-19057 Fix other code review comments about FilterList improvement
{code}
One approach is to step back before commit HBASE-17678, and progressively find 
which commit causes the test to fail.

> [Umbrella] Test and validate HBase-2.0.x with Atsv2
> ---
>
> Key: YARN-7213
> URL: https://issues.apache.org/jira/browse/YARN-7213
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-7213.prelim.patch, YARN-7213.wip.patch
>
>
> Hbase-2.0.x officially support hadoop-alpha compilations. And also they are 
> getting ready for Hadoop-beta release so that HBase can release their 
> versions compatible with Hadoop-beta. So, this JIRA is to keep track of 
> HBase-2.0 integration issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7213) [Umbrella] Test and validate HBase-2.0.x with Atsv2

2017-11-16 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255982#comment-16255982
 ] 

Ted Yu commented on YARN-7213:
--

I took a brief look at TestTimelineReaderWebServicesHBaseStorage.java which 
passes filter criteria thru URL parameters.

If test can be simplified (involving SingleColumnValueFilter), that would make 
debugging easier for hbase developers.

> [Umbrella] Test and validate HBase-2.0.x with Atsv2
> ---
>
> Key: YARN-7213
> URL: https://issues.apache.org/jira/browse/YARN-7213
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-7213.prelim.patch, YARN-7213.wip.patch
>
>
> Hbase-2.0.x officially support hadoop-alpha compilations. And also they are 
> getting ready for Hadoop-beta release so that HBase can release their 
> versions compatible with Hadoop-beta. So, this JIRA is to keep track of 
> HBase-2.0 integration issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7213) [Umbrella] Test and validate HBase-2.0.x with Atsv2

2017-11-15 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254693#comment-16254693
 ] 

Ted Yu commented on YARN-7213:
--

[~openinx]:
You have made many changes to Filters.

Mind giving Haibo a hand ?

> [Umbrella] Test and validate HBase-2.0.x with Atsv2
> ---
>
> Key: YARN-7213
> URL: https://issues.apache.org/jira/browse/YARN-7213
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-7213.prelim.patch, YARN-7213.wip.patch
>
>
> Hbase-2.0.x officially support hadoop-alpha compilations. And also they are 
> getting ready for Hadoop-beta release so that HBase can release their 
> versions compatible with Hadoop-beta. So, this JIRA is to keep track of 
> HBase-2.0 integration issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 alpha release

2017-11-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251819#comment-16251819
 ] 

Ted Yu commented on YARN-7346:
--

[~ram_krish]:
You can find branch used by Haibo from above.

> Fix compilation errors against hbase2 alpha release
> ---
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Vrushali C
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 alpha release

2017-11-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243363#comment-16243363
 ] 

Ted Yu commented on YARN-7346:
--

I am not sure a different folder helps. As long as mapreduce.tar.gz, containing 
un-relocated hbase jars, is on the classpath for (hbase) mapreduce jobs, we may 
see some problem.
e.g. HBASE-19169

> Fix compilation errors against hbase2 alpha release
> ---
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Vrushali C
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 alpha release

2017-11-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243098#comment-16243098
 ] 

Ted Yu commented on YARN-7346:
--

Have ATS v2 developers considered shading hbase jars ?

With shading, regardless of hbase version ATS v2 uses, hbase mapreduce job can 
succeed.

> Fix compilation errors against hbase2 alpha release
> ---
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Vrushali C
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 alpha release

2017-11-03 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238106#comment-16238106
 ] 

Ted Yu commented on YARN-7346:
--

When looking at the contents of mapreduce.tar.gz for hadoop3 beta1:
{code}
-rw-r--r-- jenkins/users 1304466 2017-10-17 16:16 
hadoop/share/hadoop/yarn/lib/hbase-client-1.2.6.jar
-rw-r--r-- jenkins/users 4179597 2017-10-17 16:16 
hadoop/share/hadoop/yarn/lib/hbase-server-1.2.6.jar
-rw-r--r-- jenkins/users  580945 2017-10-17 16:16 
hadoop/share/hadoop/yarn/lib/hbase-common-1.2.6.jar
-rw-r--r-- jenkins/users 4365774 2017-10-17 16:16 
hadoop/share/hadoop/yarn/lib/hbase-protocol-1.2.6.jar
-rw-r--r-- jenkins/users  100710 2017-10-17 16:16 
hadoop/share/hadoop/yarn/lib/hbase-hadoop2-compat-1.2.6.jar
{code}
The above wouldn't work for hbase2 release.
When can hbase developers have artifact which uses hbase2 alpha4 or later ?

> Fix compilation errors against hbase2 alpha release
> ---
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Vrushali C
>Priority: Major
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 alpha release

2017-10-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220795#comment-16220795
 ] 

Ted Yu commented on YARN-7346:
--

Please watch HBASE-19092

> Fix compilation errors against hbase2 alpha release
> ---
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Vrushali C
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1869) Access to zkAcl should be synchronized in ZKRMStateStore#addStoreOrUpdateOps()

2017-10-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214487#comment-16214487
 ] 

Ted Yu commented on YARN-1869:
--

Currently addStoreOrUpdateOps() has 4 arguments, instead of 5.

> Access to zkAcl should be synchronized in ZKRMStateStore#addStoreOrUpdateOps()
> --
>
> Key: YARN-1869
> URL: https://issues.apache.org/jira/browse/YARN-1869
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
> Attachments: yarn-1869.patch
>
>
> Here is related code:
> {code}
>   } else {
> opList.add(Op.create(nodeCreatePath, tokenOs.toByteArray(), zkAcl,
> CreateMode.PERSISTENT));
>   }
> {code}
> The other methods accessing zkAcl are synchronized.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 alpha release

2017-10-18 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210290#comment-16210290
 ] 

Ted Yu commented on YARN-7346:
--

bq. few bugs causing ATSv2 unit tests failure

Please surface the bug(s) if 2.0.0-alpha4-SNAPSHOT still has it.

> Fix compilation errors against hbase2 alpha release
> ---
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Vrushali C
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 alpha release

2017-10-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208639#comment-16208639
 ] 

Ted Yu commented on YARN-7346:
--

2.0.0-alpha4 hasn't come out yet.

Please build / install hbase-2 locally.

I normally use the following command line parameters :
{code}
-Phadoop-3.0 -Dhadoop-three.version=3.0.0-beta1 
-Dhadoop-two.version=3.0.0-beta1 -Djetty.version=9.3.19.v20170502
{code}

> Fix compilation errors against hbase2 alpha release
> ---
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Vrushali C
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 alpha release

2017-10-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208111#comment-16208111
 ] 

Ted Yu commented on YARN-7346:
--

Please build 2.0.0-alpha4-SNAPSHOT locally before hbase 2 alpha4 is released - 
hbase APIs are still moving.

> Fix compilation errors against hbase2 alpha release
> ---
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Vrushali C
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7346) Fix compilation errors against hbase2 alpha release

2017-10-17 Thread Ted Yu (JIRA)
Ted Yu created YARN-7346:


 Summary: Fix compilation errors against hbase2 alpha release
 Key: YARN-7346
 URL: https://issues.apache.org/jira/browse/YARN-7346
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu


When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, I 
got the following errors:

https://pastebin.com/Ms4jYEVB

This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6707) [ATSv2] Update HBase version to 1.2.6

2017-06-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045588#comment-16045588
 ] 

Ted Yu commented on YARN-6707:
--

It would take some time for hbase community to agree on the next stable release.

Please go ahead with commit.

> [ATSv2] Update HBase version to 1.2.6
> -
>
> Key: YARN-6707
> URL: https://issues.apache.org/jira/browse/YARN-6707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355
>Reporter: Varun Saxena
>Assignee: Vrushali C
> Attachments: YARN-6707-YARN-5355.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6707) [ATSv2] Update HBase version to 1.2.6

2017-06-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045344#comment-16045344
 ] 

Ted Yu commented on YARN-6707:
--

hbase 1.3.1 has been released.

Do you want to use that ?

> [ATSv2] Update HBase version to 1.2.6
> -
>
> Key: YARN-6707
> URL: https://issues.apache.org/jira/browse/YARN-6707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355
>Reporter: Varun Saxena
>Assignee: Vrushali C
> Attachments: YARN-6707-YARN-5355.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-1872) DistributedShell occasionally keeps running endlessly

2017-03-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved YARN-1872.
--
Resolution: Cannot Reproduce

> DistributedShell occasionally keeps running endlessly
> -
>
> Key: YARN-1872
> URL: https://issues.apache.org/jira/browse/YARN-1872
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Hong Zhiguo
> Attachments: TestDistributedShell.out, YARN-1872.patch
>
>
> From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console :
> TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and 
> TestDistributedShell#testDSShell timed out.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5579) Resourcemanager should surface failed state store operation prominently

2017-03-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-5579:
-
Description: 
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.

{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 
application_1470517915158_0001
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:1042)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:639)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:201)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:183)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:955)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1036)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1031)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:notifyStoreOperationFailedInternal(987)) - State store 
operation failed
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
{code}
Resourcemanager should surface the above error prominently.
Likely subsequent application submission would encounter the same error.

  was:
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.
{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 

[jira] [Updated] (YARN-5579) Resourcemanager should surface failed state store operation prominently

2017-02-26 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-5579:
-
Description: 
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.
{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 
application_1470517915158_0001
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:1042)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:639)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:201)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:183)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:955)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1036)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1031)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:notifyStoreOperationFailedInternal(987)) - State store 
operation failed
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
{code}
Resourcemanager should surface the above error prominently.
Likely subsequent application submission would encounter the same error.

  was:
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.

{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 

[jira] [Updated] (YARN-5579) Resourcemanager should surface failed state store operation prominently

2017-02-15 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-5579:
-
Description: 
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.

{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 
application_1470517915158_0001
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:1042)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:639)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:201)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:183)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:955)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1036)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1031)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:notifyStoreOperationFailedInternal(987)) - State store 
operation failed
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
{code}
Resourcemanager should surface the above error prominently.
Likely subsequent application submission would encounter the same error.

  was:
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.
{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 

[jira] [Updated] (YARN-5579) Resourcemanager should surface failed state store operation prominently

2016-12-14 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-5579:
-
Description: 
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.
{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 
application_1470517915158_0001
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:1042)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:639)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:201)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:183)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:955)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1036)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1031)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:notifyStoreOperationFailedInternal(987)) - State store 
operation failed
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
{code}
Resourcemanager should surface the above error prominently.
Likely subsequent application submission would encounter the same error.

  was:
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.
{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 

[jira] [Commented] (YARN-1872) DistributedShell occasionally keeps running endlessly

2016-11-11 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658592#comment-15658592
 ] 

Ted Yu commented on YARN-1872:
--

Looks like this is no longer an issue.

> DistributedShell occasionally keeps running endlessly
> -
>
> Key: YARN-1872
> URL: https://issues.apache.org/jira/browse/YARN-1872
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Hong Zhiguo
> Attachments: TestDistributedShell.out, YARN-1872.patch
>
>
> From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console :
> TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and 
> TestDistributedShell#testDSShell timed out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5579) Resourcemanager should surface failed state store operation prominently

2016-10-31 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-5579:
-
Description: 
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.
{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 
application_1470517915158_0001
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:1042)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:639)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:201)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:183)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:955)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1036)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1031)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:notifyStoreOperationFailedInternal(987)) - State store 
operation failed
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
{code}

Resourcemanager should surface the above error prominently.
Likely subsequent application submission would encounter the same error.

  was:
I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.
{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 

[jira] [Updated] (YARN-5579) Resourcemanager should surface failed state store operation prominently

2016-10-17 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-5579:
-
Labels: states  (was: )

> Resourcemanager should surface failed state store operation prominently
> ---
>
> Key: YARN-5579
> URL: https://issues.apache.org/jira/browse/YARN-5579
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 2.7.3
>Reporter: Ted Yu
>  Labels: states
>
> I found the following in Resourcemanager log when I tried to figure out why 
> application got stuck in NEW_SAVING state.
> {code}
> 2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
> (ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
> 2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
> (RMStateStore.java:transition(205)) - Error storing app: 
> application_1470517915158_0001
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:1042)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:639)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:201)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:183)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:955)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1036)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1031)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> 2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
> (RMStateStore.java:notifyStoreOperationFailedInternal(987)) - State store 
> operation failed
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
> at 
> 

[jira] [Updated] (YARN-5579) Resourcemanager should surface failed state store operation prominently

2016-08-29 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-5579:
-
Affects Version/s: 2.7.3

> Resourcemanager should surface failed state store operation prominently
> ---
>
> Key: YARN-5579
> URL: https://issues.apache.org/jira/browse/YARN-5579
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 2.7.3
>Reporter: Ted Yu
>
> I found the following in Resourcemanager log when I tried to figure out why 
> application got stuck in NEW_SAVING state.
> {code}
> 2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
> (ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
> 2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
> (RMStateStore.java:transition(205)) - Error storing app: 
> application_1470517915158_0001
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:1042)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:639)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:201)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:183)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:955)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1036)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1031)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> 2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
> (RMStateStore.java:notifyStoreOperationFailedInternal(987)) - State store 
> operation failed
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
> at 
> 

[jira] [Created] (YARN-5579) Resourcemanager should surface failed state store operation prominently

2016-08-29 Thread Ted Yu (JIRA)
Ted Yu created YARN-5579:


 Summary: Resourcemanager should surface failed state store 
operation prominently
 Key: YARN-5579
 URL: https://issues.apache.org/jira/browse/YARN-5579
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Ted Yu


I found the following in Resourcemanager log when I tried to figure out why 
application got stuck in NEW_SAVING state.
{code}
2016-08-29 18:14:23,486 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:runWithRetries(1242)) - Maxed out ZK retries. Giving up!
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:transition(205)) - Error storing app: 
application_1470517915158_0001
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:1042)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:639)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:201)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:183)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:955)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1036)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1031)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 18:14:23,486 ERROR recovery.RMStateStore 
(RMStateStore.java:notifyStoreOperationFailedInternal(987)) - State store 
operation failed
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
{code}
Resourcemanager should surface the above error prominently.
Likely subsequent application submission would encounter the same error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (YARN-4736) Issues with HBaseTimelineWriterImpl

2016-02-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172795#comment-15172795
 ] 

Ted Yu commented on YARN-4736:
--

bq. so planning to test with hbase-1.0.3 tar. 

There have been more release(s) since 1.0.3 release.
e.g. you can try out 1.2.0 release.

BufferedMutatorImpl#flush() appeared in stack trace. However, if the hbase 
cluster was shutdown, the flush wouldn't succeed.

I haven't seen the above issue happen on a live 1.x cluster.

> Issues with HBaseTimelineWriterImpl
> ---
>
> Key: YARN-4736
> URL: https://issues.apache.org/jira/browse/YARN-4736
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Vrushali C
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: hbaseException.log, threaddump.log
>
>
> Faced some issues while running ATSv2 in single node Hadoop cluster and in 
> the same node had launched Hbase with embedded zookeeper.
> # Due to some NPE issues i was able to see NM was trying to shutdown, but the 
> NM daemon process was not completed due to the locks.
> # Got some exception related to Hbase after application finished execution 
> successfully. 
> will attach logs and the trace for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-08-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved YARN-3025.
--
Resolution: Later

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt


 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1869) Access to zkAcl should be synchronized in ZKRMStateStore#addStoreOrUpdateOps()

2015-07-15 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-1869:
-
Description: 
Here is related code:
{code}
  } else {
opList.add(Op.create(nodeCreatePath, tokenOs.toByteArray(), zkAcl,
CreateMode.PERSISTENT));
  }
{code}

The other methods accessing zkAcl are synchronized.

  was:
Here is related code:
{code}
  } else {
opList.add(Op.create(nodeCreatePath, tokenOs.toByteArray(), zkAcl,
CreateMode.PERSISTENT));
  }
{code}
The other methods accessing zkAcl are synchronized.


 Access to zkAcl should be synchronized in ZKRMStateStore#addStoreOrUpdateOps()
 --

 Key: YARN-1869
 URL: https://issues.apache.org/jira/browse/YARN-1869
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: yarn-1869.patch


 Here is related code:
 {code}
   } else {
 opList.add(Op.create(nodeCreatePath, tokenOs.toByteArray(), zkAcl,
 CreateMode.PERSISTENT));
   }
 {code}
 The other methods accessing zkAcl are synchronized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations

2015-06-23 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598781#comment-14598781
 ] 

Ted Yu commented on YARN-3815:
--

[~jrottinghuis]:
Your description makes sense.
Cell tag is supported since hbase 0.98+ so we can use it to mark completion.

 [Aggregation] Application/Flow/User/Queue Level Aggregations
 

 Key: YARN-3815
 URL: https://issues.apache.org/jira/browse/YARN-3815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: Timeline Service Nextgen Flow, User, Queue Level 
 Aggregations (v1).pdf


 Per previous discussions in some design documents for YARN-2928, the basic 
 scenario is the query for stats can happen on:
 - Application level, expect return: an application with aggregated stats
 - Flow level, expect return: aggregated stats for a flow_run, flow_version 
 and flow 
 - User level, expect return: aggregated stats for applications submitted by 
 user
 - Queue level, expect return: aggregated stats for applications within the 
 Queue
 Application states is the basic building block for all other level 
 aggregations. We can provide Flow/User/Queue level aggregated statistics info 
 based on application states (a dedicated table for application states is 
 needed which is missing from previous design documents like HBase/Phoenix 
 schema design). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations

2015-06-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596173#comment-14596173
 ] 

Ted Yu commented on YARN-3815:
--

My comment is related to usage of hbase.
bq. under framework_specific_metrics column family
Since column family name appears in every KeyValue, it would be better to use 
very short column family name. e.g. f_m for framework metrics.

 [Aggregation] Application/Flow/User/Queue Level Aggregations
 

 Key: YARN-3815
 URL: https://issues.apache.org/jira/browse/YARN-3815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: Timeline Service Nextgen Flow, User, Queue Level 
 Aggregations (v1).pdf


 Per previous discussions in some design documents for YARN-2928, the basic 
 scenario is the query for stats can happen on:
 - Application level, expect return: an application with aggregated stats
 - Flow level, expect return: aggregated stats for a flow_run, flow_version 
 and flow 
 - User level, expect return: aggregated stats for applications submitted by 
 user
 - Queue level, expect return: aggregated stats for applications within the 
 Queue
 Application states is the basic building block for all other level 
 aggregations. We can provide Flow/User/Queue level aggregated statistics info 
 based on application states (a dedicated table for application states is 
 needed which is missing from previous design documents like HBase/Phoenix 
 schema design). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations

2015-06-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596616#comment-14596616
 ] 

Ted Yu commented on YARN-3815:
--

bq. in the spirit of readless increments as used in Tephra

Readless increment feature is implemented in cdap, called delta write.
Please take a look at:
cdap-hbase-compat-0.98/src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementHandler.java
cdap-hbase-compat-0.98//src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementSummingScanner.java

The implementation uses hbase coprocessor, BTW

 [Aggregation] Application/Flow/User/Queue Level Aggregations
 

 Key: YARN-3815
 URL: https://issues.apache.org/jira/browse/YARN-3815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: Timeline Service Nextgen Flow, User, Queue Level 
 Aggregations (v1).pdf


 Per previous discussions in some design documents for YARN-2928, the basic 
 scenario is the query for stats can happen on:
 - Application level, expect return: an application with aggregated stats
 - Flow level, expect return: aggregated stats for a flow_run, flow_version 
 and flow 
 - User level, expect return: aggregated stats for applications submitted by 
 user
 - Queue level, expect return: aggregated stats for applications within the 
 Queue
 Application states is the basic building block for all other level 
 aggregations. We can provide Flow/User/Queue level aggregated statistics info 
 based on application states (a dedicated table for application states is 
 needed which is missing from previous design documents like HBase/Phoenix 
 schema design). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-2764) counters.LimitExceededException shouldn't abort AsyncDispatcher

2015-05-02 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened YARN-2764:
--

 counters.LimitExceededException shouldn't abort AsyncDispatcher
 ---

 Key: YARN-2764
 URL: https://issues.apache.org/jira/browse/YARN-2764
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Ted Yu
  Labels: counters

 I saw the following in container log:
 {code}
 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
 attemptattempt_1414221548789_0023_r_03_0
 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
 task_1414221548789_0023_r_03 Task Transitioned from RUNNING to SUCCEEDED
 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 24
 2014-10-25 10:28:55,053 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
 job_1414221548789_0023Job Transitioned from RUNNING to COMMITTING
 2014-10-25 10:28:55,054 INFO [CommitterEvent Processor #1] 
 org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
 the event EventType: JOB_COMMIT
 2014-10-25 10:28:55,177 FATAL [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
 org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many 
 counters: 121 max=120
   at 
 org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101)
   at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1754)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1737)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1718)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1089)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2049)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2045)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 2014-10-25 10:28:55,185 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
 {code}
 Counter limit was exceeded when JobFinishedEvent was created.
 Better handling of LimitExceededException should be provided so that 
 AsyncDispatcher can continue functioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2350) TestApplicationMasterServiceOnHA fails with InvalidToken exception

2015-03-29 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved YARN-2350.
--
Resolution: Cannot Reproduce

 TestApplicationMasterServiceOnHA fails with InvalidToken exception
 --

 Key: YARN-2350
 URL: https://issues.apache.org/jira/browse/YARN-2350
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu

 From https://builds.apache.org/job/Hadoop-Yarn-trunk/622 :
 {code}
 Running org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.591 sec  
 FAILURE! - in org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
 testAllocateOnHA(org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA)
   Time elapsed: 8.408 sec   ERROR!
 org.apache.hadoop.security.token.SecretManager$InvalidToken: Given AMRMToken 
 for application : appattempt_1000_0001_00 seems to have been generated 
 illegally.
 at org.apache.hadoop.ipc.Client.call(Client.java:1411)
 at org.apache.hadoop.ipc.Client.call(Client.java:1364)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at com.sun.proxy.$Proxy85.allocate(Unknown Source)
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
 at com.sun.proxy.$Proxy86.allocate(Unknown Source)
 at 
 org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA.testAllocateOnHA(TestApplicationMasterServiceOnHA.java:84)
 {code}
 This is reproducible locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1296) schedulerAllocateTimer is accessed without holding samplerLock in ResourceSchedulerWrapper

2015-03-29 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved YARN-1296.
--
Resolution: Later

 schedulerAllocateTimer is accessed without holding samplerLock in 
 ResourceSchedulerWrapper
 --

 Key: YARN-1296
 URL: https://issues.apache.org/jira/browse/YARN-1296
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: yarn-1296-v1.patch


 Here is related code:
 {code}
   public Allocation allocate(ApplicationAttemptId attemptId,
  ListResourceRequest resourceRequests,
  ListContainerId containerIds,
  ListString strings, ListString strings2) {
 if (metricsON) {
   final Timer.Context context = schedulerAllocateTimer.time();
 {code}
 samplerLock should be used to guard the access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2178) TestApplicationMasterService sometimes fails in trunk

2015-03-22 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved YARN-2178.
--
Resolution: Cannot Reproduce

 TestApplicationMasterService sometimes fails in trunk
 -

 Key: YARN-2178
 URL: https://issues.apache.org/jira/browse/YARN-2178
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor
  Labels: test

 From https://builds.apache.org/job/Hadoop-Yarn-trunk/587/ :
 {code}
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService
 Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 55.763 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService
 testInvalidContainerReleaseRequest(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService)
   Time elapsed: 41.336 sec   FAILURE!
 java.lang.AssertionError: AppAttempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:401)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService.testInvalidContainerReleaseRequest(TestApplicationMasterService.java:143)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2764) counters.LimitExceededException shouldn't abort AsyncDispatcher

2015-03-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved YARN-2764.
--
Resolution: Later

 counters.LimitExceededException shouldn't abort AsyncDispatcher
 ---

 Key: YARN-2764
 URL: https://issues.apache.org/jira/browse/YARN-2764
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Ted Yu
  Labels: counters

 I saw the following in container log:
 {code}
 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
 attemptattempt_1414221548789_0023_r_03_0
 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
 task_1414221548789_0023_r_03 Task Transitioned from RUNNING to SUCCEEDED
 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 24
 2014-10-25 10:28:55,053 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
 job_1414221548789_0023Job Transitioned from RUNNING to COMMITTING
 2014-10-25 10:28:55,054 INFO [CommitterEvent Processor #1] 
 org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
 the event EventType: JOB_COMMIT
 2014-10-25 10:28:55,177 FATAL [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
 org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many 
 counters: 121 max=120
   at 
 org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101)
   at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1754)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1737)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1718)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1089)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2049)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2045)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 2014-10-25 10:28:55,185 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
 {code}
 Counter limit was exceeded when JobFinishedEvent was created.
 Better handling of LimitExceededException should be provided so that 
 AsyncDispatcher can continue functioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2133) Make entity Id specification in TestTimelineWebServices amenable for future test cases

2015-03-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved YARN-2133.
--
Resolution: Later

 Make entity Id specification in TestTimelineWebServices amenable for future 
 test cases
 --

 Key: YARN-2133
 URL: https://issues.apache.org/jira/browse/YARN-2133
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 Currently each test case in TestTimelineWebServices uses different entity Ids 
 / types.
 When new test case is added, developer has to go over existing cases and find 
 an unused entity Id.
 Specification of unique entity Id can be done through introduction of an 
 AtomicInteger field of TestTimelineWebServices that is incremented at the 
 beginning of each test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2764) counters.LimitExceededException shouldn't abort AsyncDispatcher

2015-03-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-2764:
-
Description: 
I saw the following in container log:
{code}
2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
attemptattempt_1414221548789_0023_r_03_0
2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
task_1414221548789_0023_r_03 Task Transitioned from RUNNING to SUCCEEDED
2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 24
2014-10-25 10:28:55,053 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1414221548789_0023Job 
Transitioned from RUNNING to COMMITTING
2014-10-25 10:28:55,054 INFO [CommitterEvent Processor #1] 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the 
event EventType: JOB_COMMIT
2014-10-25 10:28:55,177 FATAL [AsyncDispatcher event handler] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 
121 max=120
  at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101)
  at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108)
  at 
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78)
  at 
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95)
  at 
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106)
  at 
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203)
  at 
org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1754)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1737)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1718)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1089)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2049)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2045)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
  at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
  at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
  at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
  at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
  at java.lang.Thread.run(Thread.java:745)
2014-10-25 10:28:55,185 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
{code}
Counter limit was exceeded when JobFinishedEvent was created.

Better handling of LimitExceededException should be provided so that 
AsyncDispatcher can continue functioning.

  was:
I saw the following in container log:
{code}
2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
attemptattempt_1414221548789_0023_r_03_0
2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
task_1414221548789_0023_r_03 Task Transitioned from RUNNING to SUCCEEDED
2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 24
2014-10-25 10:28:55,053 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1414221548789_0023Job 
Transitioned from RUNNING to COMMITTING
2014-10-25 10:28:55,054 INFO [CommitterEvent Processor #1] 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the 
event EventType: JOB_COMMIT
2014-10-25 10:28:55,177 FATAL 

[jira] [Commented] (YARN-2706) Math.abs() is called on random integer in DefaultContainerExecutor#getWorkingDir()

2015-02-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341605#comment-14341605
 ] 

Ted Yu commented on YARN-2706:
--

lgtm

 Math.abs() is called on random integer in 
 DefaultContainerExecutor#getWorkingDir()
 --

 Key: YARN-2706
 URL: https://issues.apache.org/jira/browse/YARN-2706
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: haosdent
Priority: Minor
 Attachments: YARN-2706.patch


 Here is the code:
 {code}
 long randomPosition = Math.abs(r.nextLong()) % totalAvailable;
 {code}
 See 
 http://stackoverflow.com/questions/7567350/findbugs-rv-absolute-value-of-random-int-warning



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-02-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338736#comment-14338736
 ] 

Ted Yu commented on YARN-3025:
--

Ping [~zjshen]

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt


 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-02-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338746#comment-14338746
 ] 

Ted Yu commented on YARN-2777:
--

@Varun:
{code}
713   out.println(End of LogType:);
714   out.println(fileType);
{code}
Can you put the above two onto the same line ?

Thanks

 Mark the end of individual log in aggregated log
 

 Key: YARN-2777
 URL: https://issues.apache.org/jira/browse/YARN-2777
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Varun Saxena
  Labels: log-aggregation
 Attachments: YARN-2777.001.patch


 Below is snippet of aggregated log showing hbase master log:
 {code}
 LogType: hbase-hbase-master-ip-172-31-34-167.log
 LogUploadTime: 29-Oct-2014 22:31:55
 LogLength: 24103045
 Log Contents:
 Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
 ...
   at 
 org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
   at java.lang.Thread.run(Thread.java:745)
 LogType: hbase-hbase-master-ip-172-31-34-167.out
 {code}
 Since logs from various daemons are aggregated in one log file, it would be 
 desirable to mark the end of one log before starting with the next.
 e.g. with such a line:
 {code}
 End of LogType: hbase-hbase-master-ip-172-31-34-167.log
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-02-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338975#comment-14338975
 ] 

Ted Yu commented on YARN-2777:
--

lgtm

 Mark the end of individual log in aggregated log
 

 Key: YARN-2777
 URL: https://issues.apache.org/jira/browse/YARN-2777
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Varun Saxena
  Labels: log-aggregation
 Attachments: YARN-2777.001.patch, YARN-2777.002.patch


 Below is snippet of aggregated log showing hbase master log:
 {code}
 LogType: hbase-hbase-master-ip-172-31-34-167.log
 LogUploadTime: 29-Oct-2014 22:31:55
 LogLength: 24103045
 Log Contents:
 Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
 ...
   at 
 org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
   at java.lang.Thread.run(Thread.java:745)
 LogType: hbase-hbase-master-ip-172-31-34-167.out
 {code}
 Since logs from various daemons are aggregated in one log file, it would be 
 desirable to mark the end of one log before starting with the next.
 e.g. with such a line:
 {code}
 End of LogType: hbase-hbase-master-ip-172-31-34-167.log
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-02-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324607#comment-14324607
 ] 

Ted Yu commented on YARN-3025:
--

Talking to Jian He, he suggested adding field in AllocateResponse so that 
ApplicationMasterProtocol#allocate() can be enhanced to return blacklisted 
nodes.

[~zjshen]:
What do you think ?

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt


 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-02-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-3025:
-
Attachment: yarn-3025-v3.txt

work in progress: need to add the PBImpl classes.

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt


 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-02-15 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-3025:
-
Attachment: yarn-3025-v2.txt

Patch v2 does what was proposed above.

Next step is to add getter for black listed nodes in ApplicationMasterProtocol

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt


 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-02-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320885#comment-14320885
 ] 

Ted Yu commented on YARN-3025:
--

Looking into ApplicationMasterService#allocate():
{code}
  Allocation allocation =
  this.rScheduler.allocate(appAttemptId, ask, release, 
  blacklistAdditions, blacklistRemovals);
{code}
Black list information can be retrieved from YarnScheduler.
How about adding the following API to YarnScheduler ?
{code}
ListString getBlacklistedNodes(AllocationId);
{code}

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: yarn-3025-v1.txt


 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-02-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned YARN-3025:


Assignee: Ted Yu

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu

 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2764) counters.LimitExceededException shouldn't abort AsyncDispatcher

2015-02-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-2764:
-
Labels: counters  (was: )

 counters.LimitExceededException shouldn't abort AsyncDispatcher
 ---

 Key: YARN-2764
 URL: https://issues.apache.org/jira/browse/YARN-2764
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Ted Yu
  Labels: counters

 I saw the following in container log:
 {code}
 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
 attemptattempt_1414221548789_0023_r_03_0
 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
 task_1414221548789_0023_r_03 Task Transitioned from RUNNING to SUCCEEDED
 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 24
 2014-10-25 10:28:55,053 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
 job_1414221548789_0023Job Transitioned from RUNNING to COMMITTING
 2014-10-25 10:28:55,054 INFO [CommitterEvent Processor #1] 
 org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
 the event EventType: JOB_COMMIT
 2014-10-25 10:28:55,177 FATAL [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
 org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many 
 counters: 121 max=120
   at 
 org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101)
   at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1754)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1737)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1718)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1089)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2049)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2045)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 2014-10-25 10:28:55,185 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
 {code}
 Counter limit was exceeded when JobFinishedEvent was created.
 Better handling of LimitExceededException should be provided so that 
 AsyncDispatcher can continue functioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2650) TestRMRestart#testRMRestartGetApplicationList sometimes fails in trunk

2015-02-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved YARN-2650.
--
Resolution: Cannot Reproduce

 TestRMRestart#testRMRestartGetApplicationList sometimes fails in trunk
 --

 Key: YARN-2650
 URL: https://issues.apache.org/jira/browse/YARN-2650
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor
 Attachments: TestRMRestart.tar.gz


 I got the following failure running on Linux:
 {code}
   TestRMRestart.testRMRestartGetApplicationList:952
 rMAppManager.logApplicationSummary(
 isA(org.apache.hadoop.yarn.api.records.ApplicationId)
 );
 Wanted 3 times:
 - at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:952)
 But was 2 times:
 - at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:64)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-02-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306155#comment-14306155
 ] 

Ted Yu commented on YARN-3025:
--

bq. then we can see the efficient way to persist them into the state store to 
overcome RM restarting

Sounds good.

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu

 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2858) TestRMHA#testFailoverAndTransitions fails in trunk against Java 8

2015-01-31 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved YARN-2858.
--
Resolution: Cannot Reproduce

 TestRMHA#testFailoverAndTransitions fails in trunk against Java 8
 -

 Key: YARN-2858
 URL: https://issues.apache.org/jira/browse/YARN-2858
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/4/console :
 {code}
 Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.034 sec 
  FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMHA
 testFailoverAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 30.021 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:129)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:698)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:641)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1218)
   at 
 java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
   at 
 com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:240)
   at 
 com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
   at com.sun.jersey.api.client.Client.handle(Client.java:648)
   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
   at 
 com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.checkActiveRMWebServices(TestRMHA.java:157)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.checkActiveRMFunctionality(TestRMHA.java:142)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testFailoverAndTransitions(TestRMHA.java:211)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2871) TestRMRestart#testRMRestartGetApplicationList sometime fails in trunk

2015-01-31 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved YARN-2871.
--
Resolution: Cannot Reproduce

 TestRMRestart#testRMRestartGetApplicationList sometime fails in trunk
 -

 Key: YARN-2871
 URL: https://issues.apache.org/jira/browse/YARN-2871
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From trunk build #746 (https://builds.apache.org/job/Hadoop-Yarn-trunk/746):
 {code}
 Failed tests:
   TestRMRestart.testRMRestartGetApplicationList:957
 rMAppManager.logApplicationSummary(
 isA(org.apache.hadoop.yarn.api.records.ApplicationId)
 );
 Wanted 3 times:
 - at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:957)
 But was 2 times:
 - at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:66)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-01-27 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294467#comment-14294467
 ] 

Ted Yu commented on YARN-3025:
--

bq. If we want to make sure the blacklisted nodes is recoverable after RM 
crashing

The above is desirable.

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu

 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-01-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292145#comment-14292145
 ] 

Ted Yu commented on YARN-3025:
--

The persistence of blacklisted nodes doesn't have to be 1-to-1 with each 
heartbeat from AM.
RM can decide a proper interval.

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu

 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-01-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292981#comment-14292981
 ] 

Ted Yu commented on YARN-3025:
--

Tsuyoshi's comment makes sense.

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu

 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3081) Potential indefinite wait in ContainerManagementProtocolProxy#addProxyToCache()

2015-01-21 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-3081:
-
Attachment: yarn-3081-001.patch

 Potential indefinite wait in 
 ContainerManagementProtocolProxy#addProxyToCache()
 ---

 Key: YARN-3081
 URL: https://issues.apache.org/jira/browse/YARN-3081
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: yarn-3081-001.patch


 {code}
   if (!removedProxy) {
 // all of the proxies are currently in use and already scheduled
 // for removal, so we need to wait until at least one of them closes
 try {
   this.wait();
 {code}
 The above code can wait for a condition that has already been satisfied, 
 leading to indefinite wait.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3081) Potential indefinite wait in ContainerManagementProtocolProxy#addProxyToCache()

2015-01-21 Thread Ted Yu (JIRA)
Ted Yu created YARN-3081:


 Summary: Potential indefinite wait in 
ContainerManagementProtocolProxy#addProxyToCache()
 Key: YARN-3081
 URL: https://issues.apache.org/jira/browse/YARN-3081
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor


{code}
  if (!removedProxy) {
// all of the proxies are currently in use and already scheduled
// for removal, so we need to wait until at least one of them closes
try {
  this.wait();
{code}
The above code can wait for a condition that has already been satisfied, 
leading to indefinite wait.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3003) Provide API for client to retrieve label to node mapping

2015-01-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284205#comment-14284205
 ] 

Ted Yu commented on YARN-3003:
--

For messgae LabelsToNodeIdProto, should it be named LabelsToNodeIdsProto since 
nodeId field is repeated ?



 Provide API for client to retrieve label to node mapping
 

 Key: YARN-3003
 URL: https://issues.apache.org/jira/browse/YARN-3003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Ted Yu
Assignee: Varun Saxena
 Attachments: YARN-3003.001.patch


 Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set 
 of labels associated with the node.
 Client (such as Slider) may be interested in label to node mapping - given 
 label, return the nodes with this label.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3072) Dependency on io.netty in hadoop-nfs pom.xml can be dropped

2015-01-19 Thread Ted Yu (JIRA)
Ted Yu created YARN-3072:


 Summary: Dependency on io.netty in hadoop-nfs pom.xml can be 
dropped
 Key: YARN-3072
 URL: https://issues.apache.org/jira/browse/YARN-3072
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor


hadoop-nfs pom.xml has compile time dependency on io.netty

This dependency can be dropped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-01-18 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14282006#comment-14282006
 ] 

Ted Yu commented on YARN-3025:
--

[~bikassaha]:
Can you provide your opinion ?

Thanks

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu

 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3070) TestRMAdminCLI#testHelp fails for transitionToActive command

2015-01-17 Thread Ted Yu (JIRA)
Ted Yu created YARN-3070:


 Summary: TestRMAdminCLI#testHelp fails for transitionToActive 
command
 Key: YARN-3070
 URL: https://issues.apache.org/jira/browse/YARN-3070
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


{code}
  testError(new String[] { -help, -transitionToActive },
  Usage: yarn rmadmin [-transitionToActive serviceId +
   [--forceactive]], dataErr, 0);
{code}
fails with:
{code}
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.yarn.client.cli.TestRMAdminCLI.testError(TestRMAdminCLI.java:547)
at 
org.apache.hadoop.yarn.client.cli.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:335)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3070) TestRMAdminCLI#testHelp fails for transitionToActive command

2015-01-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281665#comment-14281665
 ] 

Ted Yu commented on YARN-3070:
--

Thanks Junping for taking care of this.

 TestRMAdminCLI#testHelp fails for transitionToActive command
 

 Key: YARN-3070
 URL: https://issues.apache.org/jira/browse/YARN-3070
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Junping Du
Priority: Minor
 Attachments: YARN-3070.patch


 {code}
   testError(new String[] { -help, -transitionToActive },
   Usage: yarn rmadmin [-transitionToActive serviceId +
[--forceactive]], dataErr, 0);
 {code}
 fails with:
 {code}
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.cli.TestRMAdminCLI.testError(TestRMAdminCLI.java:547)
   at 
 org.apache.hadoop.yarn.client.cli.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:335)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-01-09 Thread Ted Yu (JIRA)
Ted Yu created YARN-3025:


 Summary: Provide API for retrieving blacklisted nodes
 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu


We have the following method which updates blacklist:
{code}
  public synchronized void updateBlacklist(ListString blacklistAdditions,
  ListString blacklistRemovals) {
{code}
Upon AM failover, there should be an API which returns the blacklisted nodes so 
that the new AM can make consistent decisions.
The new API can be:
{code}
  public synchronized ListString getBlacklistedNodes()
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-01-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271727#comment-14271727
 ] 

Ted Yu commented on YARN-3025:
--

bq. RM probably does not persist this information
Looks like RM should persist blacklisted nodes to ride over RM restart.

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu

 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG

2015-01-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267100#comment-14267100
 ] 

Ted Yu commented on YARN-2213:
--

lgtm

 Change proxy-user cookie log in AmIpFilter to DEBUG
 ---

 Key: YARN-2213
 URL: https://issues.apache.org/jira/browse/YARN-2213
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-2213.001.patch


 I saw a lot of the following lines in AppMaster log:
 {code}
 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 {code}
 For long running app, this would consume considerable log space.
 Log level should be changed to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3003) Provide API for client to retrieve label to node mapping

2015-01-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266505#comment-14266505
 ] 

Ted Yu commented on YARN-3003:
--

+1 to the API Wangda described.

 Provide API for client to retrieve label to node mapping
 

 Key: YARN-3003
 URL: https://issues.apache.org/jira/browse/YARN-3003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Ted Yu
Assignee: Varun Saxena

 Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set 
 of labels associated with the node.
 Client (such as Slider) may be interested in label to node mapping - given 
 label, return the nodes with this label.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3003) Provide API for client to retrieve label to node mapping

2015-01-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263262#comment-14263262
 ] 

Ted Yu commented on YARN-3003:
--

Thanks for taking this, Varun.
What do you think of the following API:
{code}
  public abstract MapString, SetNodeId getNodeToLabels(ListString labels)
{code}
If labels parameter is null or empty, all mappings would be returned.
Otherwise only mappings for selected labels would be returned.

 Provide API for client to retrieve label to node mapping
 

 Key: YARN-3003
 URL: https://issues.apache.org/jira/browse/YARN-3003
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor

 Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set 
 of labels associated with the node.
 Client (such as Slider) may be interested in label to node mapping - given 
 label, return the nodes with this label.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2777) Mark the end of individual log in aggregated log

2015-01-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-2777:
-
Labels: log-aggregation  (was: )

 Mark the end of individual log in aggregated log
 

 Key: YARN-2777
 URL: https://issues.apache.org/jira/browse/YARN-2777
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
  Labels: log-aggregation

 Below is snippet of aggregated log showing hbase master log:
 {code}
 LogType: hbase-hbase-master-ip-172-31-34-167.log
 LogUploadTime: 29-Oct-2014 22:31:55
 LogLength: 24103045
 Log Contents:
 Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
 ...
   at 
 org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
   at java.lang.Thread.run(Thread.java:745)
 LogType: hbase-hbase-master-ip-172-31-34-167.out
 {code}
 Since logs from various daemons are aggregated in one log file, it would be 
 desirable to mark the end of one log before starting with the next.
 e.g. with such a line:
 {code}
 End of LogType: hbase-hbase-master-ip-172-31-34-167.log
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2764) counters.LimitExceededException shouldn't abort AsyncDispatcher

2015-01-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262748#comment-14262748
 ] 

Ted Yu commented on YARN-2764:
--

Comment on this issue is appreciated.

 counters.LimitExceededException shouldn't abort AsyncDispatcher
 ---

 Key: YARN-2764
 URL: https://issues.apache.org/jira/browse/YARN-2764
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Ted Yu

 I saw the following in container log:
 {code}
 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
 attemptattempt_1414221548789_0023_r_03_0
 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
 task_1414221548789_0023_r_03 Task Transitioned from RUNNING to SUCCEEDED
 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 24
 2014-10-25 10:28:55,053 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
 job_1414221548789_0023Job Transitioned from RUNNING to COMMITTING
 2014-10-25 10:28:55,054 INFO [CommitterEvent Processor #1] 
 org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
 the event EventType: JOB_COMMIT
 2014-10-25 10:28:55,177 FATAL [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
 org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many 
 counters: 121 max=120
   at 
 org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101)
   at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1754)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1737)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1718)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1089)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2049)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2045)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 2014-10-25 10:28:55,185 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
 {code}
 Counter limit was exceeded when JobFinishedEvent was created.
 Better handling of LimitExceededException should be provided so that 
 AsyncDispatcher can continue functioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2988) Graph#save() may leak resource

2014-12-24 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-2988:
-
Attachment: YARN-2988-002.patch

How about this patch ?

 Graph#save() may leak resource
 --

 Key: YARN-2988
 URL: https://issues.apache.org/jira/browse/YARN-2988
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: YARN-2988-001.patch, YARN-2988-002.patch


 In 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
  :
 {code}
   public void save(String filepath) throws IOException {
 OutputStreamWriter fout = new OutputStreamWriter(
 new FileOutputStream(filepath), Charset.forName(UTF-8));
 fout.write(generateGraphViz());
 fout.close();
 {code}
 The close of fout should be enclosed in finally clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2988) Graph#save() may leak file descriptors

2014-12-24 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned YARN-2988:


Assignee: Ted Yu  (was: Tsuyoshi OZAWA)

 Graph#save() may leak file descriptors
 --

 Key: YARN-2988
 URL: https://issues.apache.org/jira/browse/YARN-2988
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2988-001.patch, YARN-2988-002.patch


 In 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
  :
 {code}
   public void save(String filepath) throws IOException {
 OutputStreamWriter fout = new OutputStreamWriter(
 new FileOutputStream(filepath), Charset.forName(UTF-8));
 fout.write(generateGraphViz());
 fout.close();
 {code}
 The close of fout should be enclosed in finally clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2988) Graph#save() may leak resource

2014-12-23 Thread Ted Yu (JIRA)
Ted Yu created YARN-2988:


 Summary: Graph#save() may leak resource
 Key: YARN-2988
 URL: https://issues.apache.org/jira/browse/YARN-2988
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor


In 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
 :
{code}
  public void save(String filepath) throws IOException {
OutputStreamWriter fout = new OutputStreamWriter(
new FileOutputStream(filepath), Charset.forName(UTF-8));
fout.write(generateGraphViz());
fout.close();
{code}
The close of fout should be enclosed in finally clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2988) Graph#save() may leak resource

2014-12-23 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-2988:
-
Attachment: YARN-2988-001.patch

 Graph#save() may leak resource
 --

 Key: YARN-2988
 URL: https://issues.apache.org/jira/browse/YARN-2988
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: YARN-2988-001.patch


 In 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
  :
 {code}
   public void save(String filepath) throws IOException {
 OutputStreamWriter fout = new OutputStreamWriter(
 new FileOutputStream(filepath), Charset.forName(UTF-8));
 fout.write(generateGraphViz());
 fout.close();
 {code}
 The close of fout should be enclosed in finally clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2930) TestRMRestart#testRMRestartRecoveringNodeLabelManager sometimes fails against Java 8

2014-12-07 Thread Ted Yu (JIRA)
Ted Yu created YARN-2930:


 Summary: TestRMRestart#testRMRestartRecoveringNodeLabelManager 
sometimes fails against Java 8
 Key: YARN-2930
 URL: https://issues.apache.org/jira/browse/YARN-2930
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


From https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/31/console :
{code}
testRMRestartRecoveringNodeLabelManager[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
  Time elapsed: 0.136 sec   FAILURE!
java.lang.AssertionError: expected:1 but was:2
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartRecoveringNodeLabelManager(TestRMRestart.java:2100)

testRMRestartRecoveringNodeLabelManager[1](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
  Time elapsed: 0.081 sec   FAILURE!
java.lang.AssertionError: expected:1 but was:2
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartRecoveringNodeLabelManager(TestRMRestart.java:2100)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2914) Potential race condition in SharedCacheUploaderMetrics/CleanerMetrics/ClientSCMMetrics#getInstance()

2014-12-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234734#comment-14234734
 ] 

Ted Yu commented on YARN-2914:
--

lgtm

I triggered a QA run manually.

 Potential race condition in 
 SharedCacheUploaderMetrics/CleanerMetrics/ClientSCMMetrics#getInstance()
 

 Key: YARN-2914
 URL: https://issues.apache.org/jira/browse/YARN-2914
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2914.002.patch, YARN-2914.patch


 {code}
   public static ClientSCMMetrics getInstance() {
 ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
 if (topMetrics == null) {
   throw new IllegalStateException(
 {code}
 getInstance() doesn't hold lock on Singleton.this
 This may result in IllegalStateException being thrown prematurely.
 [~ctrezzo] reported that SharedCacheUploaderMetrics has also same kind of 
 race condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node

2014-12-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232512#comment-14232512
 ] 

Ted Yu commented on YARN-2604:
--

Should Fix Version be 2.7.0 ?

 Scheduler should consider max-allocation-* in conjunction with the largest 
 node
 ---

 Key: YARN-2604
 URL: https://issues.apache.org/jira/browse/YARN-2604
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, 
 YARN-2604.patch, YARN-2604.patch, YARN-2604.patch


 If the scheduler max-allocation-* values are larger than the resources 
 available on the largest node in the cluster, an application requesting 
 resources between the two values will be accepted by the scheduler but the 
 requests will never be satisfied. The app essentially hangs forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2914) Potential race condition in ClientSCMMetrics#getInstance()

2014-12-01 Thread Ted Yu (JIRA)
Ted Yu created YARN-2914:


 Summary: Potential race condition in ClientSCMMetrics#getInstance()
 Key: YARN-2914
 URL: https://issues.apache.org/jira/browse/YARN-2914
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
  public static ClientSCMMetrics getInstance() {
ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
if (topMetrics == null) {
  throw new IllegalStateException(
{code}
getInstance() doesn't hold lock on Singleton.this
This may result in IllegalStateException being thrown prematurely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2871) TestRMRestart#testRMRestartGetApplicationList sometime fails in trunk

2014-11-17 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-2871:
-
Description: 
From trunk build #746 (https://builds.apache.org/job/Hadoop-Yarn-trunk/746):
{code}
Failed tests:
  TestRMRestart.testRMRestartGetApplicationList:957
rMAppManager.logApplicationSummary(
isA(org.apache.hadoop.yarn.api.records.ApplicationId)
);
Wanted 3 times:
- at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:957)
But was 2 times:
- at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:66)
{code}

  was:
From trunk build #746:
{code}
Failed tests:
  TestRMRestart.testRMRestartGetApplicationList:957
rMAppManager.logApplicationSummary(
isA(org.apache.hadoop.yarn.api.records.ApplicationId)
);
Wanted 3 times:
- at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:957)
But was 2 times:
- at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:66)
{code}


 TestRMRestart#testRMRestartGetApplicationList sometime fails in trunk
 -

 Key: YARN-2871
 URL: https://issues.apache.org/jira/browse/YARN-2871
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From trunk build #746 (https://builds.apache.org/job/Hadoop-Yarn-trunk/746):
 {code}
 Failed tests:
   TestRMRestart.testRMRestartGetApplicationList:957
 rMAppManager.logApplicationSummary(
 isA(org.apache.hadoop.yarn.api.records.ApplicationId)
 );
 Wanted 3 times:
 - at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:957)
 But was 2 times:
 - at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:66)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2871) TestRMRestart#testRMRestartGetApplicationList sometime fails in trunk

2014-11-17 Thread Ted Yu (JIRA)
Ted Yu created YARN-2871:


 Summary: TestRMRestart#testRMRestartGetApplicationList sometime 
fails in trunk
 Key: YARN-2871
 URL: https://issues.apache.org/jira/browse/YARN-2871
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


From trunk build #746:
{code}
Failed tests:
  TestRMRestart.testRMRestartGetApplicationList:957
rMAppManager.logApplicationSummary(
isA(org.apache.hadoop.yarn.api.records.ApplicationId)
);
Wanted 3 times:
- at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:957)
But was 2 times:
- at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:66)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2864) TestRMWebServicesAppsModification fails in trunk

2014-11-14 Thread Ted Yu (JIRA)
Ted Yu created YARN-2864:


 Summary: TestRMWebServicesAppsModification fails in trunk
 Key: YARN-2864
 URL: https://issues.apache.org/jira/browse/YARN-2864
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


From https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/5/console :
{code}
Tests run: 32, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 151.14 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
testGetNewApplicationAndSubmit[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
  Time elapsed: 0.276 sec   ERROR!
java.lang.NoClassDefFoundError: org/apache/hadoop/io/FastByteComparisons
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at 
org.apache.hadoop.io.WritableComparator.compareBytes(WritableComparator.java:187)
at 
org.apache.hadoop.io.BinaryComparable.compareTo(BinaryComparable.java:50)
at 
org.apache.hadoop.io.BinaryComparable.equals(BinaryComparable.java:72)
at org.apache.hadoop.io.Text.equals(Text.java:348)
at java.util.ArrayList.indexOf(ArrayList.java:216)
at java.util.ArrayList.contains(ArrayList.java:199)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testAppSubmit(TestRMWebServicesAppsModification.java:844)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testGetNewApplicationAndSubmit(TestRMWebServicesAppsModification.java:726)

testGetNewApplicationAndSubmit[3](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
  Time elapsed: 0.225 sec   ERROR!
java.lang.NoClassDefFoundError: org/apache/hadoop/io/FastByteComparisons
at 
org.apache.hadoop.io.WritableComparator.compareBytes(WritableComparator.java:187)
at 
org.apache.hadoop.io.BinaryComparable.compareTo(BinaryComparable.java:50)
at 
org.apache.hadoop.io.BinaryComparable.equals(BinaryComparable.java:72)
at org.apache.hadoop.io.Text.equals(Text.java:348)
at java.util.ArrayList.indexOf(ArrayList.java:216)
at java.util.ArrayList.contains(ArrayList.java:199)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testAppSubmit(TestRMWebServicesAppsModification.java:844)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testGetNewApplicationAndSubmit(TestRMWebServicesAppsModification.java:726)
{code}
Running on MacBook, I got (with Java 1.7.0_60):
{code}
Running 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
Tests run: 32, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 146.749 sec 
 FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
testGetNewApplicationAndSubmit[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
  Time elapsed: 0.185 sec   FAILURE!
java.lang.AssertionError: expected:Accepted but was:Internal Server Error
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testAppSubmit(TestRMWebServicesAppsModification.java:799)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testGetNewApplicationAndSubmit(TestRMWebServicesAppsModification.java:726)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2858) TestRMHA#testFailoverAndTransitions fails in trunk against Java 8

2014-11-13 Thread Ted Yu (JIRA)
Ted Yu created YARN-2858:


 Summary: TestRMHA#testFailoverAndTransitions fails in trunk 
against Java 8
 Key: YARN-2858
 URL: https://issues.apache.org/jira/browse/YARN-2858
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


From https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/4/console :
{code}
Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.034 sec  
FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMHA
testFailoverAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
  Time elapsed: 30.021 sec   ERROR!
java.lang.Exception: test timed out after 3 milliseconds
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:698)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:641)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1218)
at 
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:240)
at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
at com.sun.jersey.api.client.Client.handle(Client.java:648)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at 
com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.checkActiveRMWebServices(TestRMHA.java:157)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.checkActiveRMFunctionality(TestRMHA.java:142)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testFailoverAndTransitions(TestRMHA.java:211)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2842) TestApplicationClientProtocolOnHA fails against Java 8

2014-11-10 Thread Ted Yu (JIRA)
Ted Yu created YARN-2842:


 Summary: TestApplicationClientProtocolOnHA fails against Java 8
 Key: YARN-2842
 URL: https://issues.apache.org/jira/browse/YARN-2842
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


From https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/1/consoleFull :
{code}
testGetNewApplicationOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA)
  Time elapsed: 8.959 sec   ERROR!
java.net.ConnectException: Call From asf908.gq1.ygridcore.net/67.195.81.152 to 
asf908.gq1.ygridcore.net:28032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy17.getNewApplication(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:217)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at com.sun.proxy.$Proxy18.getNewApplication(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:206)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:214)
at 
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetNewApplicationOnHA(TestApplicationClientProtocolOnHA.java:76)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2842) TestApplicationClientProtocolOnHA fails against Java 8

2014-11-10 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved YARN-2842.
--
Resolution: Duplicate

Should have searched :-)

 TestApplicationClientProtocolOnHA fails against Java 8
 --

 Key: YARN-2842
 URL: https://issues.apache.org/jira/browse/YARN-2842
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/1/consoleFull :
 {code}
 testGetNewApplicationOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA)
   Time elapsed: 8.959 sec   ERROR!
 java.net.ConnectException: Call From asf908.gq1.ygridcore.net/67.195.81.152 
 to asf908.gq1.ygridcore.net:28032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
   at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
   at org.apache.hadoop.ipc.Client.call(Client.java:1438)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
   at com.sun.proxy.$Proxy17.getNewApplication(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:217)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at com.sun.proxy.$Proxy18.getNewApplication(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:206)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:214)
   at 
 org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetNewApplicationOnHA(TestApplicationClientProtocolOnHA.java:76)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >