Re: [VOTE] Release Apache Hadoop 3.1.4 (RC2)
Yes, sure. I'll do another RC for next week. Thank you all for working on this! On Thu, Jul 9, 2020 at 8:20 AM Masatake Iwasaki wrote: > > Hi Gabor Bota, > > I committed the fix of YARN-10347 to branch-3.1. > I think this should be blocker for 3.1.4. > Could you cherry-pick it to branch-3.1.4 and cut a new RC? > > Thanks, > Masatake Iwasaki > > On 2020/07/08 23:31, Masatake Iwasaki wrote: > > Thanks Steve and Prabhu for the information. > > > > The cause turned out to be locking in CapacityScheduler#reinitialize. > > I think the method is called after transitioning to active stat if > > RM-HA is enabled. > > > > I filed YARN-10347 and created PR. > > > > > > Masatake Iwasaki > > > > > > On 2020/07/08 16:33, Prabhu Joseph wrote: > >> Hi Masatake, > >> > >> The thread is waiting for a ReadLock, we need to check what the > >> other > >> thread holding WriteLock is blocked on. > >> Can you get three consecutive complete jstack of ResourceManager > >> during the > >> issue. > >> > I got no issue if RM-HA is disabled. > >> Looks RM is not able to access Zookeeper State Store. Can you check if > >> there is any connectivity issue between RM and Zookeeper. > >> > >> Thanks, > >> Prabhu Joseph > >> > >> > >> On Mon, Jul 6, 2020 at 2:44 AM Masatake Iwasaki > >> > >> wrote: > >> > >>> Thanks for putting this up, Gabor Bota. > >>> > >>> I'm testing the RC2 on 3 node docker cluster with NN-HA and RM-HA > >>> enabled. > >>> ResourceManager reproducibly blocks on submitApplication while > >>> launching > >>> example MR jobs. > >>> Does anyone run into the same issue? > >>> > >>> The same configuration worked for 3.1.3. > >>> I got no issue if RM-HA is disabled. > >>> > >>> > >>> "IPC Server handler 1 on default port 8032" #167 daemon prio=5 > >>> os_prio=0 > >>> tid=0x7fe91821ec50 nid=0x3b9 waiting on condition > >>> [0x7fe901bac000] > >>> java.lang.Thread.State: WAITING (parking) > >>> at sun.misc.Unsafe.park(Native Method) > >>> - parking to wait for <0x85d37a40> (a > >>> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > >>> at > >>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > >>> at > >>> > >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > >>> > >>> at > >>> > >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > >>> > >>> at > >>> > >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > >>> > >>> at > >>> > >>> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > >>> > >>> at > >>> > >>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521) > >>> > >>> at > >>> > >>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417) > >>> > >>> at > >>> > >>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342) > >>> > >>> at > >>> > >>> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678) > >>> > >>> at > >>> > >>> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > >>> > >>> at > >>> > >>> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > >>> > >>> at > >>> > >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) > >>> > >>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) > >>> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015) > >>> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) > >>> at java.security.AccessController.doPrivileged(Native Method) > >>> at javax.security.auth.Subject.doAs(Subject.java:422) > >>> at > >>> > >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > >>> > >>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943) > >>> > >>> > >>> Masatake Iwasaki > >>> > >>> On 2020/06/26 22:51, Gabor Bota wrote: > Hi folks, > > I have put together a release candidate (RC2) for Hadoop 3.1.4. > > The RC is available at: > >>> http://people.apache.org/~gabota/hadoop-3.1.4-RC2/ > The RC tag in git is here: > https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2 > The maven artifacts are staged at > https://repository.apache.org/content/repositories/orgapachehadoop-1269/ > > > You can find my
Re: [VOTE] Release Apache Hadoop 3.1.4 (RC2)
Hi Gabor Bota, I committed the fix of YARN-10347 to branch-3.1. I think this should be blocker for 3.1.4. Could you cherry-pick it to branch-3.1.4 and cut a new RC? Thanks, Masatake Iwasaki On 2020/07/08 23:31, Masatake Iwasaki wrote: Thanks Steve and Prabhu for the information. The cause turned out to be locking in CapacityScheduler#reinitialize. I think the method is called after transitioning to active stat if RM-HA is enabled. I filed YARN-10347 and created PR. Masatake Iwasaki On 2020/07/08 16:33, Prabhu Joseph wrote: Hi Masatake, The thread is waiting for a ReadLock, we need to check what the other thread holding WriteLock is blocked on. Can you get three consecutive complete jstack of ResourceManager during the issue. I got no issue if RM-HA is disabled. Looks RM is not able to access Zookeeper State Store. Can you check if there is any connectivity issue between RM and Zookeeper. Thanks, Prabhu Joseph On Mon, Jul 6, 2020 at 2:44 AM Masatake Iwasaki wrote: Thanks for putting this up, Gabor Bota. I'm testing the RC2 on 3 node docker cluster with NN-HA and RM-HA enabled. ResourceManager reproducibly blocks on submitApplication while launching example MR jobs. Does anyone run into the same issue? The same configuration worked for 3.1.3. I got no issue if RM-HA is disabled. "IPC Server handler 1 on default port 8032" #167 daemon prio=5 os_prio=0 tid=0x7fe91821ec50 nid=0x3b9 waiting on condition [0x7fe901bac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x85d37a40> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943) Masatake Iwasaki On 2020/06/26 22:51, Gabor Bota wrote: Hi folks, I have put together a release candidate (RC2) for Hadoop 3.1.4. The RC is available at: http://people.apache.org/~gabota/hadoop-3.1.4-RC2/ The RC tag in git is here: https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2 The maven artifacts are staged at https://repository.apache.org/content/repositories/orgapachehadoop-1269/ You can find my public key at: https://dist.apache.org/repos/dist/release/hadoop/common/KEYS and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C Please try the release and vote. The vote will run for 5 weekdays, until July 6. 2020. 23:00 CET. The release includes the revert of HDFS-14941, as it caused HDFS-15421. IBR leak causes standby NN to be stuck in safe mode. (https://issues.apache.org/jira/browse/HDFS-15421) The release includes HDFS-15323, as requested. (https://issues.apache.org/jira/browse/HDFS-15323) Thanks, Gabor - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org -
Re: [VOTE] Release Apache Hadoop 3.1.4 (RC2)
Thanks Steve and Prabhu for the information. The cause turned out to be locking in CapacityScheduler#reinitialize. I think the method is called after transitioning to active stat if RM-HA is enabled. I filed YARN-10347 and created PR. Masatake Iwasaki On 2020/07/08 16:33, Prabhu Joseph wrote: Hi Masatake, The thread is waiting for a ReadLock, we need to check what the other thread holding WriteLock is blocked on. Can you get three consecutive complete jstack of ResourceManager during the issue. I got no issue if RM-HA is disabled. Looks RM is not able to access Zookeeper State Store. Can you check if there is any connectivity issue between RM and Zookeeper. Thanks, Prabhu Joseph On Mon, Jul 6, 2020 at 2:44 AM Masatake Iwasaki wrote: Thanks for putting this up, Gabor Bota. I'm testing the RC2 on 3 node docker cluster with NN-HA and RM-HA enabled. ResourceManager reproducibly blocks on submitApplication while launching example MR jobs. Does anyone run into the same issue? The same configuration worked for 3.1.3. I got no issue if RM-HA is disabled. "IPC Server handler 1 on default port 8032" #167 daemon prio=5 os_prio=0 tid=0x7fe91821ec50 nid=0x3b9 waiting on condition [0x7fe901bac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x85d37a40> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943) Masatake Iwasaki On 2020/06/26 22:51, Gabor Bota wrote: Hi folks, I have put together a release candidate (RC2) for Hadoop 3.1.4. The RC is available at: http://people.apache.org/~gabota/hadoop-3.1.4-RC2/ The RC tag in git is here: https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2 The maven artifacts are staged at https://repository.apache.org/content/repositories/orgapachehadoop-1269/ You can find my public key at: https://dist.apache.org/repos/dist/release/hadoop/common/KEYS and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C Please try the release and vote. The vote will run for 5 weekdays, until July 6. 2020. 23:00 CET. The release includes the revert of HDFS-14941, as it caused HDFS-15421. IBR leak causes standby NN to be stuck in safe mode. (https://issues.apache.org/jira/browse/HDFS-15421) The release includes HDFS-15323, as requested. (https://issues.apache.org/jira/browse/HDFS-15323) Thanks, Gabor - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional
Re: [VOTE] Release Apache Hadoop 3.1.4 (RC2)
Hi Masatake, The thread is waiting for a ReadLock, we need to check what the other thread holding WriteLock is blocked on. Can you get three consecutive complete jstack of ResourceManager during the issue. >> I got no issue if RM-HA is disabled. Looks RM is not able to access Zookeeper State Store. Can you check if there is any connectivity issue between RM and Zookeeper. Thanks, Prabhu Joseph On Mon, Jul 6, 2020 at 2:44 AM Masatake Iwasaki wrote: > Thanks for putting this up, Gabor Bota. > > I'm testing the RC2 on 3 node docker cluster with NN-HA and RM-HA enabled. > ResourceManager reproducibly blocks on submitApplication while launching > example MR jobs. > Does anyone run into the same issue? > > The same configuration worked for 3.1.3. > I got no issue if RM-HA is disabled. > > > "IPC Server handler 1 on default port 8032" #167 daemon prio=5 os_prio=0 > tid=0x7fe91821ec50 nid=0x3b9 waiting on condition [0x7fe901bac000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x85d37a40> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > at > > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > at > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521) > at > > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417) > at > > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342) > at > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678) > at > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943) > > > Masatake Iwasaki > > On 2020/06/26 22:51, Gabor Bota wrote: > > Hi folks, > > > > I have put together a release candidate (RC2) for Hadoop 3.1.4. > > > > The RC is available at: > http://people.apache.org/~gabota/hadoop-3.1.4-RC2/ > > The RC tag in git is here: > > https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2 > > The maven artifacts are staged at > > https://repository.apache.org/content/repositories/orgapachehadoop-1269/ > > > > You can find my public key at: > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS > > and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C > > > > Please try the release and vote. The vote will run for 5 weekdays, > > until July 6. 2020. 23:00 CET. > > > > The release includes the revert of HDFS-14941, as it caused > > HDFS-15421. IBR leak causes standby NN to be stuck in safe mode. > > (https://issues.apache.org/jira/browse/HDFS-15421) > > The release includes HDFS-15323, as requested. > > (https://issues.apache.org/jira/browse/HDFS-15323) > > > > Thanks, > > Gabor > > > > - > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > - > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org > >
Re: [VOTE] Release Apache Hadoop 3.1.4 (RC2)
hmm YARN-9341 went through all of the yarn lock code -it's in 3.3 but not in 3.1. And we do not want to attempt to backport 175KB of lock acquire/release code, do we? anyone in yarn-dev got any thoughts here? On Sun, 5 Jul 2020 at 22:14, Masatake Iwasaki wrote: > Thanks for putting this up, Gabor Bota. > > I'm testing the RC2 on 3 node docker cluster with NN-HA and RM-HA enabled. > ResourceManager reproducibly blocks on submitApplication while launching > example MR jobs. > Does anyone run into the same issue? > > The same configuration worked for 3.1.3. > I got no issue if RM-HA is disabled. > > > "IPC Server handler 1 on default port 8032" #167 daemon prio=5 os_prio=0 > tid=0x7fe91821ec50 nid=0x3b9 waiting on condition [0x7fe901bac000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x85d37a40> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > at > > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > at > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521) > at > > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417) > at > > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342) > at > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678) > at > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943) > > > Masatake Iwasaki > > On 2020/06/26 22:51, Gabor Bota wrote: > > Hi folks, > > > > I have put together a release candidate (RC2) for Hadoop 3.1.4. > > > > The RC is available at: > http://people.apache.org/~gabota/hadoop-3.1.4-RC2/ > > The RC tag in git is here: > > https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2 > > The maven artifacts are staged at > > https://repository.apache.org/content/repositories/orgapachehadoop-1269/ > > > > You can find my public key at: > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS > > and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C > > > > Please try the release and vote. The vote will run for 5 weekdays, > > until July 6. 2020. 23:00 CET. > > > > The release includes the revert of HDFS-14941, as it caused > > HDFS-15421. IBR leak causes standby NN to be stuck in safe mode. > > (https://issues.apache.org/jira/browse/HDFS-15421) > > The release includes HDFS-15323, as requested. > > (https://issues.apache.org/jira/browse/HDFS-15323) > > > > Thanks, > > Gabor > > > > - > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > >
Re: [VOTE] Release Apache Hadoop 3.1.4 (RC2)
Thanks for putting this up, Gabor Bota. I'm testing the RC2 on 3 node docker cluster with NN-HA and RM-HA enabled. ResourceManager reproducibly blocks on submitApplication while launching example MR jobs. Does anyone run into the same issue? The same configuration worked for 3.1.3. I got no issue if RM-HA is disabled. "IPC Server handler 1 on default port 8032" #167 daemon prio=5 os_prio=0 tid=0x7fe91821ec50 nid=0x3b9 waiting on condition [0x7fe901bac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x85d37a40> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943) Masatake Iwasaki On 2020/06/26 22:51, Gabor Bota wrote: Hi folks, I have put together a release candidate (RC2) for Hadoop 3.1.4. The RC is available at: http://people.apache.org/~gabota/hadoop-3.1.4-RC2/ The RC tag in git is here: https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2 The maven artifacts are staged at https://repository.apache.org/content/repositories/orgapachehadoop-1269/ You can find my public key at: https://dist.apache.org/repos/dist/release/hadoop/common/KEYS and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C Please try the release and vote. The vote will run for 5 weekdays, until July 6. 2020. 23:00 CET. The release includes the revert of HDFS-14941, as it caused HDFS-15421. IBR leak causes standby NN to be stuck in safe mode. (https://issues.apache.org/jira/browse/HDFS-15421) The release includes HDFS-15323, as requested. (https://issues.apache.org/jira/browse/HDFS-15323) Thanks, Gabor - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Re: [VOTE] Release Apache Hadoop 3.1.4 (RC2)
+1, with the instruction "warn everyone about the guava update possibly breaking things at run time" With the key issues being * code compiled with the new guava release will not link against the older releases, even without any changes in the source files. * this includes hadoop-common Applications which exclude the guava dependency published by hadoop- artifacts to use their own, must set guava.version=27.0-jre or guava.version=27.0 to be consistent with that of this release. My tests were all with using the artifacts downstream via maven; I trust others to look at the big tarball release. *Project 1: cloudstore* This is my extra diagnostics and cloud utils module. https://github.com/steveloughran/cloudstore All compiled fine, but the tests failed on guava linkage testNoOverwriteDest(org.apache.hadoop.tools.cloudup.TestLocalCloudup) Time elapsed: 0.012 sec <<< ERROR! java.lang.NoSuchMethodError: 'void com.google.common.base.Preconditions.checkArgument(boolean, java.lang.String, java.lang.Object, java.lang.Object)' at org.apache.hadoop.fs.tools.cloudup.Cloudup.run(Cloudup.java:177) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.tools.store.StoreTestUtils.exec(StoreTestUtils.java:4 Note: that app is designed to run against hadoop branch-2 and other branches, so I ended up reimplementing the checkArgument and checkState calls so that I can have a binary which links everywhere. My code, nothing serious. *Project 2: Spark* apache spark main branch built with maven (not tried the SBT build). mvn -T 1 -Phadoop-3.2 -Dhadoop.version=3.1.4 -Psnapshots-and-staging -Phadoop-cloud,yarn,kinesis-asl -DskipTests clean package All good. Then I ran the committer unit test suite mvn -T 1 -Phadoop-3.2 -Dhadoop.version=3.1.4 -Phadoop-cloud,yarn,kinesis-as -Psnapshots-and-staging --pl hadoop-cloud test CommitterBindingSuite: *** RUN ABORTED *** java.lang.NoSuchMethodError: 'void com.google.common.base.Preconditions.checkArgument(boolean, java.lang.String, java.lang.Object)' at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) at org.apache.spark.internal.io.cloud.CommitterBindingSuite.newJob(CommitterBindingSuite.scala:89) at org.apache.spark.internal.io.cloud.CommitterBindingSuite.$anonfun$new$1(CommitterBindingSuite.scala:55) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) ... Fix: again, tell the build this is a later version of Guava: mvn -T 1 -Phadoop-3.2 -Dhadoop.version=3.1.4 -Phadoop-cloud,yarn,kinesis-asl -Psnapshots-and-staging --pl hadoop-cloud -Dguava.version=27.0-jre test the mismatch doesn't break spark internally, they shade their stuff anyway, the guava.version here is actually the one which hadoop is to be linked with. outcome: tests work [INFO] --- scalatest-maven-plugin:2.0.0:test (test) @ spark-hadoop-cloud_2.12 --- Discovery starting. Discovery completed in 438 milliseconds. Run starting. Expected test count is: 4 CommitterBindingSuite: - BindingParquetOutputCommitter binds to the inner committer - committer protocol can be serialized and deserialized - local filesystem instantiation - reject dynamic partitioning Run completed in 1 second, 411 milliseconds. Total number of tests run: 4 Suites: completed 2, aborted 0 Tests: succeeded 4, failed 0, canceled 0, ignored 0, pending 0 This is a real PITA, and its invariably those checkArgument calls, because the later guava versions added some overloaded methods. Compile existing source with a later guava version and the .class no longer binds to the older guava version, even though no new guava APIs have been adopted. I am really tempted to go through src/**/*.java and replace all Guava checkArgument/checkState with our own implementation in hadoop.common, at least for any which uses the vararg variant. But: it'd be a big change and there may be related issues elsewhere. At least now things fail fast. *Project 3: spark cloud integration * https://github.com/hortonworks-spark/cloud-integration This is where the functional tests for the s3a committer through spark live -Dhadoop.version=3.1.2 -Dspark.version=3.1.0-SNAPSHOT -Psnapshots-and-staging and a full test run mvn test -Dcloud.test.configuration.file=../test-configs/s3a.xml --pl cloud-examples -Dhadoop.version=3.1.2 -Dspark.version=3.1.0-SNAPSHOT -Psnapshots-and-staging All good. A couple of test failures, but that was because one of my test datasets is not on any bucket I have...will have to fix that. To conclude: the artefacts are all there, existing code c
Re: [VOTE] Release Apache Hadoop 3.1.4 (RC2)
Mukund -thank you for running these tests. Both of them are things we've fixed, and in both cases, problems in the tests, not the production code On Wed, 1 Jul 2020 at 14:22, Mukund Madhav Thakur wrote: > Compile the distribution using mvn package -Pdist -DskipTests > -Dmaven.javadoc.skip=true -DskipShade and run some hadoop fs commands. All > good there. > > Then I ran the hadoop-aws tests and saw following failures: > > [*ERROR*] *Failures: * > > [*ERROR*] * > ITestS3AMiscOperations.testEmptyFileChecksums:147->Assert.assertEquals:118->Assert.failNotEquals:743->Assert.fail:88 > checksums expected: but > was:* > > [*ERROR*] * > ITestS3AMiscOperations.testNonEmptyFileChecksumsUnencrypted:199->Assert.assertEquals:118->Assert.failNotEquals:743->Assert.fail:88 > checksums expected: but > was:* > you've got a bucket encrypting things so checksums some back different. We've tweaked those tests so on 3.3 we look @ the bucket and skip the test if there's any default encryption policy https://issues.apache.org/jira/browse/HADOOP-16319 > These were the same failures which I saw in RC0 as well. I think these are > known failures. > > > Apart from that, all of my AssumedRole tests are failing AccessDenied > exception like > > [*ERROR*] > testPartialDeleteSingleDelete(org.apache.hadoop.fs.s3a.auth.ITestAssumeRole) > Time elapsed: 3.359 s <<< ERROR! > > org.apache.hadoop.fs.s3a.AWSServiceIOException: initTable on mthakur-data: > com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: User: > arn:aws:sts::152813717728:assumed-role/mthakur-assumed-role/valid is not > authorized to perform: dynamodb:DescribeTable on resource: > arn:aws:dynamodb:ap-south-1:152813717728:table/mthakur-data (Service: > AmazonDynamoDBv2; Status Code: 400; Error Code: AccessDeniedException; > Request ID: UJLKVGJ9I1S9TQF3AEPHVGENVJVV4KQNSO5AEMVJF66Q9ASUAAJG): User: > arn:aws:sts::152813717728:assumed-role/mthakur-assumed-role/valid is not > authorized to perform: dynamodb:DescribeTable on resource: > arn:aws:dynamodb:ap-south-1:152813717728:table/mthakur-data (Service: > AmazonDynamoDBv2; Status Code: 400; Error Code: AccessDeniedException; > Request ID: UJLKVGJ9I1S9TQF3AEPHVGENVJVV4KQNSO5AEMVJF66Q9ASUAAJG) > > at > org.apache.hadoop.fs.s3a.auth.ITestAssumeRole.executePartialDelete(ITestAssumeRole.java:759) > > at > org.apache.hadoop.fs.s3a.auth.ITestAssumeRole.testPartialDeleteSingleDelete(ITestAssumeRole.java:735) > > > I checked my policy and could verify that dynamodb:DescribeTable access is > present there. > > > So just to cross check, I ran the AssumedRole test with the same configs > for apache/trunk and it succeeded. Not sure if this is a false alarm but I > think it would be better if someone else run these AssumedRole tests as > well and verify. > That's https://issues.apache.org/jira/browse/HADOOP-15583 nothing to worry about >>
Re: [VOTE] Release Apache Hadoop 3.1.4 (RC2)
Compile the distribution using mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true -DskipShade and run some hadoop fs commands. All good there. Then I ran the hadoop-aws tests and saw following failures: [*ERROR*] *Failures: * [*ERROR*] * ITestS3AMiscOperations.testEmptyFileChecksums:147->Assert.assertEquals:118->Assert.failNotEquals:743->Assert.fail:88 checksums expected: but was:* [*ERROR*] * ITestS3AMiscOperations.testNonEmptyFileChecksumsUnencrypted:199->Assert.assertEquals:118->Assert.failNotEquals:743->Assert.fail:88 checksums expected: but was:* These were the same failures which I saw in RC0 as well. I think these are known failures. Apart from that, all of my AssumedRole tests are failing AccessDenied exception like [*ERROR*] testPartialDeleteSingleDelete(org.apache.hadoop.fs.s3a.auth.ITestAssumeRole) Time elapsed: 3.359 s <<< ERROR! org.apache.hadoop.fs.s3a.AWSServiceIOException: initTable on mthakur-data: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: User: arn:aws:sts::152813717728:assumed-role/mthakur-assumed-role/valid is not authorized to perform: dynamodb:DescribeTable on resource: arn:aws:dynamodb:ap-south-1:152813717728:table/mthakur-data (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: AccessDeniedException; Request ID: UJLKVGJ9I1S9TQF3AEPHVGENVJVV4KQNSO5AEMVJF66Q9ASUAAJG): User: arn:aws:sts::152813717728:assumed-role/mthakur-assumed-role/valid is not authorized to perform: dynamodb:DescribeTable on resource: arn:aws:dynamodb:ap-south-1:152813717728:table/mthakur-data (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: AccessDeniedException; Request ID: UJLKVGJ9I1S9TQF3AEPHVGENVJVV4KQNSO5AEMVJF66Q9ASUAAJG) at org.apache.hadoop.fs.s3a.auth.ITestAssumeRole.executePartialDelete(ITestAssumeRole.java:759) at org.apache.hadoop.fs.s3a.auth.ITestAssumeRole.testPartialDeleteSingleDelete(ITestAssumeRole.java:735) I checked my policy and could verify that dynamodb:DescribeTable access is present there. So just to cross check, I ran the AssumedRole test with the same configs for apache/trunk and it succeeded. Not sure if this is a false alarm but I think it would be better if someone else run these AssumedRole tests as well and verify. Thanks Mukund On Fri, Jun 26, 2020 at 7:21 PM Gabor Bota wrote: > Hi folks, > > I have put together a release candidate (RC2) for Hadoop 3.1.4. > > The RC is available at: http://people.apache.org/~gabota/hadoop-3.1.4-RC2/ > The RC tag in git is here: > https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2 > The maven artifacts are staged at > https://repository.apache.org/content/repositories/orgapachehadoop-1269/ > > You can find my public key at: > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS > and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C > > Please try the release and vote. The vote will run for 5 weekdays, > until July 6. 2020. 23:00 CET. > > The release includes the revert of HDFS-14941, as it caused > HDFS-15421. IBR leak causes standby NN to be stuck in safe mode. > (https://issues.apache.org/jira/browse/HDFS-15421) > The release includes HDFS-15323, as requested. > (https://issues.apache.org/jira/browse/HDFS-15323) > > Thanks, > Gabor >
[VOTE] Release Apache Hadoop 3.1.4 (RC2)
Hi folks, I have put together a release candidate (RC2) for Hadoop 3.1.4. The RC is available at: http://people.apache.org/~gabota/hadoop-3.1.4-RC2/ The RC tag in git is here: https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2 The maven artifacts are staged at https://repository.apache.org/content/repositories/orgapachehadoop-1269/ You can find my public key at: https://dist.apache.org/repos/dist/release/hadoop/common/KEYS and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C Please try the release and vote. The vote will run for 5 weekdays, until July 6. 2020. 23:00 CET. The release includes the revert of HDFS-14941, as it caused HDFS-15421. IBR leak causes standby NN to be stuck in safe mode. (https://issues.apache.org/jira/browse/HDFS-15421) The release includes HDFS-15323, as requested. (https://issues.apache.org/jira/browse/HDFS-15323) Thanks, Gabor - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org