[jira] [Resolved] (HBASE-28448) CompressionTest hangs when run over a Ozone ofs path
[ https://issues.apache.org/jira/browse/HBASE-28448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HBASE-28448. - Resolution: Fixed > CompressionTest hangs when run over a Ozone ofs path > > > Key: HBASE-28448 > URL: https://issues.apache.org/jira/browse/HBASE-28448 > Project: HBase > Issue Type: Bug >Reporter: Pratyush Bhatt >Assignee: Wei-Chiu Chuang >Priority: Major > Labels: ozone, pull-request-available > Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.6.1 > > Attachments: hbase_ozone_compression.jstack > > > If we run the Compression test over HDFS path, it works fine: > {code:java} > hbase org.apache.hadoop.hbase.util.CompressionTest > hdfs://ns1/tmp/dir1/dir2/test_file.txt snappy > 24/03/20 06:08:43 WARN impl.MetricsConfig: Cannot locate configuration: tried > hadoop-metrics2-hbase.properties,hadoop-metrics2.properties > 24/03/20 06:08:43 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot > period at 10 second(s). > 24/03/20 06:08:43 INFO impl.MetricsSystemImpl: HBase metrics system started > 24/03/20 06:08:43 INFO metrics.MetricRegistries: Loaded MetricRegistries > class org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl > 24/03/20 06:08:43 INFO compress.CodecPool: Got brand-new compressor [.snappy] > 24/03/20 06:08:43 INFO compress.CodecPool: Got brand-new compressor [.snappy] > 24/03/20 06:08:44 INFO compress.CodecPool: Got brand-new decompressor > [.snappy] > SUCCESS {code} > The command exits, but when the same is tried over a ofs path, the command > hangs. > {code:java} > hbase org.apache.hadoop.hbase.util.CompressionTest > ofs://ozone1710862004/test-222compression-vol/compression-buck2/test_file.txt > snappy > 24/03/20 06:05:19 INFO protocolPB.OmTransportFactory: Loading OM transport > implementation > org.apache.hadoop.ozone.om.protocolPB.Hadoop3OmTransportFactory as specified > by configuration. > 24/03/20 06:05:20 INFO client.ClientTrustManager: Loading certificates for > client. > 24/03/20 06:05:20 WARN impl.MetricsConfig: Cannot locate configuration: tried > hadoop-metrics2-hbase.properties,hadoop-metrics2.properties > 24/03/20 06:05:20 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot > period at 10 second(s). > 24/03/20 06:05:20 INFO impl.MetricsSystemImpl: HBase metrics system started > 24/03/20 06:05:20 INFO metrics.MetricRegistries: Loaded MetricRegistries > class org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl > 24/03/20 06:05:20 INFO rpc.RpcClient: Creating Volume: > test-222compression-vol, with om as owner and space quota set to -1 bytes, > counts quota set to -1 > 24/03/20 06:05:20 INFO rpc.RpcClient: Creating Bucket: > test-222compression-vol/compression-buck2, with bucket layout > FILE_SYSTEM_OPTIMIZED, om as owner, Versioning false, Storage Type set to > DISK and Encryption set to false, Replication Type set to server-side default > replication type, Namespace Quota set to -1, Space Quota set to -1 > 24/03/20 06:05:21 INFO compress.CodecPool: Got brand-new compressor [.snappy] > 24/03/20 06:05:21 INFO compress.CodecPool: Got brand-new compressor [.snappy] > 24/03/20 06:05:21 WARN impl.MetricsSystemImpl: HBase metrics system already > initialized! > 24/03/20 06:05:21 INFO metrics.MetricRegistries: Loaded MetricRegistries > class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl > 24/03/20 06:05:22 INFO compress.CodecPool: Got brand-new decompressor > [.snappy] > SUCCESS > . > . > .{code} > The command doesnt exit. > Attaching the jstack of the process below: > [^hbase_ozone_compression.jstack] > cc: [~weichiu] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28584) RS SIGSEGV under heavy replication load
Whitney Jackson created HBASE-28584: --- Summary: RS SIGSEGV under heavy replication load Key: HBASE-28584 URL: https://issues.apache.org/jira/browse/HBASE-28584 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 2.5.6 Environment: RHEL 7.9 JDK 11.0.23 Hadoop 3.2.4 Hbase 2.5.6 Reporter: Whitney Jackson I'm observing RS crashes under heavy replication load: {code:java} # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f7546873b69, pid=29890, tid=36828 # # JRE version: Java(TM) SE Runtime Environment 18.9 (11.0.23+7) (build 11.0.23+7-LTS-222) # Java VM: Java HotSpot(TM) 64-Bit Server VM 18.9 (11.0.23+7-LTS-222, mixed mode, tiered, compressed oops, g1 gc, linux-amd64) # Problematic frame: # J 24625 c2 org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V (75 bytes) @ 0x7f7546873b69 [0x7f7546873960+0x0209] {code} The heavier load comes when a replication peer has been disabled for several hours for patching etc. When the peer is re-enabled the replication load is high until the peer is all caught up. The crashes happen on the cluster receiving the replication edits: I believe this problem started after upgrading from 2.4.x to 2.5.x. One possibly relevant non-standard config I run with: {code:java} hbase.region.store.parallel.put.limit 100 Added after seeing "failed to accept edits" replication errors in the destination region servers indicating this limit was being exceeded while trying to process replication edits. {code} I understand from other Jiras that the problem is likely around direct memory usage by Netty. I haven't yet tried switching the Netty allocator to {{unpooled}} or {{{}heap{}}}. I also haven't yet tried any of the {{io.netty.allocator.*}} options. {{MaxDirectMemorySize}} is set to 26g. Here's the full stack for the relevant thread: {code:java} Stack: [0x7f72e2e5f000,0x7f72e2f6], sp=0x7f72e2f5e450, free space=1021k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) J 24625 c2 org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V (75 bytes) @ 0x7f7546873b69 [0x7f7546873960+0x0209] J 26253 c2 org.apache.hadoop.hbase.ByteBufferKeyValue.write(Ljava/io/OutputStream;Z)I (21 bytes) @ 0x7f7545af2d84 [0x7f7545af2d20+0x0064] J 22971 c2 org.apache.hadoop.hbase.codec.KeyValueCodecWithTags$KeyValueEncoder.write(Lorg/apache/hadoop/hbase/Cell;)V (27 bytes) @ 0x7f754663f700 [0x7f754663f4c0+0x0240] J 25251 c2 org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.write(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V (90 bytes) @ 0x7f7546a53038 [0x7f7546a50e60+0x21d8] J 21182 c2 org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V (73 bytes) @ 0x7f7545f4d90c [0x7f7545f4d3a0+0x056c] J 21181 c2 org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(Ljava/lang/Object;ZLorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V (149 bytes) @ 0x7f7545fd680c [0x7f7545fd65e0+0x022c] J 25389 c2 org.apache.hadoop.hbase.ipc.NettyRpcConnection$$Lambda$247.run()V (16 bytes) @ 0x7f7546ade660 [0x7f7546ade140+0x0520] J 24098 c2 org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z (109 bytes) @ 0x7f754678fbb8 [0x7f754678f8e0+0x02d8] J 27297% c2 org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run()V (603 bytes) @ 0x7f75466c4d48 [0x7f75466c4c80+0x00c8] j org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run()V+44 j org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run()V+11 j org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run()V+4 J 12278 c1 java.lang.Thread.run()V java.base@11.0.23 (17 bytes) @ 0x7f753e11f084 [0x7f753e11ef40+0x0144] v ~StubRoutines::call_stub V [libjvm.so+0x85574a] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x27a V [libjvm.so+0x853d2e] JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0x19e V [libjvm.so+0x8ffddf] thread_entry(JavaThread*, Thread*)+0x9f V [libjvm.so+0xdb68d1] JavaThread::thread_main_inner()+0x131 V [libjvm.so+0xdb2c4c] Thread::call_run()+0x13c V [libjvm.so+0xc1f2e6]
[jira] [Created] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema
Ke Han created HBASE-28583: -- Summary: Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema Key: HBASE-28583 URL: https://issues.apache.org/jira/browse/HBASE-28583 Project: HBase Issue Type: Bug Components: master Affects Versions: 2.5.8, 3.0.0 Reporter: Ke Han Attachments: commands.txt, hbase--master-cc13b0df0f3a.log, persistent.tar.gz When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 RS, 2 HDFS), I met the following exception and the upgrade failed. {code:java} 2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] master.HMaster: Failed to become active master org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: old_table_schema at org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155) ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362] 2024-05-09T20:16:20,639 ERROR [master/hmaster:16000:becomeActiveMaster] master.HMaster: * ABORTING master hmaster,16000,1715285771112: Unhandled exception. Starting shutdown. * org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: Message missing
[jira] [Created] (HBASE-28582) ModifyTableProcedure should not reset TRSP on region node when closing unused region replicas
Duo Zhang created HBASE-28582: - Summary: ModifyTableProcedure should not reset TRSP on region node when closing unused region replicas Key: HBASE-28582 URL: https://issues.apache.org/jira/browse/HBASE-28582 Project: HBase Issue Type: Bug Components: proc-v2 Reporter: Duo Zhang Assignee: Duo Zhang Found this when digging HBASE-28522. First, this is not safe as MTP does not like DTP where we hold the exclusive lock all the time. Second, even if we hold the exclusive lock all the time, as showed in HBASE-28522, we may still hang there forever because SCP will not interrupt the TRSP. -- This message was sent by Atlassian Jira (v8.20.10#820010)