在补充一点服务挂掉的日志如下: ## A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007fa6a8cdd075, pid=2785, tid=2880 # # JRE version: OpenJDK Runtime Environment (11.0.2+9) (build 11.0.2+9) # Java VM: OpenJDK 64-Bit Server VM (11.0.2+9, mixed mode, tiered, compressed oops, g1 gc, linux-amd64) # Problematic frame: # J 7392 c2 org.apache.rocketmq.store.CommitLog.checkMessageAndReturnSize(Ljava/nio/ByteBuffer;ZZ)Lorg/apache/rocketmq/store/DispatchRequest; (727 bytes) @ 0x00007fa6a8cdd075 [0x00007fa6a8cdcfe0+0x0000000000000095] # # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again## If you would like to submit a bug report, please visit:# http://bugreport.java.com/bugreport/crash.jsp #
--------------- S U M M A R Y ------------ Command Line: -Xms2g -Xmx2g -XX:+UseG1GC -XX:G1HeapRegionSize=16m -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+UseG1GC -XX:G1HeapRegionSize=16m -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 -XX:SoftRefLRUPolicyMSPerMB=0 -Xlog:gc*:file=/dev/shm/rmq_srv_gc_%p_%t.log:time,tags:filecount=5,filesize=30M -XX:-OmitStackTraceInFastThrow -XX:+AlwaysPreTouch -XX:MaxDirectMemorySize=15g -XX:-UseLargePages -XX:-UseBiasedLocking --add-exports=java.base/jdk.internal.ref=ALL-UNNAMED org.apache.rocketmq.broker.BrokerStartup -c ./conf/broker_m.conf Host: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz, 4 cores, 7G, CentOS Linux release 7.6.1810 (Core) Time: Fri Sep 9 12:17:22 2022 CST elapsed time: 243041 seconds (2d 19h 30m 41s) --------------- T H R E A D --------------- Current thread (0x00007fa6b8a115f0): JavaThread "ReputMessageService" [_thread_in_Java, id=2880, stack(0x00007fa5a91b5000,0x00007fa5a92b6000)] Stack: [0x00007fa5a91b5000,0x00007fa5a92b6000], sp=0x00007fa5a92b47c0, free space=1021k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) J 7392 c2 org.apache.rocketmq.store.CommitLog.checkMessageAndReturnSize(Ljava/nio/ByteBuffer;ZZ)Lorg/apache/rocketmq/store/DispatchRequest; (727 bytes) @ 0x00007fa6a8cdd075 [0x00007fa6a8cdcfe0+0x0000000000000095] J 4056 c2 org.apache.rocketmq.store.DefaultMessageStore$ReputMessageService.doReput()V (551 bytes) @ 0x00007fa6a89cbef0 [0x00007fa6a89cbba0+0x0000000000000350] J 3827% c2 org.apache.rocketmq.store.DefaultMessageStore$ReputMessageService.run()V (114 bytes) @ 0x00007fa6a89a7b9c [0x00007fa6a89a7a20+0x000000000000017c] j java.lang.Thread.run()V+11 java.base@11.0.2 v ~StubRoutines::call_stubV [libjvm.so+0x8847e9] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x3b9 V [libjvm.so+0x88279d] JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0x1ed V [libjvm.so+0x92c85c] thread_entry(JavaThread*, Thread*)+0x6c V [libjvm.so+0xdc2b0d] JavaThread::thread_main_inner()+0x21dV [libjvm.so+0xdc2eb7] JavaThread::run()+0x377V [libjvm.so+0xc076a0] thread_native_entry(Thread*)+0xf0 siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007fa3ec812ae7 Register to memory mapping: RAX=0x0000000000000034 is an unknown value RBX=0x00000000095be787 is an unknown value RCX=0x00007fa3e3254364 is an unknown value RDX=0x0000000082ecfd50 is an oop: java.nio.DirectByteBuffer {0x0000000082ecfd50} - klass: 'java/nio/DirectByteBuffer' RSP=0x00007fa5a92b47c0 is pointing into the stack for thread: 0x00007fa6b8a115f0 RBP=0x0000000000000001 is an unknown value RSI=0x0000000080000810 is an oop: org.apache.rocketmq.store.CommitLog {0x0000000080000810} - klass: 'org/apache/rocketmq/store/CommitLog' RDI=0x00000000240562c7 is an unknown value R8 =0x00000000095be783 is an unknown value R9 =0x000000001aa97b44 is an unknown value R10=0x00000000095be783 is an unknown value R11=0x0000000082ecfd50 is an oop: java.nio.DirectByteBuffer {0x0000000082ecfd50} - klass: 'java/nio/DirectByteBuffer' R12=0x0 is NULL R13=0x00000000d457c7a0 is an oop: java.util.HashMap {0x00000000d457c7a0} - klass: 'java/util/HashMap' R14=0x00007fa6bf9e1000 points into unknown readable memory: 00 00 00 00 00 00 00 00 R15=0x00007fa6b8a115f0 is a thread 有人能帮忙看看是啥原因导致服务挂掉的么? kai wang <yiduwang...@gmail.com> 于2022年9月9日周五 23:35写道: > 场景:公司在测试环境做mq性能压测,服务意外挂掉后重启不可用 > 现象:服务不可用,控制台无broker的任务信息 > 日志:storeerror.log:大量刷如下的日志 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911520 currentLogicOffset: 1581160 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911540 currentLogicOffset: 1581180 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911560 currentLogicOffset: 1581200 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911580 currentLogicOffset: 1581220 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911600 currentLogicOffset: 1581240 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911620 currentLogicOffset: 1581260 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911640 currentLogicOffset: 1581280 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911660 currentLogicOffset: 1581300 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911680 currentLogicOffset: 1581320 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911700 currentLogicOffset: 1581340 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911720 currentLogicOffset: 1581360 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911740 currentLogicOffset: 1581380 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911760 currentLogicOffset: 1581400 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911780 currentLogicOffset: 1581420 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911800 currentLogicOffset: 1581440 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1911820 currentLogicOffset: 1581460 Topic: BenchmarkTest > QID: 720 Diff: 330360 > 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong, > expectLogicOffset: 1934440 currentLogicOffset: 1608560 Topic: BenchmarkTest > QID: 228 Diff: 325880 > > store.log > 2022-09-09 21:43:11 INFO main - recover next physics file, > /alidata1/admin/rmq/rmq-m/commitlog/00000000021474836480 > 2022-09-09 21:48:51 INFO main - recover next physics file, > /alidata1/admin/rmq/rmq-m/commitlog/00000000022548578304 > 2022-09-09 22:07:02 INFO main - recover next physics file, > /alidata1/admin/rmq/rmq-m/commitlog/00000000023622320128 > 2022-09-09 22:48:00 INFO main - recover next physics file, > /alidata1/admin/rmq/rmq-m/commitlog/00000000024696061952 > 2022-09-09 22:48:18 INFO FlushIndexFileThread - flush index file elapsed > time(ms) 1431 > 2022-09-09 22:51:32 INFO main - recover next physics file, > /alidata1/admin/rmq/rmq-m/commitlog/00000000025769803776 > 2022-09-09 22:54:58 INFO main - recover next physics file, > /alidata1/admin/rmq/rmq-m/commitlog/00000000026843545600 > 2022-09-09 22:58:41 INFO main - recover next physics file, > /alidata1/admin/rmq/rmq-m/commitlog/00000000027917287424 > 2022-09-09 23:04:27 INFO main - recover next physics file, > /alidata1/admin/rmq/rmq-m/commitlog/00000000028991029248 > 2022-09-09 23:20:00 INFO main - recover next physics file, > /alidata1/admin/rmq/rmq-m/commitlog/00000000030064771072 > > 做了哪些操作: > 日志中提到的topic是BenchmarkTest > 为了让集群恢复 > 1.停止压测,断开所有的外部链接 > 2.删除该topic > sh mqadmin deleteTopic -c brokerIp -n nameserverIp -t BenchmarkTest > 3.重启集群多次 > > 集群依然不可用,storeerror.log依然在不断的滚动 > onsumequeue中BenchmarkTest目录下队列在不断重复创建和删除 > > 各位大佬,该怎么解决 > >