在补充一点服务挂掉的日志如下:

## A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fa6a8cdd075, pid=2785, tid=2880
#
# JRE version: OpenJDK Runtime Environment (11.0.2+9) (build 11.0.2+9)
# Java VM: OpenJDK 64-Bit Server VM (11.0.2+9, mixed mode, tiered,
compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# J 7392 c2
org.apache.rocketmq.store.CommitLog.checkMessageAndReturnSize(Ljava/nio/ByteBuffer;ZZ)Lorg/apache/rocketmq/store/DispatchRequest;
(727 bytes) @ 0x00007fa6a8cdd075 [0x00007fa6a8cdcfe0+0x0000000000000095]
#
# No core dump will be written. Core dumps have been disabled. To enable
core dumping, try "ulimit -c unlimited" before starting Java again## If you
would like to submit a bug report, please visit:#
http://bugreport.java.com/bugreport/crash.jsp
#

---------------  S U M M A R Y ------------

Command Line: -Xms2g -Xmx2g -XX:+UseG1GC -XX:G1HeapRegionSize=16m
-XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30
-XX:SoftRefLRUPolicyMSPerMB=0 -XX:+UseG1GC -XX:G1HeapRegionSize=16m
-XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30
-XX:SoftRefLRUPolicyMSPerMB=0
-Xlog:gc*:file=/dev/shm/rmq_srv_gc_%p_%t.log:time,tags:filecount=5,filesize=30M
-XX:-OmitStackTraceInFastThrow -XX:+AlwaysPreTouch
-XX:MaxDirectMemorySize=15g -XX:-UseLargePages -XX:-UseBiasedLocking
--add-exports=java.base/jdk.internal.ref=ALL-UNNAMED
org.apache.rocketmq.broker.BrokerStartup -c ./conf/broker_m.conf

Host: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz, 4 cores, 7G, CentOS
Linux release 7.6.1810 (Core)
Time: Fri Sep  9 12:17:22 2022 CST elapsed time: 243041 seconds (2d 19h 30m
41s)
---------------  T H R E A D  ---------------

Current thread (0x00007fa6b8a115f0):  JavaThread "ReputMessageService"
[_thread_in_Java, id=2880, stack(0x00007fa5a91b5000,0x00007fa5a92b6000)]

Stack: [0x00007fa5a91b5000,0x00007fa5a92b6000],  sp=0x00007fa5a92b47c0,
 free space=1021k
Native frames: (J=compiled Java code, A=aot compiled Java code,
j=interpreted, Vv=VM code, C=native code)
J 7392 c2
org.apache.rocketmq.store.CommitLog.checkMessageAndReturnSize(Ljava/nio/ByteBuffer;ZZ)Lorg/apache/rocketmq/store/DispatchRequest;
(727 bytes) @ 0x00007fa6a8cdd075 [0x00007fa6a8cdcfe0+0x0000000000000095]
J 4056 c2
org.apache.rocketmq.store.DefaultMessageStore$ReputMessageService.doReput()V
(551 bytes) @ 0x00007fa6a89cbef0 [0x00007fa6a89cbba0+0x0000000000000350]
J 3827% c2
org.apache.rocketmq.store.DefaultMessageStore$ReputMessageService.run()V
(114 bytes) @ 0x00007fa6a89a7b9c [0x00007fa6a89a7a20+0x000000000000017c]
j  java.lang.Thread.run()V+11 java.base@11.0.2
v  ~StubRoutines::call_stubV  [libjvm.so+0x8847e9]
 JavaCalls::call_helper(JavaValue*, methodHandle const&,
JavaCallArguments*, Thread*)+0x3b9
V  [libjvm.so+0x88279d]  JavaCalls::call_virtual(JavaValue*, Handle,
Klass*, Symbol*, Symbol*, Thread*)+0x1ed
V  [libjvm.so+0x92c85c]  thread_entry(JavaThread*, Thread*)+0x6c
V  [libjvm.so+0xdc2b0d]  JavaThread::thread_main_inner()+0x21dV
 [libjvm.so+0xdc2eb7]  JavaThread::run()+0x377V  [libjvm.so+0xc076a0]
 thread_native_entry(Thread*)+0xf0


siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr:
0x00007fa3ec812ae7

Register to memory mapping:

RAX=0x0000000000000034 is an unknown value
RBX=0x00000000095be787 is an unknown value
RCX=0x00007fa3e3254364 is an unknown value
RDX=0x0000000082ecfd50 is an oop: java.nio.DirectByteBuffer
{0x0000000082ecfd50} - klass: 'java/nio/DirectByteBuffer'
RSP=0x00007fa5a92b47c0 is pointing into the stack for thread:
0x00007fa6b8a115f0
RBP=0x0000000000000001 is an unknown value
RSI=0x0000000080000810 is an oop: org.apache.rocketmq.store.CommitLog
{0x0000000080000810} - klass: 'org/apache/rocketmq/store/CommitLog'
RDI=0x00000000240562c7 is an unknown value
R8 =0x00000000095be783 is an unknown value
R9 =0x000000001aa97b44 is an unknown value
R10=0x00000000095be783 is an unknown value
R11=0x0000000082ecfd50 is an oop: java.nio.DirectByteBuffer
{0x0000000082ecfd50} - klass: 'java/nio/DirectByteBuffer'
R12=0x0 is NULL
R13=0x00000000d457c7a0 is an oop: java.util.HashMap
{0x00000000d457c7a0} - klass: 'java/util/HashMap'
R14=0x00007fa6bf9e1000 points into unknown readable memory: 00 00 00 00 00
00 00 00
R15=0x00007fa6b8a115f0 is a thread

有人能帮忙看看是啥原因导致服务挂掉的么?

kai wang <yiduwang...@gmail.com> 于2022年9月9日周五 23:35写道:

> 场景:公司在测试环境做mq性能压测,服务意外挂掉后重启不可用
> 现象:服务不可用,控制台无broker的任务信息
> 日志:storeerror.log:大量刷如下的日志
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911520 currentLogicOffset: 1581160 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911540 currentLogicOffset: 1581180 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911560 currentLogicOffset: 1581200 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911580 currentLogicOffset: 1581220 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911600 currentLogicOffset: 1581240 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911620 currentLogicOffset: 1581260 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911640 currentLogicOffset: 1581280 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911660 currentLogicOffset: 1581300 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911680 currentLogicOffset: 1581320 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911700 currentLogicOffset: 1581340 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911720 currentLogicOffset: 1581360 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911740 currentLogicOffset: 1581380 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911760 currentLogicOffset: 1581400 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911780 currentLogicOffset: 1581420 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911800 currentLogicOffset: 1581440 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1911820 currentLogicOffset: 1581460 Topic: BenchmarkTest
> QID: 720 Diff: 330360
> 2022-09-09 23:28:15 WARN main - [BUG]logic queue order maybe wrong,
> expectLogicOffset: 1934440 currentLogicOffset: 1608560 Topic: BenchmarkTest
> QID: 228 Diff: 325880
>
> store.log
> 2022-09-09 21:43:11 INFO main - recover next physics file,
> /alidata1/admin/rmq/rmq-m/commitlog/00000000021474836480
> 2022-09-09 21:48:51 INFO main - recover next physics file,
> /alidata1/admin/rmq/rmq-m/commitlog/00000000022548578304
> 2022-09-09 22:07:02 INFO main - recover next physics file,
> /alidata1/admin/rmq/rmq-m/commitlog/00000000023622320128
> 2022-09-09 22:48:00 INFO main - recover next physics file,
> /alidata1/admin/rmq/rmq-m/commitlog/00000000024696061952
> 2022-09-09 22:48:18 INFO FlushIndexFileThread - flush index file elapsed
> time(ms) 1431
> 2022-09-09 22:51:32 INFO main - recover next physics file,
> /alidata1/admin/rmq/rmq-m/commitlog/00000000025769803776
> 2022-09-09 22:54:58 INFO main - recover next physics file,
> /alidata1/admin/rmq/rmq-m/commitlog/00000000026843545600
> 2022-09-09 22:58:41 INFO main - recover next physics file,
> /alidata1/admin/rmq/rmq-m/commitlog/00000000027917287424
> 2022-09-09 23:04:27 INFO main - recover next physics file,
> /alidata1/admin/rmq/rmq-m/commitlog/00000000028991029248
> 2022-09-09 23:20:00 INFO main - recover next physics file,
> /alidata1/admin/rmq/rmq-m/commitlog/00000000030064771072
>
> 做了哪些操作:
> 日志中提到的topic是BenchmarkTest
> 为了让集群恢复
> 1.停止压测,断开所有的外部链接
> 2.删除该topic
> sh mqadmin deleteTopic -c brokerIp -n nameserverIp -t BenchmarkTest
> 3.重启集群多次
>
> 集群依然不可用,storeerror.log依然在不断的滚动
> onsumequeue中BenchmarkTest目录下队列在不断重复创建和删除
>
> 各位大佬,该怎么解决
>
>

Reply via email to