Re: RocketMQ 3.5.8流控问题

zacXu Fri, 09 Feb 2018 21:10:11 -0800

Translation in English

HI,


I met a flow control problem. Here is config:
Version: rocketmq 3.5.8
Broker: 3m-3s-async
flushDiskType:ASYNC_FLUSH
Jvm:-Xms 4g -Xmx 4g -xmn 2g -XX:PermSize=128m -XX:MaxPermSize=320m 
-XX:ParallelGCThreads=2 -XX:ConcGCThreads=1
Env: Docker, no cpu limit, 16G memory,(The host has 40 cpus and deploys about 
20-30 docker instances（not mq), all instances is no cpu limit)
sendMessageThreadPoolNums=28
pullMessageThreadPoolNums=22

When the tps is only 100-200, some master start flow control.Most of logs in 
producer is like this:
[TIMEOUT_CLEAN_QUEUE]broker busy, start flow control for a while, period in 
queue: 201ms, size of queue: 35
Some logs is:
[REJECTREQUEST]system busy, start flow control for a while
[PCBUSY_CLEAN_QUEUE]broker busy, start flow control for a while, period in 
queue: 3ms, size of queue: 66

There are warning logs in master sometimes, like:
[NOTIFYME]putMessage in lock cost time(ms)=1557, bodyLength=108 
AppendMessageResult=AppendMessageResult{status=PUT_OK,...pagecacheRT=1557}
putMessage not in lock eclipse time(ms)=1557, bodyLength=110
Sometimes pagecacheRT=0.

Sometimes only has "not in lock time” log:
putMessage not in lock eclipse time(ms)=1440, bodyLength=108

The flow control happens in one broker mostly，and  the others is fine.
The gc time is about 100ms，it mostly doesn’t happen is gc period

In another cluster, 1m-1s-async with mq config, its tsp can reach 10000 until 
happening flow control

Does anyone have idea about this problem，where is bottleneck 
probably，memory，disk, or cpu?

On 2018/02/09 05:56:29, "752832634" <[email protected]> wrote: 
> Hi> 
> 
> 
> 
> 我们在生产环境docker上搭了一套3.5.8版本的mq，jdk用的1.7，共3组broker，1m1s，异步刷盘，异步复制master 
> ，还有两个namesrv，jvm是按推荐配的，4g堆。现在遇到一个问题，在单个broker的tps只有100多的时候触发了流控，有少部分是put 
> message 超过1000ms 
> ，大部分是排队队列超过200ms超时丢弃，而且问题主要发生在某台master，其他少量，我们在测试环境搭了一组broker，压测到10000tps才会出现排队队列超时的流控。想问一下可能是哪里有问题？应该往哪个方向去排查？>
>

Re: RocketMQ 3.5.8流控问题

Reply via email to