Translation in English

HI,

I met a flow control problem. Here is config:
Version: rocketmq 3.5.8
Broker: 3m-3s-async
flushDiskType:ASYNC_FLUSH
Jvm:-Xms 4g -Xmx 4g -xmn 2g -XX:PermSize=128m -XX:MaxPermSize=320m 
-XX:ParallelGCThreads=2 -XX:ConcGCThreads=1
Env: Docker, no cpu limit, 16G memory,(The host has 40 cpus and deploys about 
20-30 docker instances(not mq), all instances is no cpu limit)
sendMessageThreadPoolNums=28
pullMessageThreadPoolNums=22

When the tps is only 100-200, some master start flow control.Most of logs in 
producer is like this:
[TIMEOUT_CLEAN_QUEUE]broker busy, start flow control for a while, period in 
queue: 201ms, size of queue: 35
Some logs is:
[REJECTREQUEST]system busy, start flow control for a while
[PCBUSY_CLEAN_QUEUE]broker busy, start flow control for a while, period in 
queue: 3ms, size of queue: 66

There are warning logs in master sometimes, like:
[NOTIFYME]putMessage in lock cost time(ms)=1557, bodyLength=108 
AppendMessageResult=AppendMessageResult{status=PUT_OK,...pagecacheRT=1557}
putMessage not in lock eclipse time(ms)=1557, bodyLength=110
Sometimes pagecacheRT=0.

Sometimes only has "not in lock time” log:
putMessage not in lock eclipse time(ms)=1440, bodyLength=108

The flow control happens in one broker mostly,and  the others is fine.
The gc time is about 100ms,it mostly doesn’t happen is gc period

In another cluster, 1m-1s-async with mq config, its tsp can reach 10000 until 
happening flow control

Does anyone have idea about this problem,where is bottleneck 
probably,memory,disk, or cpu?

On 2018/02/09 05:56:29, "752832634" <x...@foxmail.com> wrote: 
> Hi> 
> 
> 
> 
> 我们在生产环境docker上搭了一套3.5.8版本的mq,jdk用的1.7,共3组broker,1m1s,异步刷盘,异步复制master 
> ,还有两个namesrv,jvm是按推荐配的,4g堆。现在遇到一个问题,在单个broker的tps只有100多的时候触发了流控,有少部分是put 
> message 超过1000ms 
> ,大部分是排队队列超过200ms超时丢弃,而且问题主要发生在某台master,其他少量,我们在测试环境搭了一组broker,压测到10000tps才会出现排队队列超时的流控。想问一下可能是哪里有问题?应该往哪个方向去排查?>
>  

Reply via email to