[ 
https://issues.apache.org/jira/browse/ROCKETMQ-384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394750#comment-16394750
 ] 

Shilin Lu commented on ROCKETMQ-384:
------------------------------------

I think ntp offset is not a root cause.There will be 2S offset between 
different machines.But in this case,it is in a same machine,i think it can be 
ignore.But in this case ,i think may be gc is a root cause.

> broker flow control is abnormal when the machine's physical time drift
> ----------------------------------------------------------------------
>
>                 Key: ROCKETMQ-384
>                 URL: https://issues.apache.org/jira/browse/ROCKETMQ-384
>             Project: Apache RocketMQ
>          Issue Type: Improvement
>          Components: rocketmq-broker
>    Affects Versions: 4.2.0
>         Environment: operating system: CentOS6.0
> hardware: 8C8G VM
> version: rocketmq-4.2.0
> broker config: default 2m-2s-async config
> producer qps: 4000
> pre message byte: 10 bytes
>            Reporter: liyuzhou
>            Assignee: yukon
>            Priority: Major
>
>     When I did a performance test for broker,I found large numbers of 
> exception like this:
>  send error com.alibaba.rocketmq.client.exception.MQBrokerException: CODE: 2 
> DESC: [REJECTREQUEST]system busy, start flow control for a while
>     After I have read the related source code,I think this exception occurs 
> when the broker think its OS is busy, it will refuse the increasing request. 
> But I saw the server's monitor, the JVM gc is normal(normally 10ms), max cost 
> 50ms and performed at a frequency of 10s once, and my server's CPU and disk 
> IO also is health.But my server's NTP offset occasionally drift more than 
> 2s.So I think the pyhsical time caused the flow control.
>  related code:
> {code:java}
> CommitLog.java
> public PutMessageResult putMessage(final MessageExtBrokerInner msg) {
>     ... ...
>     long beginLockTimestamp = 
> this.defaultMessageStore.getSystemClock().now(); //now() is 
> System.currentTimeMillis()
>     this.beginTimeInLock = beginLockTimestamp;
>     ... ...
> }
> DefaultMessageStore.java
> public boolean isOSPageCacheBusy() {
>     long begin = this.getCommitLog().getBeginTimeInLock();
>     long diff = this.systemClock.now() - begin;
>     return diff < 10000000
>       && diff > this.messageStoreConfig.getOsPageCacheBusyTimeOutMills();
> }
> {code}
>     Assume the first request run into the CommitLog's method putMessage, the 
> physical clock is 0s, the server's physical clock drift to 2s, then the 
> second request run into the isOSPageCacheBusy in order to check if the system 
> is busy, so the second request is rejected by the broker for the clock drift.
>     So should we replace System.currentTimeMillis() with System.nanoTime() to 
> reduce the abnormal request rejection?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to