Yu Kaiyuan created ROCKETMQ-272: ----------------------------------- Summary: The config `syncFlushTimeout` doesn't work for SYNC_MASTER Key: ROCKETMQ-272 URL: https://issues.apache.org/jira/browse/ROCKETMQ-272 Project: Apache RocketMQ Issue Type: Bug Components: rocketmq-broker Affects Versions: 4.1.0-incubating Reporter: Yu Kaiyuan Assignee: yukon
It's quite frequent to get result as `sendStatus=FLUSH_SLAVE_TIMEOUT` when sending big messages(>500k) in SYNC_MASTER/SLAVE scenario. The timeout value used by the sync process currently as I found, is the config `syncFlushTimeout`. And its default value is 5000 milliseconds. But it shows that producer get the result as `FLUSH_SLAVE_TIMEOUT` less than 1 second. So why does the config not work as expected? Relevant code: {code:java} // CommitLog.java public PutMessageResult putMessage(final MessageExtBrokerInner msg) { { // Synchronous write double if (BrokerRole.SYNC_MASTER == this.defaultMessageStore.getMessageStoreConfig().getBrokerRole()) { HAService service = this.defaultMessageStore.getHaService(); if (msg.isWaitStoreMsgOK()) { // Determine whether to wait if (service.isSlaveOK(result.getWroteOffset() + result.getWroteBytes())) { if (null == request) { request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes()); } service.putRequest(request); service.getWaitNotifyObject().wakeupAll(); boolean flushOK = // TODO request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout()); if (!flushOK) { log.error("do sync transfer other node, wait return, but failed, topic: " + msg.getTopic() + " tags: " + msg.getTags() + " client address: " + msg.getBornHostString()); putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_SLAVE_TIMEOUT); } } // Slave problem else { // Tell the producer, slave not available putMessageResult.setPutMessageStatus(PutMessageStatus.SLAVE_NOT_AVAILABLE); } } } return putMessageResult; } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)