Hi, Batch Message is designed for the streaming case in which you could get a high throughput.
Best Regards, > 在 2018年6月15日,09:39,Jason Joo <[email protected]> 写道: > > Hello, > > We have a cluster with 2m-2s architecture. TPS is between 1000 to 6000 and > about 200 million messages in 24 hours typically. > The commit log size is about 200GB and can be cached in memory fully. > > Our server(one rocketmq each, 4 servers in this cluster): > 6 core Xeon-E5-2620 x 2 (24 thread total) > 256G ddr3 1600MHz > 240G Intel SSD 530(SSDSC2BW240H6) x 6 (Raid 1) > > To "IMPROVE" producing performance(mainly reduce network packages) we start > to use batch sending feature, and we begain to receive so much "slow log" > after that(batch is 50~70, 100~1000B per message body). > > After some tuning(osPageCacheBusyTimeOutMills, waitTimeMillsInSendQueue to > 500ms, etc) we only increasing the "time cost" printed in slow log. > > The pagecache rt in log before using batch sending: > > [PAGECACHERT] TotalPut 50650, PutMessageDistributeTime [<=0ms]:44250 > [0~10ms]:5865 [10~50ms]:514 [50~100ms]:10 [100~200ms]:11 [200~500ms]:0 > [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0 > > after: > > [PAGECACHERT] TotalPut 1970, PutMessageDistributeTime [<=0ms]:1428 > [0~10ms]:459 [10~50ms]:4 [50~100ms]:2 [100~200ms]:5 [200~500ms]:10 > [500ms~1s]:10 [1~2s]:25 [2~3s]:24 [3~4s]:3 [4~5s]:0 [5~10s]:0 [10s~]:0 > > > > After reading CommitLog i found the whole batch message is surrounded in one > lock together. So it make sense the average executing time of PUT would > increase. but we alse will receive some "in lock slow log" which will trigger > flow control for a while: > > 2018-06-14 21:02:42 WARN SendMessageThread_102 - [NOTIFYME]putMessages in > lock cost time(ms)=1373, bodyLength=49380 > AppendMessageResult=AppendMessageResult{status=PUT_OK, > wroteOffset=68238809866237, wroteBytes=54420, > msgIdstoreTimestamp=1528981361218, logicsOffset=5784993480, pagecacheRT=1373, > msgNum=70} > 2018-06-14 21:02:42 WARN SendMessageThread_102 - not in lock eclipse > time(ms)=1374, bodyLength=49380 > 2018-06-14 21:02:42 WARN SendMessageThread_79 - not in lock eclipse > time(ms)=688, bodyLength=274 > 2018-06-14 21:02:42 WARN SendMessageThread_84 - not in lock eclipse > time(ms)=562, bodyLength=48242 > 2018-06-14 21:02:42 WARN SendMessageThread_107 - not in lock eclipse > time(ms)=839, bodyLength=274 > 2018-06-14 21:02:42 WARN SendMessageThread_97 - not in lock eclipse > time(ms)=1199, bodyLength=49469 > 2018-06-14 21:02:42 WARN SendMessageThread_111 - not in lock eclipse > time(ms)=1307, bodyLength=47329 > 2018-06-14 21:02:42 WARN SendMessageThread_91 - not in lock eclipse > time(ms)=786, bodyLength=275 > 2018-06-14 21:02:42 WARN SendMessageThread_67 - not in lock eclipse > time(ms)=807, bodyLength=575 > 2018-06-14 21:02:42 WARN SendMessageThread_65 - not in lock eclipse > time(ms)=1326, bodyLength=47294 > 2018-06-14 21:02:43 INFO FlushRealTimeService - Flush data to disk costs 2271 > ms > 2018-06-14 21:02:43 WARN SendMessageThread_116 - not in lock eclipse > time(ms)=1542, bodyLength=50039 > 2018-06-14 21:02:43 WARN SendMessageThread_119 - not in lock eclipse > time(ms)=1202, bodyLength=50024 > 2018-06-14 21:02:43 WARN SendMessageThread_55 - not in lock eclipse > time(ms)=1436, bodyLength=907 > > So the question is: > Is batch sending recommended? > It maybe "slower" compared to simple sending, should i increase the > in-lock-timeout? > how we can solve the performance problem(many REJECTREQUEST will generated in > producers), add new Master into the cluster? > > > My question > > best regards, > > Jason >
