[jira] [Created] (ROCKETMQ-288) mq broker process mutex refactor
yubaofu created ROCKETMQ-288: Summary: mq broker process mutex refactor Key: ROCKETMQ-288 URL: https://issues.apache.org/jira/browse/ROCKETMQ-288 Project: Apache RocketMQ Issue Type: Improvement Components: rocketmq-broker, rocketmq-store Affects Versions: 4.1.0-incubating Reporter: yubaofu Assignee: yukon Fix For: 4.2.0-incubating mq broker process mutex logic may be abstract independent service. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ROCKETMQ-193) Develop rocketmq-redis-replicator component
[ https://issues.apache.org/jira/browse/ROCKETMQ-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160316#comment-16160316 ] ASF GitHub Bot commented on ROCKETMQ-193: - Github user leonchen83 commented on the issue: https://github.com/apache/incubator-rocketmq-externals/pull/29 @vongosling you will see the result of integration-test at [incubator-rocketmq-externals](https://travis-ci.org/leonchen83/incubator-rocketmq-externals/branches) > Develop rocketmq-redis-replicator component > --- > > Key: ROCKETMQ-193 > URL: https://issues.apache.org/jira/browse/ROCKETMQ-193 > Project: Apache RocketMQ > Issue Type: Task >Reporter: Rich Zhang >Assignee: Rich Zhang >Priority: Minor > Fix For: 4.2.0-incubating > > > Design: > Redis supplies an official replication mechanism , and slave communicates > to master with RESP protocol, so a natural way to design the > rocketmq-redis-replicator component is simulating itself as a slave, sending > commands to master and receiving datas from master timely, and then resending > to rocketmq broker. > If you are not familiar with redis replication mechanism, please learn this > section first [1]. After that, I will illustrate some key points ahead. > 1. To make slave start from the point where it left off when it reconnects, > slave and master should agree on a master runId and a replication offset. > Slave acknowledges this offset to master periodically. In other words,slave > may received duplicate commands. Along with, the rocketmq-redis-replicator > component may send duplicate messages too. A good way to minimize the > duplicate time window is reducing the "ack period" to a smaller one, such as > 100ms. > 2. If slave keeps offline for some time, it’s easy to use up backlog whose > default value is just 1M, especially for a high-traffic redis instance. > Unfortunatelly,if slave replication offset has already been covered in master > backlog, a full synchronization will have to execute, which is unacceptable > for rocketmq-redis-replicator component as a large number of messages will be > sent out intensively. > 3. When synchronizing from master fully, master will generate a new rdb > file(the rdb file format [2]),and slave will receive this file,store in disk, > and last apply to memory. This strategy makes slave reaches a consistent > state with master as soon as possible, and hardly fail. For > rocketmq-redis-replicator component, it’s also a good way to prevent > synchronizing initial rdb file from failure in halfway. > There already an open source project [3] which focuses on replicating redis > data, and provides api to handle data received [4]. The principal thoughts > are simulating itself as a slave , following official replication procedure, > communicating with master by RESP, and acking master with replication offset. > Base on this project to develop is a good idea, meanwhile some aspects should > also be enhanced and considered more robust. Here is some points: > [High Available] >Keeping the replication component's high availability is not difficult but > important, not only for providing an uninterruptible service. If component > leaves off for some time, a unacceptable full synchronization may be > triggered. >It’s also easy to reach high availability, including adopting master/slave > module, using zookeeper to coordinate and switch master/slave, storing data > onto zookeeper to keep component stateless. > [Data Loss] >Generally, data loss should be tried best to avoid. The key point is that > slave only acks replication offset to master after sending command to > rocketmq broker successfully. > [Data Stale] >It also happened when slave reconnect. Consider case below: >`time1` `time2` `time3` >set k=a set k=b set k=c >If slave left off at time3, but the latest replication offset reported to > master is only at time1, when slave reconnected, it re-apply commands “set > k=b… set k=c”. In a small time window, “k” will equal the stale “b” until > “set k=c” command is applied. So the slave offline time shorter, the better. >[Message Order] >Redis uses single thread model to keep command execute in order, because > of its high performance. Replicating data with a single thread in slave is > also fine, as it is also totally memory operation. But sending all data to > rocketmq in a global order is a good choose? Producer should have no > performance issue, but consumer may not be able to consume messages in time, > especially redis was in a high load. >Hashing “KEY” to different rocketmq queue is a good strategy. Guarantee > the same key operation route to a unique queue, to keep partial ordered, and > the
[jira] [Commented] (ROCKETMQ-193) Develop rocketmq-redis-replicator component
[ https://issues.apache.org/jira/browse/ROCKETMQ-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160315#comment-16160315 ] ASF GitHub Bot commented on ROCKETMQ-193: - Github user leonchen83 commented on the issue: https://github.com/apache/incubator-rocketmq-externals/pull/29 @vongosling I already added an integration-test to rocketmq-redis-replicator base on travis-ci. please help review this commit > Develop rocketmq-redis-replicator component > --- > > Key: ROCKETMQ-193 > URL: https://issues.apache.org/jira/browse/ROCKETMQ-193 > Project: Apache RocketMQ > Issue Type: Task >Reporter: Rich Zhang >Assignee: Rich Zhang >Priority: Minor > Fix For: 4.2.0-incubating > > > Design: > Redis supplies an official replication mechanism , and slave communicates > to master with RESP protocol, so a natural way to design the > rocketmq-redis-replicator component is simulating itself as a slave, sending > commands to master and receiving datas from master timely, and then resending > to rocketmq broker. > If you are not familiar with redis replication mechanism, please learn this > section first [1]. After that, I will illustrate some key points ahead. > 1. To make slave start from the point where it left off when it reconnects, > slave and master should agree on a master runId and a replication offset. > Slave acknowledges this offset to master periodically. In other words,slave > may received duplicate commands. Along with, the rocketmq-redis-replicator > component may send duplicate messages too. A good way to minimize the > duplicate time window is reducing the "ack period" to a smaller one, such as > 100ms. > 2. If slave keeps offline for some time, it’s easy to use up backlog whose > default value is just 1M, especially for a high-traffic redis instance. > Unfortunatelly,if slave replication offset has already been covered in master > backlog, a full synchronization will have to execute, which is unacceptable > for rocketmq-redis-replicator component as a large number of messages will be > sent out intensively. > 3. When synchronizing from master fully, master will generate a new rdb > file(the rdb file format [2]),and slave will receive this file,store in disk, > and last apply to memory. This strategy makes slave reaches a consistent > state with master as soon as possible, and hardly fail. For > rocketmq-redis-replicator component, it’s also a good way to prevent > synchronizing initial rdb file from failure in halfway. > There already an open source project [3] which focuses on replicating redis > data, and provides api to handle data received [4]. The principal thoughts > are simulating itself as a slave , following official replication procedure, > communicating with master by RESP, and acking master with replication offset. > Base on this project to develop is a good idea, meanwhile some aspects should > also be enhanced and considered more robust. Here is some points: > [High Available] >Keeping the replication component's high availability is not difficult but > important, not only for providing an uninterruptible service. If component > leaves off for some time, a unacceptable full synchronization may be > triggered. >It’s also easy to reach high availability, including adopting master/slave > module, using zookeeper to coordinate and switch master/slave, storing data > onto zookeeper to keep component stateless. > [Data Loss] >Generally, data loss should be tried best to avoid. The key point is that > slave only acks replication offset to master after sending command to > rocketmq broker successfully. > [Data Stale] >It also happened when slave reconnect. Consider case below: >`time1` `time2` `time3` >set k=a set k=b set k=c >If slave left off at time3, but the latest replication offset reported to > master is only at time1, when slave reconnected, it re-apply commands “set > k=b… set k=c”. In a small time window, “k” will equal the stale “b” until > “set k=c” command is applied. So the slave offline time shorter, the better. >[Message Order] >Redis uses single thread model to keep command execute in order, because > of its high performance. Replicating data with a single thread in slave is > also fine, as it is also totally memory operation. But sending all data to > rocketmq in a global order is a good choose? Producer should have no > performance issue, but consumer may not be able to consume messages in time, > especially redis was in a high load. >Hashing “KEY” to different rocketmq queue is a good strategy. Guarantee > the same key operation route to a unique queue, to keep partial ordered, and > the downstream consumer could
[jira] [Updated] (ROCKETMQ-287) RouteInfoManager#createAndUpdateQueueData not use break;
[ https://issues.apache.org/jira/browse/ROCKETMQ-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingwei li updated ROCKETMQ-287: Affects Version/s: (was: 4.1.0-incubating) > RouteInfoManager#createAndUpdateQueueData not use break; > > > Key: ROCKETMQ-287 > URL: https://issues.apache.org/jira/browse/ROCKETMQ-287 > Project: Apache RocketMQ > Issue Type: Improvement > Components: rocketmq-namesrv >Affects Versions: 4.2.0-incubating >Reporter: Jingwei li >Assignee: vongosling > Labels: easyfix > Attachments: screenshot-1.png > > > I think that RouteInfoManager#createAndUpdateQueueData can use break when the > brokerName matches but it not. > Or List may as several item with the same brokerName? > detail can be seen in attachment png; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ROCKETMQ-287) RouteInfoManager#createAndUpdateQueueData not use break;
[ https://issues.apache.org/jira/browse/ROCKETMQ-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingwei li updated ROCKETMQ-287: Affects Version/s: 4.2.0-incubating > RouteInfoManager#createAndUpdateQueueData not use break; > > > Key: ROCKETMQ-287 > URL: https://issues.apache.org/jira/browse/ROCKETMQ-287 > Project: Apache RocketMQ > Issue Type: Improvement > Components: rocketmq-namesrv >Affects Versions: 4.1.0-incubating, 4.2.0-incubating >Reporter: Jingwei li >Assignee: vongosling > Labels: easyfix > Attachments: screenshot-1.png > > > I think that RouteInfoManager#createAndUpdateQueueData can use break when the > brokerName matches but it not. > Or List may as several item with the same brokerName? > detail can be seen in attachment png; -- This message was sent by Atlassian JIRA (v6.4.14#64029)