RIP 19 Streaming Tiered Storage Optimize I noticed that the community was designing the next-generation architecutre a log time ago.After chatting with some guys in the community, I hope to evolve the design of this architecture and implement it. Thanks for the communication and guidance given by vongosling. Interested guys are welcome to bread down and complete architecture evolution in the cloud native era. Status
Current State: Proposed Authors: [email protected] Shepherds: [email protected] Mailing List Discussion: [email protected] Pull Request: #PR_NUMBER Released: <released_version> Correlations: RIP-18 Metadata management architecture upgrade,Thought of The Evolution of The Next Decade Architecture for RocketMQ Background & Motivation As user business increase ten times at peek times, such as breaking event happens/shopping festival, and decrease after the peek time to regular, which may lead to some issues We have to expand brokers resource in case of storage capacity bottleneck. We have to expand storage capacity in case of broker cpu bottleneck. After expand brokers/storage, we cant shrink these resources, which lead 9 times resource waste. Benchmark test dledger broker-cluster, when leader reach cpu bottleneck, follower only utilize 10% of cpu resource. If store one topic 1 year messages, all topics store 1 year If store one topic 1 year, local storage is expensive If 1 queue consume 100 times, lead to the leader node bottleneck, while other node idle When create new queues, it may choose the busiest broker Goals What problem is this proposal designed to solve? Implement the next generation rocketmq architechture, make the boundaries of compute and storage layer specific, support stream and tiered storage. Non-Goals What problem is this proposal NOT designed to solve? We will not directly implement multi-raft protocol on RocketMQ. Are there any limits of this proposal? Nothing specific. Changes Architecture New architecture graph: broker Make clear compute and storage layer interface, focus on compute logic, such as transaction, delay, filter Handle produce/consume/admin(rocketmq_protocol) api Stateless Use the state-of-the-art Angelia rpc call storage read/write api Support dynamic add and shrink broker node Support read replica customized, crack hot queue read imblance problem storage cluster Implement storage api for compute node Multi-raft implement Support hdfs/S3/GCP Support dynamic add and shrink storage node Storage support multi-tenant, like 100w queue Support topic-level retention time nameserver cluster Hash(queue) to a assigned raft group, which broker determine to call the queue-leader node Detect storage node failure, notify to broker update queueTostorageNode map Implement a intellgent module, which use machine learning to decide new queue assign to which node, and detect unhealthy node. Angelia Support multi communication protocol such as tcp, http2 Support multi serilization protocol such as json, Avro compute and store decouple compute layer SendMessageProcessor#asyncSendMessage this.brokerController.getMessageStore().asyncPutMessage(msgInner); storage layer DefaultMessageStore#putMessage 1)commitLog storage CommitLog#asyncPutMessage mappedFile.appendMessage(msg, this.appendMessageCallback); 2)Dledger storage DLedgerCommitLog#asyncPutMessage #io.openmessaging.storage.dledger.DLedgerServer dLedgerServer.handleAppend(request); dLedgerStore.appendAsLeader(dLedgerEntry); return dLedgerEntryPusher.waitAck(resEntry, false); 3)5.0 seperate storage SeperateStorage#asyncPutMessage client:appendMessage() #wait for leader and follower write half nodes Message Store Public interface MessageStore { #basic api putMessage(final MessageExtBrokerInner msg) flush() getMessage(final long offset, final int size) ... #startup and cache for storage node load() ... #failover recover(long maxPhyOffsetOfConsumeQueue) ... #manage api getMaxOffset() ... for si Compatibility, Deprecation, and Migration Plan Are backward and forward compatibility taken into consideration? the old producer/consumer/admin protocol not change, remains backward compatible Are there deprecated APIs? Remove deprecated pull consumer apis. Rejected Alternatives How does alternatives solve the issue you proposed? Use bookkeeper as stream storage engine Pros and Cons of alternatives The advantages and disadvantages of using bookkeeper are as follows Advantage: Simple implementation and short development cycle Disadvantages: bookkeeper use zookkeeper store ledger metadata, The introduction of third-party components requires maintenance of two systems, add maintaince complexity Why should we reject above alternatives? The introduction of third-party components is always avoided by RocketMQ, so the proposal of using bookkeeper(which depends zookkeeper) as metadata storage is not adopted.
