+1 Thanks veritas follow-up the community and we have talked about Streaming Tiered Storage Optimize, which is the subparts of the RocketMQ 5. Undoubtedly, achieving this will be a very challenging task. and I hope the whole R&D will be more open. If you're interested, please don't hesitate to support it and join in. I believe veritas has a clear implementation plan waiting for more guys to join this improvement program.
veritas <[email protected]> 于2021年4月23日周五 下午12:31写道: > RIP 20 Streaming Tiered Storage Optimize > I noticed that the community was designing the next-generation > architecutre a log time ago.After chatting with some guys in the community, > I hope to evolve the design of this architecture and implement it. Thanks > for the communication and guidance given by vongosling. Interested guys are > welcome to bread down and complete architecture evolution in the cloud > native era. For detail, > https://github.com/apache/rocketmq/wiki/RIP-20-Streaming-Tiered-Storage-Optimize > Status > > Current State: Proposed > > Authors: [email protected] > > Shepherds: [email protected] > > Mailing List Discussion: [email protected] > > Pull Request: #PR_NUMBER > > Released: <released_version> > > Correlations: RIP-18 Metadata management architecture upgrade,Thought of > The Evolution of The Next Decade Architecture for RocketMQ > > Background & Motivation > > As user business increase ten times at peek times, such as breaking event > happens/shopping festival, and decrease after the peek time to regular, > which may lead to some issues > > We have to expand brokers resource in case of storage capacity bottleneck. > We have to expand storage capacity in case of broker cpu bottleneck. > After expand brokers/storage, we cant shrink these resources, which lead 9 > times resource waste. > Benchmark test dledger broker-cluster, when leader reach cpu bottleneck, > follower only utilize 10% of cpu resource. > If store one topic 1 year messages, all topics store 1 year > If store one topic 1 year, local storage is expensive > If 1 queue consume 100 times, lead to the leader node bottleneck, while > other node idle > When create new queues, it may choose the busiest broker > Goals > What problem is this proposal designed to solve? > > Implement the next generation rocketmq architechture, make the boundaries > of compute and storage layer specific, support stream and tiered storage. > > Non-Goals > What problem is this proposal NOT designed to solve? > > We will not directly implement multi-raft protocol on RocketMQ. > > Are there any limits of this proposal? > > Nothing specific. > > Changes > Architecture > > New architecture graph: > > broker > Make clear compute and storage layer interface, focus on compute logic, > such as transaction, delay, filter > Handle produce/consume/admin(rocketmq_protocol) api > Stateless > Use the state-of-the-art Angelia rpc call storage read/write api > Support dynamic add and shrink broker node > Support read replica customized, crack hot queue read imblance problem > storage cluster > Implement storage api for compute node > Multi-raft implement > Support hdfs/S3/GCP > Support dynamic add and shrink storage node > Storage support multi-tenant, like 100w queue > Support topic-level retention time > nameserver cluster > Hash(queue) to a assigned raft group, which broker determine to call the > queue-leader node > Detect storage node failure, notify to broker update queueTostorageNode map > Implement a intellgent module, which use machine learning to decide new > queue assign to which node, and detect unhealthy node. > Angelia > Support multi communication protocol such as tcp, http2 > Support multi serilization protocol such as json, Avro > compute and store decouple > compute layer > SendMessageProcessor#asyncSendMessage > this.brokerController.getMessageStore().asyncPutMessage(msgInner); > > > storage layer > DefaultMessageStore#putMessage > 1)commitLog storage > CommitLog#asyncPutMessage > mappedFile.appendMessage(msg, this.appendMessageCallback); > > 2)Dledger storage > DLedgerCommitLog#asyncPutMessage > #io.openmessaging.storage.dledger.DLedgerServer > dLedgerServer.handleAppend(request); > dLedgerStore.appendAsLeader(dLedgerEntry); > return dLedgerEntryPusher.waitAck(resEntry, false); > > 3)5.0 seperate storage > SeperateStorage#asyncPutMessage > client:appendMessage() #wait for leader and follower write half > nodes > > Message Store > > Public interface MessageStore { > > #basic api > putMessage(final MessageExtBrokerInner msg) > flush() > getMessage(final long offset, final int size) > ... > > #startup and cache for storage node > load() > ... > > #failover > recover(long maxPhyOffsetOfConsumeQueue) > ... > > #manage api > getMaxOffset() > ... > > for si > > Compatibility, Deprecation, and Migration Plan > > Are backward and forward compatibility taken into consideration? > the old producer/consumer/admin protocol not change, remains backward > compatible > > Are there deprecated APIs? > Remove deprecated pull consumer apis. > > Rejected Alternatives > How does alternatives solve the issue you proposed? > > Use bookkeeper as stream storage engine > > Pros and Cons of alternatives > > The advantages and disadvantages of using bookkeeper are as follows > > Advantage: Simple implementation and short development cycle > > Disadvantages: bookkeeper use zookkeeper store ledger metadata, The > introduction of third-party components requires maintenance of two systems, > add maintaince complexity > > Why should we reject above alternatives? > > The introduction of third-party components is always avoided by RocketMQ, > so the proposal of using bookkeeper(which depends zookkeeper) as metadata > storage is not adopted. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Best Regards :-)
