Re: [DISCUSS]RIP-20 Streaming Tiered Storage Optimize

vongosling Fri, 23 Apr 2021 03:43:03 -0700

+1

Thanks veritas follow-up the community and we have talked about Streaming
Tiered Storage Optimize, which is the subparts of the RocketMQ
5. Undoubtedly, achieving this will be a very challenging task. and I hope
the whole R&D will be more open. If you're interested, please don't
hesitate to support it and join in. I believe veritas has a clear
implementation plan waiting for more guys to join this improvement program.


veritas <[email protected]> 于2021年4月23日周五 下午12:31写道：

> RIP 20 Streaming Tiered Storage Optimize
> I noticed that the community was designing the next-generation
> architecutre a log time ago.After chatting with some guys in the community,
> I hope to evolve the design of this architecture and implement it. Thanks
> for the communication and guidance given by vongosling. Interested guys are
> welcome to bread down and complete architecture evolution in the cloud
> native era. For detail,
> https://github.com/apache/rocketmq/wiki/RIP-20-Streaming-Tiered-Storage-Optimize
> Status
>
> Current State: Proposed
>
> Authors: [email protected]
>
> Shepherds: [email protected]
>
> Mailing List Discussion: [email protected]
>
> Pull Request: #PR_NUMBER
>
> Released: <released_version>
>
> Correlations: RIP-18 Metadata management architecture upgrade，Thought of
> The Evolution of The Next Decade Architecture for RocketMQ
>
> Background & Motivation
>
> As user business increase ten times at peek times, such as breaking event
> happens/shopping festival, and decrease after the peek time to regular,
> which may lead to some issues
>
> We have to expand brokers resource in case of storage capacity bottleneck.
> We have to expand storage capacity in case of broker cpu bottleneck.
> After expand brokers/storage, we cant shrink these resources, which lead 9
> times resource waste.
> Benchmark test dledger broker-cluster, when leader reach cpu bottleneck,
> follower only utilize 10% of cpu resource.
> If store one topic 1 year messages, all topics store 1 year
> If store one topic 1 year, local storage is expensive
> If 1 queue consume 100 times, lead to the leader node bottleneck, while
> other node idle
> When create new queues, it may choose the busiest broker
> Goals
> What problem is this proposal designed to solve?
>
> Implement the next generation rocketmq architechture, make the boundaries
> of compute and storage layer specific, support stream and tiered storage.
>
> Non-Goals
> What problem is this proposal NOT designed to solve?
>
> We will not directly implement multi-raft protocol on RocketMQ.
>
> Are there any limits of this proposal?
>
> Nothing specific.
>
> Changes
> Architecture
>
> New architecture graph:
>
> broker
> Make clear compute and storage layer interface, focus on compute logic,
> such as transaction, delay, filter
> Handle produce/consume/admin(rocketmq_protocol) api
> Stateless
> Use the state-of-the-art Angelia rpc call storage read/write api
> Support dynamic add and shrink broker node
> Support read replica customized, crack hot queue read imblance problem
> storage cluster
> Implement storage api for compute node
> Multi-raft implement
> Support hdfs/S3/GCP
> Support dynamic add and shrink storage node
> Storage support multi-tenant, like 100w queue
> Support topic-level retention time
> nameserver cluster
> Hash(queue) to a assigned raft group, which broker determine to call the
> queue-leader node
> Detect storage node failure, notify to broker update queueTostorageNode map
> Implement a intellgent module, which use machine learning to decide new
> queue assign to which node, and detect unhealthy node.
> Angelia
> Support multi communication protocol such as tcp, http2
> Support multi serilization protocol such as json, Avro
> compute and store decouple
> compute layer
> SendMessageProcessor#asyncSendMessage
>     this.brokerController.getMessageStore().asyncPutMessage(msgInner);
>
>
> storage layer
> DefaultMessageStore#putMessage
>     1）commitLog storage
>         CommitLog#asyncPutMessage
>                 mappedFile.appendMessage(msg, this.appendMessageCallback);
>
>     2）Dledger storage
>     DLedgerCommitLog#asyncPutMessage
>         #io.openmessaging.storage.dledger.DLedgerServer
>         dLedgerServer.handleAppend(request);
>             dLedgerStore.appendAsLeader(dLedgerEntry);
>             return dLedgerEntryPusher.waitAck(resEntry, false);
>
>     3）5.0 seperate storage
>     SeperateStorage#asyncPutMessage
>         client:appendMessage()  #wait for leader and follower write half
> nodes
>
> Message Store
>
> Public interface MessageStore {
>
> #basic api
> putMessage(final MessageExtBrokerInner msg)
> flush()
> getMessage(final long offset, final int size)
> ...
>
> #startup and cache for storage node
> load()
> ...
>
> #failover
> recover(long maxPhyOffsetOfConsumeQueue)
> ...
>
> #manage api
> getMaxOffset()
> ...
>
> for si
>
> Compatibility, Deprecation, and Migration Plan
>
> Are backward and forward compatibility taken into consideration?
> the old producer/consumer/admin protocol not change, remains backward
> compatible
>
> Are there deprecated APIs?
> Remove deprecated pull consumer apis.
>
> Rejected Alternatives
> How does alternatives solve the issue you proposed?
>
> Use bookkeeper as stream storage engine
>
> Pros and Cons of alternatives
>
> The advantages and disadvantages of using bookkeeper are as follows
>
> Advantage: Simple implementation and short development cycle
>
> Disadvantages: bookkeeper use zookkeeper store ledger metadata, The
> introduction of third-party components requires maintenance of two systems,
> add maintaince complexity
>
> Why should we reject above alternatives?
>
> The introduction of third-party components is always avoided by RocketMQ,
> so the proposal of using bookkeeper(which depends zookkeeper) as metadata
> storage is not adopted.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>



-- 
Best Regards :-)

Re: [DISCUSS]RIP-20 Streaming Tiered Storage Optimize

Reply via email to