[ https://issues.apache.org/jira/browse/KYLIN-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
nichunen reopened KYLIN-4167: ----------------------------- > Refactor streaming coordinator > ------------------------------ > > Key: KYLIN-4167 > URL: https://issues.apache.org/jira/browse/KYLIN-4167 > Project: Kylin > Issue Type: Improvement > Components: Real-time Streaming > Reporter: Xiaoxiang Yu > Assignee: Xiaoxiang Yu > Priority: Major > Fix For: v3.0.0 > > > h2. Summary > # Currently, *coordinator* has too many responsibility, which violate single > responsibility principle, and it not easy for extension, a good separation of > responsibilities is a recommended way. > # Some cluster level operation has no atomicity guarantee, we should > implement then in idempotent way to achieve final consistency > # Resubmit when job was discarded > # Clarify overall design for realtime OLAP > > h4. StreamingCoordinator > Facade of coordinator, will controll BuildJobSummitter/ReceiverClusterMangaer > and delegate operation to them. > h4. BuildJobSubmitter > The main responsibility of BuildJobSubmitter including: > 1. Try to find candidate segment which ready to submit a build job > 2. Trace the status of candidate segment's build job and promote segment if > it is has met requirements > h4. > h4. ReceiverClusterManager > This class manage operation related to multi streaming receivers. They are > often not atomic and maybe idempotent. > h4. ClusterStateChecker > Basic step of this class: > 1. stop/pause coordinator to avoid underlying concurrency issue > 2. check inconsistent state of all receiver cluster > 3. send summary via mail to kylin admin > 4. if need, call ClusterDoctor to repair inconsistent issue > h4. ClusterDoctor > Repair inconsistent state according to result of ClusterStateChecker > > ---- > h3. Candidate Segment > The candidate segments are those segments what can be saw/perceived by > streaming coordinator, > candidate segment could be divided into following state/queue: > 1. segment which data are uploaded *PARTLY* > 2. segment which data are uploaded completely and *WAITING* to build > 3. segment which in *BUILDING* state, job's state should be one of > (NEW/RUNNING/ERROR/DISCARD) > 4. segment which built *succeed* and wait to be delivered to historical part > (and to be deleted in realtime part) > 5. segment which *in historical part*(HBase Ready Segment) > > By design, segment should transfer to next queue in sequential way(shouldn't > jump the queue), do not break this. > h3. Atomicity > In a multi-step transcation, following acepts should be thought twice: > 1. should *fail fast* or continue when exception thrown. > 2. should API(remote call) be *synchronous* or asynchronous > 3. when transcation failed, could *roll back* always succeed > 4. transcation should be *idempotent* so when it failed, it could be fixed by > retry > > How to ensure whole cluster opreation smoothly without blocking problem. I > divided all multi-step transcation into three kinds: > NotAtomicIdempotent > NotAtomicAndNotIdempotent > NonSideEffect -- This message was sent by Atlassian Jira (v8.3.4#803005)