[jira] [Reopened] (KYLIN-4167) Refactor streaming coordinator

nichunen (Jira) Tue, 19 Nov 2019 00:21:19 -0800


     [ 
https://issues.apache.org/jira/browse/KYLIN-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


nichunen reopened KYLIN-4167:
-----------------------------

> Refactor streaming coordinator
> ------------------------------
>
>                 Key: KYLIN-4167
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4167
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Real-time Streaming
>            Reporter: Xiaoxiang Yu
>            Assignee: Xiaoxiang Yu
>            Priority: Major
>             Fix For: v3.0.0
>
>
> h2. Summary
>  # Currently, *coordinator* has too many responsibility, which violate single 
> responsibility principle, and it not easy for extension, a good separation of 
> responsibilities is a recommended way.
>  # Some cluster level operation has no atomicity guarantee, we should 
> implement then in idempotent way to achieve final consistency
>  #  Resubmit when job was discarded
>  # Clarify overall design for realtime OLAP
>  
> h4. StreamingCoordinator
> Facade of coordinator, will controll BuildJobSummitter/ReceiverClusterMangaer 
> and delegate operation to them.
> h4. BuildJobSubmitter
> The main responsibility of BuildJobSubmitter including:
> 1. Try to find candidate segment which ready to submit a build job
> 2. Trace the status of candidate segment's build job and promote segment if 
> it is has met requirements
> h4.  
> h4. ReceiverClusterManager
> This class manage operation related to multi streaming receivers. They are 
> often not atomic and maybe idempotent.
> h4. ClusterStateChecker
> Basic step of this class:
> 1. stop/pause coordinator to avoid underlying concurrency issue
> 2. check inconsistent state of all receiver cluster
> 3. send summary via mail to kylin admin
> 4. if need, call ClusterDoctor to repair inconsistent issue
> h4. ClusterDoctor
> Repair inconsistent state according to result of ClusterStateChecker
>  
> ----
> h3. Candidate Segment
> The candidate segments are those segments what can be saw/perceived by 
> streaming coordinator,
> candidate segment could be divided into following state/queue:
> 1. segment which data are uploaded *PARTLY*
> 2. segment which data are uploaded completely and *WAITING* to build
> 3. segment which in *BUILDING* state, job's state should be one of 
> (NEW/RUNNING/ERROR/DISCARD)
> 4. segment which built *succeed* and wait to be delivered to historical part 
> (and to be deleted in realtime part)
> 5. segment which *in historical part*(HBase Ready Segment)
>  
> By design, segment should transfer to next queue in sequential way(shouldn't 
> jump the queue), do not break this.
> h3. Atomicity
> In a multi-step transcation, following acepts should be thought twice:
> 1. should *fail fast* or continue when exception thrown.
> 2. should API(remote call) be *synchronous* or asynchronous
> 3. when transcation failed, could *roll back* always succeed
> 4. transcation should be *idempotent* so when it failed, it could be fixed by 
> retry
>  
> How to ensure whole cluster opreation smoothly without blocking problem. I 
> divided all multi-step transcation into three kinds:
> NotAtomicIdempotent
> NotAtomicAndNotIdempotent
> NonSideEffect



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (KYLIN-4167) Refactor streaming coordinator

Reply via email to