Re: About alpha-fsm progress

2019-06-28 Thread Willem Jiang
I just did a quick search about the Akka persistent[1], it looks like
we could leverage it to implement the event source  without WAL.

I agree we could use Akka cluster to provide the HA solution for the
Alpha, but we need to do some POC for it.

[1]https://doc.akka.io/docs/akka/current/persistence.html


Willem Jiang

Twitter: willemjiang
Weibo: 姜宁willem

On Fri, Jun 28, 2019 at 7:49 PM Zheng Feng  wrote:
>
> Thanks Zhang Lei,
>
> Zhang Lei  于2019年6月28日周五 下午5:50写道:
>
> > Hi, All
> >
> > alpha-fsm has been pushed to the branch SCB-1321
> >
> > Completed:
> > 1. State machine design document[1]
> > 2. State machine prototype
> > 3. State machine test case
> > 4. Receive saga events using the internal message bus
> >
> > Key emphasis of next stage in work:
> > In order to carry out the feasibility verification as soon as possible, I
> > will not consider the reliability issue for the time being.
> > 1. Refactor Omega components, add SagaAbortedEvent, SagaTimeoutEvent,
> > TxComponsitedEvent
> > 2. Save compensation method parameters in Actor and trigger compensation
> > in Actor
> > 3. Do not use Kafka and only verify single node alpha, The Alpha server
> > receives the saga event and puts it into the internal message bus.
> >
> It looks good to me !
>
> >
> > Planning:
> > 1. Persist actor data to the database when it terminates
> >
> What are the actor data ? they are all the events ?
>
> > 2. Integration Kafka
> >
> So we can use the Kafka as a message broker and the invokings between the
> Omega and the Alpha will become async ?
>
> > 3. Support WAL[2] recovery mode
> >
> Well, it looks interesting and does it support by the akka ?
>
> > 4. Verify Akka cluster reliability
> >
> > [1]
> > https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm <
> > https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm>
> > [2] https://en.wikipedia.org/wiki/Write-ahead_logging <
> > https://en.wikipedia.org/wiki/Write-ahead_logging>
> >
> > if you have other comments, please let us know.
> >
> Good luck !
>
> >
> > Thanks,
> > Lei Zhang
> >
> > > 在 2019年6月27日,上午9:50,Willem Jiang  写道:
> > >
> > > We just leverage the message broker to make sure Alpha get the
> > > transaction event from Omega.
> > > In most cases Alpha don't need to talk back  to Omega, we just need to
> > > make sure all the transaction message are stored (Alpha can process it
> > > later).
> > >
> > > If Omega cannot talk the message broker, Omega should abort the
> > > transaction processing with transport exception.
> > >
> > > Willem Jiang
> > >
> > > Twitter: willemjiang
> > > Weibo: 姜宁willem
> > >
> > > On Tue, Jun 25, 2019 at 8:42 AM Zhang Lei  wrote:
> > >>
> > >> Hi, Zhang jun
> > >>
> > >>> I just cared about the recovery scan thread design.
> > >>> Kafka can ensure event message can be consumed by alpha exactly, but
> > recovery need know all the participated transaction response to decide
> > rollback or commit, so I think scan thread is also necessary.
> > >>
> > >> I am not sure, but I think Akka's persistence can solve this problem
> > you care about.
> > >> Of course, this ability needs to be verified
> > >>
> > >> Thanks,
> > >> Zhang Lei
> > >>
> > >>> 在 2019年6月24日,上午10:46,赵俊  写道:
> > >>>
> > >>> Hi, Zhang Lei
> > >>>
> >  A2 : I think we only need to ensure that the message can be reliably
> > delivered to the state machine, The state machine is only a synchronous
> > record state transition when the transaction is executed normally. At
> > present, the compensation method based on table scan is also asynchronous.
> > I am not sure if I have answered your question, or you can give me more
> > information.
> > >>>
> > >>> If we have a mechanism that ensure main service can collect all the
> > participated transaction response from alpha correctly before
> > commit/rollback, it is OK.
> > >>>
> >  Q2 : Also we should consider about recovery, it seems that recovery
> > is as same as before based on database.
> >  A2 : I think the question you care about is how to recover when the
> > alpha is down, this is a little different from the current version.
> >  1. We can base on Kafka's reliability and control the offset of the
> > topic, one message at a time
> >  2. Of course, we can also do some extra design for it, such as
> > logging the data log file locally after receiving the Kafka message. Resume
> > the message by reading the data log file when the alpha machine restarts
> > >>>
> > >>> I just cared about the recovery scan thread design.
> > >>> Kafka can ensure event message can be consumed by alpha exactly, but
> > recovery need know all the participated transaction response to decide
> > rollback or commit, so I think scan thread is also necessary.
> > >>>
> > >>>
> > >>>
> >  On Jun 23, 2019, at 1:04 PM, Zhang Lei  wrote:
> > 
> >  Hi, Zhao Jun
> > 
> >  Thank you for your reply!
> > 
> >  This design document does not elaborate on reliability 

Re: About alpha-fsm progress

2019-06-28 Thread Zhang Lei
Hi, Feng Zheng

Thank you for the reply :)

> 在 2019年6月28日,下午7:49,Zheng Feng  写道:
> 
> Thanks Zhang Lei,
> 
> Zhang Lei mailto:zhang_...@boco.com.cn>> 
> 于2019年6月28日周五 下午5:50写道:
> 
>> Hi, All
>> 
>> alpha-fsm has been pushed to the branch SCB-1321
>> 
>> Completed:
>> 1. State machine design document[1]
>> 2. State machine prototype
>> 3. State machine test case
>> 4. Receive saga events using the internal message bus
>> 
>> Key emphasis of next stage in work:
>> In order to carry out the feasibility verification as soon as possible, I
>> will not consider the reliability issue for the time being.
>> 1. Refactor Omega components, add SagaAbortedEvent, SagaTimeoutEvent,
>> TxComponsitedEvent
>> 2. Save compensation method parameters in Actor and trigger compensation
>> in Actor
>> 3. Do not use Kafka and only verify single node alpha, The Alpha server
>> receives the saga event and puts it into the internal message bus.
>> 
> It looks good to me !
> 
>> 
>> Planning:
>> 1. Persist actor data to the database when it terminates
>> 
> What are the actor data ? they are all the events ?

Data contains details of a global transaction and sub-transactions, It consists 
of three parts:
1. global transaction id, start time, end time, status.
2. sub-transaction id, start time, end time, status.
3. All events received by the actor

> 
>> 2. Integration Kafka
>> 
> So we can use the Kafka as a message broker and the invokings between the
> Omega and the Alpha will become async ?

This part needs to be discussed. My idea is not to change the communication 
protocol between omega and alpha. It only distributes to other alpha through 
Kafka after alpha receives the event so that alpha dynamic expansion can be 
realized, this architecture diagram has an intuitive description 
https://github.com/apache/servicecomb-pack/raw/SCB-1321/docs/fsm/assets/fsm.png 

If we use a message broker between omega and alpha, then we also need to 
consider the implementation of alpha call omega compensation.

> 
>> 3. Support WAL[2] recovery mode
>> 
> Well, it looks interesting and does it support by the akka ?
> 

No, nothing to do with Akka.
In order to avoid event loss caused by the system crash, Alpha first persists 
to disk after receiving an event, Then send the message to the Actor and record 
the message pointer. When Alpha restarts, it will first read the event pointer 
on the disk and resend the event to the actor from the pointer position.

>> 4. Verify Akka cluster reliability
>> 
>> [1]
>> https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm 
>>  <
>> https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm 
>> >
>> [2] https://en.wikipedia.org/wiki/Write-ahead_logging 
>>  <
>> https://en.wikipedia.org/wiki/Write-ahead_logging 
>> >
>> 
>> if you have other comments, please let us know.
>> 
> Good luck !
> 
>> 
>> Thanks,
>> Lei Zhang
>> 
>>> 在 2019年6月27日,上午9:50,Willem Jiang >> > 写道:
>>> 
>>> We just leverage the message broker to make sure Alpha get the
>>> transaction event from Omega.
>>> In most cases Alpha don't need to talk back  to Omega, we just need to
>>> make sure all the transaction message are stored (Alpha can process it
>>> later).
>>> 
>>> If Omega cannot talk the message broker, Omega should abort the
>>> transaction processing with transport exception.
>>> 
>>> Willem Jiang
>>> 
>>> Twitter: willemjiang
>>> Weibo: 姜宁willem
>>> 
>>> On Tue, Jun 25, 2019 at 8:42 AM Zhang Lei >> > wrote:
 
 Hi, Zhang jun
 
> I just cared about the recovery scan thread design.
> Kafka can ensure event message can be consumed by alpha exactly, but
>> recovery need know all the participated transaction response to decide
>> rollback or commit, so I think scan thread is also necessary.
 
 I am not sure, but I think Akka's persistence can solve this problem
>> you care about.
 Of course, this ability needs to be verified
 
 Thanks,
 Zhang Lei
 
> 在 2019年6月24日,上午10:46,赵俊 mailto:zhaoju...@jd.com>> 写道:
> 
> Hi, Zhang Lei
> 
>> A2 : I think we only need to ensure that the message can be reliably
>> delivered to the state machine, The state machine is only a synchronous
>> record state transition when the transaction is executed normally. At
>> present, the compensation method based on table scan is also asynchronous.
>> I am not sure if I have answered your question, or you can give me more
>> information.
> 
> If we have a mechanism that ensure main service can collect all the
>> participated transaction response from alpha correctly before

Re: About alpha-fsm progress

2019-06-28 Thread Zheng Feng
Thanks Zhang Lei,

Zhang Lei  于2019年6月28日周五 下午5:50写道:

> Hi, All
>
> alpha-fsm has been pushed to the branch SCB-1321
>
> Completed:
> 1. State machine design document[1]
> 2. State machine prototype
> 3. State machine test case
> 4. Receive saga events using the internal message bus
>
> Key emphasis of next stage in work:
> In order to carry out the feasibility verification as soon as possible, I
> will not consider the reliability issue for the time being.
> 1. Refactor Omega components, add SagaAbortedEvent, SagaTimeoutEvent,
> TxComponsitedEvent
> 2. Save compensation method parameters in Actor and trigger compensation
> in Actor
> 3. Do not use Kafka and only verify single node alpha, The Alpha server
> receives the saga event and puts it into the internal message bus.
>
It looks good to me !

>
> Planning:
> 1. Persist actor data to the database when it terminates
>
What are the actor data ? they are all the events ?

> 2. Integration Kafka
>
So we can use the Kafka as a message broker and the invokings between the
Omega and the Alpha will become async ?

> 3. Support WAL[2] recovery mode
>
Well, it looks interesting and does it support by the akka ?

> 4. Verify Akka cluster reliability
>
> [1]
> https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm <
> https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm>
> [2] https://en.wikipedia.org/wiki/Write-ahead_logging <
> https://en.wikipedia.org/wiki/Write-ahead_logging>
>
> if you have other comments, please let us know.
>
Good luck !

>
> Thanks,
> Lei Zhang
>
> > 在 2019年6月27日,上午9:50,Willem Jiang  写道:
> >
> > We just leverage the message broker to make sure Alpha get the
> > transaction event from Omega.
> > In most cases Alpha don't need to talk back  to Omega, we just need to
> > make sure all the transaction message are stored (Alpha can process it
> > later).
> >
> > If Omega cannot talk the message broker, Omega should abort the
> > transaction processing with transport exception.
> >
> > Willem Jiang
> >
> > Twitter: willemjiang
> > Weibo: 姜宁willem
> >
> > On Tue, Jun 25, 2019 at 8:42 AM Zhang Lei  wrote:
> >>
> >> Hi, Zhang jun
> >>
> >>> I just cared about the recovery scan thread design.
> >>> Kafka can ensure event message can be consumed by alpha exactly, but
> recovery need know all the participated transaction response to decide
> rollback or commit, so I think scan thread is also necessary.
> >>
> >> I am not sure, but I think Akka's persistence can solve this problem
> you care about.
> >> Of course, this ability needs to be verified
> >>
> >> Thanks,
> >> Zhang Lei
> >>
> >>> 在 2019年6月24日,上午10:46,赵俊  写道:
> >>>
> >>> Hi, Zhang Lei
> >>>
>  A2 : I think we only need to ensure that the message can be reliably
> delivered to the state machine, The state machine is only a synchronous
> record state transition when the transaction is executed normally. At
> present, the compensation method based on table scan is also asynchronous.
> I am not sure if I have answered your question, or you can give me more
> information.
> >>>
> >>> If we have a mechanism that ensure main service can collect all the
> participated transaction response from alpha correctly before
> commit/rollback, it is OK.
> >>>
>  Q2 : Also we should consider about recovery, it seems that recovery
> is as same as before based on database.
>  A2 : I think the question you care about is how to recover when the
> alpha is down, this is a little different from the current version.
>  1. We can base on Kafka's reliability and control the offset of the
> topic, one message at a time
>  2. Of course, we can also do some extra design for it, such as
> logging the data log file locally after receiving the Kafka message. Resume
> the message by reading the data log file when the alpha machine restarts
> >>>
> >>> I just cared about the recovery scan thread design.
> >>> Kafka can ensure event message can be consumed by alpha exactly, but
> recovery need know all the participated transaction response to decide
> rollback or commit, so I think scan thread is also necessary.
> >>>
> >>>
> >>>
>  On Jun 23, 2019, at 1:04 PM, Zhang Lei  wrote:
> 
>  Hi, Zhao Jun
> 
>  Thank you for your reply!
> 
>  This design document does not elaborate on reliability aspects.
> 
>  My initial thought is this
> 
>  Q1 : It seems that omega should hold on after consuming the event
> message from Kafka instead of completing pushing message
>  A2 : I think we only need to ensure that the message can be reliably
> delivered to the state machine, The state machine is only a synchronous
> record state transition when the transaction is executed normally. At
> present, the compensation method based on table scan is also asynchronous.
> I am not sure if I have answered your question, or you can give me more
> information.
> 
>  Q2 : Also we should consider about recovery, it seems that recovery
> is 

About alpha-fsm progress

2019-06-28 Thread Zhang Lei
Hi, All

alpha-fsm has been pushed to the branch SCB-1321

Completed:
1. State machine design document[1]
2. State machine prototype
3. State machine test case
4. Receive saga events using the internal message bus

Key emphasis of next stage in work:
In order to carry out the feasibility verification as soon as possible, I will 
not consider the reliability issue for the time being.
1. Refactor Omega components, add SagaAbortedEvent, SagaTimeoutEvent, 
TxComponsitedEvent
2. Save compensation method parameters in Actor and trigger compensation in 
Actor
3. Do not use Kafka and only verify single node alpha, The Alpha server 
receives the saga event and puts it into the internal message bus.

Planning:
1. Persist actor data to the database when it terminates
2. Integration Kafka
3. Support WAL[2] recovery mode
4. Verify Akka cluster reliability

[1] https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm 

[2] https://en.wikipedia.org/wiki/Write-ahead_logging 


if you have other comments, please let us know.

Thanks,
Lei Zhang

> 在 2019年6月27日,上午9:50,Willem Jiang  写道:
> 
> We just leverage the message broker to make sure Alpha get the
> transaction event from Omega.
> In most cases Alpha don't need to talk back  to Omega, we just need to
> make sure all the transaction message are stored (Alpha can process it
> later).
> 
> If Omega cannot talk the message broker, Omega should abort the
> transaction processing with transport exception.
> 
> Willem Jiang
> 
> Twitter: willemjiang
> Weibo: 姜宁willem
> 
> On Tue, Jun 25, 2019 at 8:42 AM Zhang Lei  wrote:
>> 
>> Hi, Zhang jun
>> 
>>> I just cared about the recovery scan thread design.
>>> Kafka can ensure event message can be consumed by alpha exactly, but 
>>> recovery need know all the participated transaction response to decide 
>>> rollback or commit, so I think scan thread is also necessary.
>> 
>> I am not sure, but I think Akka's persistence can solve this problem you 
>> care about.
>> Of course, this ability needs to be verified
>> 
>> Thanks,
>> Zhang Lei
>> 
>>> 在 2019年6月24日,上午10:46,赵俊  写道:
>>> 
>>> Hi, Zhang Lei
>>> 
 A2 : I think we only need to ensure that the message can be reliably 
 delivered to the state machine, The state machine is only a synchronous 
 record state transition when the transaction is executed normally. At 
 present, the compensation method based on table scan is also asynchronous. 
 I am not sure if I have answered your question, or you can give me more 
 information.
>>> 
>>> If we have a mechanism that ensure main service can collect all the 
>>> participated transaction response from alpha correctly before 
>>> commit/rollback, it is OK.
>>> 
 Q2 : Also we should consider about recovery, it seems that recovery is as 
 same as before based on database.
 A2 : I think the question you care about is how to recover when the alpha 
 is down, this is a little different from the current version.
 1. We can base on Kafka's reliability and control the offset of the topic, 
 one message at a time
 2. Of course, we can also do some extra design for it, such as logging the 
 data log file locally after receiving the Kafka message. Resume the 
 message by reading the data log file when the alpha machine restarts
>>> 
>>> I just cared about the recovery scan thread design.
>>> Kafka can ensure event message can be consumed by alpha exactly, but 
>>> recovery need know all the participated transaction response to decide 
>>> rollback or commit, so I think scan thread is also necessary.
>>> 
>>> 
>>> 
 On Jun 23, 2019, at 1:04 PM, Zhang Lei  wrote:
 
 Hi, Zhao Jun
 
 Thank you for your reply!
 
 This design document does not elaborate on reliability aspects.
 
 My initial thought is this
 
 Q1 : It seems that omega should hold on after consuming the event message 
 from Kafka instead of completing pushing message
 A2 : I think we only need to ensure that the message can be reliably 
 delivered to the state machine, The state machine is only a synchronous 
 record state transition when the transaction is executed normally. At 
 present, the compensation method based on table scan is also asynchronous. 
 I am not sure if I have answered your question, or you can give me more 
 information.
 
 Q2 : Also we should consider about recovery, it seems that recovery is as 
 same as before based on database.
 A2 : I think the question you care about is how to recover when the alpha 
 is down, this is a little different from the current version.
 1. We can base on Kafka's reliability and control the offset of the topic, 
 one message at a time
 2. Of course, we can also do some extra design for it, such as logging the 
 data log file locally