Hello Honza and others, It seems, Corosync is not reliable in network partition. Here is the test I ran:
Process P1 on node N1 Process P2 on node N2 Process P3 on node N2 again When all the processes has joined the cluster, this is what happens: 1. P1 in one of its thread, continuously {while(1)} multi-casting messages. 2. All the processes, i.e. P1,P2 and P3, - has separate listening thread, calling {cpg_dispatch()} - inside cpg_deliver_fn_t, printf counter, which represent how many messages has received - inside cpg_confchg_fn_t, put process to sleep as in when any process leaves the cluster First test case: P2 was stopped forcefully. - P1 and P3 received same number of messages before configuration change message - PASS expected result, configuration message order is maintained Second test case: manually pulled the cable connecting node N1 and N2 - P1 received more number of messages in compare to P2 and P3 before configuration change message was delievered - Configuration message was not delivered in order, FAIL In case of network partition, configuration messages are not ordered with regard to 'extended virtual synchrony' property. I believe CPG_TYPE_SAFE implementation is required not only to guarantee that messages are received at all the process but also to guarantee the ordering of configuration messages in network partition. -- Satish On Mon, Jun 6, 2016 at 9:50 PM, satish kumar <satish.kr2...@gmail.com> wrote: > Thanks, really appreciate your help. > > On Mon, Jun 6, 2016 at 9:17 PM, Jan Friesse <jfrie...@redhat.com> wrote: > >> But C1 is *guaranteed *to deliver *before *m(k)? No case where C1 is >>> >> >> Yes >> >> delivered after m(k)? >>> >> >> Nope. >> >> >> >>> >>> Regards, >>> Satish >>> >>> On Mon, Jun 6, 2016 at 8:10 PM, Jan Friesse <jfrie...@redhat.com> wrote: >>> >>> satish kumar napsal(a): >>>> >>>> Hello honza, thanks for the response ! >>>> >>>>> >>>>> With state sync, I simply mean that 'k-1' messages were delivered to >>>>> N1, >>>>> N2 >>>>> and N3 and they have applied these messages to change their program >>>>> state. >>>>> N1.state = apply(m(k-1); >>>>> N2.state = apply(m(k-1); >>>>> N3.state = apply(m(k-1); >>>>> >>>>> The document you shared cleared many doubts. However I still need one >>>>> clarification. >>>>> >>>>> According to the document: >>>>> "The configuration change messages warn the application that a >>>>> membership >>>>> change has occurred, so that the application program can take >>>>> appropriate >>>>> action based on the membership change. Extended virtual synchrony >>>>> guarantees a consistent order of messages delivery across a partition, >>>>> which is essential if the application program are to be able to >>>>> reconcile >>>>> their states following repair of a failed processor or reemerging of >>>>> the >>>>> partitioned network." >>>>> >>>>> I just want to know that this property is not something related to >>>>> CPG_TYPE_SAFE, which is still not implemented. >>>>> Please consider this scenario: >>>>> 0. N1, N2 and N3 has received the message m(k-1). >>>>> 1. N1 mcast(CPG_TYPE_AGREED) m(k) message. >>>>> 2. As it is not CPG_TYPE_SAFE, m(k) delievered to N1 but was not yet >>>>> delivered to N2 and N3. >>>>> 3. Network partition separate N1 from N2 and N3. N2 and N3 can never >>>>> see >>>>> m(k). >>>>> 4. Configuration change message is now delivered to N1, N2 and N3. >>>>> >>>>> Here, N1 will change its state to N1.state = apply(m(k), thinking all >>>>> in >>>>> the current configuration has received the message. >>>>> >>>>> According to your reply it looks like N1 will not receive m(k). So >>>>> this is >>>>> what each node will see: >>>>> N1 will see: m(k-1) -> C1 (config change) >>>>> N2 will see: m(k-1) -> C1 (config change) >>>>> N3 will see: m(k-1) -> C1 (config change) >>>>> >>>>> >>>> For N2 and N3, it's not same C1. So let's call it C2. Because C1 for N1 >>>> is >>>> (N2 and N3 left) and C2 for N2 and N3 is (N1 left). >>>> >>>> >>>> >>>> Message m(k) will be discarded, and will not be delivered to N1 even if >>>>> it >>>>> was sent by N1 before the network partition. >>>>> >>>>> >>>> No. m(k) will be delivered to app running on N1. So N1 will see m(k-1), >>>> C1, m(k). So application exactly knows which node got message m(k). >>>> >>>> Regards, >>>> Honza >>>> >>>> >>>> >>>> This is the expected behavior with CPG_TYPE_AGREED? >>>>> >>>>> Regards, >>>>> Satish >>>>> >>>>> >>>>> On Mon, Jun 6, 2016 at 4:15 PM, Jan Friesse <jfrie...@redhat.com> >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>>> >>>>>> Hello, >>>>>> >>>>>> >>>>>>> Virtual Synchrony Property - messages are delivered in agreed order >>>>>>> and >>>>>>> configuration changes are delivered in agreed order relative to >>>>>>> message. >>>>>>> >>>>>>> What happen to this property when network is partitioned the cluster >>>>>>> into >>>>>>> two. Consider following scenario (which I took from one of the >>>>>>> previous query by Andrei Elkin): >>>>>>> >>>>>>> * N1, N2 and N3 are in state sync with m(k-1) messages are delivered. >>>>>>> >>>>>>> >>>>>>> What exactly you mean by "state sync"? >>>>>> >>>>>> * N1 sends m(k) and just now network partition N1 node from N2 and N3. >>>>>> >>>>>> >>>>>>> Does CPG_TYPE_AGREED guarantee that virtual synchrony is held? >>>>>>> >>>>>>> >>>>>>> Yes it does (actually higher level of VS called EVS) >>>>>> >>>>>> >>>>>> When property is held, configuration change message C1 is guaranteed >>>>>> to >>>>>> >>>>>>> delivered before m(k) to N1. >>>>>>> N1 will see: m(k-1) C1 m(k) >>>>>>> N2 and N3 will see: m(k-1) C1 >>>>>>> >>>>>>> But if this property is violated: >>>>>>> N1 will see: m(k-1) m(k) C1 >>>>>>> N2 and N3 will see: m(k-1) C1 >>>>>>> >>>>>>> Violation will screw any user application running on the cluster. >>>>>>> >>>>>>> Could someone please explain what is the behavior of Corosync in this >>>>>>> scenario with CPG_TYPE_AGREED ordering. >>>>>>> >>>>>>> >>>>>>> For description how exactly totem synchronization works take a look >>>>>> to >>>>>> http://corosync.github.com/corosync/doc/DAAgarwal.thesis.ps.gz >>>>>> >>>>>> After totem is synchronized, there is another level of >>>>>> synchronization of >>>>>> services (not described in above doc). All services synchronize in >>>>>> very >>>>>> similar way, so you can take a look to CPG as example. Basically only >>>>>> state >>>>>> held by CPG is connected clients. So every node sends it's connected >>>>>> clients list to every other node. If sync is aborted (change of >>>>>> membership), it's restarted. These sync messages has priority over >>>>>> user >>>>>> messages (actually it's not possible to send messages during sync). >>>>>> User >>>>>> app can be sure that message was delivered only after it gets it's own >>>>>> message. Also app gets configuration change message so it knows, who >>>>>> got >>>>>> the message. >>>>>> >>>>>> Regards, >>>>>> Honza >>>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>>> Satish >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> Getting started: >>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>> Users mailing list: Users@clusterlabs.org >>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>> >>>>>> Project Home: http://www.clusterlabs.org >>>>>> Getting started: >>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: http://bugs.clusterlabs.org >>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list: Users@clusterlabs.org >>>>> http://clusterlabs.org/mailman/listinfo/users >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Users mailing list: Users@clusterlabs.org >>>> http://clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>>> >>>> >>> >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >>> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > >
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org