[Openais] WORKAROUND Assert at totemsrp.c:1194 after FAILED TO RECEIVE

Steven Dake Wed, 09 Feb 2011 12:55:21 -0800

Good news on this issue.  Honza has managed to create a reproducer for
this issue using iptables.  His efforts will eventually lead to a fix
for the assertion.


As a good workaround for switches that are not entirely broken, add the
following line to the config totem section:

fail_recv_const: 5000

The current default (50) is too low for most modern switch hardware.

This abort triggered because the aru doesn't increase for 50 token
rotations combined with a defect in how failed to recv conditions are
handled.  By increasing this tunable, the condition should no longer
trigger the errant code.  We will also address the assertion, since it
shouldn't assert...

Big thanks to Honza for helping sort out a reproducer for this issue.

Regards
-steve


On 02/08/2011 05:54 PM, [email protected] wrote:
> Hi Steven,
> 
>> Have a try of the patch i have sent to this ml.  If the issue persists,
>> we can look at more options.
> 
> Thank you for comment.
> 
> Is your patch 2 of the next?
> 
>  * [Openais] [PATCH] When a failed to recv state happens,stop forwarding the 
> token
> 
>  * [Openais] [PATCH] When a failed to recv state happens,stop forwarding the 
> token(take 2)
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> 
> --- Steven Dake <[email protected]> wrote:
> 
>> On 02/07/2011 11:11 PM, [email protected] wrote:
>>> Hi Steven,
>>>
>>> I understood your opinion by mistake.
>>>
>>> We do not have simple test case.
>>>
>>> The phenomenon generated in our environment is the following thing.
>>>
>>> Step 1) corosync constitutes a cluster in 12 nodes.
>>>  * begin communication in TOKEN
>>>
>>> Step 2) One node raises [FAILED TO RECEIVE].
>>>
>>> Step 3) 12 nodes begin the reconfiguration of the cluster again.
>>>
>>> Step 4) The node that occurred fails([FAILED TO RECEIVE]) in an consensus 
>>> of the JOIN
>> communication.
>>>  * Because the node failed in an consensus, node make contents of faildlist 
>>> and proclist same.
>>>  * And this node compares faildlist with proclist and assert-fail happened.
>>>
>>>
>>> When the node that made a cluster stood alone, I think that assert() is 
>>> unnecessary.
>>>
>>> Because the reason is because there is the next processing.
>>>
>>>
>>
>> Have a try of the patch i have sent to this ml.  If the issue persists,
>> we can look at more options.
>>
>> Thanks!
>> -steve
>>
>>
>>>
>>> static void memb_join_process (
>>>     struct totemsrp_instance *instance,
>>>     const struct memb_join *memb_join)
>>> {
>>>     struct srp_addr *proc_list;
>>>     struct srp_addr *failed_list;
>>> (snip)
>>>                             instance->failed_to_recv = 0;
>>>                             srp_addr_copy (&instance->my_proc_list[0],
>>>                                     &instance->my_id);
>>>                             instance->my_proc_list_entries = 1;
>>>                             instance->my_failed_list_entries = 0;
>>>
>>>                             memb_state_commit_token_create (instance);
>>>
>>>                             memb_state_commit_enter (instance);
>>>                             return;
>>>
>>> (snip)
>>>
>>> Best Regards,
>>> Hideo Yamauchi.
>>>
>>>
>>>
>>> --- [email protected] wrote:
>>>
>>>> Hi Steven,
>>>>
>>>>> Hideo,
>>>>>
>>>>> If you have a test case, I can make a patch for you to try.
>>>>>
>>>>
>>>> All right.
>>>>
>>>> We use corosync.1.3.0.
>>>>
>>>> Please send me patch.
>>>>
>>>> Best Regards,
>>>> Hideo Yamauchi.
>>>>
>>>> --- Steven Dake <[email protected]> wrote:
>>>>
>>>>> On 02/06/2011 09:16 PM, [email protected] wrote:
>>>>>> Hi Steven,
>>>>>> Hi Dejan,
>>>>>>
>>>>>>>>>> This code never got a chance to run because on failed_to_recv
>>>>>>>>>> the two sets (my_process_list and my_failed_list) are equal which
>>>>>>>>>> makes the assert fail in memb_consensus_agreed():
>>>>>>
>>>>>> The same problem occurs, and we are troubled, too. 
>>>>>>
>>>>>> How did this argument turn out?
>>>>>>
>>>>>> Best Regards,
>>>>>> Hideo Yamauchi.
>>>>>>
>>>>>
>>>>> Hideo,
>>>>>
>>>>> If you have a test case, I can make a patch for you to try.
>>>>>
>>>>> Regards
>>>>> -steve
>>>>>
>>>>>>
>>>>>> --- Dejan Muhamedagic <[email protected]> wrote:
>>>>>>
>>>>>>> nudge, nudge
>>>>>>>
>>>>>>> On Wed, Jan 05, 2011 at 02:05:55PM +0100, Dejan Muhamedagic wrote:
>>>>>>>> On Tue, Jan 04, 2011 at 01:53:00PM -0700, Steven Dake wrote:
>>>>>>>>> On 12/23/2010 06:14 AM, Dejan Muhamedagic wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> On Wed, Dec 01, 2010 at 05:30:44PM +0200, Vladislav Bogdanov wrote:
>>>>>>>>>>> 01.12.2010 16:32, Dejan Muhamedagic wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Nov 23, 2010 at 12:53:42PM +0200, Vladislav Bogdanov wrote:
>>>>>>>>>>>>> Hi Steven, hi all.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I often see this assert on one of nodes after I stop corosync on 
>>>>>>>>>>>>> some
>>>>>>>>>>>>> another node in newly-setup 4-node cluster.
>>>>>>>>>>>>
>>>>>>>>>>>> Does the assert happen on a node lost event? Or once new
>>>>>>>>>>>> partition is formed?
>>>>>>>>>>>
>>>>>>>>>>> I first noticed it when I rebooted another node, just after console 
>>>>>>>>>>> said
>>>>>>>>>>> that OpenAIS is stopped.
>>>>>>>>>>>
>>>>>>>>>>> Can't say right now, what exactly event did it follow, I'm actually
>>>>>>>>>>> fighting with several problems with corosync, pacemaker, NFS4 and
>>>>>>>>>>> phantom uncorrectable ECC errors simultaneously and I'm a bit lost 
>>>>>>>>>>> with
>>>>>>>>>>> all of them.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> #0  0x00007f51953e49a5 in raise () from /lib64/libc.so.6
>>>>>>>>>>>>> #1  0x00007f51953e6185 in abort () from /lib64/libc.so.6
>>>>>>>>>>>>> #2  0x00007f51953dd935 in __assert_fail () from /lib64/libc.so.6
>>>>>>>>>>>>> #3  0x00007f5196176406 in memb_consensus_agreed
>>>>>>>>>>>>> (instance=0x7f5196554010) at totemsrp.c:1194
>>>>>>>>>>>>> #4  0x00007f519617b2f3 in memb_join_process 
>>>>>>>>>>>>> (instance=0x7f5196554010,
>>>>>>>>>>>>> memb_join=0x262f628) at totemsrp.c:3918
>>>>>>>>>>>>> #5  0x00007f519617b619 in message_handler_memb_join
>>>>>>>>>>>>> (instance=0x7f5196554010, msg=<value optimized out>, 
>>>>>>>>>>>>> msg_len=<value
>>>>>>>>>>>>> optimized out>, endian_conversion_needed=<value optimized out>)
>>>>>>>>>>>>>     at totemsrp.c:4161
>>>>>>>>>>>>> #6  0x00007f5196173ba7 in passive_mcast_recv 
>>>>>>>>>>>>> (rrp_instance=0x2603030,
>>>>>>>>>>>>> iface_no=0, context=<value optimized out>, msg=<value optimized 
>>>>>>>>>>>>> out>,
>>>>>>>>>>>>> msg_len=<value optimized out>) at totemrrp.c:720
>>>>>>>>>>>>> #7  0x00007f5196172b44 in rrp_deliver_fn (context=<value 
>>>>>>>>>>>>> optimized out>,
>>>>>>>>>>>>> msg=0x262f628, msg_len=420) at totemrrp.c:1404
>>>>>>>>>>>>> #8  0x00007f5196171a76 in net_deliver_fn (handle=<value optimized 
>>>>>>>>>>>>> out>,
>>>>>>>>>>>>> fd=<value optimized out>, revents=<value optimized out>, 
>>>>>>>>>>>>> data=0x262ef80)
>>>>>>>>>>>>> at totemudp.c:1244
>>>>>>>>>>>>> #9  0x00007f519616d7f2 in poll_run (handle=4858364909567606784) at
>>>>>>>>>>>>> coropoll.c:510
>>>>>>>>>>>>> #10 0x0000000000406add in main (argc=<value optimized out>, 
>>>>>>>>>>>>> argv=<value
>>>>>>>>>>>>> optimized out>, envp=<value optimized out>) at main.c:1680
>>>>>>>>>>>>>
>>>>>>>>>>>>> Last fplay lines are:
>>>>>>>>>>>>>
>>>>>>>>>>>>> rec=[36124] Log Message=Delivering MCAST message with seq 1366 to
>>>>>>>>>>>>> pending delivery queue
>>>>>>>>>>>>> rec=[36125] Log Message=Delivering MCAST message with seq 1367 to
>>>>>>>>>>>>> pending delivery queue
>>>>>>>>>>>>> rec=[36126] Log Message=Received ringid(10.5.4.52:12660) seq 1366
>>>>>>>>>>>>> rec=[36127] Log Message=Received ringid(10.5.4.52:12660) seq 1367
>>>>>>>>>>>>> rec=[36128] Log Message=Received ringid(10.5.4.52:12660) seq 1366
>>>>>>>>>>>>> rec=[36129] Log Message=Received ringid(10.5.4.52:12660) seq 1367
>>>>>>>>>>>>> rec=[36130] Log Message=releasing messages up to and including 
>>>>>>>>>>>>> 1367
>>>>>>>>>>>>> rec=[36131] Log Message=FAILED TO RECEIVE
>>>>>>>>>>>>> rec=[36132] Log Message=entering GATHER state from 6.
>>>>>>>>>>>>> rec=[36133] Log Message=entering GATHER state from 0.
>>>>>>>>>>>>> Finishing replay: records found [33993]
>>>>>>>>>>>>>
>>>>>>>>>>>>> What could be the reason for this? Bug, switches, memory errors?
>>>>>>>>>>>>
>>>>>>>>>>>> The assertion fails because corosync finds out that
>>>>>>>>>>>> instance->my_proc_list and instance->my_failed_list are
>>>>>>>>>>>> equal. That happens immediately after the "FAILED TO RECEIVE"
>>>>>>>>>>>> message which is issued when fail_recv_const token rotations
>>>>>>>>>>>> happened without any multicast packet received (defaults to 50).
>>>>>>>>>>
>>>>>>>>>> I took a look at the code and the protocol specification again
>>>>>>>>>> and it seems like that assert is not valid since Steve patched
>>>>>>>>>> the part dealing with the "FAILED TO RECEIVE" condition. The
>>>>>>>>>> patch is from 2010-06-03 posted to the list here
>>>>>>>>>> http://marc.info/?l=openais&m=127559807608484&w=2
>>>>>>>>>>
>>>>>>>>>> The last hunk of the patch contains this code (exec/totemsrp.c):
>>>>>>>>>>
>>>>>>>>>> 3933         if (memb_consensus_agreed (instance) && 
>>>>>>>>>> instance->failed_to_recv == 1) {  
>>
>>>>  
>>>>>>>
>>>>>>>>>> 3934                 instance->failed_to_recv = 0;
>>>>>>>>>> 3935                 srp_addr_copy (&instance->my_proc_list[0],
>>>>>>>>>> 3936                     &instance->my_id);
>>>>>>>>>> 3937                 instance->my_proc_list_entries = 1;
>>>>>>>>>> 3938                 instance->my_failed_list_entries = 0;
>>>>>>>>>> 3939            
>>>>>>>>>> 3940                 memb_state_commit_token_create (instance);
>>>>>>>>>> 3941            
>>>>>>>>>> 3942                 memb_state_commit_enter (instance);
>>>>>>>>>> 3943                 return;
>>>>>>>>>> 3944         }
>>>>>>>>>>
>>>>>>>>>> This code never got a chance to run because on failed_to_recv
>>>>>>>>>> the two sets (my_process_list and my_failed_list) are equal which
>>>>>>>>>> makes the assert fail in memb_consensus_agreed():
>>>>>>>>>>
>>>>>>>>>> 1185     memb_set_subtract (token_memb, &token_memb_entries,
>>>>>>>>>> 1186         instance->my_proc_list, instance->my_proc_list_entries,
>>>>>>>>>> 1187         instance->my_failed_list, 
>>>>>>>>>> instance->my_failed_list_entries);
>>>>>>>>>> ...
>>>>>>>>>> 1195     assert (token_memb_entries >= 1);
>>>>>>>>>>
>>>>>>>>>> In other words, it's something like this:
>>>>>>>>>>
>>>>>>>>>>      if A:
>>>>>>>>>>              if memb_consensus_agreed() and failed_to_recv:
>>>>>>>>>>                      form a single node ring and try to recover
>>>>>>>>>>
>>>>>>>>>>      memb_consensus_agreed():
>>>>>>>>>>              assert(!A)
>>>>>>>>>>
>>>>>>>>>> Steve, can you take a look and confirm that this holds.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dejan,
>>>>>>>>>
>>>>>>>>> sorry for delay in response - big backlog which is mostly cleared out 
>>>>>>>>> :)
>>>>>>>>
>>>>>>>> No problem.
>>>>>>>>
>>>>>>>>> The assert definitely isn't correct, but removing it without 
>>>>>>>>> addressing
>>>>>>>>> the contents of the proc and fail lists is also not right.  That would
>>>>>>>>> cause the logic in the if statement at line 3933 not to be executed
>>>>>>>>> (because the first part of the if would evaluate to false)
>>>>>>>>
>>>>>>>> Actually it wouldn't. The agreed variable is set to 1 and it
>>>>>>>> is going to be returned unchanged.
>>>>>>>>
>>>>>>>>> I believe
>>>>>>>>> what we should do is check the "failed_to_recv" value in
>>>>>>>>> memb_consensus_agreed instead of at line 3933.
>>>>>>>>>
>>>>>>>>> The issue with this is memb_state_consensus_timeout_expired which also
>>>>>>>>> executes some 'then' logic where we may not want to execute the
>>>>>>>>> failed_to_recv logic.
>>>>>>>>
>>>>>>>> Perhaps we should just
>>>>>>>>
>>>>>>>> 3933         if (instance->failed_to_recv == 1) {
>>>>>>>>
>>>>>>>> ? In case failed_to_recv both proc and fail lists are equal so
>>>>>>>> checking for memb_consensus_agreed won't make sense, right?
>>>>>>>>
>>>>>>>>> If anyone has a reliable reproducer and can forward to me, I'll test 
>>>>>>>>> out
>>>>>>>>> a change to address this problem.  Really hesitant to change anything 
>>>>>>>>> in
>>>>>>>>> totemsrp without a test case for this problem - its almost perfect ;-)
>>>>>>>>
>>>>>>>> Since the tester upgraded the switch firmware they couldn't
>>>>>>>> reproduce it anymore.
>>>>>>>>
>>>>>>>> Would compiling with these help?
>>>>>>>>
>>>>>>>> /*
>>>>>>>>  * These can be used to test the error recovery algorithms
>>>>>>>>  * #define TEST_DROP_ORF_TOKEN_PERCENTAGE 30
>>>>>>>>  * #define TEST_DROP_COMMIT_TOKEN_PERCENTAGE 30
>>>>>>>>  * #define TEST_DROP_MCAST_PERCENTAGE 50
>>>>>>>>  * #define TEST_RECOVERY_MSG_COUNT 300
>>>>>>>>  */
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Dejan
>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> -steve
>>>>>>>>>
>>>>>>>>>> Dejan
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Openais mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Openais mailing list
>>>>>>>>> [email protected]
>>>>>>>>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>>>>> _______________________________________________
>>>>>>>> Openais mailing list
>>>>>>>> [email protected]
>>>>>>>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>>>> _______________________________________________
>>>>>>> Openais mailing list
>>>>>>> [email protected]
>>>>>>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Openais mailing list
>>>>>> [email protected]
>>>>>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Openais mailing list
>>>> [email protected]
>>>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>
>>>
>>> _______________________________________________
>>> Openais mailing list
>>> [email protected]
>>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>
>>
> 
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

[Openais] WORKAROUND Assert at totemsrp.c:1194 after FAILED TO RECEIVE

Reply via email to