On 07/06/2011 05:24 PM, Tim Beale wrote:
> Hi,
> 
> We've hit a problem in the recovery code and I'm struggling to understand why
> we do the following:
> 
>       /*
>        * The recovery sort queue now becomes the regular
>        * sort queue.  It is necessary to copy the state
>        * into the regular sort queue.
>        */
>       sq_copy (&instance->regular_sort_queue, &instance->recovery_sort_queue);
> 
> The problem we're seeing is sometimes we get an encapsulated message from the
> recovery queue copied onto the regular queue, and corosync then crashes trying
> to process the message. (When it strips off the totemsrp header it gets 
> another
> totemsrp header rather than the totempg header it expects).
> 
> The problem seems to happen when we only do the sq_items_release() for a 
> subset
> of the recovery messages, e.g. there are 12 messages on the recovery queue and
> we only free/release 5 of them. The remaining encapsulated recovery messages
> get left on the regular queue and corosync crashes trying to deliver them.
> 
> It looks to me like deliver_messages_from_recovery_to_regular() handles the
> encapsulation correctly, stripping the extra header and adding the recovery
> messages to the regular queue. But then the sq_copy() just seems to overwrite
> the regular queue.
> 
> We've avoided the crash in the past by just reiniting both queues, but I don't
> think this is the best solution.
> 

I would expect this solution would lead to message loss or lockup of the
protocol.

> Any advice would be appreciated.
> 
> Thanks,
> Tim

A proper fix should be in commit
master:
7d5e588931e4393c06790995a995ea69e6724c54
flatiron-1.3:
8603ff6e9a270ecec194f4e13780927ebeb9f5b2

A new flatiron-1.3 release is in the works.  There are other totem bugs
you may wish to backport in the meantime.

Let us know if that commit fixes the problem you encountered.

Regards
-steve

> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to