Re: [Sequoia] Not The Last Man Down

Gérard BUNEL Wed, 17 Oct 2007 06:19:08 -0700

Olivier Fambon a écrit :

Gérard BUNEL [17/10/07 12:01]:
Hello,
Héllo Gérard,
We have 2 controllers with each 2 backends. VDB is in autoload config
We stop one controller, then restart it. Using the console we can seethat the VDB is not mounted and the log is as here-under.As the controller as just been shutdonw then restarted, it waseffectively the last man down. So why is sequoia complaining aboutnot beeing the last man down.
If I understand you correctly, you keep the other controller up, thusdoing down-up on one controller does not make it the last man down:there remains one man up ! May be the term "last man down" ismiss-leading. It is meant to identify the last vdb part that wentdown in a full (all parts) vdb shutdown.

So in case only one controller went down, the second remaining up, thiscontrol should not occur.

The logs states that there's an error while trying to join the group.So maybe an appia problem ?
First, a side note: the error shows at group-join time, but this isreally a by product: actually, it is vdb.xml load time: when we checkfor last-man-down condition, IF we are the first vdb part in the group.
Now for your case: according to your logs, the vdb part you areattempting to load is actually the first member in the group:

It is because it didn't find the other controller. It should as itremains up. We use static view in our Appia configuration, so, no use ofmulticast to discover other members of the group:

<channel name="TCP SEQ Channel" template="tcp_sequencer" initialized="yes">

<memorymanagement size="40000000" up_threshold="15000000"down_threshold="7000000" />

       <chsession name="hederalayer">

<parametername="base_view">ptrimd01:27752,btrimd01:27752</parameter><parametername="base_endpoints">ptrimd01,btrimd01</parameter>

               <parameter name="initial_endpoints">ptrimd01</parameter>
               <parameter name="local_address">ptrimd01:27752</parameter>
               <parameter name="local_endpoint">ptrimd01</parameter>
       </chsession>
       <chsession name="suspectl">
               <parameter name="suspect_sweep">10000</parameter>
               <parameter name="suspect_time">30000</parameter>
       </chsession>
</channel>

But currently we do not understand the whole sense of thisconfiguration part.Particularily, the 2 parameters suspect_sweep and suspect_time are a bitobscur for us (configuration elements provided by another sequoia userwho encountered the same problem as us when using multicast-discovering).These 2 parameters seem to be of importance as I've also encountered aloose of sync of the controllers when multiplying these values by 2.

When we restart the whole platform by first recreating recoveryLog oneach controller we do not have any problem. Each VDB is mountedcorrectly and each controller is added to the group.

2007-10-17 11:56:06,201 INFO controller.virtualdatabase.MATISSEDBFirst controller in group MATISSEDB
So either the other controller is down - and we are not in the caseyou describe - or the controller you are restarting can not see theother one - and you have a connectivity issue.
Hope this helps.

A+O.
_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia



--

*Gérard BUNEL
*Chef de Projet
____________________________________________________________________


Technopôle Brest Iroise
Site du Vernis – CS 23866
29238 Brest Cedex 3
Tél : + 33 2 98 05 43 21
Fax : + 33 2 98 05 20 34
www.altran.com <http://www.altran.com>

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

Re: [Sequoia] Not The Last Man Down

Reply via email to