Olivier Fambon a écrit :
Gérard BUNEL [17/10/07 12:01]:
Hello,
Héllo Gérard,
We have 2 controllers with each 2 backends. VDB is in autoload config
We stop one controller, then restart it. Using the console we can see
that the VDB is not mounted and the log is as here-under.
As the controller as just been shutdonw then restarted, it was
effectively the last man down. So why is sequoia complaining about
not beeing the last man down.
If I understand you correctly, you keep the other controller up, thus
doing down-up on one controller does not make it the last man down:
there remains one man up ! May be the term "last man down" is
miss-leading. It is meant to identify the last vdb part that went
down in a full (all parts) vdb shutdown.
So in case only one controller went down, the second remaining up, this
control should not occur.
The logs states that there's an error while trying to join the group.
So maybe an appia problem ?
First, a side note: the error shows at group-join time, but this is
really a by product: actually, it is vdb.xml load time: when we check
for last-man-down condition, IF we are the first vdb part in the group.
Now for your case: according to your logs, the vdb part you are
attempting to load is actually the first member in the group:
It is because it didn't find the other controller. It should as it
remains up. We use static view in our Appia configuration, so, no use of
multicast to discover other members of the group:
<channel name="TCP SEQ Channel" template="tcp_sequencer" initialized="yes">
<memorymanagement size="40000000" up_threshold="15000000"
down_threshold="7000000" />
<chsession name="hederalayer">
<parameter
name="base_view">ptrimd01:27752,btrimd01:27752</parameter>
<parameter
name="base_endpoints">ptrimd01,btrimd01</parameter>
<parameter name="initial_endpoints">ptrimd01</parameter>
<parameter name="local_address">ptrimd01:27752</parameter>
<parameter name="local_endpoint">ptrimd01</parameter>
</chsession>
<chsession name="suspectl">
<parameter name="suspect_sweep">10000</parameter>
<parameter name="suspect_time">30000</parameter>
</chsession>
</channel>
But currently we do not understand the whole sense of this
configuration part.
Particularily, the 2 parameters suspect_sweep and suspect_time are a bit
obscur for us (configuration elements provided by another sequoia user
who encountered the same problem as us when using multicast-discovering).
These 2 parameters seem to be of importance as I've also encountered a
loose of sync of the controllers when multiplying these values by 2.
When we restart the whole platform by first recreating recoveryLog on
each controller we do not have any problem. Each VDB is mounted
correctly and each controller is added to the group.
2007-10-17 11:56:06,201 INFO controller.virtualdatabase.MATISSEDB
First controller in group MATISSEDB
So either the other controller is down - and we are not in the case
you describe - or the controller you are restarting can not see the
other one - and you have a connectivity issue.
Hope this helps.
A+O.
_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia
--
*Gérard BUNEL
*Chef de Projet
____________________________________________________________________
Technopôle Brest Iroise
Site du Vernis – CS 23866
29238 Brest Cedex 3
Tél : + 33 2 98 05 43 21
Fax : + 33 2 98 05 20 34
www.altran.com <http://www.altran.com>
_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia