If I understand correctly, you have a cluster configured for 2PBE where
only one SC is available. That means the cluster is by default not persistent
writable (in the twoSafe2PBE state that requires both PBEs to be available).
Ccb operations will be rejected with TRY_AGAIN in this state.
It is possible, in a 2PBE system with only one SC currently available,
to enter the oneSafe2PBE state. This will open up for allowing persistent
writes using only one SC. Such writes will then of course only reach one
of the two PBE files. This should be avoided unless the ccb is urgent, or
necessary as part of the repair of the other SC. See the section titled
'oneSafe2PBE' in the README.2PBE file.
In this particular case, the ccb is quite special. You are updating the
repositoryInitMode, that controls the enablement of the PBE service itself.
If the current state of the repositoryInitMode is SA_IMM_INIT_FROM_FILE
then there should be no problem in applying the CCB. Even if 2PBE is
configured and only one SC is available, when PBE is disabled then the
cluster should be persistentWritable even if the PBE is not available
(it not expected to be running).
But if the current state of the repositoryInitMode is SA_IMM_KEEP_REPOSITORY
then the oneSAfe2PBE needs to be toggled on to allow persistent writes in
2PBE with one SC.
There is also an "escape admin-op" allowing PBE in general to be disabled,
i.e. to change repositoryInitMode to FROM_FILE, when PBE is currently not
persistent-wriable. This is relevant for both 1PBE and 2PBE in cases where
the PBE(s) is/are "permanently" hung or congested. That admin-op is
documented in the original section for PBE (1PBE).
---
** [tickets:#709] saImmRepositoryInit attrib modification returns TRY_AGAIN in
2PBE scenario**
**Status:** assigned
**Created:** Tue Jan 07, 2014 10:23 AM UTC by surender khetavath
**Last Updated:** Tue Jan 07, 2014 11:31 AM UTC
**Owner:** Anders Bjornerstedt
changeset : 4733
setup: 5nodes
Test:
2PBE is configured as per README.2PBE.
SC-1 was active and sc-2 standby
Opensaf on all the 3 payloads is started.
Then opensaf is stopped in this order PL-5,4,3 SC-2.
Now modification to attibute saImmRepositoryInit returns ERR_TRY_AGAIN
immcfg -a saImmRepositoryInit=2 safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan 7 15:42:51 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type <33>
failed with ERROR:18
error - immcfg command timed out (alarm)
SC-1:/etc/opensaf # immcfg -a saImmRepositoryInit=2
safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan 7 15:43:57 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type <33>
failed with ERROR:18
error - immcfg command timed out (alarm)
SC-1:/etc/opensaf # ls -l /var/crash/opensaf/
total 0
SC-1:/etc/opensaf # immlist safRdn=immManagement,safApp=safImmService
Name Type Value(s)
========================================================================
safRdn SA_STRING_T
safRdn=immManagement
saImmRepositoryInit SA_UINT32_T 1 (0x1)
saImmOiTimeout SA_TIME_T <Empty>
saImmNumOis SA_UINT32_T <Empty>
saImmNumInitializedCcbs SA_UINT32_T <Empty>
saImmNumAdminOwnedObjects SA_UINT32_T <Empty>
saImmLastUpdate SA_TIME_T <Empty>
saImmExportFileUri SA_STRING_T <Empty>
SaImmAttrImplementerName SA_STRING_T <Empty>
SaImmAttrClassName SA_STRING_T SaImmMngt
SaImmAttrAdminOwnerName SA_STRING_T <Empty>
SC-1:/etc/opensaf # immcfg -a saImmRepositoryInit=2
safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan 7 15:46:26 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type <33>
failed with ERROR:18
error - immcfg command timed out (alarm)
/var/log/messages:
Jan 7 15:40:38 SC-1 osafamfd[2338]: NO Node 'SC-2' left the cluster
Jan 7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer connected: 69
(MsgQueueService131599) <691, 2010f>
Jan 7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer locally disconnected.
Marking it as doomed 69 <691, 2010f> (MsgQueueService131599)
Jan 7 15:40:38 SC-1 opensaf_reboot: Rebooting remote node in the absence of
PLM is outside the scope of OpenSAF
Jan 7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer disconnected 69 <691,
2010f> (MsgQueueService131599)
Jan 7 15:40:38 SC-1 osafclmd[2319]: ER Node 131855 doesn't exist
Jan 7 15:40:38 SC-1 kernel: [ 457.764164] TIPC: Resetting link
<1.1.1:eth3-1.1.3:eth2>, peer not responding
Jan 7 15:40:38 SC-1 kernel: [ 457.764177] TIPC: Lost link
<1.1.1:eth3-1.1.3:eth2> on network plane A
Jan 7 15:40:38 SC-1 kernel: [ 457.764186] TIPC: Lost contact with <1.1.3>
logs attached.
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets