If I understand correctly, you have a cluster configured for 2PBE where
only one SC is available. That means the cluster is by default not persistent
writable (in the twoSafe2PBE state that requires both PBEs to be available).
Ccb operations will be rejected with TRY_AGAIN in this state. 

It is possible, in a 2PBE system with only one SC currently available,
to enter the oneSafe2PBE state. This will open up for allowing persistent
writes using only one SC. Such writes will then of course only reach one
of the two PBE files. This should be avoided unless the ccb is urgent, or
necessary as part of the repair of the other SC. See the section titled 
'oneSafe2PBE' in the README.2PBE file. 

In this particular case, the ccb is quite special. You are updating the
repositoryInitMode, that controls the enablement of the PBE service itself.
If the current state of the repositoryInitMode is SA_IMM_INIT_FROM_FILE
then there should be no problem in applying the CCB. Even if 2PBE is
configured and only one SC is available, when PBE is disabled then the
cluster should be persistentWritable even if the PBE is not available
(it not expected to be running). 

But if the current state of the repositoryInitMode is SA_IMM_KEEP_REPOSITORY
then the oneSAfe2PBE needs to be toggled on to allow persistent writes in
2PBE with one SC.

There is also an "escape admin-op" allowing PBE in general to be disabled,
i.e. to change repositoryInitMode to FROM_FILE, when PBE is currently not
persistent-wriable. This is relevant for both 1PBE and 2PBE in cases where
the PBE(s) is/are "permanently" hung or congested. That admin-op is 
documented in the original section for PBE (1PBE). 



---

** [tickets:#709] saImmRepositoryInit attrib modification returns TRY_AGAIN in 
2PBE scenario**

**Status:** assigned
**Created:** Tue Jan 07, 2014 10:23 AM UTC by surender khetavath
**Last Updated:** Tue Jan 07, 2014 11:31 AM UTC
**Owner:** Anders Bjornerstedt

changeset : 4733
setup: 5nodes

Test:
2PBE is configured as per README.2PBE.
SC-1 was active and sc-2 standby
Opensaf on all the 3 payloads is started.
Then opensaf is stopped in this order PL-5,4,3 SC-2. 
Now modification to attibute saImmRepositoryInit returns ERR_TRY_AGAIN


immcfg -a saImmRepositoryInit=2 safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan  7 15:42:51 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type <33> 
failed with ERROR:18
error - immcfg command timed out (alarm)
SC-1:/etc/opensaf # immcfg -a saImmRepositoryInit=2 
safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan  7 15:43:57 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type <33> 
failed with ERROR:18
error - immcfg command timed out (alarm)
SC-1:/etc/opensaf # ls -l /var/crash/opensaf/
total 0
SC-1:/etc/opensaf # immlist safRdn=immManagement,safApp=safImmService
Name                                               Type         Value(s)
========================================================================
safRdn                                             SA_STRING_T  
safRdn=immManagement 
saImmRepositoryInit                                SA_UINT32_T  1 (0x1)
saImmOiTimeout                                     SA_TIME_T    <Empty>
saImmNumOis                                        SA_UINT32_T  <Empty>
saImmNumInitializedCcbs                            SA_UINT32_T  <Empty>
saImmNumAdminOwnedObjects                          SA_UINT32_T  <Empty>
saImmLastUpdate                                    SA_TIME_T    <Empty>
saImmExportFileUri                                 SA_STRING_T  <Empty>
SaImmAttrImplementerName                           SA_STRING_T  <Empty>
SaImmAttrClassName                                 SA_STRING_T  SaImmMngt 
SaImmAttrAdminOwnerName                            SA_STRING_T  <Empty>

SC-1:/etc/opensaf # immcfg -a saImmRepositoryInit=2 
safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan  7 15:46:26 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type <33> 
failed with ERROR:18
error - immcfg command timed out (alarm)

/var/log/messages:
Jan  7 15:40:38 SC-1 osafamfd[2338]: NO Node 'SC-2' left the cluster
Jan  7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer connected: 69 
(MsgQueueService131599) <691, 2010f>
Jan  7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer locally disconnected. 
Marking it as doomed 69 <691, 2010f> (MsgQueueService131599)
Jan  7 15:40:38 SC-1 opensaf_reboot: Rebooting remote node in the absence of 
PLM is outside the scope of OpenSAF
Jan  7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer disconnected 69 <691, 
2010f> (MsgQueueService131599)
Jan  7 15:40:38 SC-1 osafclmd[2319]: ER Node 131855 doesn't exist
Jan  7 15:40:38 SC-1 kernel: [  457.764164] TIPC: Resetting link 
<1.1.1:eth3-1.1.3:eth2>, peer not responding
Jan  7 15:40:38 SC-1 kernel: [  457.764177] TIPC: Lost link 
<1.1.1:eth3-1.1.3:eth2> on network plane A
Jan  7 15:40:38 SC-1 kernel: [  457.764186] TIPC: Lost contact with <1.1.3>


logs attached. 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to