[tickets] [opensaf:tickets] #709 saImmRepositoryInit attrib modification returns TRY_AGAIN in 2PBE scenario

2014-01-08 Thread Anders Bjornerstedt
- **status**: assigned -- duplicate
- **assigned_to**: Anders Bjornerstedt --  nobody 
- **Milestone**: 4.4.0 -- never



---

** [tickets:#709] saImmRepositoryInit attrib modification returns TRY_AGAIN in 
2PBE scenario**

**Status:** duplicate
**Created:** Tue Jan 07, 2014 10:23 AM UTC by surender khetavath
**Last Updated:** Tue Jan 07, 2014 04:38 PM UTC
**Owner:** nobody

changeset : 4733
setup: 5nodes

Test:
2PBE is configured as per README.2PBE.
SC-1 was active and sc-2 standby
Opensaf on all the 3 payloads is started.
Then opensaf is stopped in this order PL-5,4,3 SC-2. 
Now modification to attibute saImmRepositoryInit returns ERR_TRY_AGAIN


immcfg -a saImmRepositoryInit=2 safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan  7 15:42:51 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type 33 
failed with ERROR:18
error - immcfg command timed out (alarm)
SC-1:/etc/opensaf # immcfg -a saImmRepositoryInit=2 
safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan  7 15:43:57 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type 33 
failed with ERROR:18
error - immcfg command timed out (alarm)
SC-1:/etc/opensaf # ls -l /var/crash/opensaf/
total 0
SC-1:/etc/opensaf # immlist safRdn=immManagement,safApp=safImmService
Name   Type Value(s)

safRdn SA_STRING_T  
safRdn=immManagement 
saImmRepositoryInitSA_UINT32_T  1 (0x1)
saImmOiTimeout SA_TIME_TEmpty
saImmNumOisSA_UINT32_T  Empty
saImmNumInitializedCcbsSA_UINT32_T  Empty
saImmNumAdminOwnedObjects  SA_UINT32_T  Empty
saImmLastUpdateSA_TIME_TEmpty
saImmExportFileUri SA_STRING_T  Empty
SaImmAttrImplementerName   SA_STRING_T  Empty
SaImmAttrClassName SA_STRING_T  SaImmMngt 
SaImmAttrAdminOwnerNameSA_STRING_T  Empty

SC-1:/etc/opensaf # immcfg -a saImmRepositoryInit=2 
safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan  7 15:46:26 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type 33 
failed with ERROR:18
error - immcfg command timed out (alarm)

/var/log/messages:
Jan  7 15:40:38 SC-1 osafamfd[2338]: NO Node 'SC-2' left the cluster
Jan  7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer connected: 69 
(MsgQueueService131599) 691, 2010f
Jan  7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer locally disconnected. 
Marking it as doomed 69 691, 2010f (MsgQueueService131599)
Jan  7 15:40:38 SC-1 opensaf_reboot: Rebooting remote node in the absence of 
PLM is outside the scope of OpenSAF
Jan  7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer disconnected 69 691, 
2010f (MsgQueueService131599)
Jan  7 15:40:38 SC-1 osafclmd[2319]: ER Node 131855 doesn't exist
Jan  7 15:40:38 SC-1 kernel: [  457.764164] TIPC: Resetting link 
1.1.1:eth3-1.1.3:eth2, peer not responding
Jan  7 15:40:38 SC-1 kernel: [  457.764177] TIPC: Lost link 
1.1.1:eth3-1.1.3:eth2 on network plane A
Jan  7 15:40:38 SC-1 kernel: [  457.764186] TIPC: Lost contact with 1.1.3


logs attached. 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #709 saImmRepositoryInit attrib modification returns TRY_AGAIN in 2PBE scenario

2014-01-07 Thread surender khetavath



---

** [tickets:#709] saImmRepositoryInit attrib modification returns TRY_AGAIN in 
2PBE scenario**

**Status:** unassigned
**Created:** Tue Jan 07, 2014 10:23 AM UTC by surender khetavath
**Last Updated:** Tue Jan 07, 2014 10:23 AM UTC
**Owner:** nobody

changeset : 4733
setup: 5nodes

Test:
2PBE is configured as per README.2PBE.
SC-1 was active and sc-2 standby
Opensaf on all the 3 payloads is started.
Then opensaf is stopped in this order PL-5,4,3 SC-2. 
Now modification to attibute saImmRepositoryInit returns ERR_TRY_AGAIN


immcfg -a saImmRepositoryInit=2 safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan  7 15:42:51 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type 33 
failed with ERROR:18
error - immcfg command timed out (alarm)
SC-1:/etc/opensaf # immcfg -a saImmRepositoryInit=2 
safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan  7 15:43:57 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type 33 
failed with ERROR:18
error - immcfg command timed out (alarm)
SC-1:/etc/opensaf # ls -l /var/crash/opensaf/
total 0
SC-1:/etc/opensaf # immlist safRdn=immManagement,safApp=safImmService
Name   Type Value(s)

safRdn SA_STRING_T  
safRdn=immManagement 
saImmRepositoryInitSA_UINT32_T  1 (0x1)
saImmOiTimeout SA_TIME_TEmpty
saImmNumOisSA_UINT32_T  Empty
saImmNumInitializedCcbsSA_UINT32_T  Empty
saImmNumAdminOwnedObjects  SA_UINT32_T  Empty
saImmLastUpdateSA_TIME_TEmpty
saImmExportFileUri SA_STRING_T  Empty
SaImmAttrImplementerName   SA_STRING_T  Empty
SaImmAttrClassName SA_STRING_T  SaImmMngt 
SaImmAttrAdminOwnerNameSA_STRING_T  Empty

SC-1:/etc/opensaf # immcfg -a saImmRepositoryInit=2 
safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan  7 15:46:26 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type 33 
failed with ERROR:18
error - immcfg command timed out (alarm)

/var/log/messages:
Jan  7 15:40:38 SC-1 osafamfd[2338]: NO Node 'SC-2' left the cluster
Jan  7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer connected: 69 
(MsgQueueService131599) 691, 2010f
Jan  7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer locally disconnected. 
Marking it as doomed 69 691, 2010f (MsgQueueService131599)
Jan  7 15:40:38 SC-1 opensaf_reboot: Rebooting remote node in the absence of 
PLM is outside the scope of OpenSAF
Jan  7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer disconnected 69 691, 
2010f (MsgQueueService131599)
Jan  7 15:40:38 SC-1 osafclmd[2319]: ER Node 131855 doesn't exist
Jan  7 15:40:38 SC-1 kernel: [  457.764164] TIPC: Resetting link 
1.1.1:eth3-1.1.3:eth2, peer not responding
Jan  7 15:40:38 SC-1 kernel: [  457.764177] TIPC: Lost link 
1.1.1:eth3-1.1.3:eth2 on network plane A
Jan  7 15:40:38 SC-1 kernel: [  457.764186] TIPC: Lost contact with 1.1.3


logs attached. 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #709 saImmRepositoryInit attrib modification returns TRY_AGAIN in 2PBE scenario

2014-01-07 Thread Anders Bjornerstedt
- **status**: unassigned -- assigned
- **assigned_to**: Anders Bjornerstedt
- **Milestone**: future -- 4.4.0



---

** [tickets:#709] saImmRepositoryInit attrib modification returns TRY_AGAIN in 
2PBE scenario**

**Status:** assigned
**Created:** Tue Jan 07, 2014 10:23 AM UTC by surender khetavath
**Last Updated:** Tue Jan 07, 2014 10:23 AM UTC
**Owner:** Anders Bjornerstedt

changeset : 4733
setup: 5nodes

Test:
2PBE is configured as per README.2PBE.
SC-1 was active and sc-2 standby
Opensaf on all the 3 payloads is started.
Then opensaf is stopped in this order PL-5,4,3 SC-2. 
Now modification to attibute saImmRepositoryInit returns ERR_TRY_AGAIN


immcfg -a saImmRepositoryInit=2 safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan  7 15:42:51 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type 33 
failed with ERROR:18
error - immcfg command timed out (alarm)
SC-1:/etc/opensaf # immcfg -a saImmRepositoryInit=2 
safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan  7 15:43:57 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type 33 
failed with ERROR:18
error - immcfg command timed out (alarm)
SC-1:/etc/opensaf # ls -l /var/crash/opensaf/
total 0
SC-1:/etc/opensaf # immlist safRdn=immManagement,safApp=safImmService
Name   Type Value(s)

safRdn SA_STRING_T  
safRdn=immManagement 
saImmRepositoryInitSA_UINT32_T  1 (0x1)
saImmOiTimeout SA_TIME_TEmpty
saImmNumOisSA_UINT32_T  Empty
saImmNumInitializedCcbsSA_UINT32_T  Empty
saImmNumAdminOwnedObjects  SA_UINT32_T  Empty
saImmLastUpdateSA_TIME_TEmpty
saImmExportFileUri SA_STRING_T  Empty
SaImmAttrImplementerName   SA_STRING_T  Empty
SaImmAttrClassName SA_STRING_T  SaImmMngt 
SaImmAttrAdminOwnerNameSA_STRING_T  Empty

SC-1:/etc/opensaf # immcfg -a saImmRepositoryInit=2 
safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan  7 15:46:26 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type 33 
failed with ERROR:18
error - immcfg command timed out (alarm)

/var/log/messages:
Jan  7 15:40:38 SC-1 osafamfd[2338]: NO Node 'SC-2' left the cluster
Jan  7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer connected: 69 
(MsgQueueService131599) 691, 2010f
Jan  7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer locally disconnected. 
Marking it as doomed 69 691, 2010f (MsgQueueService131599)
Jan  7 15:40:38 SC-1 opensaf_reboot: Rebooting remote node in the absence of 
PLM is outside the scope of OpenSAF
Jan  7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer disconnected 69 691, 
2010f (MsgQueueService131599)
Jan  7 15:40:38 SC-1 osafclmd[2319]: ER Node 131855 doesn't exist
Jan  7 15:40:38 SC-1 kernel: [  457.764164] TIPC: Resetting link 
1.1.1:eth3-1.1.3:eth2, peer not responding
Jan  7 15:40:38 SC-1 kernel: [  457.764177] TIPC: Lost link 
1.1.1:eth3-1.1.3:eth2 on network plane A
Jan  7 15:40:38 SC-1 kernel: [  457.764186] TIPC: Lost contact with 1.1.3


logs attached. 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #709 saImmRepositoryInit attrib modification returns TRY_AGAIN in 2PBE scenario

2014-01-07 Thread Anders Bjornerstedt
If I understand correctly, you have a cluster configured for 2PBE where
only one SC is available. That means the cluster is by default not persistent
writable (in the twoSafe2PBE state that requires both PBEs to be available).
Ccb operations will be rejected with TRY_AGAIN in this state. 

It is possible, in a 2PBE system with only one SC currently available,
to enter the oneSafe2PBE state. This will open up for allowing persistent
writes using only one SC. Such writes will then of course only reach one
of the two PBE files. This should be avoided unless the ccb is urgent, or
necessary as part of the repair of the other SC. See the section titled 
'oneSafe2PBE' in the README.2PBE file. 

In this particular case, the ccb is quite special. You are updating the
repositoryInitMode, that controls the enablement of the PBE service itself.
If the current state of the repositoryInitMode is SA_IMM_INIT_FROM_FILE
then there should be no problem in applying the CCB. Even if 2PBE is
configured and only one SC is available, when PBE is disabled then the
cluster should be persistentWritable even if the PBE is not available
(it not expected to be running). 

But if the current state of the repositoryInitMode is SA_IMM_KEEP_REPOSITORY
then the oneSAfe2PBE needs to be toggled on to allow persistent writes in
2PBE with one SC.

There is also an escape admin-op allowing PBE in general to be disabled,
i.e. to change repositoryInitMode to FROM_FILE, when PBE is currently not
persistent-wriable. This is relevant for both 1PBE and 2PBE in cases where
the PBE(s) is/are permanently hung or congested. That admin-op is 
documented in the original section for PBE (1PBE). 



---

** [tickets:#709] saImmRepositoryInit attrib modification returns TRY_AGAIN in 
2PBE scenario**

**Status:** assigned
**Created:** Tue Jan 07, 2014 10:23 AM UTC by surender khetavath
**Last Updated:** Tue Jan 07, 2014 11:31 AM UTC
**Owner:** Anders Bjornerstedt

changeset : 4733
setup: 5nodes

Test:
2PBE is configured as per README.2PBE.
SC-1 was active and sc-2 standby
Opensaf on all the 3 payloads is started.
Then opensaf is stopped in this order PL-5,4,3 SC-2. 
Now modification to attibute saImmRepositoryInit returns ERR_TRY_AGAIN


immcfg -a saImmRepositoryInit=2 safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan  7 15:42:51 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type 33 
failed with ERROR:18
error - immcfg command timed out (alarm)
SC-1:/etc/opensaf # immcfg -a saImmRepositoryInit=2 
safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan  7 15:43:57 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type 33 
failed with ERROR:18
error - immcfg command timed out (alarm)
SC-1:/etc/opensaf # ls -l /var/crash/opensaf/
total 0
SC-1:/etc/opensaf # immlist safRdn=immManagement,safApp=safImmService
Name   Type Value(s)

safRdn SA_STRING_T  
safRdn=immManagement 
saImmRepositoryInitSA_UINT32_T  1 (0x1)
saImmOiTimeout SA_TIME_TEmpty
saImmNumOisSA_UINT32_T  Empty
saImmNumInitializedCcbsSA_UINT32_T  Empty
saImmNumAdminOwnedObjects  SA_UINT32_T  Empty
saImmLastUpdateSA_TIME_TEmpty
saImmExportFileUri SA_STRING_T  Empty
SaImmAttrImplementerName   SA_STRING_T  Empty
SaImmAttrClassName SA_STRING_T  SaImmMngt 
SaImmAttrAdminOwnerNameSA_STRING_T  Empty

SC-1:/etc/opensaf # immcfg -a saImmRepositoryInit=2 
safRdn=immManagement,safApp=safImmService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
Jan  7 15:46:26 SC-1 osafimmnd[2283]: NO Precheck of fevs message of type 33 
failed with ERROR:18
error - immcfg command timed out (alarm)

/var/log/messages:
Jan  7 15:40:38 SC-1 osafamfd[2338]: NO Node 'SC-2' left the cluster
Jan  7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer connected: 69 
(MsgQueueService131599) 691, 2010f
Jan  7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer locally disconnected. 
Marking it as doomed 69 691, 2010f (MsgQueueService131599)
Jan  7 15:40:38 SC-1 opensaf_reboot: Rebooting remote node in the absence of 
PLM is outside the scope of OpenSAF
Jan  7 15:40:38 SC-1 osafimmnd[2283]: NO Implementer disconnected 69 691, 
2010f (MsgQueueService131599)
Jan  7 15:40:38 SC-1 osafclmd[2319]: ER Node 131855 doesn't exist
Jan  7 15:40:38 SC-1 kernel: [  457.764164] TIPC: Resetting link 
1.1.1:eth3-1.1.3:eth2, peer not responding
Jan  7 15:40:38 SC-1 kernel: [  457.764177] TIPC: Lost link 
1.1.1:eth3-1.1.3:eth2 on network plane A
Jan