[tickets] [opensaf:tickets] #1055 dependent si assignments are not removed

2014-09-10 Thread Praveen
- **status**: unassigned -- assigned
- **assigned_to**: Praveen
- **Milestone**: 4.3.3 -- 4.4.1



---

** [tickets:#1055] dependent si assignments are not removed**

**Status:** assigned
**Milestone:** 4.4.1
**Created:** Tue Sep 09, 2014 09:59 AM UTC by surender khetavath
**Last Updated:** Wed Sep 10, 2014 06:44 AM UTC
**Owner:** Praveen

changeset : 5697
model : 2n
configuration : 1App,1SG,5SUs with 3comps each, 5SIs with 3CSIs each
si-si deps configured as SI1-SI2-SI3-SI4.
SU1 is active, SU2 is standby.
SU1 is mapped to SC-1 and SU2 to SC-2,SU3 to PL-3 and SU4,5 to PL-4
saAmfSGAutoRepair=1(True)
SuFailover=1(True)

Test:
lock sponsor si
when remove cbk is received, report error on self through errorReport api

SI1 is seen in locked state but the dependent assignments are not removed.


safSi=TWONSI1,safApp=TWONAPP
saAmfSIAdminState=LOCKED(2)
saAmfSIAssignmentState=UNASSIGNED(1)
safSi=TWONSI2,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=UNASSIGNED(1)
safSi=TWONSI3,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)
safSi=TWONSI5,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)
safSi=TWONSI4,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)

safSu=SU1,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)
safSu=SU2,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)
safSu=SU3,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=DISABLED(2)
saAmfSUPresenceState=UNINSTANTIATED(1)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSu=SU4,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=DISABLED(2)
saAmfSUPresenceState=UNINSTANTIATED(1)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSu=SU5,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)

safSISU=safSu=SU1\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI5,safApp=TWONAPP
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU1\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI4,safApp=TWONAPP
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=SU1\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI3,safApp=TWONAPP
saAmfSISUHAState=STANDBY(2)




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191iu=/4140/ostg.clktrk___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1055 dependent si assignments are not removed

2014-09-10 Thread Praveen
- **status**: assigned -- duplicate
- **Milestone**: 4.4.1 -- never
- **Comment**:

Issue is reproducible and #1015 solves it. Hence marking it duplicate of #1015.



---

** [tickets:#1055] dependent si assignments are not removed**

**Status:** duplicate
**Milestone:** never
**Created:** Tue Sep 09, 2014 09:59 AM UTC by surender khetavath
**Last Updated:** Wed Sep 10, 2014 06:51 AM UTC
**Owner:** Praveen

changeset : 5697
model : 2n
configuration : 1App,1SG,5SUs with 3comps each, 5SIs with 3CSIs each
si-si deps configured as SI1-SI2-SI3-SI4.
SU1 is active, SU2 is standby.
SU1 is mapped to SC-1 and SU2 to SC-2,SU3 to PL-3 and SU4,5 to PL-4
saAmfSGAutoRepair=1(True)
SuFailover=1(True)

Test:
lock sponsor si
when remove cbk is received, report error on self through errorReport api

SI1 is seen in locked state but the dependent assignments are not removed.


safSi=TWONSI1,safApp=TWONAPP
saAmfSIAdminState=LOCKED(2)
saAmfSIAssignmentState=UNASSIGNED(1)
safSi=TWONSI2,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=UNASSIGNED(1)
safSi=TWONSI3,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)
safSi=TWONSI5,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)
safSi=TWONSI4,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)

safSu=SU1,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)
safSu=SU2,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)
safSu=SU3,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=DISABLED(2)
saAmfSUPresenceState=UNINSTANTIATED(1)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSu=SU4,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=DISABLED(2)
saAmfSUPresenceState=UNINSTANTIATED(1)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSu=SU5,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)

safSISU=safSu=SU1\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI5,safApp=TWONAPP
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU1\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI4,safApp=TWONAPP
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=SU1\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI3,safApp=TWONAPP
saAmfSISUHAState=STANDBY(2)




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191iu=/4140/ostg.clktrk___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1057 (2PBE) Slave PBE restarts multiple times

2014-09-10 Thread Sirisha Alla



---

** [tickets:#1057] (2PBE) Slave PBE restarts multiple times**

**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Wed Sep 10, 2014 07:00 AM UTC by Sirisha Alla
**Last Updated:** Wed Sep 10, 2014 07:00 AM UTC
**Owner:** nobody

The issue is seen on SLES X86. Opensaf is up with changeset 5697 and 2pbe is 
loaded with 50k objects.

When CCBs are in progress, prto object creation is attempted. Following is the 
syslog of SLOT2 when slave pbe restarted

Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Ccb 644 COMMITTED (exowner)
Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmnd[6344]: WA Create of PERSISTENT 
runtime object 'DistObj3=DistRunTime,DistObj1=DistRunTime' REVERTED. PBE rc:20
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Implementer disconnected 
579 0, 2040f (implementertestMA_verifyObjImplReleaseModCallbackNode_101_133)
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Implementer disconnected 
581 0, 2040f (@applier1testMA_verifyObjImplReleaseModCallbackNode_101_133)
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: IN Delaying class delete at slave 
PBE due to ongoing commit of ccb:284/644
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: NO Slave PBE time-out in waiting 
on porepare for PRTO create ccb:105a5 
dn:DistObj3=DistRunTime,DistObj1=DistRunTime
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: NO 2PBE Error (21) in PRTO create 
(ccbId:105a5)
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: ER PBE-B got completed callback 
for Ccb:284/644 before prepare from PBE-A
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: WA PBE slave exiting in prepare 
for ccb 284/644, file should be regenerated.


Sep  9 15:55:42 SLES-64BIT-SLOT2 osafimmnd[6344]: WA SLAVE PBE process has 
apparently died at non coord
Sep  9 15:55:42 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Delete of class 
testMA_verifyObjImplReleaseModCallbackNode_101_133 is PERSISTENT.
Sep  9 15:55:42 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Implementer disconnected 
580 0, 2030f (@applier2testMA_verifyObjImplReleaseModCallbackNode_101_133)
Sep  9 15:55:43 SLES-64BIT-SLOT2 osafimmnd[6344]: NO STARTING SLAVE PBE process.
Sep  9 15:55:43 SLES-64BIT-SLOT2 osafimmnd[6344]: NO 
pbe-db-file-path:/home/sirisha/immsv/immpbe//imm.db.2020f VETERAN:1 B:1
Sep  9 15:55:43 SLES-64BIT-SLOT2 osafimmpbed: logtrace: trace enabled to file 
/var/log/opensaf/osafimmnd, mask=0x
Sep  9 15:55:43 SLES-64BIT-SLOT2 osafimmpbed: IN arg[0] == 

[tickets] [opensaf:tickets] #1057 (2PBE) Slave PBE restarts multiple times

2014-09-10 Thread Sirisha Alla
SC-2 logs


Attachment: SLOT2.tar.bz2 (28.3 MB; application/x-bzip) 


---

** [tickets:#1057] (2PBE) Slave PBE restarts multiple times**

**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Wed Sep 10, 2014 07:00 AM UTC by Sirisha Alla
**Last Updated:** Wed Sep 10, 2014 07:00 AM UTC
**Owner:** nobody

The issue is seen on SLES X86. Opensaf is up with changeset 5697 and 2pbe is 
loaded with 50k objects.

When CCBs are in progress, prto object creation is attempted. Following is the 
syslog of SLOT2 when slave pbe restarted

Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Ccb 644 COMMITTED (exowner)
Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmnd[6344]: WA Create of PERSISTENT 
runtime object 'DistObj3=DistRunTime,DistObj1=DistRunTime' REVERTED. PBE rc:20
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Implementer disconnected 
579 0, 2040f (implementertestMA_verifyObjImplReleaseModCallbackNode_101_133)
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Implementer disconnected 
581 0, 2040f (@applier1testMA_verifyObjImplReleaseModCallbackNode_101_133)
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: IN Delaying class delete at slave 
PBE due to ongoing commit of ccb:284/644
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: NO Slave PBE time-out in waiting 
on porepare for PRTO create ccb:105a5 
dn:DistObj3=DistRunTime,DistObj1=DistRunTime
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: NO 2PBE Error (21) in PRTO create 
(ccbId:105a5)
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: ER PBE-B got completed callback 
for Ccb:284/644 before prepare from PBE-A
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: WA PBE slave exiting in prepare 
for ccb 284/644, file should be regenerated.


Sep  9 15:55:42 SLES-64BIT-SLOT2 osafimmnd[6344]: WA SLAVE PBE process has 
apparently died at non coord
Sep  9 15:55:42 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Delete of class 
testMA_verifyObjImplReleaseModCallbackNode_101_133 is PERSISTENT.
Sep  9 15:55:42 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Implementer disconnected 
580 0, 2030f (@applier2testMA_verifyObjImplReleaseModCallbackNode_101_133)
Sep  9 15:55:43 SLES-64BIT-SLOT2 osafimmnd[6344]: NO STARTING SLAVE PBE process.
Sep  9 15:55:43 SLES-64BIT-SLOT2 osafimmnd[6344]: NO 
pbe-db-file-path:/home/sirisha/immsv/immpbe//imm.db.2020f VETERAN:1 B:1
Sep  9 15:55:43 SLES-64BIT-SLOT2 osafimmpbed: logtrace: trace enabled to file 
/var/log/opensaf/osafimmnd, mask=0x
Sep  9 

[tickets] [opensaf:tickets] #1057 (2PBE) Slave PBE restarts multiple times

2014-09-10 Thread Sirisha Alla
SC-1 logs. Since the logs are very huge I tried to trim the immnd traces to the 
timing of the issue


Attachment: SLOT1.tar.bz2 (1.6 MB; application/x-bzip) 


---

** [tickets:#1057] (2PBE) Slave PBE restarts multiple times**

**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Wed Sep 10, 2014 07:00 AM UTC by Sirisha Alla
**Last Updated:** Wed Sep 10, 2014 07:06 AM UTC
**Owner:** nobody

The issue is seen on SLES X86. Opensaf is up with changeset 5697 and 2pbe is 
loaded with 50k objects.

When CCBs are in progress, prto object creation is attempted. Following is the 
syslog of SLOT2 when slave pbe restarted

Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Ccb 644 COMMITTED (exowner)
Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmnd[6344]: WA Create of PERSISTENT 
runtime object 'DistObj3=DistRunTime,DistObj1=DistRunTime' REVERTED. PBE rc:20
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Implementer disconnected 
579 0, 2040f (implementertestMA_verifyObjImplReleaseModCallbackNode_101_133)
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Implementer disconnected 
581 0, 2040f (@applier1testMA_verifyObjImplReleaseModCallbackNode_101_133)
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: IN Delaying class delete at slave 
PBE due to ongoing commit of ccb:284/644
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: NO Slave PBE time-out in waiting 
on porepare for PRTO create ccb:105a5 
dn:DistObj3=DistRunTime,DistObj1=DistRunTime
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: NO 2PBE Error (21) in PRTO create 
(ccbId:105a5)
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: ER PBE-B got completed callback 
for Ccb:284/644 before prepare from PBE-A
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: WA PBE slave exiting in prepare 
for ccb 284/644, file should be regenerated.


Sep  9 15:55:42 SLES-64BIT-SLOT2 osafimmnd[6344]: WA SLAVE PBE process has 
apparently died at non coord
Sep  9 15:55:42 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Delete of class 
testMA_verifyObjImplReleaseModCallbackNode_101_133 is PERSISTENT.
Sep  9 15:55:42 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Implementer disconnected 
580 0, 2030f (@applier2testMA_verifyObjImplReleaseModCallbackNode_101_133)
Sep  9 15:55:43 SLES-64BIT-SLOT2 osafimmnd[6344]: NO STARTING SLAVE PBE process.
Sep  9 15:55:43 SLES-64BIT-SLOT2 osafimmnd[6344]: NO 
pbe-db-file-path:/home/sirisha/immsv/immpbe//imm.db.2020f VETERAN:1 B:1
Sep  9 15:55:43 SLES-64BIT-SLOT2 

[tickets] [opensaf:tickets] #1057 (2PBE) Slave PBE restarts multiple times

2014-09-10 Thread Anders Bjornerstedt
- **status**: unassigned -- accepted
- **assigned_to**: Anders Bjornerstedt
- **Milestone**: 4.3.3 -- 4.4.1
- **Comment**:

Is the test that provokes this problem a new test (recently introduced)
or is it an old test that has previously not provoked these problems?

2PBE was introduced in OpensF 4.4 so the test could in principle have
been created in relation to 4.4





---

** [tickets:#1057] (2PBE) Slave PBE restarts multiple times**

**Status:** accepted
**Milestone:** 4.4.1
**Created:** Wed Sep 10, 2014 07:00 AM UTC by Sirisha Alla
**Last Updated:** Wed Sep 10, 2014 07:23 AM UTC
**Owner:** Anders Bjornerstedt

The issue is seen on SLES X86. Opensaf is up with changeset 5697 and 2pbe is 
loaded with 50k objects.

When CCBs are in progress, prto object creation is attempted. Following is the 
syslog of SLOT2 when slave pbe restarted

Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Ccb 644 COMMITTED (exowner)
Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:37 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:38 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:39 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: IN ccb-prepare received at PBE 
slave ccbId:105a5/4294968741 numOps:1
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: NO Prepare 
ccb:105a5/4294968741 received at Pbe slave when Prior Ccb 644 still 
processing
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmnd[6344]: WA Create of PERSISTENT 
runtime object 'DistObj3=DistRunTime,DistObj1=DistRunTime' REVERTED. PBE rc:20
Sep  9 15:55:40 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare 
from primary on PRTO create ccb:105a5
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Implementer disconnected 
579 0, 2040f (implementertestMA_verifyObjImplReleaseModCallbackNode_101_133)
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Implementer disconnected 
581 0, 2040f (@applier1testMA_verifyObjImplReleaseModCallbackNode_101_133)
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: IN Delaying class delete at slave 
PBE due to ongoing commit of ccb:284/644
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: NO Slave PBE time-out in waiting 
on porepare for PRTO create ccb:105a5 
dn:DistObj3=DistRunTime,DistObj1=DistRunTime
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: NO 2PBE Error (21) in PRTO create 
(ccbId:105a5)
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: ER PBE-B got completed callback 
for Ccb:284/644 before prepare from PBE-A
Sep  9 15:55:41 SLES-64BIT-SLOT2 osafimmpbed: WA PBE slave exiting in prepare 
for ccb 284/644, file should be regenerated.


Sep  9 15:55:42 SLES-64BIT-SLOT2 osafimmnd[6344]: WA SLAVE PBE process has 
apparently died at non coord
Sep  9 15:55:42 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Delete of class 
testMA_verifyObjImplReleaseModCallbackNode_101_133 is PERSISTENT.
Sep  9 15:55:42 SLES-64BIT-SLOT2 osafimmnd[6344]: NO Implementer disconnected 
580 0, 2030f (@applier2testMA_verifyObjImplReleaseModCallbackNode_101_133)
Sep  9 15:55:43 

[tickets] [opensaf:tickets] #1058 AMF: error report notifications sent with alarm event type

2014-09-10 Thread Hans Feldt



---

** [tickets:#1058] AMF: error report notifications sent with alarm event type**

**Status:** unassigned
**Milestone:** 4.5.0
**Created:** Wed Sep 10, 2014 08:02 AM UTC by Hans Feldt
**Last Updated:** Wed Sep 10, 2014 08:02 AM UTC
**Owner:** nobody

https://sourceforge.net/p/opensaf/tickets/106/

AMF cannot send out notifications as alarms. We need to fix this properly or 
stop sending these.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191iu=/4140/ostg.clktrk___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1050 amfnd sometimes fails to start due to ERR_LIBRARY from saImmOmInitialize

2014-09-10 Thread Hans Feldt
- **status**: accepted -- review



---

** [tickets:#1050] amfnd sometimes fails to start due to ERR_LIBRARY from 
saImmOmInitialize**

**Status:** review
**Milestone:** 4.5.0
**Created:** Tue Sep 09, 2014 07:08 AM UTC by Hans Feldt
**Last Updated:** Tue Sep 09, 2014 07:08 AM UTC
**Owner:** Hans Feldt

With MDS/TIPC amfnd randomly fails to start causing failed opensaf start.

osafimmnd logs the infamous immnd_evt_proc_imm_init: ... MDS problem?

Reason is a random timing variation of the TIPC topology DOWN event. This 
sometimes causes the DOWN event to wrongly delete a newly added process_info 
entry.

The trigger for this problem is that some IMM clients in opensaf like amfnd 
does not reuse IMM handles but initialize/finalize in a far from optimal way. 
This should also be fixed.

The solution under test consists of two parts:
1) The MDS down event just starts a timer in MDS, when the timeout event 
happens the process_info entry is deleted.

2) A new explicit disconnect() is added to the MDS API which is used by IMMA 
library when it is about to close down the whole core library.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191iu=/4140/ostg.clktrk___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1059 2PBE: cluster reset observed during switchovers

2014-09-10 Thread Sirisha Alla



---

** [tickets:#1059] 2PBE: cluster reset observed during switchovers**

**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Wed Sep 10, 2014 09:57 AM UTC by Sirisha Alla
**Last Updated:** Wed Sep 10, 2014 09:57 AM UTC
**Owner:** nobody

The issue is seen on SLES X86. OpenSAF is running with changeset 5697 with 2PBE 
with 50k application objects.

Switchovers with IMM application running is in progress when the issue is 
observed.

Syslog on SC-1:

Sep 10 14:56:47 SLES-64BIT-SLOT1 osafamfnd[7540]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
182 0, 2020f (@OpenSafImmReplicatorB)
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for PRTA update Ccb:10063/4294967395
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Slave PBE replied with OK on 
attempt to start prepare of ccb:10063/4294967395
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for Ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer (applier) 
connected: 193 (@OpenSafImmReplicatorB) 0, 2020f
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: IN Slave PBE replied with OK on 
attempt to start prepare of ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for PRTA update Ccb:10064/4294967396
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Ccb 6 COMMITTED (SetUp_Ccb)
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:49 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:49 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:50 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: WA Start prepare for ccb: 
10064/4294967396 towards slave PBE returned: '6' from sttandby PBE
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: WA PBE-A failed to prepare PRTA 
update Ccb:10064/4294967396 towards PBE-B
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: NO 2PBE Error (20) in PRTA update 
(ccbId:10064)
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmnd[7448]: WA update of PERSISTENT 
runtime attributes in object 'safNode=PL-4,safCluster=myClmCluster' REVERTED. 
PBE rc:20
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer locally 
disconnected. Marking it as doomed 190 17, 2010f (safClmService)
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafclmd[7511]: ER saImmOiImplementerSet 
failed rc:14, exiting
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafamfnd[7540]: NO 
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafamfnd[7540]: ER 
safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafamfnd[7540]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60
Sep 10 14:56:51 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; 
timeout=60
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
190 17, 2010f (safClmService)
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: ER PBE PRTAttrs Update 
continuation missing! invoc:100
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
16 0, 2020f (@OpenSafImmPBE)
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
17 0, 2020f (OsafImmPbeRt_B)
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: WA Timeout on syncronous 
admin operation 108
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmpbed: WA Failed to delete class towards 
slave PBE. Library or immsv replied Rc:5 - ignoring

Syslog on SC-2

Sep 10 14:56:57 SLES-64BIT-SLOT2 osafamfd[6896]: ER Failed to stop cluster 
tracking 5
Sep 10 14:56:57 SLES-64BIT-SLOT2 osafamfd[6896]: ER ClmTrack stop failed
Sep 10 14:56:57 SLES-64BIT-SLOT2 osaffmd[6816]: NO Current role: ACTIVE
Sep 10 14:56:57 SLES-64BIT-SLOT2 osaffmd[6816]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 
131599, SupervisionTime = 60
Sep 10 14:56:57 SLES-64BIT-SLOT2 osafimmd[6826]: 

[tickets] [opensaf:tickets] #1059 2PBE: cluster reset observed during switchovers

2014-09-10 Thread Sirisha Alla
SC-2 logs


Attachment: SLOT2.tar.bz2 (8.9 MB; application/x-bzip) 


---

** [tickets:#1059] 2PBE: cluster reset observed during switchovers**

**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Wed Sep 10, 2014 09:57 AM UTC by Sirisha Alla
**Last Updated:** Wed Sep 10, 2014 09:57 AM UTC
**Owner:** nobody

The issue is seen on SLES X86. OpenSAF is running with changeset 5697 with 2PBE 
with 50k application objects.

Switchovers with IMM application running is in progress when the issue is 
observed.

Syslog on SC-1:

Sep 10 14:56:47 SLES-64BIT-SLOT1 osafamfnd[7540]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
182 0, 2020f (@OpenSafImmReplicatorB)
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for PRTA update Ccb:10063/4294967395
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Slave PBE replied with OK on 
attempt to start prepare of ccb:10063/4294967395
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for Ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer (applier) 
connected: 193 (@OpenSafImmReplicatorB) 0, 2020f
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: IN Slave PBE replied with OK on 
attempt to start prepare of ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for PRTA update Ccb:10064/4294967396
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Ccb 6 COMMITTED (SetUp_Ccb)
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:49 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:49 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:50 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: WA Start prepare for ccb: 
10064/4294967396 towards slave PBE returned: '6' from sttandby PBE
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: WA PBE-A failed to prepare PRTA 
update Ccb:10064/4294967396 towards PBE-B
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: NO 2PBE Error (20) in PRTA update 
(ccbId:10064)
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmnd[7448]: WA update of PERSISTENT 
runtime attributes in object 'safNode=PL-4,safCluster=myClmCluster' REVERTED. 
PBE rc:20
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer locally 
disconnected. Marking it as doomed 190 17, 2010f (safClmService)
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafclmd[7511]: ER saImmOiImplementerSet 
failed rc:14, exiting
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafamfnd[7540]: NO 
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafamfnd[7540]: ER 
safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafamfnd[7540]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60
Sep 10 14:56:51 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; 
timeout=60
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
190 17, 2010f (safClmService)
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: ER PBE PRTAttrs Update 
continuation missing! invoc:100
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
16 0, 2020f (@OpenSafImmPBE)
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
17 0, 2020f (OsafImmPbeRt_B)
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: WA Timeout on syncronous 
admin operation 108
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmpbed: WA Failed to delete class towards 
slave PBE. Library or immsv replied Rc:5 - ignoring

Syslog on SC-2

Sep 10 14:56:57 SLES-64BIT-SLOT2 osafamfd[6896]: ER Failed to stop cluster 
tracking 5
Sep 10 14:56:57 SLES-64BIT-SLOT2 osafamfd[6896]: ER ClmTrack stop failed
Sep 10 14:56:57 SLES-64BIT-SLOT2 osaffmd[6816]: NO Current role: ACTIVE
Sep 10 14:56:57 SLES-64BIT-SLOT2 osaffmd[6816]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 
131599, 

[tickets] [opensaf:tickets] #1060 AMF: reset of cluster startup timer does not happen (#76)

2014-09-10 Thread Hans Feldt



---

** [tickets:#1060] AMF: reset of cluster startup timer does not happen (#76)**

**Status:** unassigned
**Milestone:** future
**Created:** Wed Sep 10, 2014 10:18 AM UTC by Hans Feldt
**Last Updated:** Wed Sep 10, 2014 10:18 AM UTC
**Owner:** nobody


When all SUs are ENABLED and INSTANTIATED it was intended in #76 that 
assignments should be done and not wait for the cluster start timeout. This 
does not work. It is not a problem since once the timer expires assignments 
happen but it waste time in testing...

Analysis:

The trigger is the state change from DISABLED to ENABLED, in sgproc.cc and the 
call to cluster_su_instantiation_done()

cluster_su_instantiation_done() checks the SUs presence state for INSTANTIATED. 
Since that is not yet updated if returns False. When the SU goes to 
INSTANTIATED, no new check is done.

Logs:

15 12:03:00 09/10/2014 NO safApp=safAmfService Starting cluster 
startup timer
16 12:03:01 09/10/2014 NO safApp=safAmfService 
safSu=PL-4,safSg=NoRed,safApp=OpenSAF PresenceState INSTANTIATING = 
INSTANTIATED
17 12:03:01 09/10/2014 NO safApp=safAmfService 
safSu=PL-4,safSg=NoRed,safApp=OpenSAF OperState DISABLED = ENABLED
18 12:03:01 09/10/2014 NO safApp=safAmfService 
safSu=PL-4,safSg=NoRed,safApp=OpenSAF ReadinessState OUT_OF_SERVICE = 
IN_SERVICE
19 12:03:01 09/10/2014 NO safApp=safAmfService 
safSi=NoRed1,safApp=OpenSAF assigned to safSu=PL-4,safSg=NoRed,safApp=OpenSAF 
HA State 'ACTIVE'
20 12:03:01 09/10/2014 NO safApp=safAmfService 
safSu=PL-5,safSg=NoRed,safApp=OpenSAF PresenceState INSTANTIATING = 
INSTANTIATED
21 12:03:02 09/10/2014 NO safApp=safAmfService 
safSu=PL-5,safSg=NoRed,safApp=OpenSAF OperState DISABLED = ENABLED
22 12:03:02 09/10/2014 NO safApp=safAmfService 
safSu=PL-5,safSg=NoRed,safApp=OpenSAF ReadinessState OUT_OF_SERVICE = 
IN_SERVICE
23 12:03:02 09/10/2014 NO safApp=safAmfService 
safSu=PL-3,safSg=NoRed,safApp=OpenSAF PresenceState INSTANTIATING = 
INSTANTIATED
24 12:03:02 09/10/2014 NO safApp=safAmfService 
safSu=PL-3,safSg=NoRed,safApp=OpenSAF OperState DISABLED = ENABLED
25 12:03:02 09/10/2014 NO safApp=safAmfService 
safSu=PL-3,safSg=NoRed,safApp=OpenSAF ReadinessState OUT_OF_SERVICE = 
IN_SERVICE
26 12:03:02 09/10/2014 NO safApp=safAmfService 
safSi=NoRed3,safApp=OpenSAF assigned to safSu=PL-5,safSg=NoRed,safApp=OpenSAF 
HA State 'ACTIVE'
27 12:03:02 09/10/2014 NO safApp=safAmfService 
safSi=NoRed5,safApp=OpenSAF assigned to safSu=PL-3,safSg=NoRed,safApp=OpenSAF 
HA State 'ACTIVE'
28 12:03:02 09/10/2014 NO safApp=safAmfService 
safAmfNode=PL-4,safAmfCluster=myAmfCluster OperState DISABLED = ENABLED
29 12:03:02 09/10/2014 NO safApp=safAmfService 
safSu=2,safSg=1,safApp=osaftest OperState DISABLED = ENABLED
30 12:03:02 09/10/2014 NO safApp=safAmfService 
safAmfNode=PL-5,safAmfCluster=myAmfCluster OperState DISABLED = ENABLED
31 12:03:02 09/10/2014 NO safApp=safAmfService 
safAmfNode=PL-3,safAmfCluster=myAmfCluster OperState DISABLED = ENABLED
32 12:03:02 09/10/2014 NO safApp=safAmfService 
safSu=1,safSg=1,safApp=osaftest OperState DISABLED = ENABLED
33 12:03:03 09/10/2014 NO safApp=safAmfService 
safSu=SC-2,safSg=NoRed,safApp=OpenSAF PresenceState INSTANTIATING = 
INSTANTIATED
34 12:03:03 09/10/2014 NO safApp=safAmfService 
safSu=SC-2,safSg=NoRed,safApp=OpenSAF OperState DISABLED = ENABLED
35 12:03:03 09/10/2014 NO safApp=safAmfService 
safSu=SC-2,safSg=NoRed,safApp=OpenSAF ReadinessState OUT_OF_SERVICE = 
IN_SERVICE
36 12:03:03 09/10/2014 NO safApp=safAmfService 
safSi=NoRed4,safApp=OpenSAF assigned to safSu=SC-2,safSg=NoRed,safApp=OpenSAF 
HA State 'ACTIVE'
37 12:03:03 09/10/2014 NO safApp=safAmfService 
safSu=SC-2,safSg=2N,safApp=OpenSAF PresenceState INSTANTIATING = INSTANTIATED
38 12:03:03 09/10/2014 NO safApp=safAmfService 
safSu=SC-2,safSg=2N,safApp=OpenSAF OperState DISABLED = ENABLED
39 12:03:03 09/10/2014 NO safApp=safAmfService 
safSu=SC-2,safSg=2N,safApp=OpenSAF ReadinessState OUT_OF_SERVICE = IN_SERVICE
40 12:03:03 09/10/2014 NO safApp=safAmfService 
safSi=SC-2N,safApp=OpenSAF assigned to safSu=SC-2,safSg=2N,safApp=OpenSAF HA 
State 'STANDBY'
41 12:03:03 09/10/2014 NO safApp=safAmfService 
safAmfNode=SC-2,safAmfCluster=myAmfCluster OperState DISABLED = ENABLED
42 12:03:05 09/10/2014 NO safApp=safAmfService 
safSu=1,safSg=1,safApp=osaftest PresenceState INSTANTIATING = INSTANTIATED
43 12:03:05 09/10/2014 NO safApp=safAmfService 
safSu=1,safSg=1,safApp=osaftest ReadinessState OUT_OF_SERVICE = IN_SERVICE
44 12:03:05 09/10/2014 NO safApp=safAmfService 
safSu=2,safSg=1,safApp=osaftest PresenceState INSTANTIATING = INSTANTIATED
 Here the last SU is ENABLED and INSTANTIATED but nothing happens...
45 12:03:05 09/10/2014 NO safApp=safAmfService 
safSu=2,safSg=1,safApp=osaftest ReadinessState 

[tickets] [opensaf:tickets] #1059 2PBE: cluster reset observed during switchovers

2014-09-10 Thread Anders Bjornerstedt
- **Component**: unknown -- clm
- **Comment**:

The direct cause of the cluster reset is that CLM exits on receiving 
ERR_EXIST on implementerSet.This could a case of #946
(fixed in 5722:d353ca39b3d9).

If not then someone needs to analyze what CLM is doing (it is detaching as main 
OI and fails to attach as a new OI/Applier).







---

** [tickets:#1059] 2PBE: cluster reset observed during switchovers**

**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Wed Sep 10, 2014 09:57 AM UTC by Sirisha Alla
**Last Updated:** Wed Sep 10, 2014 09:57 AM UTC
**Owner:** nobody

The issue is seen on SLES X86. OpenSAF is running with changeset 5697 with 2PBE 
with 50k application objects.

Switchovers with IMM application running is in progress when the issue is 
observed.

Syslog on SC-1:

Sep 10 14:56:47 SLES-64BIT-SLOT1 osafamfnd[7540]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
182 0, 2020f (@OpenSafImmReplicatorB)
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for PRTA update Ccb:10063/4294967395
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Slave PBE replied with OK on 
attempt to start prepare of ccb:10063/4294967395
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for Ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer (applier) 
connected: 193 (@OpenSafImmReplicatorB) 0, 2020f
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: IN Slave PBE replied with OK on 
attempt to start prepare of ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for PRTA update Ccb:10064/4294967396
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Ccb 6 COMMITTED (SetUp_Ccb)
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:49 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:49 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:50 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: WA Start prepare for ccb: 
10064/4294967396 towards slave PBE returned: '6' from sttandby PBE
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: WA PBE-A failed to prepare PRTA 
update Ccb:10064/4294967396 towards PBE-B
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: NO 2PBE Error (20) in PRTA update 
(ccbId:10064)
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmnd[7448]: WA update of PERSISTENT 
runtime attributes in object 'safNode=PL-4,safCluster=myClmCluster' REVERTED. 
PBE rc:20
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer locally 
disconnected. Marking it as doomed 190 17, 2010f (safClmService)
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafclmd[7511]: ER saImmOiImplementerSet 
failed rc:14, exiting
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafamfnd[7540]: NO 
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafamfnd[7540]: ER 
safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafamfnd[7540]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60
Sep 10 14:56:51 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; 
timeout=60
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
190 17, 2010f (safClmService)
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: ER PBE PRTAttrs Update 
continuation missing! invoc:100
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
16 0, 2020f (@OpenSafImmPBE)
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
17 0, 2020f (OsafImmPbeRt_B)
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: WA Timeout on syncronous 
admin operation 108
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmpbed: WA Failed to delete class towards 
slave PBE. Library or immsv replied Rc:5 - ignoring

Syslog on SC-2

Sep 10 14:56:57 SLES-64BIT-SLOT2 osafamfd[6896]: ER Failed to stop cluster 
tracking 5
Sep 10 14:56:57 SLES-64BIT-SLOT2 

[tickets] [opensaf:tickets] #1059 2PBE: cluster reset observed during switchovers

2014-09-10 Thread Anders Bjornerstedt
The CLM problem actually only explains why this SLOT2 went down. 
This was a switchover, not a failover, so the other SC should have
reverted the switchover.



---

** [tickets:#1059] 2PBE: cluster reset observed during switchovers**

**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Wed Sep 10, 2014 09:57 AM UTC by Sirisha Alla
**Last Updated:** Wed Sep 10, 2014 10:19 AM UTC
**Owner:** nobody

The issue is seen on SLES X86. OpenSAF is running with changeset 5697 with 2PBE 
with 50k application objects.

Switchovers with IMM application running is in progress when the issue is 
observed.

Syslog on SC-1:

Sep 10 14:56:47 SLES-64BIT-SLOT1 osafamfnd[7540]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
182 0, 2020f (@OpenSafImmReplicatorB)
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for PRTA update Ccb:10063/4294967395
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Slave PBE replied with OK on 
attempt to start prepare of ccb:10063/4294967395
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for Ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer (applier) 
connected: 193 (@OpenSafImmReplicatorB) 0, 2020f
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: IN Slave PBE replied with OK on 
attempt to start prepare of ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for PRTA update Ccb:10064/4294967396
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Ccb 6 COMMITTED (SetUp_Ccb)
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:49 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:49 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:50 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: WA Start prepare for ccb: 
10064/4294967396 towards slave PBE returned: '6' from sttandby PBE
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: WA PBE-A failed to prepare PRTA 
update Ccb:10064/4294967396 towards PBE-B
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: NO 2PBE Error (20) in PRTA update 
(ccbId:10064)
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmnd[7448]: WA update of PERSISTENT 
runtime attributes in object 'safNode=PL-4,safCluster=myClmCluster' REVERTED. 
PBE rc:20
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer locally 
disconnected. Marking it as doomed 190 17, 2010f (safClmService)
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafclmd[7511]: ER saImmOiImplementerSet 
failed rc:14, exiting
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafamfnd[7540]: NO 
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafamfnd[7540]: ER 
safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafamfnd[7540]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60
Sep 10 14:56:51 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; 
timeout=60
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
190 17, 2010f (safClmService)
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: ER PBE PRTAttrs Update 
continuation missing! invoc:100
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
16 0, 2020f (@OpenSafImmPBE)
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
17 0, 2020f (OsafImmPbeRt_B)
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: WA Timeout on syncronous 
admin operation 108
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmpbed: WA Failed to delete class towards 
slave PBE. Library or immsv replied Rc:5 - ignoring

Syslog on SC-2

Sep 10 14:56:57 SLES-64BIT-SLOT2 osafamfd[6896]: ER Failed to stop cluster 
tracking 5
Sep 10 14:56:57 SLES-64BIT-SLOT2 osafamfd[6896]: ER ClmTrack stop failed
Sep 10 14:56:57 SLES-64BIT-SLOT2 osaffmd[6816]: NO Current role: ACTIVE
Sep 10 14:56:57 SLES-64BIT-SLOT2 osaffmd[6816]: Rebooting OpenSAF NodeId = 

[tickets] [opensaf:tickets] #1043 immnd crashes (segfault) while bringing up the SC-1 controller .

2014-09-10 Thread A V Mahesh (AVM)
- **status**: review -- fixed
- **Comment**:

changeset:   5758:52212ecd4267
user:A V Mahesh mahesh.va...@oracle.com
date:Wed Sep 10 15:22:56 2014 +0530
summary: mds: send full encode message for Mcast message type [#1043]
 
changeset:   5759:383f556c0330
branch:  opensaf-4.5.x
tag: tip
parent:  5756:813b6cc1d054
user:A V Mahesh mahesh.va...@oracle.com
date:Wed Sep 10 15:23:23 2014 +0530
summary: mds: send full encode message for Mcast message type [#10



---

** [tickets:#1043] immnd crashes (segfault) while bringing up the SC-1 
controller .**

**Status:** fixed
**Milestone:** 4.5.FC
**Created:** Thu Sep 04, 2014 05:22 AM UTC by manu
**Last Updated:** Wed Sep 10, 2014 04:48 AM UTC
**Owner:** A V Mahesh (AVM)

Changeset : 5608
Scenario:- SC-2 is already up and running ,tring to bring up the SC-1 
controller on the node 1. 

Rarely reproducible.

Syslog:-

Sep  3 17:09:32 OpenSAF-SLOT1 opensafd: Starting OpenSAF Services
Starting OpenSAF Services: Sep  3 17:09:32 OpenSAF-SLOT1 kernel: [ 9769.989840] 
TIPC: Activated (version 2.0.0)
Sep  3 17:09:32 OpenSAF-SLOT1 kernel: [ 9769.989994] NET: Registered protocol 
family 30
Sep  3 17:09:32 OpenSAF-SLOT1 kernel: [ 9769.989995] TIPC: Started in single 
node mode
Sep  3 17:09:32 OpenSAF-SLOT1 kernel: [ 9769.993128] TIPC: Started in network 
mode
Sep  3 17:09:32 OpenSAF-SLOT1 kernel: [ 9769.993132] TIPC: Own node address 
1.1.1, network identity 1234
Sep  3 17:09:32 OpenSAF-SLOT1 kernel: [ 9769.995619] TIPC: Enabled bearer 
eth:eth0, discovery domain 1.1.0, priority 10
Sep  3 17:09:32 OpenSAF-SLOT1 kernel: [ 9769.996695] TIPC: Established link 
1.1.1:eth0-1.1.2:eth0 on network plane A
Sep  3 17:09:33 OpenSAF-SLOT1 osafrded[10151]: Started
Sep  3 17:09:33 OpenSAF-SLOT1 osafrded[10151]: NO Peer rde@2020f has active 
state = Assigning Standby role to this node
Sep  3 17:09:33 OpenSAF-SLOT1 osaffmd[10160]: Started
Sep  3 17:09:33 OpenSAF-SLOT1 osafimmd[10170]: Started
Sep  3 17:09:33 OpenSAF-SLOT1 osafimmnd[10180]: Started
Sep  3 17:09:33 OpenSAF-SLOT1 osafimmnd[10180]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Sep  3 17:09:33 OpenSAF-SLOT1 osafimmnd[10180]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS -- IMM_SERVER_CLUSTER_WAITING
Sep  3 17:09:33 OpenSAF-SLOT1 osafimmnd[10180]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING -- IMM_SERVER_LOADING_PENDING
Sep  3 17:09:33 OpenSAF-SLOT1 osafimmnd[10180]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING -- IMM_SERVER_SYNC_PENDING
Sep  3 17:09:33 OpenSAF-SLOT1 osafimmnd[10180]: NO NODE STATE- 
IMM_NODE_ISOLATED
Sep  3 17:09:33 OpenSAF-SLOT1 osafimmd[10170]: NO SBY: Ruling epoch noted as:742
Sep  3 17:09:33 OpenSAF-SLOT1 osafimmd[10170]: NO IMMND coord at 2020f
Sep  3 17:09:33 OpenSAF-SLOT1 osafimmd[10170]: NO SBY: SaImmRepositoryInitModeT 
changed and noted as 'SA_IMM_KEEP_REPOSITORY'
Sep  3 17:09:33 OpenSAF-SLOT1 kernel: [ 9770.652257] osafimmnd[10183]: segfault 
at 0 ip 7f7e7a9f2472 sp 7f7e7c11eea8 error 4 in 
libc-2.11.3.so[7f7e7a96f000+16b000]
^C
OpenSAF-SLOT1:/home/opensaf_FC_5697 # Sep  3 17:09:45 OpenSAF-SLOT1 
osafimmd[10170]: NO SBY: New Epoch for IMMND process at node 2020f old epoch: 
741  new epoch:742
Sep  3 17:09:45 OpenSAF-SLOT1 osafimmd[10170]: NO IMMND coord at 2020f

=

gdb output has been attached.








---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191iu=/4140/ostg.clktrk___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1059 2PBE: cluster reset observed during switchovers

2014-09-10 Thread Anders Bjornerstedt
And one more comment:
I dont think that this incident is logically related to 2PBE.
A PRTO fails to get created. 
That is possibly more likely to happen with 2PBE than 1PBE or 0PBE,
but logically it can at least also happen also with 1PBE.

 


---

** [tickets:#1059] 2PBE: cluster reset observed during switchovers**

**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Wed Sep 10, 2014 09:57 AM UTC by Sirisha Alla
**Last Updated:** Wed Sep 10, 2014 10:22 AM UTC
**Owner:** nobody

The issue is seen on SLES X86. OpenSAF is running with changeset 5697 with 2PBE 
with 50k application objects.

Switchovers with IMM application running is in progress when the issue is 
observed.

Syslog on SC-1:

Sep 10 14:56:47 SLES-64BIT-SLOT1 osafamfnd[7540]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
182 0, 2020f (@OpenSafImmReplicatorB)
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for PRTA update Ccb:10063/4294967395
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Slave PBE replied with OK on 
attempt to start prepare of ccb:10063/4294967395
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for Ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer (applier) 
connected: 193 (@OpenSafImmReplicatorB) 0, 2020f
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: IN Slave PBE replied with OK on 
attempt to start prepare of ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for PRTA update Ccb:10064/4294967396
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Ccb 6 COMMITTED (SetUp_Ccb)
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:49 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:49 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:50 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:10064/4294967396
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: WA Start prepare for ccb: 
10064/4294967396 towards slave PBE returned: '6' from sttandby PBE
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: WA PBE-A failed to prepare PRTA 
update Ccb:10064/4294967396 towards PBE-B
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmpbed: NO 2PBE Error (20) in PRTA update 
(ccbId:10064)
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmnd[7448]: WA update of PERSISTENT 
runtime attributes in object 'safNode=PL-4,safCluster=myClmCluster' REVERTED. 
PBE rc:20
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer locally 
disconnected. Marking it as doomed 190 17, 2010f (safClmService)
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafclmd[7511]: ER saImmOiImplementerSet 
failed rc:14, exiting
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafamfnd[7540]: NO 
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafamfnd[7540]: ER 
safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafamfnd[7540]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60
Sep 10 14:56:51 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; 
timeout=60
Sep 10 14:56:51 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
190 17, 2010f (safClmService)
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: ER PBE PRTAttrs Update 
continuation missing! invoc:100
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
16 0, 2020f (@OpenSafImmPBE)
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
17 0, 2020f (OsafImmPbeRt_B)
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmnd[7448]: WA Timeout on syncronous 
admin operation 108
Sep 10 14:56:53 SLES-64BIT-SLOT1 osafimmpbed: WA Failed to delete class towards 
slave PBE. Library or immsv replied Rc:5 - ignoring

Syslog on SC-2

Sep 10 14:56:57 SLES-64BIT-SLOT2 osafamfd[6896]: ER Failed to stop cluster 
tracking 5
Sep 10 14:56:57 SLES-64BIT-SLOT2 osafamfd[6896]: ER ClmTrack stop failed
Sep 10 14:56:57 SLES-64BIT-SLOT2 osaffmd[6816]: NO Current role: 

[tickets] [opensaf:tickets] #981 smf: set DN maxlength depending on if extended DNs is enabled globally or not

2014-09-10 Thread Ingvar Bergström
- **status**: review -- fixed
- **Comment**:

changeset:   5760:8f303de00e27
branch:  opensaf-4.5.x
user:Robert Apanowicz robert.apanow...@ericsson.com
date:Tue Sep 02 09:42:03 2014 +0200
summary: smf: set max DN length in SMF based on long DN config in IMM [#981]

changeset:   5761:c031c26df205
tag: tip
parent:  5758:52212ecd4267
user:Robert Apanowicz robert.apanow...@ericsson.com
date:Tue Sep 02 09:42:03 2014 +0200
summary: smf: set max DN length in SMF based on long DN config in IMM [#981]




---

** [tickets:#981] smf: set DN maxlength depending on if extended DNs is enabled 
globally or not**

**Status:** fixed
**Milestone:** 4.5.0
**Created:** Mon Aug 11, 2014 11:37 AM UTC by Ingvar Bergström
**Last Updated:** Fri Aug 29, 2014 07:47 AM UTC
**Owner:** Robert Apanowicz

Instead of using kMaxDnLength directly, we should actually 
have a variable that is initialized to either kMaxDnLength or 255, 
depending on if extended DNs is enabled globally or not (which can be 
checked by reading the attribute longDnsAllowed in 
opensafImm=opensafImm,safApp=safImmService)



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191iu=/4140/ostg.clktrk___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1061 imm: memory leak in dumping resources in PBE

2014-09-10 Thread Zoran Milinkovic



---

** [tickets:#1061] imm: memory leak in dumping resources in PBE**

**Status:** unassigned
**Milestone:** 4.5.0
**Created:** Wed Sep 10, 2014 12:35 PM UTC by Zoran Milinkovic
**Last Updated:** Wed Sep 10, 2014 12:35 PM UTC
**Owner:** Zoran Milinkovic

In dumping resources (in PBE), data are collected, sent as a result, but 
allocated memory (rparams) is not freed.

p
SaImmAdminOperationParamsT_2 ** rparams;br/
rparams = (SaImmAdminOperationParamsT_2 **) realloc(NULL, 
sizeof(SaImmAdminOperationParamsT_2 *));
/p


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191iu=/4140/ostg.clktrk___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1062 imm: immnd may crash in resourceDisplay

2014-09-10 Thread Zoran Milinkovic



---

** [tickets:#1062] imm: immnd may crash in resourceDisplay**

**Status:** unassigned
**Milestone:** 4.5.0
**Created:** Wed Sep 10, 2014 02:10 PM UTC by Zoran Milinkovic
**Last Updated:** Wed Sep 10, 2014 02:10 PM UTC
**Owner:** Zoran Milinkovic

immnd may crash in immModel_resourceDisplay and ImmModel::resourceDisplay if 
operation name is not provided.
In this case, opName is NULL, and strcmp() functions may crash.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191iu=/4140/ostg.clktrk___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1059 2PBE: cluster reset observed during switchovers

2014-09-10 Thread Neelakanta Reddy
A.  SLOT1 node went down:

1. CLM got BAD_HANDLE and finalizes the handle

Sep 10 14:56:51.543332 osafclmd [7511:imma_oi_api.c:0622]  saImmOiFinalize
Sep 10 14:56:51.543370 osafclmd [7511:imma_oi_api.c:0626] T2 ERR_BAD_HANDLE: No 
initialized handle exists!

2. Discard implementer is called

Sep 10 14:56:51.538179 osafimmnd [7448:immsv_evt.c:5363] T8 Sending:  
IMMD_EVT_ND2D_DISCARD_IMPL to 0
Sep 10 14:56:51.539878 osafimmnd [7448:ImmModel.cc:11474]  discardImplementer
Sep 10 14:56:51.539994 osafimmnd [7448:ImmModel.cc:11510] NO Implementer 
locally disconnected. Marking it as doomed 190 17, 2010f (safClmService)
Sep 10 14:56:51.540181 osafimmnd [7448:ImmModel.cc:11534]  discardImplementer

3. But the implemnter actually got disconnected at

Sep 10 14:56:51.580449 osafimmnd [7448:immnd_evt.c:8588] T2 Global discard 
implementer for id:190
Sep 10 14:56:51.580462 osafimmnd [7448:ImmModel.cc:11474]  discardImplementer
Sep 10 14:56:51.580496 osafimmnd [7448:ImmModel.cc:11481] NO Implementer 
disconnected 190 17, 2010f (safClmService)
Sep 10 14:56:51.580518 osafimmnd [7448:ImmModel.cc:11534]  discardImplementer

4. CLM tries to re-initializes and receives ERR_EXISTS


Sep 10 14:56:51.551900 osafclmd [7511:imma_oi_api.c:0440]  
saImmOiSelectionObjectGet
Sep 10 14:56:51.551942 osafclmd [7511:clms_imm.c:2286] ER saImmOiImplementerSet 
failed rc:14, exiting
Sep 10 14:59:51.245982 osafclmd [2538:clms_main.c:0267]  clms_init


Sep 10 14:56:51.548981 osafimmnd [7448:immsv_evt.c:5382] T8 Received: 
IMMND_EVT_A2ND_OI_IMPL_SET (40) from 2010f
Sep 10 14:56:51.549023 osafimmnd [7448:immnd_evt.c:2471] T2 SENDRSP FAIL 14


#946 fixes the above problem in CLM


B. Slot2 node went down(Quiesced -- Active)

1. Sep 10 14:56:57.681152 osafamfd [6896:role.cc:0375] NO FAILOVER Quiesced -- 
Active

2. saImmOiRtObjectUpdate_2 got BAD_HANDLE so AMFD  tries to re-initialize with 
IMM and calls avd_imm_reinit_bg

Sep 10 14:56:57.701333 osafamfd [6896:imma_oi_api.c:2279]  
saImmOiRtObjectUpdate_2
Sep 10 14:56:57.701344 osafamfd [6896:imma_oi_api.c:2345] T2 ERR_BAD_HANDLE: 
The SaImmOiHandleT is not associated with any implementer name
Sep 10 14:56:57.701353 osafamfd [6896:imma_oi_api.c:2554]  
saImmOiRtObjectUpdate_2
Sep 10 14:56:57.701362 osafamfd [6896:imm.cc:0164] TR BADHANDLE
Sep 10 14:56:57.701370 osafamfd [6896:imm.cc:1660]  avd_imm_reinit_bg
Sep 10 14:56:57.701406 osafamfd [6896:imm.cc:1662] NO Re-initializing with IMM
Sep 10 14:56:57.701420 osafamfd [6896:imma_oi_api.c:0622]  saImmOiFinalize

3. Before the finalize is not completed in clearing the OI handle, impl_set is 
called by AMFD in the function avd_role_failover_qsd_actv(calling 
avd_imm_impl_set_task_create). Because of this amfd exited.

Sep 10 14:56:57.701178 osafamfd [6896:role.cc:0498]  
avd_role_failover_qsd_actv


Sep 10 14:56:57.702256 osafamfd [6896:imm.cc:1215]  avd_imm_impl_set
Sep 10 14:56:57.702273 osafamfd [6896:imma_oi_api.c:1281] T4 ERR_LIBRARY: 
Overlapping use of IMM OI handle by multiple threads
Sep 10 14:56:57.703683 osafamfd [6896:imm.cc:1218] ER saImmOiImplementerSet 
failed 2
Sep 10 14:56:57.703788 osafamfd [6896:imm.cc:1288] ER exiting since 
avd_imm_impl_set failed


Because of using shared Oihandle, across multiple threads in AMFD the 
saImmOiImplementerSet failed with ERR_LIBRARY.



---

** [tickets:#1059] 2PBE: cluster reset observed during switchovers**

**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Wed Sep 10, 2014 09:57 AM UTC by Sirisha Alla
**Last Updated:** Wed Sep 10, 2014 10:29 AM UTC
**Owner:** nobody

The issue is seen on SLES X86. OpenSAF is running with changeset 5697 with 2PBE 
with 50k application objects.

Switchovers with IMM application running is in progress when the issue is 
observed.

Syslog on SC-1:

Sep 10 14:56:47 SLES-64BIT-SLOT1 osafamfnd[7540]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer disconnected 
182 0, 2020f (@OpenSafImmReplicatorB)
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for PRTA update Ccb:10063/4294967395
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Slave PBE replied with OK on 
attempt to start prepare of ccb:10063/4294967395
Sep 10 14:56:47 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for Ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Implementer (applier) 
connected: 193 (@OpenSafImmReplicatorB) 0, 2020f
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 1 or Immsv (6) 
replied with transient error on prepare for ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: IN Slave PBE replied with OK on 
attempt to start prepare of ccb:6/6
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: IN Starting distributed PBE 
commit for PRTA update Ccb:10064/4294967396
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmnd[7448]: NO Ccb 6 COMMITTED (SetUp_Ccb)
Sep 10 14:56:48 SLES-64BIT-SLOT1 osafimmpbed: NO Slave PBE 

Re: [tickets] [opensaf:tickets] #1059 2PBE: cluster reset observed during switchovers

2014-09-10 Thread Anders Björnerstedt
Good analysis.

We can add that the reason that AMFD got BAD_HANLDE when attempting to do an 
RtObjectUpdate is that
although the OI handle is valid, it was not associated with any 
implementer-name at that time.
So this must be a pure and plain bug in the AMFD. Most likely recently 
introduced since otherwise we should
have seen this before.

The AMFD interprets the BAD_HANDLE as the only expected reason for 
BAD_HANDLE, that the handle is invalid
due to the IMMND having restarted. By expected I dont mean common, I mean the 
only reason the AMFD programmer
has to re-initialize the handle: a restart of the local IMMND. But that is not 
what is happening here.

The SAF spec actually explicitly says that BAD_HANDLE is to be used for this 
particlar case (implementer-name not set
when trying to perform an OI operation). While this is not wrong. It would be 
better in this case to use ERR_BAD_OPERATION
since that is an error that is unambiguously a client error and the spec on 
that error code for this downcall also fitts the case:

ERR_BAD_OPERATION - The targeted object is not implemented by the invoking 
process.

So I think we should write an enhancement on the immsv to change the error code 
for this case.
It will also be a backwards compatible change. We are talking an interface 
violation here and that
should be handled by the process aborting.

We should also write a defect ticket on AMFD, or use this ticket, to track a 
fix for the AMFD bug - premature
use of an OI handle. This ticket also poitns to an independent CLM bug. So we 
should probably have two
tickets.

/AndersBj




From: Neelakanta Reddy [mailto:neelaka...@users.sf.net]
Sent: den 10 september 2014 17:58
To: opensaf-tickets@lists.sourceforge.net
Subject: [tickets] [opensaf:tickets] #1059 2PBE: cluster reset observed during 
switchovers


A. SLOT1 node went down:

  1.  CLM got BAD_HANDLE and finalizes the handle

Sep 10 14:56:51.543332 osafclmd [7511:imma_oi_api.c:0622]  saImmOiFinalize
Sep 10 14:56:51.543370 osafclmd [7511:imma_oi_api.c:0626] T2 ERR_BAD_HANDLE: No 
initialized handle exists!

  1.  Discard implementer is called

Sep 10 14:56:51.538179 osafimmnd [7448:immsv_evt.c:5363] T8 Sending: 
IMMD_EVT_ND2D_DISCARD_IMPL to 0
Sep 10 14:56:51.539878 osafimmnd [7448:ImmModel.cc:11474]  discardImplementer
Sep 10 14:56:51.539994 osafimmnd [7448:ImmModel.cc:11510] NO Implementer 
locally disconnected. Marking it as doomed 190 17, 2010f (safClmService)
Sep 10 14:56:51.540181 osafimmnd [7448:ImmModel.cc:11534]  discardImplementer

  1.  But the implemnter actually got disconnected at

Sep 10 14:56:51.580449 osafimmnd [7448:immnd_evt.c:8588] T2 Global discard 
implementer for id:190
Sep 10 14:56:51.580462 osafimmnd [7448:ImmModel.cc:11474]  discardImplementer
Sep 10 14:56:51.580496 osafimmnd [7448:ImmModel.cc:11481] NO Implementer 
disconnected 190 17, 2010f (safClmService)
Sep 10 14:56:51.580518 osafimmnd [7448:ImmModel.cc:11534]  discardImplementer

  1.  CLM tries to re-initializes and receives ERR_EXISTS

Sep 10 14:56:51.551900 osafclmd [7511:imma_oi_api.c:0440]  
saImmOiSelectionObjectGet
Sep 10 14:56:51.551942 osafclmd [7511:clms_imm.c:2286] ER saImmOiImplementerSet 
failed rc:14, exiting
Sep 10 14:59:51.245982 osafclmd [2538:clms_main.c:0267]  clms_init

Sep 10 14:56:51.548981 osafimmnd [7448:immsv_evt.c:5382] T8 Received: 
IMMND_EVT_A2ND_OI_IMPL_SET (40) from 2010f
Sep 10 14:56:51.549023 osafimmnd [7448:immnd_evt.c:2471] T2 SENDRSP FAIL 14

946 fixes the above problem in CLM

B. Slot2 node went down(Quiesced -- Active)

  1.  Sep 10 14:56:57.681152 osafamfd [6896:role.cc:0375] NO FAILOVER Quiesced 
-- Active

  2.  saImmOiRtObjectUpdate_2 got BAD_HANDLE so AMFD tries to re-initialize 
with IMM and calls avd_imm_reinit_bg

Sep 10 14:56:57.701333 osafamfd [6896:imma_oi_api.c:2279]  
saImmOiRtObjectUpdate_2
Sep 10 14:56:57.701344 osafamfd [6896:imma_oi_api.c:2345] T2 ERR_BAD_HANDLE: 
The SaImmOiHandleT is not associated with any implementer name
Sep 10 14:56:57.701353 osafamfd [6896:imma_oi_api.c:2554]  
saImmOiRtObjectUpdate_2
Sep 10 14:56:57.701362 osafamfd [6896:imm.cc:0164] TR BADHANDLE
Sep 10 14:56:57.701370 osafamfd [6896:imm.cc:1660]  avd_imm_reinit_bg
Sep 10 14:56:57.701406 osafamfd [6896:imm.cc:1662] NO Re-initializing with IMM
Sep 10 14:56:57.701420 osafamfd [6896:imma_oi_api.c:0622]  saImmOiFinalize

  1.  Before the finalize is not completed in clearing the OI handle, impl_set 
is called by AMFD in the function avd_role_failover_qsd_actv(calling 
avd_imm_impl_set_task_create). Because of this amfd exited.

Sep 10 14:56:57.701178 osafamfd [6896:role.cc:0498]  
avd_role_failover_qsd_actv

Sep 10 14:56:57.702256 osafamfd [6896:imm.cc:1215]  avd_imm_impl_set
Sep 10 14:56:57.702273 osafamfd [6896:imma_oi_api.c:1281] T4 ERR_LIBRARY: 
Overlapping use of IMM OI handle by multiple threads
Sep 10 14:56:57.703683 osafamfd [6896:imm.cc:1218] ER saImmOiImplementerSet 
failed 2
Sep 10 14:56:57.703788 

[tickets] [opensaf:tickets] Re: #1059 2PBE: cluster reset observed during switchovers

2014-09-10 Thread Anders Bjornerstedt
Good analysis.

We can add that the reason that AMFD got BAD_HANLDE when attempting to do an 
RtObjectUpdate is that
although the OI handle is valid, it was not associated with any 
implementer-name at that time.
So this must be a pure and plain bug in the AMFD. Most likely recently 
introduced since otherwise we should
have seen this before.

The AMFD interprets the BAD_HANDLE as the only expected reason for 
BAD_HANDLE, that the handle is invalid
due to the IMMND having restarted. By expected I dont mean common, I mean the 
only reason the AMFD programmer
has to re-initialize the handle: a restart of the local IMMND. But that is not 
what is happening here.

The SAF spec actually explicitly says that BAD_HANDLE is to be used for this 
particlar case (implementer-name not set
when trying to perform an OI operation). While this is not wrong. It would be 
better in this case to use ERR_BAD_OPERATION
since that is an error that is unambiguously a client error and the spec on 
that error code for this downcall also fitts the case:

ERR_BAD_OPERATION - The targeted object is not implemented by the invoking 
process.

So I think we should write an enhancement on the immsv to change the error code 
for this case.
It will also be a backwards compatible change. We are talking an interface 
violation here and that
should be handled by the process aborting.

We should also write a defect ticket on AMFD, or use this ticket, to track a 
fix for the AMFD bug - premature
use of an OI handle. This ticket also poitns to an independent CLM bug. So we 
should probably have two
tickets.

/AndersBj




From: Neelakanta Reddy [mailto:neelaka...@users.sf.net]
Sent: den 10 september 2014 17:58
To: opensaf-tickets@lists.sourceforge.net
Subject: [tickets] [opensaf:tickets] #1059 2PBE: cluster reset observed during 
switchovers


A. SLOT1 node went down:

  1.  CLM got BAD_HANDLE and finalizes the handle

Sep 10 14:56:51.543332 osafclmd [7511:imma_oi_api.c:0622]  saImmOiFinalize
Sep 10 14:56:51.543370 osafclmd [7511:imma_oi_api.c:0626] T2 ERR_BAD_HANDLE: No 
initialized handle exists!

  1.  Discard implementer is called

Sep 10 14:56:51.538179 osafimmnd [7448:immsv_evt.c:5363] T8 Sending: 
IMMD_EVT_ND2D_DISCARD_IMPL to 0
Sep 10 14:56:51.539878 osafimmnd [7448:ImmModel.cc:11474]  discardImplementer
Sep 10 14:56:51.539994 osafimmnd [7448:ImmModel.cc:11510] NO Implementer 
locally disconnected. Marking it as doomed 190 17, 2010f (safClmService)
Sep 10 14:56:51.540181 osafimmnd [7448:ImmModel.cc:11534]  discardImplementer

  1.  But the implemnter actually got disconnected at

Sep 10 14:56:51.580449 osafimmnd [7448:immnd_evt.c:8588] T2 Global discard 
implementer for id:190
Sep 10 14:56:51.580462 osafimmnd [7448:ImmModel.cc:11474]  discardImplementer
Sep 10 14:56:51.580496 osafimmnd [7448:ImmModel.cc:11481] NO Implementer 
disconnected 190 17, 2010f (safClmService)
Sep 10 14:56:51.580518 osafimmnd [7448:ImmModel.cc:11534]  discardImplementer

  1.  CLM tries to re-initializes and receives ERR_EXISTS

Sep 10 14:56:51.551900 osafclmd [7511:imma_oi_api.c:0440]  
saImmOiSelectionObjectGet
Sep 10 14:56:51.551942 osafclmd [7511:clms_imm.c:2286] ER saImmOiImplementerSet 
failed rc:14, exiting
Sep 10 14:59:51.245982 osafclmd [2538:clms_main.c:0267]  clms_init

Sep 10 14:56:51.548981 osafimmnd [7448:immsv_evt.c:5382] T8 Received: 
IMMND_EVT_A2ND_OI_IMPL_SET (40) from 2010f
Sep 10 14:56:51.549023 osafimmnd [7448:immnd_evt.c:2471] T2 SENDRSP FAIL 14

946 fixes the above problem in CLM

B. Slot2 node went down(Quiesced -- Active)

  1.  Sep 10 14:56:57.681152 osafamfd [6896:role.cc:0375] NO FAILOVER Quiesced 
-- Active

  2.  saImmOiRtObjectUpdate_2 got BAD_HANDLE so AMFD tries to re-initialize 
with IMM and calls avd_imm_reinit_bg

Sep 10 14:56:57.701333 osafamfd [6896:imma_oi_api.c:2279]  
saImmOiRtObjectUpdate_2
Sep 10 14:56:57.701344 osafamfd [6896:imma_oi_api.c:2345] T2 ERR_BAD_HANDLE: 
The SaImmOiHandleT is not associated with any implementer name
Sep 10 14:56:57.701353 osafamfd [6896:imma_oi_api.c:2554]  
saImmOiRtObjectUpdate_2
Sep 10 14:56:57.701362 osafamfd [6896:imm.cc:0164] TR BADHANDLE
Sep 10 14:56:57.701370 osafamfd [6896:imm.cc:1660]  avd_imm_reinit_bg
Sep 10 14:56:57.701406 osafamfd [6896:imm.cc:1662] NO Re-initializing with IMM
Sep 10 14:56:57.701420 osafamfd [6896:imma_oi_api.c:0622]  saImmOiFinalize

  1.  Before the finalize is not completed in clearing the OI handle, impl_set 
is called by AMFD in the function avd_role_failover_qsd_actv(calling 
avd_imm_impl_set_task_create). Because of this amfd exited.

Sep 10 14:56:57.701178 osafamfd [6896:role.cc:0498]  
avd_role_failover_qsd_actv

Sep 10 14:56:57.702256 osafamfd [6896:imm.cc:1215]  avd_imm_impl_set
Sep 10 14:56:57.702273 osafamfd [6896:imma_oi_api.c:1281] T4 ERR_LIBRARY: 
Overlapping use of IMM OI handle by multiple threads
Sep 10 14:56:57.703683 osafamfd [6896:imm.cc:1218] ER saImmOiImplementerSet 
failed 2
Sep 10 14:56:57.703788 

[tickets] [opensaf:tickets] Re: #1059 2PBE: cluster reset observed during switchovers

2014-09-10 Thread Anders Bjornerstedt
Neel correctly points out that 946 may fix the CLM problem.
This is true if the ERR_EXIST on implementer-set is due to the prior OI having 
detached *locally* at this node
but not yet been confirmed globally over fevs. But since this is a switchover, 
it is more likley that the OI detached
on the other SC and that the AMF is faster in processing quiessed-ack from old 
active CLMD and ordering CLMD
at this node to become new active (and thus allocate OI).

So I think getting ERR_EXIST at new active OI implementer-set may unfortunately 
be a fact of life for the switch-over case.
The only fix I see here is that since the new active knows that this is (or 
could be) a switchover since it has just been
order to become active. It could in this context interpret ERR_EXIST from 
implementer-set as effectively TRY_AGAIN.

Perhaps even simpler: Director services could always interpret ERR_EXIST on 
implementer-set as TRY_AGAIN.
As always, a TRY_AGAIN loop must be finite. And implementer-set is not blocked 
by imm-sync so we are not
talking 60 secods here. At MOST we are talking sub-second, the fevs turnarround 
latency.

/AndersBj


From: Anders Björnerstedt [mailto:anders.bjornerst...@ericsson.com]
Sent: den 10 september 2014 18:57
To: [opensaf:tickets] ; opensaf-tickets@lists.sourceforge.net
Subject: Re: [tickets] [opensaf:tickets] #1059 2PBE: cluster reset observed 
during switchovers

Good analysis.

We can add that the reason that AMFD got BAD_HANLDE when attempting to do an 
RtObjectUpdate is that
although the OI handle is valid, it was not associated with any 
implementer-name at that time.
So this must be a pure and plain bug in the AMFD. Most likely recently 
introduced since otherwise we should
have seen this before.

The AMFD interprets the BAD_HANDLE as the only expected reason for 
BAD_HANDLE, that the handle is invalid
due to the IMMND having restarted. By expected I dont mean common, I mean the 
only reason the AMFD programmer
has to re-initialize the handle: a restart of the local IMMND. But that is not 
what is happening here.

The SAF spec actually explicitly says that BAD_HANDLE is to be used for this 
particlar case (implementer-name not set
when trying to perform an OI operation). While this is not wrong. It would be 
better in this case to use ERR_BAD_OPERATION
since that is an error that is unambiguously a client error and the spec on 
that error code for this downcall also fitts the case:

ERR_BAD_OPERATION - The targeted object is not implemented by the invoking 
process.

So I think we should write an enhancement on the immsv to change the error code 
for this case.
It will also be a backwards compatible change. We are talking an interface 
violation here and that
should be handled by the process aborting.

We should also write a defect ticket on AMFD, or use this ticket, to track a 
fix for the AMFD bug - premature
use of an OI handle. This ticket also poitns to an independent CLM bug. So we 
should probably have two
tickets.

/AndersBj




From: Neelakanta Reddy [mailto:neelaka...@users.sf.net]
Sent: den 10 september 2014 17:58
To: opensaf-tickets@lists.sourceforge.net
Subject: [tickets] [opensaf:tickets] #1059 2PBE: cluster reset observed during 
switchovers


A. SLOT1 node went down:

  1.  CLM got BAD_HANDLE and finalizes the handle

Sep 10 14:56:51.543332 osafclmd [7511:imma_oi_api.c:0622]  saImmOiFinalize
Sep 10 14:56:51.543370 osafclmd [7511:imma_oi_api.c:0626] T2 ERR_BAD_HANDLE: No 
initialized handle exists!

  1.  Discard implementer is called

Sep 10 14:56:51.538179 osafimmnd [7448:immsv_evt.c:5363] T8 Sending: 
IMMD_EVT_ND2D_DISCARD_IMPL to 0
Sep 10 14:56:51.539878 osafimmnd [7448:ImmModel.cc:11474]  discardImplementer
Sep 10 14:56:51.539994 osafimmnd [7448:ImmModel.cc:11510] NO Implementer 
locally disconnected. Marking it as doomed 190 17, 2010f (safClmService)
Sep 10 14:56:51.540181 osafimmnd [7448:ImmModel.cc:11534]  discardImplementer

  1.  But the implemnter actually got disconnected at

Sep 10 14:56:51.580449 osafimmnd [7448:immnd_evt.c:8588] T2 Global discard 
implementer for id:190
Sep 10 14:56:51.580462 osafimmnd [7448:ImmModel.cc:11474]  discardImplementer
Sep 10 14:56:51.580496 osafimmnd [7448:ImmModel.cc:11481] NO Implementer 
disconnected 190 17, 2010f (safClmService)
Sep 10 14:56:51.580518 osafimmnd [7448:ImmModel.cc:11534]  discardImplementer

  1.  CLM tries to re-initializes and receives ERR_EXISTS

Sep 10 14:56:51.551900 osafclmd [7511:imma_oi_api.c:0440]  
saImmOiSelectionObjectGet
Sep 10 14:56:51.551942 osafclmd [7511:clms_imm.c:2286] ER saImmOiImplementerSet 
failed rc:14, exiting
Sep 10 14:59:51.245982 osafclmd [2538:clms_main.c:0267]  clms_init

Sep 10 14:56:51.548981 osafimmnd [7448:immsv_evt.c:5382] T8 Received: 
IMMND_EVT_A2ND_OI_IMPL_SET (40) from 2010f
Sep 10 14:56:51.549023 osafimmnd [7448:immnd_evt.c:2471] T2 SENDRSP FAIL 14

946 fixes the above problem in CLM

B. Slot2 node went