[tickets] [opensaf:tickets] #2103 IMM: Performance: saImmOmSearchInitialize_2 API failed with BAD_HANDLE

2016-11-07 Thread Neelakanta Reddy
- **status**: unassigned --> needinfo
- **Comment**:

The agent traces:

Oct  7 20:45:52.274583 imma [5241:imma_proc.c:1478] >> 
imma_proc_resurrect_client
Oct  7 20:45:52.274607 imma [5241:imma_proc.c:1523] T1 Resurrect message for 
immHandle: 370002040f isOm: 1
Oct  7 20:45:54.382140 imma [5241:imma_proc.c:1560] T3 Recieved TRY_AGAIN while 
resurrecting
Oct  7 20:45:54.382177 imma [5241:imma_proc.c:1622] << 
imma_proc_resurrect_client
Oct  7 20:45:54.382185 imma [5241:imma_om_api.c:9259] T3 Failed to resurrect OM 
handle 

But the shared syslog and immnd traces are from SC-1.

Provide the syslog amd immnd traces where problem is observed. with time of 
problem seen.




---

** [tickets:#2103] IMM: Performance: saImmOmSearchInitialize_2 API failed with 
BAD_HANDLE**

**Status:** needinfo
**Milestone:** 5.2.FC
**Created:** Fri Oct 07, 2016 03:36 PM UTC by Chani Srivastava
**Last Updated:** Fri Oct 07, 2016 03:37 PM UTC
**Owner:** nobody
**Attachments:**

- 
[perfSearchInit.zip](https://sourceforge.net/p/opensaf/tickets/2103/attachment/perfSearchInit.zip)
 (25.9 MB; application/zip)


OS : Suse 64bit
Changeset : 8190 
Setup : 4 nodes 1 PBE enabled with around 70k objects

Steps:
1. Bring up opensaf on four node cluster with 1 PBE enabled
2. Create a load of around 70k objects
3. set IMMA_MAX_OPEN_SEARCHES_PER_HANDLE=200
3. Call saImmOmSearchInitialize_2() in a loop

After around 65 calls SearchInitialize api is failing with ERR_BAD_HANDLE

Syslog, immnd trace and agent traces attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1889 Immnd crashed on Payload during headless operation

2016-11-07 Thread Neelakanta Reddy
- **status**: unassigned --> needinfo
- **Comment**:

Provide the following logs:
1.syslog on all nodes
2. immnd traces on controllers and payload where the IMMND is asserted



---

** [tickets:#1889] Immnd crashed on Payload during headless operation**

**Status:** needinfo
**Milestone:** 5.0.2
**Created:** Tue Jun 21, 2016 09:50 AM UTC by Ritu Raj
**Last Updated:** Tue Sep 20, 2016 05:42 PM UTC
**Owner:** nobody
**Attachments:**

- 
[SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1889/attachment/SC-1.tar.bz2)
 (7.6 MB; application/x-bzip)
- 
[SCALE_SLOT-75.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1889/attachment/SCALE_SLOT-75.tar.bz2)
 (4.8 MB; application/x-bzip)


setup:
Version - opensaf 5.0.GA
6-Node cluster(SC-1:Active, SC-2:Standby, SC-3:Spare PL:4,PL-5: Payloads)

* Issue Observed:
Immnd crashed on Payload during headless operation

* Steps performed: 
(1). Invoke headless 
(2). Created logsv application stream after headless
(3). Closed the stream after performing write operation
(4). While reverting back to default configuration one of the CCB operation 
failed

>> SCALE_SLOT-75 osafimmnd[18906]: WA ERR_FAILED_OPERATION: ccb 1 is not in an 
>> expected state: 11 rejecting ccbObjectModify operation
>>
immcfg -a saLogStreamLogFullAction=3 
safLgStrCfg=saLogNotification,safApp=safLogService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_FAILED_OPERATION (21)
OI reports: IMM: Resource abort: CCB is not in an expected state
error - saImmOmCcbApply FAILED: SA_AIS_ERR_FAILED_OPERATION (21)

(5). On invoking second headless immnd crashed on both the payload

>>
Jun 21 14:27:53 SCALE_SLOT-75 osafimmnd[18906]: ImmModel.cc:648: 
immModel_abortNonCriticalCcbs: **Assertion 'immModel_ccbAbort(cb, (*i3)->mId, 
, , , , , )' 
failed**.
Jun 21 14:27:53 SCALE_SLOT-75 osafamfnd[18925]: NO 
'safSu=PL-5,safSg=NoRed,safApp=OpenSAF' component restart probation timer 
started (timeout: 600 ns)
Jun 21 14:27:53 SCALE_SLOT-75 osafamfnd[18925]: NO Restarting a component of 
'safSu=PL-5,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Jun 21 14:27:53 SCALE_SLOT-75 osafamfnd[18925]: NO 
'**safComp=IMMND,safSu=PL-5,safSg=NoRed,safApp=OpenSAF' faulted** due to 
'avaDown' : Recovery is 'componentRestart'
Jun 21 14:27:53 SCALE_SLOT-75 osafimmnd[19167]: Started


* Syslog and Immnd trace file is attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2158 AMF: IMMND dies at Opensaf start up phase causes AMFD heartbeat timeout

2016-11-07 Thread Praveen
Hi Minh,

Is this issue observed in normal cluster (SC absecnce and Spare SC features 
disabled)?
In normal cluster, in failover situation FM makes standby SC active. Since 
other controller is still not standby, FM should reboot it as it did in #1334.

Also even if AMFND on standby SC receives su_pres message from AMFD, it will 
not be able to read the comp configuration as IMMND is down. 

Thanks,
Praveen


---

** [tickets:#2158] AMF: IMMND dies at Opensaf start up phase causes AMFD 
heartbeat timeout**

**Status:** unassigned
**Milestone:** 5.0.2
**Created:** Wed Nov 02, 2016 05:20 AM UTC by Minh Hon Chau
**Last Updated:** Wed Nov 02, 2016 05:20 AM UTC
**Owner:** nobody
**Attachments:**

- 
[osafamfnd_sc2](https://sourceforge.net/p/opensaf/tickets/2158/attachment/osafamfnd_sc2)
 (264.2 kB; application/octet-stream)


If IMMND dies at Opensaf startup phase, IMMND is not restarted by AMF. The 
issue has been observed in following situation
- Restart cluster
- During active controller starts up, a critical component is death which cause 
a node failfast
Oct 25 12:51:21 SC-1 osafamfnd[7642]: ER 
safComp=ABC,safSu=1,safSg=2N,safApp=ABC Faulted due to:csiSetcallbackTimeout 
Recovery is:nodeFailfast
Oct 25 12:51:21 SC-1 osafamfnd[7642]: Rebooting OpenSAF NodeId = 131343 EE Name 
= , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, 
SupervisionTime = 60
- In the meantime, standby controller is requested to become active
Oct 25 12:51:27 SC-2 tipclog[16221]: Lost link <1.1.2:eth0-1.1.1:eth0> on 
network plane A
Oct 25 12:51:27 SC-2 osafclmna[4336]: NO Starting to promote this node to a 
system controller
Oct 25 12:51:27 SC-2 osafrded[4387]: NO Requesting ACTIVE role
- IMMND is also death a bit later
Oct 25 12:51:29 SC-2 osafimmnd[4536]: ER MESSAGE:44816 OUT OF ORDER my highest 
processed:44814 - exiting
Oct 25 12:51:29 SC-2 osafamfnd[7414]: NO saClmDispatch BAD_HANDLE
- Other services could not initialize other services since IMMND is death
Oct 25 12:51:39 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:51:39 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:51:39 SC-2 osafntfimcnd[7501]: WA ntfimcn_ntf_init saNtfInitialize( 
returned SA_AIS_ERR_TIMEOUT (5)
Oct 25 12:51:39 SC-2 osafclmd[7386]: WA saImmOiImplementerSet returned 9
Oct 25 12:51:39 SC-2 osafntfd[7372]: WA saLogInitialize returns try again, 
retries...
Oct 25 12:51:39 SC-2 osaflogd[7358]: WA saImmOiImplementerSet returned 
SA_AIS_ERR_BAD_HANDLE (9)
Oct 25 12:51:39 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5

Oct 25 12:51:49 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:51:50 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:51:50 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5

Oct 25 12:52:00 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:52:00 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:52:00 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5

Oct 25 12:52:20 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:52:20 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:52:20 SC-2 osafimmd[4489]: NO Extended intro from node 2210f

- At the end, AMFD heart beat timeout 
Oct 25 12:53:57 SC-2 osafntfimcnd[7501]: WA ntfimcn_ntf_init saNtfInitialize( 
returned SA_AIS_ERR_TIMEOUT (5)
Oct 25 12:54:01 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:54:01 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:54:01 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:54:07 SC-2 osafntfimcnd[7501]: WA ntfimcn_ntf_init saNtfInitialize( 
returned SA_AIS_ERR_TIMEOUT (5)
Oct 25 12:54:11 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:54:11 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:54:11 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:54:15 SC-2 osafamfnd[7414]: ER AMF director heart beat timeout, 
generating core for amfd

In AMFND trace in SC2, AMFND did not receive su_pres from AMFD, therefore AMFND 
could not initiate middleware components (including IMMND), so AMFND was not 
aware of IMMND's death so that AMFND can restart IMMND. The problem here is 
slightly different from #1828, which happened in newly promoted SC (with 
roamingSC feature) where AMFND had IMMND registered.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. 

[tickets] [opensaf:tickets] #2178 MDS: mdstest 13 14 failed

2016-11-07 Thread Quyen Dao



---

** [tickets:#2178] MDS: mdstest 13 14 failed**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Tue Nov 08, 2016 06:58 AM UTC by Quyen Dao
**Last Updated:** Tue Nov 08, 2016 06:58 AM UTC
**Owner:** nobody
**Attachments:**

- 
[mdstest_13_14_mds.log](https://sourceforge.net/p/opensaf/tickets/2178/attachment/mdstest_13_14_mds.log)
 (65.8 kB; application/octet-stream)


Changeset: 8287:bcc7af78a5a7
OS: Ubuntu 16.04
MDS transport: TCP

root@SC-1:~# export MDS_LOG_LEVEL=5
root@SC-1:~# mdstest 13 14

Suite 13: Direct Just Send test cases
/ntet_initialise_setup: Get an ADEST handle,Create PWE=2 on ADEST,Install 
EXTMIN and INTMIN svc on ADEST,Install INTMIN,EXTMIN services on ADEST's PWE=2,
Create VDEST 100 and VDEST 200,Change the role of VDEST 200 to ACTIVE,
Install EXTMIN  service on VDEST 100,Install INTMIN, EXTMIN services on 
VDEST 200

ADEST <2010f021f > : GET_HDLS is SUCCESSFUL
 100 : VDEST_CREATE is SUCCESSFUL
 200 : VDEST_CREATE is SUCCESSFUL
VDEST_CHANGE ROLE to 1 is SUCCESSFULL
PWE_CREATE is SUCCESSFUL : PWE = 2
 256 : SERVICE INSTALL is SUCCESSFULL
 512 : SERVICE INSTALL is SUCCESSFULL
 256 : SERVICE INSTALL is SUCCESSFULL
 512 : SERVICE INSTALL is SUCCESSFULL
 512 : SERVICE INSTALL is SUCCESSFULL
 256 : SERVICE INSTALL is SUCCESSFULL
 512 : SERVICE INSTALL is SUCCESSFULL
 MDS SERVICE SUBSCRIBE is SUCCESSFULL
The Subscriber Service id = 512 is on ADEST
UP: Subscribed Svc = 512 with svc pvt ver = 1 is UP on dest=  anchor= <0> 
role= 1 with PWE id = 1 on node = 2010f

The Subscriber Service id = 512 is on ADEST
UP: Subscribed Svc = 512 with svc pvt ver = 3 is UP on dest= <2010f021f> 
anchor= <0> role= 1 with PWE id = 1 on node = 2010f

 MDS RETRIEVE is SUCCESSFULL
Test Case 14: Not able to send a message of size >(MDS_DIRECT_BUF_MAXSIZE) to 
2000

Request to ncsmds_api: MDS DIRECT SEND is SUCCESSFULL
Fail

Cancel subscription

 MDS CANCEL SUBSCRIBE is SUCCESSFULLUninstalling the services on both 
VDESTs and ADEST

 UnInstalling the Services on both the VDESTs

 MDS RETRIEVE is SUCCESSFULL
 512 : SERVICE UNINSTALL is SUCCESSFULL
 MDS RETRIEVE is SUCCESSFULL
 256 : SERVICE UNINSTALL is SUCCESSFULL
 MDS RETRIEVE is SUCCESSFULL
 512 : SERVICE UNINSTALL is SUCCESSFULL
Destroying the VDESTS
Destroying both the VDESTs and PWE=2 on ADEST

VDEST_CHANGE ROLE to 2 is SUCCESSFULL
 200 : VDEST_DESTROY is SUCCESSFULL
VDEST_CHANGE ROLE to 2 is SUCCESSFULL
 100 : VDEST_DESTROY is SUCCESSFULL
Direct Receive callback
 The Sender service is = 512 is on <2010f021f> destination with anchor = 
<2010f021f> on Node = 2010f with msg fmt ver=1

The Receiver service is = 512 is on <2010f021f> destination

Received Message len = 8002 and the message 
is=sss
 
s
 

[tickets] [opensaf:tickets] #2177 MDS: mdstest 10 1 failed

2016-11-07 Thread Quyen Dao



---

** [tickets:#2177] MDS: mdstest 10 1 failed**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Tue Nov 08, 2016 06:48 AM UTC by Quyen Dao
**Last Updated:** Tue Nov 08, 2016 06:48 AM UTC
**Owner:** nobody
**Attachments:**

- 
[mdstest_10_1_mds.log](https://sourceforge.net/p/opensaf/tickets/2177/attachment/mdstest_10_1_mds.log)
 (121.2 kB; application/octet-stream)


Changeset: 8287:bcc7af78a5a7
OS: Ubuntu 16.04
MDS transport: TCP

root@SC-1:~# export MDS_LOG_LEVEL=5
root@SC-1:~# mdstest 10 1

Suite 10: Send All test cases

Test Case 1: Sender service installed with i_fail_no_active_sends = true and 
there is no-active instance of the receiver service
Setting up the setup
/ntet_initialise_setup: Get an ADEST handle,Create PWE=2 on ADEST,Install 
EXTMIN and INTMIN svc on ADEST,Install INTMIN,EXTMIN services on ADEST's PWE=2,
Create VDEST 100 and VDEST 200,Change the role of VDEST 200 to ACTIVE,
Install EXTMIN  service on VDEST 100,Install INTMIN, EXTMIN services on 
VDEST 200

ADEST <2010f020a > : GET_HDLS is SUCCESSFUL
 100 : VDEST_CREATE is SUCCESSFUL
 200 : VDEST_CREATE is SUCCESSFUL
VDEST_CHANGE ROLE to 1 is SUCCESSFULL
PWE_CREATE is SUCCESSFUL : PWE = 2
 256 : SERVICE INSTALL is SUCCESSFULL
 512 : SERVICE INSTALL is SUCCESSFULL
 256 : SERVICE INSTALL is SUCCESSFULL
 512 : SERVICE INSTALL is SUCCESSFULL
 512 : SERVICE INSTALL is SUCCESSFULL
 256 : SERVICE INSTALL is SUCCESSFULL
 512 : SERVICE INSTALL is SUCCESSFULL
 MDS SERVICE SUBSCRIBE is SUCCESSFULL
VDEST_CHANGE ROLE to 2 is SUCCESSFULL
 MDS RETRIEVE is SUCCESSFULL Sending the message to no active instance
Encoding the message sent Sender svc = 512 with msg fmt ver =0
Successfully encoded message for Receiver svc = 512

MDS SEND is SUCCESSFULL

Fail

Sendack to the no active instance

MDS SEND ACK has failed as there is no active instance

Success

Send response to the no active instance

Request to ncsmds_api: MDS SEND RESPONSE has no active instance

Change role to active

VDEST_CHANGE ROLE to 1 is SUCCESSFULL
The Subscriber Service id = 512 is on ADEST
UP: Subscribed Svc = 512 with svc pvt ver = 1 is UP on dest=  anchor= <0> 
role= 1 with PWE id = 1 on node = 2010f

The Subscriber Service id = 512 is on ADEST
UP: Subscribed Svc = 512 with svc pvt ver = 3 is UP on dest= <2010f020a> 
anchor= <0> role= 1 with PWE id = 1 on node = 2010f

The Subscriber Service id = 512 is on ADEST
NO ACTIVE: Received NO ACTIVE Event
 In the system no active instance of Subscribed srv= 512 with svc pvt ver = 1 
on dest=  found

The Subscriber Service id = 512 is on ADEST
NEW ACTIVE: Received NEW_ACTIVE Event
 In the system atleast one active instance of Subscribed service = 512 with svc 
pvt ver = 1  on destinatin =  found

 MDS RETRIEVE is SUCCESSFULL
Task has been Created

Inside Receiver Thread

The service which is sending the message is = 512
The service to which the message needs to be delivered = 512
 Got the message: trying to retreive it

The Sender service = 512 is on destination = with anchor = <2010f020a> 
Node 2010f and msg fmt ver = 3
The Receiver service = 512 is on destination =<2010f020a>

Received Message len = 30
The message is= Hi Receiver! Are you there?
 MDS RETRIEVE is SUCCESSFULL
VDEST_CHANGE ROLE to 2 is SUCCESSFULL
The service which is sending the message is = 512
The service to which the message needs to be delivered = 512
 MDS RESPONSE is SUCCESSFULL

 MDS SEND RESPONSE is SUCCESSFULL
The response got from the receiver is :
 message length = 33
 message =  Hi Sender! My Name is RECEIVER
Success

TASK is released

 MDS CANCEL SUBSCRIBE is SUCCESSFULLUninstalling the services on both 
VDESTs and ADEST

 UnInstalling the Services on both the VDESTs

 MDS RETRIEVE is SUCCESSFULL
 512 : SERVICE UNINSTALL is SUCCESSFULL
 MDS RETRIEVE is SUCCESSFULL
 256 : SERVICE UNINSTALL is SUCCESSFULL
 MDS RETRIEVE is SUCCESSFULL
 512 : SERVICE UNINSTALL is SUCCESSFULL
Destroying the VDESTS
Destroying both the VDESTs and PWE=2 on ADEST

VDEST_CHANGE ROLE to 2 is SUCCESSFULL
 200 : VDEST_DESTROY is SUCCESSFULL
VDEST_CHANGE ROLE to 2 is SUCCESSFULL
 100 : VDEST_DESTROY is SUCCESSFULL
The Subscriber Service id = 512 is on ADEST
NO ACTIVE: Received NO ACTIVE Event
 In the system no active instance of Subscribed srv= 512 with svc pvt ver = 1 
on dest=  found

 MDS RETRIEVE is SUCCESSFULL
 512 : SERVICE UNINSTALL is SUCCESSFULL
 MDS RETRIEVE is SUCCESSFULL
 256 : SERVICE UNINSTALL is SUCCESSFULL
 ADEST : PWE 2 : Uninstalling Services 2000/INTMIN

 MDS RETRIEVE is SUCCESSFULL
 512 : SERVICE UNINSTALL is SUCCESSFULL
 MDS RETRIEVE is SUCCESSFULL
 256 : SERVICE UNINSTALL is SUCCESSFULL
ADEST PWE2 Destroyed

ADEST: PWE_DESTROY is SUCCESSFUL1  FAILED   Sender service installed with 
i_fail_no_active_sends = true and there is no-active instance of the receiver 
service (expected OUT_OF_RANGE, got SA_AIS_OK (1));

=


[tickets] [opensaf:tickets] #2175 amfd: null SU during CCB modify apply

2016-11-07 Thread Gary Lee
- **Milestone**: 5.2.FC --> 5.1.1



---

** [tickets:#2175] amfd: null SU during CCB modify apply**

**Status:** unassigned
**Milestone:** 5.1.1
**Created:** Tue Nov 08, 2016 06:37 AM UTC by Gary Lee
**Last Updated:** Tue Nov 08, 2016 06:37 AM UTC
**Owner:** nobody


su is NULL and subsequently causes a segfault.

line 1833 corresponds to su->saAmfSUMaintenanceCampaign = "";

Cause is probably the same as #1932?

~~~
Full backtrace:
#0 0x7f5ee5adc036 in std::string::assign(char const*, unsigned long) () 
from /usr/lib64/libstdc++.so.6
No symbol table info available.
#1 0x00495ab9 in assign (__s=0x4c8b33 "", this=0x28) at 
/usr/include/c++/4.8/bits/basic_string.h:1131
No locals.
#2 operator= (__s=0x4c8b33 "", this=0x28) at 
/usr/include/c++/4.8/bits/basic_string.h:555
No locals.
#3 su_ccb_apply_modify_hdlr (opdata=opdata@entry=0x2580cf4) at 
../../../../../../../opensaf/osaf/services/saf/amf/amfd/su.cc:1833
attr_mod = 0x2580f48
i = 
su = 0x0
value_is_deleted = true
_FUNCTION_ = "su_ccb_apply_modify_hdlr"
#4 0x00498d78 in su_ccb_apply_cb (opdata=0x2580cf4) at 
../../../../../../../opensaf/osaf/services/saf/amf/amfd/su.cc:1985
su = 
_FUNCTION_ = "su_ccb_apply_cb"
#5 0x00439fa6 in ccb_apply_cb (immoi_handle=, 
ccb_id=218) at 
../../../../../../../opensaf/osaf/services/saf/amf/amfd/imm.cc:1226
ccb_util_ccb_data = 
type = 
temp = 
_FUNCTION_ = "ccb_apply_cb"
opdata = 0x0
next = 0x2517540
#6 0x7f5ee6818329 in imma_process_callback_info (cb=cb@entry=0x7f5ee6a373a0 
, cl_node=0x2517cc0, callback=callback@entry=0x7f5ed8004b60, 
immHandle=901943263503) at 
../../../../../../../opensaf/osaf/libs/agents/saf/imma/imma_proc.c:2245
ccbid = 218
privateAugOmHandle = 0
_FUNCTION_ = "imma_process_callback_info"
clientCapable = true
isPbeOp = false
isExtendedNameValid = false
isAttrExtendedName = false
#7 0x7f5ee681aec9 in imma_hdl_callbk_dispatch_all (cb=0x7f5ee6a373a0 
, immHandle=901943263503) at 
../../../../../../../opensaf/osaf/libs/agents/saf/imma/imma_proc.c:1732
callback = 0x7f5ed8004b60
cl_node = 0x2517cc0
#8 0x7f5ee680efc4 in saImmOiDispatch (immOiHandle=901943263503, 
dispatchFlags=SA_DISPATCH_ALL) at 
../../../../../../../opensaf/osaf/libs/agents/saf/imma/imma_oi_api.c:609
rc = SA_AIS_OK
cl_node = 0x0
locked = false
pend_fin = 0
pend_dis = 0
_FUNCTION_ = "saImmOiDispatch"
#9 0x00407b90 in main_loop () at 
../../../../../../../opensaf/osaf/services/saf/amf/amfd/main.cc:722
pollretval = 
evt = 
polltmo = 
term_fd = 14
cb = 0x6e8900 <_control_block>
error = 
#10 main (argc=, argv=) at 
../../../../../../../opensaf/osaf/services/saf/amf/amfd/main.cc:848
~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2176 MDS: mdstest 5 9 failed

2016-11-07 Thread Quyen Dao



---

** [tickets:#2176] MDS: mdstest 5 9 failed**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Tue Nov 08, 2016 06:38 AM UTC by Quyen Dao
**Last Updated:** Tue Nov 08, 2016 06:38 AM UTC
**Owner:** nobody
**Attachments:**

- 
[mdstest_5_9_mds.log](https://sourceforge.net/p/opensaf/tickets/2176/attachment/mdstest_5_9_mds.log)
 (29.8 kB; application/octet-stream)


Changeset: 8287:bcc7af78a5a7
OS: Ubuntu 16.04
MDS transport: TCP

root@SC-1:~# export MDS_LOG_LEVEL=5
root@SC-1:~# mdstest 5 9

Suite 5: Subscribe ADEST

Getting an ADEST handle

ADEST <2010f0205 > : GET_HDLS is SUCCESSFUL
Installing the services 500,600,700 with CHASSIS scope

 500 : SERVICE INSTALL is SUCCESSFULL
 600 : SERVICE INSTALL is SUCCESSFULL
 700 : SERVICE INSTALL is SUCCESSFULL
Test Case 9: 500 Subscription to:600,700 in two seperate Subscription calls but 
Cancels both in a single cancellation call

Action: Subscribe 500 to 600

 MDS SERVICE SUBSCRIBE is SUCCESSFULL
Action: Subscribe 500 to 700

 MDS SERVICE SUBSCRIBE is SUCCESSFULL
Action: Retreive three times, third shall fail

Request to ncsmds_api: MDS RETRIEVE has FAILED

Fail mds_service_retrieve

Request to ncsmds_api: MDS RETRIEVE has FAILED

Fail mds_service_retrieve

Request to ncsmds_api: MDS RETRIEVE has FAILED

Action: Cancel subscription

 MDS CANCEL SUBSCRIBE is SUCCESSFULL
Success

Uninstalling all the services on this ADESt

 700 : SERVICE UNINSTALL is SUCCESSFULL
 600 : SERVICE UNINSTALL is SUCCESSFULL
 500 : SERVICE UNINSTALL is SUCCESSFULL9  FAILED500 Subscription 
to:600,700 in two seperate Subscription calls but Cancels both in a single 
cancellation call (expected OUT_OF_RANGE, got SA_AIS_OK (1));

=

   Test Result:
  Total:  1
  Passed: 0
  Failed: 1
root@SC-1:~#



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2175 amfd: null SU during CCB modify apply

2016-11-07 Thread Gary Lee



---

** [tickets:#2175] amfd: null SU during CCB modify apply**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Tue Nov 08, 2016 06:37 AM UTC by Gary Lee
**Last Updated:** Tue Nov 08, 2016 06:37 AM UTC
**Owner:** nobody


su is NULL and subsequently causes a segfault.

line 1833 corresponds to su->saAmfSUMaintenanceCampaign = "";

Cause is probably the same as #1932?

~~~
Full backtrace:
#0 0x7f5ee5adc036 in std::string::assign(char const*, unsigned long) () 
from /usr/lib64/libstdc++.so.6
No symbol table info available.
#1 0x00495ab9 in assign (__s=0x4c8b33 "", this=0x28) at 
/usr/include/c++/4.8/bits/basic_string.h:1131
No locals.
#2 operator= (__s=0x4c8b33 "", this=0x28) at 
/usr/include/c++/4.8/bits/basic_string.h:555
No locals.
#3 su_ccb_apply_modify_hdlr (opdata=opdata@entry=0x2580cf4) at 
../../../../../../../opensaf/osaf/services/saf/amf/amfd/su.cc:1833
attr_mod = 0x2580f48
i = 
su = 0x0
value_is_deleted = true
_FUNCTION_ = "su_ccb_apply_modify_hdlr"
#4 0x00498d78 in su_ccb_apply_cb (opdata=0x2580cf4) at 
../../../../../../../opensaf/osaf/services/saf/amf/amfd/su.cc:1985
su = 
_FUNCTION_ = "su_ccb_apply_cb"
#5 0x00439fa6 in ccb_apply_cb (immoi_handle=, 
ccb_id=218) at 
../../../../../../../opensaf/osaf/services/saf/amf/amfd/imm.cc:1226
ccb_util_ccb_data = 
type = 
temp = 
_FUNCTION_ = "ccb_apply_cb"
opdata = 0x0
next = 0x2517540
#6 0x7f5ee6818329 in imma_process_callback_info (cb=cb@entry=0x7f5ee6a373a0 
, cl_node=0x2517cc0, callback=callback@entry=0x7f5ed8004b60, 
immHandle=901943263503) at 
../../../../../../../opensaf/osaf/libs/agents/saf/imma/imma_proc.c:2245
ccbid = 218
privateAugOmHandle = 0
_FUNCTION_ = "imma_process_callback_info"
clientCapable = true
isPbeOp = false
isExtendedNameValid = false
isAttrExtendedName = false
#7 0x7f5ee681aec9 in imma_hdl_callbk_dispatch_all (cb=0x7f5ee6a373a0 
, immHandle=901943263503) at 
../../../../../../../opensaf/osaf/libs/agents/saf/imma/imma_proc.c:1732
callback = 0x7f5ed8004b60
cl_node = 0x2517cc0
#8 0x7f5ee680efc4 in saImmOiDispatch (immOiHandle=901943263503, 
dispatchFlags=SA_DISPATCH_ALL) at 
../../../../../../../opensaf/osaf/libs/agents/saf/imma/imma_oi_api.c:609
rc = SA_AIS_OK
cl_node = 0x0
locked = false
pend_fin = 0
pend_dis = 0
_FUNCTION_ = "saImmOiDispatch"
#9 0x00407b90 in main_loop () at 
../../../../../../../opensaf/osaf/services/saf/amf/amfd/main.cc:722
pollretval = 
evt = 
polltmo = 
term_fd = 14
cb = 0x6e8900 <_control_block>
error = 
#10 main (argc=, argv=) at 
../../../../../../../opensaf/osaf/services/saf/amf/amfd/main.cc:848
~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2174 MDS: mdstest 5 1 failed

2016-11-07 Thread Quyen Dao



---

** [tickets:#2174] MDS: mdstest 5 1 failed**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Tue Nov 08, 2016 06:29 AM UTC by Quyen Dao
**Last Updated:** Tue Nov 08, 2016 06:29 AM UTC
**Owner:** nobody
**Attachments:**

- 
[mdstest_5_1_mds.log](https://sourceforge.net/p/opensaf/tickets/2174/attachment/mdstest_5_1_mds.log)
 (29.2 kB; application/octet-stream)


Changeset: 8287:bcc7af78a5a7
OS: Ubuntu 16.04
MDS transport: TCP

root@SC-1:~# export MDS_LOG_LEVEL=5
root@SC-1:~# mdstest 5 1

Suite 5: Subscribe ADEST
Test Case 1: 500 Subscription to:600,700 where Install scope = Subscription 
scope

Getting an ADEST handle

ADEST <2010f01f3 > : GET_HDLS is SUCCESSFUL
Installing the services 500,600,700 with CHASSIS scope

 500 : SERVICE INSTALL is SUCCESSFULL
 600 : SERVICE INSTALL is SUCCESSFULL
 700 : SERVICE INSTALL is SUCCESSFULL
Action: Retrieve only ONE event

 MDS SERVICE SUBSCRIBE is SUCCESSFULL
Action: Retrieve only ONE event

Request to ncsmds_api: MDS RETRIEVE has FAILED
Fail, retrieve ONE

Action: Retrieve ALL event

The Subscriber Service id = 500 is on ADEST
UP: Subscribed Svc = 600 with svc pvt ver = 1 is UP on dest= <2010f01f3> 
anchor= <0> role= 1 with PWE id = 1 on node = 2010f

The Subscriber Service id = 500 is on ADEST
UP: Subscribed Svc = 700 with svc pvt ver = 1 is UP on dest= <2010f01f3> 
anchor= <0> role= 1 with PWE id = 1 on node = 2010f

 MDS RETRIEVE is SUCCESSFULL
Success

Action: Cancel subscription 500

 MDS CANCEL SUBSCRIBE is SUCCESSFULL
Success

Uninstalling all the services on this ADESt

 700 : SERVICE UNINSTALL is SUCCESSFULL
 600 : SERVICE UNINSTALL is SUCCESSFULL
 500 : SERVICE UNINSTALL is SUCCESSFULL1  FAILEDIn the NO_ACTIVE event 
notification, the remote service subpart version is set to the last active 
instance.s remote-service sub-part version (expected OUT_OF_RANGE, got 
SA_AIS_OK (1));

=

   Test Result:
  Total:  1
  Passed: 0
  Failed: 1
root@SC-1:~#


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2134 AMF: Update RTA saAmfSISUHAState to IMM

2016-11-07 Thread Praveen
Hi Minh,

si_rt_attr_cb() is invoked in AMFD context. So I think, this function can take 
into account fsm state to count only  AVD_SU_SI_STATE_ASGND for updating 
saAmfSINumCurrActiveAssignments and saAmfSINumCurrStandbyAssignments.

For quiescing state, when AMFND gets Response() from component for quiescing 
state (before quiescing complelte), AMFND can send a data update message to 
AMFD. Based on this data update, AMFD can update IMM for quiescing state. After 
quiescing complete AMFD gets response from AMFND for assignment, now AMFD can 
update IMM for queisced state.

Although saAmfSIAssignmentState is a runtime attribut, it is coupled with 
configurable attributes saAmfSIPrefActive/StandbyAssignments. So I think this 
attribute only reflects whether a SI is assigned in full capacilty or not. This 
attribute does not indicate if some assignment is happening or not because in 
N-Way and N-Way active model, a SI can go to many SUs, so its 
saAmfSIAssignmentState may remain in PARTIALLY_ASSIGNED state for a long time.

I think root cause of most issues is AMFD unecesary plays with "Curr" types of 
attributes by incrementing and decrementing them for SU and SI. For these 
attributes in SI and SU, all calcurations should be done dynamically inside the 
callbacks su_rt_attr_cb() and si_rt_attr_cb(). If this gets fixed I think what 
remians is updation of HA state in SUSI/COMPCSI. updation of 
saAmfSIAssignmentState in SI and notifications. Can we evalate to get rid of 
updating "Curr" attributes and work without them. In some red models these 
"Curr" attributes are used in assignment algorithm.Those can be replaced with 
their function equivalent like su->get_saAmfSUNumCurrActiveSIs(). These same 
functions can be called from callbacks also and these functions can take into 
acount fsm states, list_of_sisu etc.
Will this help and make things consistent? We will have to evaluate this. 
But that will more like an enhancmement and #1354 can be made an ehnacment to 
bring such consistency.
Can we fix #2134 and #2133 in the existing way and document, if required, in PR?


Thanks,
Praveen




---

** [tickets:#2134] AMF: Update RTA saAmfSISUHAState to IMM**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Thu Oct 20, 2016 07:58 PM UTC by Minh Hon Chau
**Last Updated:** Mon Oct 24, 2016 01:23 PM UTC
**Owner:** nobody


In scenario of 2N Si-swap, when AMFD sends QUIESCED su_si assignment msg (for 
example) to AMFND that changes the HA State of SUSI assignment, AMFD updates 
its local state AVD_SU_SI_REL::state, checkpoint this change to standby AMFD. 
However, AMFD does not updates saAmfSISUHAState untill receiving su_si 
assignment response. Question:
(1). Whether AMFD should update the runtime attribute saAmfSISUHAState to IMM 
as long as local @state gets updated in implementer; to make IMM, active AMFD, 
standby AMFD all are synced
(2). Or AMFD updates saAmfSISUHAState to IMM only if AMFD receives su_si 
assignment from AMFND, as it has been implemented currently for some reason 
(not expose the change of saAmfSISUHAState to user too early?)

grep "avd_susi_update" which updates saAmfSISUHAState to IMM, there is also an 
inconsistency in usage. For avd_susi_mod_send() sends su_si msg and also 
updates saAmfSISUHAState immediately, while avd_sg_su_si_mod_snd does 
otherwise. 

Since the headless recovery relies on IMM to restore the state. If 
saAmfSISUHAState is not updated punctually and the node is reboot during 
headless stage, so after headless saAmfSISUHAState read from IMM does not fit 
with many other states (SG fsm, SUSI fsm, saAmfSISUHAState of the other SUSIs).

My question is if doing (1) will cause any problem for normal cluster? Pending 
patches #1725 part 2 currently implement (1).



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2169 amf: su containing proxy comps is not working properly after a su restart recovery

2016-11-07 Thread Long HB Nguyen
Hi Nagu,

The changeset is 8272. I used the proxy demo and the config file (.xml) in 
samples/amf/proxy.


---

** [tickets:#2169] amf: su containing proxy comps is not working properly after 
a su restart recovery**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Fri Nov 04, 2016 09:45 AM UTC by Long HB Nguyen
**Last Updated:** Tue Nov 08, 2016 06:05 AM UTC
**Owner:** Nagendra Kumar
**Attachments:**

- 
[osaftestLog-2016-11-04_14-55-08.tgz](https://sourceforge.net/p/opensaf/tickets/2169/attachment/osaftestLog-2016-11-04_14-55-08.tgz)
 (1.2 MB; application/x-gzip-compressed)


- Description:
When a proxy component restart is escalated to a su restart, then the SU is not 
working properly after that (e.g. lock failed).

- Reproduction:
1) Use the proxy demo in amf samples.
2) Unlock-in/unlock proxy SU, proxied SU.
3) Kill the proxy process some times to take the recovery escalation from comp 
restart to su restart.
4) Lock the proxy SU => timeout.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2169 amf: su containing proxy comps is not working properly after a su restart recovery

2016-11-07 Thread Nagendra Kumar
Also, please share the config file(.xml) or configuration details.


---

** [tickets:#2169] amf: su containing proxy comps is not working properly after 
a su restart recovery**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Fri Nov 04, 2016 09:45 AM UTC by Long HB Nguyen
**Last Updated:** Tue Nov 08, 2016 06:02 AM UTC
**Owner:** Nagendra Kumar
**Attachments:**

- 
[osaftestLog-2016-11-04_14-55-08.tgz](https://sourceforge.net/p/opensaf/tickets/2169/attachment/osaftestLog-2016-11-04_14-55-08.tgz)
 (1.2 MB; application/x-gzip-compressed)


- Description:
When a proxy component restart is escalated to a su restart, then the SU is not 
working properly after that (e.g. lock failed).

- Reproduction:
1) Use the proxy demo in amf samples.
2) Unlock-in/unlock proxy SU, proxied SU.
3) Kill the proxy process some times to take the recovery escalation from comp 
restart to su restart.
4) Lock the proxy SU => timeout.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2169 amf: su containing proxy comps is not working properly after a su restart recovery

2016-11-07 Thread Nagendra Kumar
Hi Long HB Nguyen, Can you please update the version or change set.


---

** [tickets:#2169] amf: su containing proxy comps is not working properly after 
a su restart recovery**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Fri Nov 04, 2016 09:45 AM UTC by Long HB Nguyen
**Last Updated:** Mon Nov 07, 2016 05:12 AM UTC
**Owner:** Nagendra Kumar
**Attachments:**

- 
[osaftestLog-2016-11-04_14-55-08.tgz](https://sourceforge.net/p/opensaf/tickets/2169/attachment/osaftestLog-2016-11-04_14-55-08.tgz)
 (1.2 MB; application/x-gzip-compressed)


- Description:
When a proxy component restart is escalated to a su restart, then the SU is not 
working properly after that (e.g. lock failed).

- Reproduction:
1) Use the proxy demo in amf samples.
2) Unlock-in/unlock proxy SU, proxied SU.
3) Kill the proxy process some times to take the recovery escalation from comp 
restart to su restart.
4) Lock the proxy SU => timeout.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #171 saAmfComponentUnregister should be unexposed to handle obtained from B 4 1 version

2016-11-07 Thread Nagendra Kumar
- **status**: accepted --> review



---

** [tickets:#171] saAmfComponentUnregister should be unexposed to handle 
obtained from B 4 1 version**

**Status:** review
**Milestone:** 5.2.FC
**Created:** Tue May 14, 2013 06:01 AM UTC by Nagendra Kumar
**Last Updated:** Mon Nov 07, 2016 11:47 AM UTC
**Owner:** Nagendra Kumar


Migrated from http://devel.opensaf.org/ticket/2019

saAmfComponentUnregister api should return SA_AIS_ERR_VERSION when called with 
handle obtained from B 4 1 version.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync

2016-11-07 Thread Minh Hon Chau
- **status**: assigned --> review



---

** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless 
sync**

**Status:** review
**Milestone:** 5.2.FC
**Labels:** headless recovery 
**Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau
**Last Updated:** Mon Nov 07, 2016 05:11 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) 
(1.4 MB; application/x-compressed)


Test steps:
- Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 
(standby assignment)
- Stop SCs
- Stop PL4
- Restart SC1
- Restart SC2
- Since PL4 is stopped, headless sync will be time out in 10 secs. During this 
10 secs, reboot SC1 to trigger SC failover
Observation: SC2 becomes active controller, cold sync complete, but SU5 still 
has standby assignment.

When SC2 becomes active controller, the part of code that performs headless 
recovery is not executed (function failover_absent_assignment()). Therefore, 
the transient assignments remain after SC failover.

Log/trace are attached.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1902 AMF: Extend escalation support during headless

2016-11-07 Thread Minh Hon Chau
- **status**: unassigned --> assigned
- **Comment**:

The escalation supported during headless has been implemented and pushed under 
changeset

changeset:   8276:9c4b2ab58ad2
changeset:   8275:af4e966fa808

Mark this ticket as Assigned to continue testing componentFailover



---

** [tickets:#1902] AMF: Extend escalation support during headless**

**Status:** assigned
**Milestone:** 5.2.FC
**Created:** Wed Jun 29, 2016 12:02 PM UTC by Minh Hon Chau
**Last Updated:** Mon Aug 29, 2016 12:41 PM UTC
**Owner:** Minh Hon Chau


If a comp/su failover occurs during headless, amfnd will escalate to reboot. 
This will unexpectedly impact on other comp/su which are up and running if 
there's no node failover escalation configured on this faulty comp/su 

2016-06-29 21:30:07 PL-4 osafamfnd[429]: NO 
'safComp=AmfDemo2,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' faulted due 
to 'avaDown' : Recovery is 'suFailover'
2016-06-29 21:30:07 PL-4 osafamfnd[429]: NO Terminating components of 
'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon'(abruptly & unordered)
2016-06-29 21:30:07 PL-4 osafamfnd[429]: NO 
'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' Presence State INSTANTIATED => 
TERMINATING
2016-06-29 21:30:07 PL-4 osafamfnd[429]: NO 
'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' Presence State TERMINATING => 
TERMINATING
2016-06-29 21:30:07 PL-4 osafamfnd[429]: NO 
'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' Presence State TERMINATING => 
TERMINATING
2016-06-29 21:30:07 PL-4 osafamfnd[429]: Rebooting OpenSAF NodeId = 132111 EE 
Name = , Reason: Can't perform recovery while controllers are down. Recovery is 
node failfast., OwnNodeId = 132111, SupervisionTime = 60
2016-06-29 21:30:07 PL-4 opensaf_reboot: Rebooting local node; timeout=60

This ticket will remove unexpected reboot due to failover during headless which 
is mentioned as limitation in AMF opensaf documentation.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2139 smf: use IMM appler for the change in longDnsAllowed attributes

2016-11-07 Thread elunlen
- **status**: unassigned --> accepted
- **assigned_to**: elunlen
- **Milestone**: future --> 5.1.1
- **Comment**:

Found yet another problem where this setting is not read. If long dn setting is 
done after SMF is started  and some objects e.g. bundle objects are created 
using long DN it will fail because SMF will not detect the new DN setting until 
after execution of a campaign is started (or a node restart)



---

** [tickets:#2139] smf: use IMM appler for the change in longDnsAllowed 
attributes**

**Status:** accepted
**Milestone:** 5.1.1
**Created:** Tue Oct 25, 2016 10:14 AM UTC by Neelakanta Reddy
**Last Updated:** Tue Oct 25, 2016 10:14 AM UTC
**Owner:** elunlen


SMF is reading IMM attribute longDnsAllowed, to check if longDnsAllowed is 
enabled/disabled.

There is redundant code, by calling 
read_IMM_long_DN_config_and_set_control_block each time and update 
the corresponding longdn values in SMF.

The, alternate solution is to have an applier in SMFD for OpensafImm class in 
seperate thread.so that any modification can be updated to the SMF global 
variables.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1835 Imm: Immd helathcheck callback got timed-out on active controller when starting opensaf on PL-4 and stopping opensaf on PL-3 simultaneously.

2016-11-07 Thread Neelakanta Reddy
- **status**: unassigned --> invalid
- **Comment**:

The ticket #1219 is closed as invalid.
re-open the ticket if the problem is observed again



---

** [tickets:#1835] Imm: Immd helathcheck callback got timed-out on active 
controller when starting opensaf on PL-4 and stopping opensaf on PL-3 
simultaneously.**

**Status:** invalid
**Milestone:** 5.0.2
**Created:** Tue May 17, 2016 10:27 AM UTC by Madhurika Koppula
**Last Updated:** Tue Sep 20, 2016 06:04 PM UTC
**Owner:** nobody
**Attachments:**

- 
[messages_SC-1](https://sourceforge.net/p/opensaf/tickets/1835/attachment/messages_SC-1)
 (794.5 kB; application/octet-stream)


Setup:
Changeset- 7613
Version - opensaf 5.0
4 nodes cluster with single PBE.

Reproducible steps:

1) Bring up Active controller, standby controller and any payload PL-3.
2) Now bringup payload Pl-4 and stop opensaf on payload PL-3 during Immnd 
start-up sync of PL-4.

Below is the snippet of Immd helathcheck callback time-out on active controller 
SC-1.


May 17 15:00:25 REG-S1 osafmsgd[11279]: ER saImmOiImplementerSet failed with 
return value=6
May 17 15:01:35 REG-S1 osafimmloadd: ER Too many TRY_AGAIN on saImmOmSearchNext 
- aborting
May 17 15:01:35 REG-S1 osafimmnd[11165]: ER SYNC APPARENTLY FAILED status:1
May 17 15:01:35 REG-S1 osafimmnd[11165]: NO -SERVER STATE: 
IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY
May 17 15:01:35 REG-S1 osafimmnd[11165]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE (2761)
May 17 15:01:35 REG-S1 osafimmnd[11165]: NO Epoch set to 8 in ImmModel
May 17 15:01:35 REG-S1 osafimmnd[11165]: NO Coord broadcasting ABORT_SYNC, 
epoch:8

May 17 15:05:13 REG-S1 osafamfnd[11227]: NO SU failover probation timer started 
(timeout: 12000 ns)
May 17 15:05:13 REG-S1 osafamfnd[11227]: NO Performing failover of 
'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)

**May 17 15:05:13 REG-S1 osafamfnd[11227]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated 
from 'componentFailover' to 'suFailover'
May 17 15:05:13 REG-S1 osafamfnd[11227]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
May 17 15:05:13 REG-S1 osafamfnd[11227]: ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due 
to:healthCheckcallbackTimeout Recovery is:suFailover**

May 17 15:05:13 REG-S1 osafamfnd[11227]: Rebooting OpenSAF NodeId = 131343 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
May 17 15:05:13 REG-S1 opensaf_reboot: Rebooting local node; timeout=60
May 17 15:05:17 REG-S1 kernel: [21682.049674] md: stopping all md devices.

Attaching the syslog of Active controller.
Immnd traces are huge to attach.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #56 IMM: IMMsv cannot be gracefully stopped

2016-11-07 Thread Neelakanta Reddy
defect #1747 has opened for inconsistancy of IMM in stopping sequence.


---

** [tickets:#56] IMM: IMMsv cannot be gracefully stopped**

**Status:** unassigned
**Milestone:** future
**Created:** Wed May 08, 2013 08:27 AM UTC by Anders Bjornerstedt
**Last Updated:** Wed May 08, 2013 08:27 AM UTC
**Owner:** nobody


Migrated from:
http://devel.opensaf.org/ticket/2099

Sep 23 10:29:28 Vostro opensafd: Stopping OpenSAF Services
Sep 23 10:29:28 Vostro osafamfnd[15127]: Terminating all AMF components
Sep 23 10:29:28 Vostro osafimmd[15064]: IMMND coordinator at 2010f apparently 
crashed => electing new coord
Sep 23 10:29:28 Vostro osafimmd[15064]: Failed to find candidate for new IMMND 
coordinator
Sep 23 10:29:28 Vostro osafimmd[15064]: Active IMMD has to restart the IMMSv. 
All IMMNDs will restart
Sep 23 10:29:28 Vostro osafimmd[15064]: IMM RELOAD with NO persistent back end 
=> ensure cluster restart by IMMD exit at both SCs.
Sep 23 10:29:28 Vostro osafamfnd[15127]: All AMF components terminated 
successfully, exiting

---

Can I assume that the immd gets the quiescing callback from the AMF
before teh AMF terminates the IMMNDs ?

 http://list.opensaf.org/pipermail/devel/2011-September/017881.html
-
This ticket needs a clearer definition of what needs to be done.
Or we should close it.
-
Reclassifying this one as enhancement before migrating to SourceForge



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1747 IMMND trying to start PBE process while stopping OpenSAF services

2016-11-07 Thread Neelakanta Reddy
- **status**: unassigned --> duplicate
- **Comment**:

shutting down swquence of IMM has to be gracefull. 
The problem reported in the ticket will be  fixed as part of enhancement #56.




---

** [tickets:#1747] IMMND trying to start PBE process while stopping OpenSAF 
services**

**Status:** duplicate
**Milestone:** 5.0.2
**Created:** Mon Apr 11, 2016 10:30 AM UTC by Chani Srivastava
**Last Updated:** Tue Sep 20, 2016 05:46 PM UTC
**Owner:** nobody


Setup:
Changeset- 7436
Version - opensaf 5.0
1-PBE enabled
Issue is not observed always.

Apr 11 13:32:52 OSAF-SC1 opensafd: Stopping OpenSAF Services
Apr 11 13:32:52 OSAF-SC1 osafamfnd[29960]: NO Shutdown initiated
Apr 11 13:32:52 OSAF-SC1 osafamfnd[29960]: NO Terminating all AMF components
Apr 11 13:32:52 OSAF-SC1 osafimmpbed: NO IMM PBE received SIG_TERM, closing db 
handle
Apr 11 13:32:52 OSAF-SC1 osafimmpbed: IN IMM PBE process EXITING...
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. 
Marking it as doomed 18 <545, 2010f> (OpenSafImmPBE)
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer disconnected 18 <545, 
2010f> (OpenSafImmPBE)
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: WA Persistent back-end process has 
apparently died.
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO STARTING PBE process.
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO 
pbe-db-file-path:/home/chani/immPBE/imm.db VETERAN:1 B:0
Apr 11 13:32:53 OSAF-SC1 osafckptnd[30049]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafsmfd[29976]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osaflckd[30057]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. 
Marking it as doomed 2412 <321, 2010f> (safLckService)
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer disconnected 2412 
<321, 2010f> (safLckService)
Apr 11 13:32:53 OSAF-SC1 osaflcknd[30032]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafclmna[29860]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafimmd[29888]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osaffmd[29878]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafrded[29869]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafevtd[30088]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. 
Marking it as doomed 2413 <315, 2010f> (safEvtService)
Apr 11 13:32:53 OSAF-SC1 osafckptd[30097]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. 
Marking it as doomed 2411 <330, 2010f> (safCheckPointService)
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafmsgd[30011]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafmsgnd[29995]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafsmfnd[29978]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osaflogd[29914]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafntfimcnd[5780]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Apr 11 13:32:53 OSAF-SC1 osafclmd[29940]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[0] == 
'/usr/lib64/opensaf/osafimmpbed'
Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[1] == '--recover'
Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[2] == '--pbe'
Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[3] == '/home/chani/immPBE/imm.db'
Apr 11 13:32:53 OSAF-SC1 osafimmpbed: ER osafimmpbe is not started by osafimmnd



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2173 base: Add support classes for pthread mutexes and condition variables

2016-11-07 Thread Anders Widell



---

** [tickets:#2173] base: Add support classes for pthread mutexes and condition 
variables**

**Status:** accepted
**Milestone:** 5.2.FC
**Created:** Mon Nov 07, 2016 11:59 AM UTC by Anders Widell
**Last Updated:** Mon Nov 07, 2016 11:59 AM UTC
**Owner:** Anders Widell


The header files  and  are unapproved by the Google 
C++ style guide. We can implement our own versions of these. An added benefit 
is that select our own options when creating the mutexes, and we could use 
error checking mutexes in OpenSAF "debug builds" - i.e. when a special 
preprocessor macro is defined.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1526 imm: 1PBE can see db as locked

2016-11-07 Thread Neelakanta Reddy
- **status**: assigned --> not-reproducible
- **Comment**:

since, the problem is not reproducible closing the defect.



---

** [tickets:#1526] imm: 1PBE can see db as locked**

**Status:** not-reproducible
**Milestone:** 5.0.2
**Created:** Wed Oct 07, 2015 09:43 AM UTC by Neelakanta Reddy
**Last Updated:** Tue Sep 20, 2016 06:04 PM UTC
**Owner:** Neelakanta Reddy


when the disk is full the sqlite will return error.

Sep 18 13:42:02 SC-2 osafimmpbed: ER SQL statement ('COMMIT TRANSACTION') 
failed because:  disk I/O error
Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Invalid error reported implementer 
'OpenSafImmPBE', Ccb 321 will be aborted
Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Ccb 321 ABORTED (TraceC)
Sep 18 13:42:02 SC-2 osafimmpbed: WA Failed to find CCB object for 141/321


Due to continoues CCB operations (even though disk is full) the 1PBE is seeing 
the following mesages for more than 3 hours:

messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread.



messages.7:Sep 18 14:22:22 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:24 SC-2 osafimmpbed: WA Sqlite db locked by other thread

After freeing the space still the PBE is got struck in Sqlite db locked by 
other thread.
This is preventing any further operations. 
once the PBE is killed, the imm.db re-generated and the CCB operations are 
applied.

Solution(1PBE):

For the 1PBE case, which is not multi threaded, if the sqlite db locked case is 
reached abort the PBE and let the PBE be re-generated(instead of blocking the 
PBE process).

 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #171 saAmfComponentUnregister should be unexposed to handle obtained from B 4 1 version

2016-11-07 Thread Nagendra Kumar
- **status**: unassigned --> accepted
- **assigned_to**: Nagendra Kumar
- **Part**: - --> lib
- **Milestone**: future --> 5.2.FC



---

** [tickets:#171] saAmfComponentUnregister should be unexposed to handle 
obtained from B 4 1 version**

**Status:** accepted
**Milestone:** 5.2.FC
**Created:** Tue May 14, 2013 06:01 AM UTC by Nagendra Kumar
**Last Updated:** Thu Aug 06, 2015 11:22 AM UTC
**Owner:** Nagendra Kumar


Migrated from http://devel.opensaf.org/ticket/2019

saAmfComponentUnregister api should return SA_AIS_ERR_VERSION when called with 
handle obtained from B 4 1 version.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-11-07 Thread Srikanth R
There are two scenarios where "opensafd stop" is invoked on any opensaf 
controller.

SCENARIO-1) Where /etc/init.d/opensafd script is invoked manually on command 
prompt when the system is running and up.
SCENARIO-2) Software on a controller ( other than opensafd) invoked "reboot"  
for which opensafd stop is invoked in run level 3 or higher.

 With the patch submitted for #2160,
 
 a)node shall go for reboot in scenario-1, if administrator doesn't invoke clm 
admin operation. This is fine.
 
b) For scenario-2, all run level services shall not be stopped gracefully as 
the node shall be rebooted abruptly after opensafd stop as admin did not invoke 
clm admin operation. So, opensafd as a HA software shall not support graceful 
reboot on standby controller with the #2160 fix ?


---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Wed Nov 02, 2016 11:40 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because