date:20170315

[tickets] [opensaf:tickets] #2377 AMF: SG in unstable state after couple of admin operations during headless scenario

2017-03-15 Thread Nagendra Kumar

As per safLog, the issue occured at  Mar 14:
 11139 18:08:34 03/14/2017 NO safApp=safAmfService "Admin op invocation: 
5471788335253, err: 'SG not in STABLE state 
(safSg=TestApp_SG1,safApp=TestApp_TwoN)'"

Amfd trace is not available during this time. Amfd trace starts from Mar 15:
Mar 15  7:03:12.095487 osafamfd [3250:src/amf/amfd/main.cc:0502] >> initialize 

Please upload Amfd traces on/before Mar 14 18:08.


---

** [tickets:#2377] AMF: SG in unstable state after couple of admin operations 
during headless scenario**

**Status:** assigned
**Milestone:** 5.2.RC2
**Created:** Wed Mar 15, 2017 04:54 AM UTC by Srikanth R
**Last Updated:** Thu Mar 16, 2017 05:28 AM UTC
**Owner:** Nagendra Kumar
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2377/attachment/logs.tgz) 
(7.6 MB; application/x-compressed)


Changeset : 8634 5.2.FC
Setup : 2 controllers with 3 payloads ( Headless feature enabled)
AMF application : 2n application 2 SUs 4SIs ( si-si deps disabled)

Steps performed :

-> Initially brought up 5 nodes.

-> Deployed the attached configuration.

-> Performed admin operations on SG couped with 2 headless operations.

-> Later performed shutdown operation of SG, which resulted in unstable state.

Attached logs :

-> syslog,amfd and amfnd traces of both controllers and PL-3.

-> AMF application


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2377 AMF: SG in unstable state after couple of admin operations during headless scenario

2017-03-15 Thread Nagendra Kumar

- **status**: unassigned --> assigned
- **assigned_to**: Nagendra Kumar



---

** [tickets:#2377] AMF: SG in unstable state after couple of admin operations 
during headless scenario**

**Status:** assigned
**Milestone:** 5.2.RC2
**Created:** Wed Mar 15, 2017 04:54 AM UTC by Srikanth R
**Last Updated:** Wed Mar 15, 2017 04:54 AM UTC
**Owner:** Nagendra Kumar
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2377/attachment/logs.tgz) 
(7.6 MB; application/x-compressed)


Changeset : 8634 5.2.FC
Setup : 2 controllers with 3 payloads ( Headless feature enabled)
AMF application : 2n application 2 SUs 4SIs ( si-si deps disabled)

Steps performed :

-> Initially brought up 5 nodes.

-> Deployed the attached configuration.

-> Performed admin operations on SG couped with 2 headless operations.

-> Later performed shutdown operation of SG, which resulted in unstable state.

Attached logs :

-> syslog,amfd and amfnd traces of both controllers and PL-3.

-> AMF application


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2278 mds: Blocking send causes AMF health check time-out

2017-03-15 Thread A V Mahesh (AVM)

- **Comment**:

Follow up by Hans Nordebäck :
/=/=/=

On 3/15/2017 1:55 PM, A V Mahesh wrote:
> Hi Hans N,
>
> Try add some debugging logs in LEAP as well and try to reproduce.
>
> My guess is the specific system my running out of FD,
> or the system libraries are not comestible to our LEAP system apis calls.
>
> -AVM
>
> On 3/15/2017 1:48 PM, Hans Nordebäck wrote:
>> Hi Mahesh,
>>
>> ok, I understand the logic now, I'll check your patch but it only checks 
>> NCS_TMR_START? Perhaps we should trace
>> the complete flow to conclude why the descriptor is not signaled, e.g. 
>> mds_tmr_mailbox_processing and so on?
>> /Thanks HansN
>>
>> -Original Message-
>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>> Sent: den 15 mars 2017 08:04
>> To: Hans Nordebäck 
>> Cc: Anders Widell ; Nagendra Kumar 
>> 
>> Subject: Re: MDS question #2278
>>
>> Hi Hans N
>>
>>   >> mbcsv_mds_send_msg is 0 and therfore set to -1 when calling poll, 
>> normally it should be 10 seconds or 1 second.
>>
>> No .
>>
>> It is right,  mbcsv_mds_send_msg() --> mds_mcm_time_wait() --> 
>> osaf_poll_one_fd () --> osaf_poll --> osaf_poll_no_timeout( -1 ), if MDS 
>> didn't find the subscription to that peer, MDS manually adds a subscription 
>> entry for the same and starts a discovery_tmr (with time out of 5 sec and 
>> creates sel_obj the same sel_obj added to await_disc_queue list of that 
>> subscription , the and the same sel_obj used for mds_mcm_time_wait() --> 
>> osaf_poll_one_fd () --> osaf_poll --> osaf_poll_no_timeout() If the 
>> subscription doesn't arrive with in 500 * 10 ms, then the
>> subscription_tmr_expiry() function do SEL_OBJ_IND on disc_queue->sel_obj, so 
>> that osaf_poll_no_timeout() will come out of poll.
>> So the only possible theoretical code bug in MDS is
>> mds_mcm_subtn_add()-->mds_subtn_tbl_add()-->m_NCS_TMR_START(subtn_info->discovery_tmr)
>> .
>>
>> Apply my debug patch and see  m_NCS_TMR_START errors.
>>
>> -AVM
>>
>> ||
>>
>>
>> On 3/15/2017 12:04 PM, Hans Nordebäck wrote:
>>> Hi Mahesh,
>>>
>>> Thanks, I'll read the private mail, but with the debug output below,
>>> it seems quite clear why the 'amfd heart beat timeout' happens, the timeout 
>>> value to mbcsv_mds_send_msg is 0 and therfore set to -1 when calling poll, 
>>> normally it should be 10 seconds or 1 second.
>>>
>>> We managed to get a call chain at the time of the fault:
>>>
>>> vas-1:/root # /gstack 2578
>>> Thread 4 (Thread 0x7f5173128b00 (LWP 2584)):
>>> 0  0x7f5170edbbfd in poll () from /lib64/libc.so.6
>>> 1  0x7f5172d45261 in osaf_ppoll () from
>>> /usr/lib64/libopensaf_core.so.0
>>> 2  0x7f5172d453fb in osaf_poll () from
>>> /usr/lib64/libopensaf_core.so.0
>>> 3  0x7f5172d45445 in osaf_poll_one_fd () from
>>> /usr/lib64/libopensaf_core.so.0
>>> 4  0x7f5172d760d7 in rda_read_msg(int, char*, int) [clone
>>> .constprop.2] () from /usr/lib64/libopensaf_core.so.0
>>> 5  0x7f5172d76364 in rda_callback_task(RDA_CALLBACK_CB*) () from
>>> /usr/lib64/libopensaf_core.so.0
>>> 6  0x7f51720600a4 in start_thread () from /lib64/libpthread.so.0
>>> 7  0x7f5170ee402d in clone () from /lib64/libc.so.6 Thread 3
>>> (Thread 0x7f5173148b00 (LWP 2583)):
>>> 0  0x7f5170edbbfd in poll () from /lib64/libc.so.6
>>> 1  0x7f5172d6d920 in mdtm_process_recv_events_tcp () from
>>> /usr/lib64/libopensaf_core.so.0
>>> 2  0x7f51720600a4 in start_thread () from /lib64/libpthread.so.0
>>> 3  0x7f5170ee402d in clone () from /lib64/libc.so.6 Thread 2
>>> (Thread 0x7f5173188b00 (LWP 2581)):
>>> 0  0x7f5170edbbfd in poll () from /lib64/libc.so.6
>>> 1  0x7f5172d45261 in osaf_ppoll () from
>>> /usr/lib64/libopensaf_core.so.0
>>> 2  0x7f5172d4c3df in ncs_tmr_wait () from
>>> /usr/lib64/libopensaf_core.so.0
>>> 3  0x7f51720600a4 in start_thread () from /lib64/libpthread.so.0
>>> 4  0x7f5170ee402d in clone () from /lib64/libc.so.6 Thread 1
>>> (Thread 0x7f51731ab740 (LWP 2578)):
>>> 0  0x7f5170edbbfd in poll () from /lib64/libc.so.6
>>> 1  0x7f5172d45261 in osaf_ppoll () from
>>> /usr/lib64/libopensaf_core.so.0
>>> 2  0x7f5172d453fb in osaf_poll () from
>>> /usr/lib64/libopensaf_core.so.0
>>> 3  0x7f5172d45445 in osaf_poll_one_fd () from
>>> /usr/lib64/libopensaf_core.so.0
>>> 4  0x7f5172d62457 in mds_mcm_time_wait () from
>>> /usr/lib64/libopensaf_core.so.0
>>> 5  0x7f5172d626bc in mds_subtn_tbl_add_disc_queue.isra () from
>>> /usr/lib64/libopensaf_core.so.0
>>> 6  0x7f5172d628c3 in mds_mcm_process_disc_queue_checks_redundant
>>> () from /usr/lib64/libopensaf_core.so.0
>>>7  0x7f5172d632e5 in mcm_pvt_red_snd_process_common () from
>>> /usr/lib64/libopensaf_core.so.0
>>> 8  0x7f5172d663ad in mds_send () from
>>> /usr/lib64/libopensaf_core.so.0
>>> 9  0x7f5172d708b0 in ncsmds_api () from
>>> /usr/lib64/libopensaf_core.so.0
>>>10

[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"

2017-03-15 Thread A V Mahesh (AVM)

In normal conditions we are not able to reproduce the problem by doing 
`/etc/init.d/opensafd restart `
so can please provide following information , to reproduce the problem:

1) Can you please share or elaborate  what  "./opensaf nodestop"  "./opensaf 
nodestart"
   scripts do aprt of ` /etc/init.d/opensafd stop`  &   `/etc/init.d/opensafd 
restart 

2) is their any other NON Opensaf application using MDS/TCP  libariry  ?
   if so are they stoped cleanly before ` /etc/init.d/opensafd stop`  





---

** [tickets:#2030] dtm: "Node already exit in the cluster with smiler 
configuration"**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell
**Last Updated:** Mon Sep 26, 2016 02:26 PM UTC
**Owner:** A V Mahesh (AVM)


osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:

~~~
Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
~~~

Here are the steps to reproduce this problem in UML:

./opensaf start
(wait until the cluster comes up)
./opensaf nodestop 2
(wait a few seconds)
./opensaf nodestart 2
./opensaf nodestart 2

The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.

It seems that osafdtmd asserts and dies when this happens. Here is the result 
from a second run of the above test:

~~~
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: 
dtm_process_node_info: Assertion '0' failed.
Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmna[468]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmd[458]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafntfd[448]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaflogd[437]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmnd[426]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmd[415]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaffmd[405]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafrded[392]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'SC-1'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-4'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-5'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-3'
Sep 13 14:25:59 SC-2 user.notice osafdtmd: osafdtmd Process down, Rebooting the 
node
Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; 
timeout=60

~~~

Update: it seems I forgot to do "./opensaf nodestop" between the two "./opensaf 
nodestart" above. Thus, there are probably two SC-2 nodes at the same time, and 
the error message "Node already exit in the cluster with smiler configuration" 
should be interpreted as "duplicate node detected in the network". Reducing the 
priority of this defect to "minor". Still two problems ought to be fixed: the 
error message should be changed so that it is clear what it means, and osafdtmd 
should not assert (it could call opensaf_reboot() if a there is a configuration 
problem, but asserting idicates a software problem).




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net

[tickets] [opensaf:tickets] #2376 AMFD: IMM Jobs are not executed in ordered cluster shutdown

2017-03-15 Thread Minh Hon Chau

- **status**: unassigned --> accepted
- **assigned_to**: Minh Hon Chau



---

** [tickets:#2376] AMFD: IMM Jobs are not executed in ordered cluster shutdown 
**

**Status:** accepted
**Milestone:** 5.2.RC2
**Created:** Wed Mar 15, 2017 03:36 AM UTC by Minh Hon Chau
**Last Updated:** Wed Mar 15, 2017 03:52 AM UTC
**Owner:** Minh Hon Chau


In scenario of ordered cluster shutdown ("/opensafd stop"), opensafd in active 
controller currently orders AMFND to terminate AMF components which includes 
IMMND before opensafd stops AMFD. if at this time AMFD still has jobs to update 
IMM, IMMND termination will cause the AMFD's jobs not be executed. Sometimes, 
AMFD alos tries to reinitialize with IMMND during shutdown, which is not 
neccessary because IMM service is not available.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2375 build: Development libraries contain dependencies to internal libraries

2017-03-15 Thread Anders Widell

- **status**: accepted --> review



---

** [tickets:#2375] build: Development libraries contain dependencies to 
internal libraries**

**Status:** review
**Milestone:** 5.2.RC2
**Created:** Tue Mar 14, 2017 03:25 PM UTC by Anders Widell
**Last Updated:** Tue Mar 14, 2017 03:25 PM UTC
**Owner:** Anders Widell


The AIS development libraries (libSaXXX.so) contain dependencies to internal 
OpenSAF libraries. If an application links with the AIS libraries, the 
application may also get a dependency towards these internal OpenSAF libraries 
(see ticket [#2298]). This can have the effect that the application will not 
run with other (older and/or newer) versions of OpenSAF.

The solution is to create special development libraries without these 
dependencies.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2149 log: Create a C++ wrapper for handling IMM api

2017-03-15 Thread Vu Minh Nguyen

- **status**: unassigned --> assigned
- **assigned_to**: Vu Minh Nguyen



---

** [tickets:#2149] log: Create a C++ wrapper for handling IMM api**

**Status:** assigned
**Milestone:** future
**Created:** Fri Oct 28, 2016 12:25 PM UTC by elunlen
**Last Updated:** Fri Oct 28, 2016 12:25 PM UTC
**Owner:** Vu Minh Nguyen


Create a wrapper for IMM API handling and replace all usage of current immutil.c
This wrapper must not contain anything that is directly related to the log 
service. The purpose is to simplify handling of IMM APIs e.g. simplify the 
complicated void pointer handling used with the IMM C APIs.
This wrapper can be limited to implement only what’s needed for the log service 
but the design shall make it possible to add more functionality. The goal is to 
create a generic C++ immutil to be used with all services and this log service 
immutil could be the start of that



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2380 log: change LOG_ER incase rename file failed

2017-03-15 Thread Canh Truong

- Description has changed:

Diff:



--- old
+++ new
@@ -1,5 +1,5 @@
 LOG_ER("Could not rename log file: %s", strerror(errno));
 
-After main thread send request to file handle thread to rename log file, the 
main thread may get some failed result. Maybe it's not error case. e.g. file is 
not exist incase deleting manually, or take long time to rename, 
+After main thread send request to file handle thread to rename log file, the 
main thread may get some failed result. Maybe it's not error case. e.g. file is 
not exist incase deleting manually, 
 
 Convert ER to WA.



- **status**: assigned --> accepted



---

** [tickets:#2380] log: change LOG_ER incase rename file failed**

**Status:** accepted
**Milestone:** 5.2.RC2
**Created:** Wed Mar 15, 2017 01:44 PM UTC by Canh Truong
**Last Updated:** Wed Mar 15, 2017 01:44 PM UTC
**Owner:** Canh Truong


LOG_ER("Could not rename log file: %s", strerror(errno));

After main thread send request to file handle thread to rename log file, the 
main thread may get some failed result. Maybe it's not error case. e.g. file is 
not exist incase deleting manually, 

Convert ER to WA.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2380 log: change LOG_ER incase rename file failed

2017-03-15 Thread Canh Truong




---

** [tickets:#2380] log: change LOG_ER incase rename file failed**

**Status:** assigned
**Milestone:** 5.2.RC2
**Created:** Wed Mar 15, 2017 01:44 PM UTC by Canh Truong
**Last Updated:** Wed Mar 15, 2017 01:44 PM UTC
**Owner:** Canh Truong


LOG_ER("Could not rename log file: %s", strerror(errno));

After main thread send request to file handle thread to rename log file, the 
main thread may get some failed result. Maybe it's not error case. e.g. file is 
not exist incase deleting manually, or take long time to rename, 

Convert ER to WA.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2379 smf:PR documentation for 5.2 release

2017-03-15 Thread Neelakanta Reddy




---

** [tickets:#2379] smf:PR documentation for 5.2 release**

**Status:** accepted
**Milestone:** 5.2.RC2
**Created:** Wed Mar 15, 2017 10:03 AM UTC by Neelakanta Reddy
**Last Updated:** Wed Mar 15, 2017 10:03 AM UTC
**Owner:** Neelakanta Reddy


update the PR document wit 5.2 Enancements

smf: add support for asynchronous detection of failed AMF entities [#2145]


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2340 immnd : restarts if larg db

2017-03-15 Thread Neelakanta Reddy

- **status**: assigned --> wontfix
- **Comment**:

The Problem reported in the ticket may be observed when the disk is full. IMM 
is tested for 300k objects.
When the test is performed in my setup, the IMMND is not restarted for 500k 
(when immomtest 3 10 is performed.) But the IMMND is restarted for the same 
test, when the traces are enabled and the disk is full .

please share the bt .



---

** [tickets:#2340] immnd : restarts if larg db**

**Status:** wontfix
**Milestone:** 5.2.RC2
**Created:** Fri Mar 03, 2017 06:18 AM UTC by A V Mahesh (AVM)
**Last Updated:** Tue Mar 14, 2017 06:36 AM UTC
**Owner:** Neelakanta Reddy


1) Configure opensaf with  --enable-ntf-imcn

/# ./bootstrap.sh ;./configure --enable-imm-pbe --enable-tests --enable-tipc 
--enable-ntf-imcn; make rpm 

2) Have a larg XML database of objects ( 70 k objects)

3) Run 

/# /usr/bin/immomtest 3 10 

error: in src/imm/apitest/management/test_saImmOmSearchInitialize_2.c at 171: 
SA_AIS_ERR_TIMEOUT (5), expected SA_AIS_OK (1) - exiting


4) Immnd restarts 

=

Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO ERR_INVALID_PARAM: Problem with new 
class 'saImmOmClassCreate_2_10'
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO ERR_INVALID_PARAM: Attribute 
'SaImmAttrImplementerName' is neither SA_IMM_ATTR_CONFIG nor SA_IMM_ATTR_RUNTIME
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO ERR_INVALID_PARAM: Attribute 
'SaImmAttrAdminOwnerName' is neither SA_IMM_ATTR_CONFIG nor SA_IMM_ATTR_RUNTIME
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO ERR_INVALID_PARAM: Attribute 
'SaImmAttrClassName' is neither SA_IMM_ATTR_CONFIG nor SA_IMM_ATTR_RUNTIME
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO ERR_INVALID_PARAM: Problem with new 
class 'saImmOmClassCreate_2_11'
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO opensafImmNostdFlags changed to: 0x1f6
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO opensafImmNostdFlags changed to: 0x1f7
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO Class 
'saImmOmClassCreate_SchemaChange_2_17' exist - check implied schema upgrade
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO Allowed upgrade, attribute 
saImmOmClassCreate_SchemaChange_2_17:attr adds flag SA_IMM_ATTR_STRONG_DEFAULT
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO Schema change for class 
saImmOmClassCreate_SchemaChange_2_17 ACCEPTED. Adding 0 and changing 1 
attribute defs
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO No instances to migrate - schema 
change could have been avoided
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO Schema change completed for class 
saImmOmClassCreate_SchemaChange_2_17
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO opensafImmNostdFlags changed to: 0x1f6
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO opensafImmNostdFlags changed to: 0x1f7
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO Class 
'saImmOmClassCreate_SchemaChange_2_18' exist - check implied schema upgrade
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO Allowed upgrade, attribute 
saImmOmClassCreate_SchemaChange_2_18:attr removes flag 
SA_IMM_ATTR_STRONG_DEFAULT
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO Schema change for class 
saImmOmClassCreate_SchemaChange_2_18 ACCEPTED. Adding 0 and changing 1 
attribute defs
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO No instances to migrate - schema 
change could have been avoided
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO Schema change completed for class 
saImmOmClassCreate_SchemaChange_2_18
Mar  3 11:40:27 SC-1 osafimmnd[32348]: NO opensafImmNostdFlags changed to: 0x1f6
Mar  3 11:40:57 SC-1 osafimmd[32335]: NO MDS event from svc_id 25 (change:4, 
dest:564114323931152)
Mar  3 11:40:57 SC-1 osafsmfd[32430]: WA DispatchOiCallback: saImmOiDispatch() 
Fail 'SA_AIS_ERR_BAD_HANDLE (9)'
Mar  3 11:40:57 SC-1 osafntfimcnd[32381]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Mar  3 11:40:58 SC-1 osafamfnd[32412]: NO 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' component restart probation timer 
started (timeout: 600 ns)
Mar  3 11:40:58 SC-1 osafamfnd[32412]: NO Restarting a component of 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Mar  3 11:40:58 SC-1 osafamfnd[32412]: NO 
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
Mar  3 11:40:58 SC-1 osafimmd[32335]: WA IMMND coordinator at 2010f apparently 
crashed => electing new coord
Mar  3 11:40:58 SC-1 osafimmd[32335]: NO New coord elected, resides at 2020f




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org!

[tickets] [opensaf:tickets] #2372 amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.

2017-03-15 Thread Srikanth R

>From the starting of CLM implementation, the service doesn't support admin 
>operations on more than one node simultaneously. There was a discussion ( or 
>ticket) on the earlier trac ticket system that CLM doesn't support operation 
>on two entities simultaneously. 


Below is the simple scenario to reproduce.

-> Bring up CLM agent, and subscribe to the track callback. Do not respond to 
the START callback.

-> Now perform CLM lock operation on the two payloads in two different 
terminals.

-> In the CLM application, Respond to the callbacks only after invoking both 
admin operations.

-> Both admin operations shall result in SA_AIS_ERR_REPAIR_PENDING return code. 
It seems that CLM doesn't store the invocation id for the initial admin op from 
the below output in syslog.

Mar 15 11:54:20 SLES-1 osafamfd[3276]: NO Pending Response sent for CLM track 
callback::OK '7'


---

** [tickets:#2372] amf/clm: CLM lock of two more nodes returns REPAIR_PENDING 
for first node.**

**Status:** accepted
**Milestone:** 5.0.2
**Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen
**Last Updated:** Tue Mar 14, 2017 09:29 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) 
(3.4 MB; application/octet-stream)
- 
[osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) 
(860.9 kB; application/octet-stream)


Steps to reproduce:
1) Bring 4 nodes cluster up.
2) Deploy AMf demo on PL-3 and PL-4.
3) LOCK amfd nodes PL-3 and PL-4.
4) Make arranegements so that termination of amf_demo on PL-3 takes  more time 
compare to PL-4.
5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock 
of PL-4.

CLM and AMF traces are attached.  
Analysis:
When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on 
PL-3. When termination of amf_demo still going on AMF gets another track 
callback with rootcausetentity as PL-4. However callback contains information 
of PL-3 also. AMFD starts terminating  amf_demo on PL-4 but at the same time it 
responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 
change_started completed and sends completion callback for PL-4. In this 
callback, AMF clears internal flags which monitors the graceful removal of 
nodes. Since AMF never responded for PL-3 callback, callback timer expires in 
CLMD and it sends complete callback to AMF. AMF thinks this is the case of 
nodefailover and tries to failover PL-3.

Note: In all these stages, CLM sends track callback with information of all the 
nodes. AMF registers params are:
 
SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP.
  I am still evaluating whther issue is in CLM or AMF. Since AMF registers for 
**|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all 
subsequent callbacks?
 Also AMF should respond to callback when it has completed termination of comps.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2377 AMF: SG in unstable state after couple of admin operations during headless scenario

[tickets] [opensaf:tickets] #2377 AMF: SG in unstable state after couple of admin operations during headless scenario

[tickets] [opensaf:tickets] #2278 mds: Blocking send causes AMF health check time-out

[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"

[tickets] [opensaf:tickets] #2376 AMFD: IMM Jobs are not executed in ordered cluster shutdown

[tickets] [opensaf:tickets] #2375 build: Development libraries contain dependencies to internal libraries

[tickets] [opensaf:tickets] #2149 log: Create a C++ wrapper for handling IMM api

[tickets] [opensaf:tickets] #2380 log: change LOG_ER incase rename file failed

[tickets] [opensaf:tickets] #2380 log: change LOG_ER incase rename file failed

[tickets] [opensaf:tickets] #2379 smf:PR documentation for 5.2 release

[tickets] [opensaf:tickets] #2340 immnd : restarts if larg db

[tickets] [opensaf:tickets] #2372 amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.

12 matches

Site Navigation

Mail list logo

Footer information