[tickets] [opensaf:tickets] #2377 AMF: SG in unstable state after couple of admin operations during headless scenario
As per safLog, the issue occured at Mar 14: 11139 18:08:34 03/14/2017 NO safApp=safAmfService "Admin op invocation: 5471788335253, err: 'SG not in STABLE state (safSg=TestApp_SG1,safApp=TestApp_TwoN)'" Amfd trace is not available during this time. Amfd trace starts from Mar 15: Mar 15 7:03:12.095487 osafamfd [3250:src/amf/amfd/main.cc:0502] >> initialize Please upload Amfd traces on/before Mar 14 18:08. --- ** [tickets:#2377] AMF: SG in unstable state after couple of admin operations during headless scenario** **Status:** assigned **Milestone:** 5.2.RC2 **Created:** Wed Mar 15, 2017 04:54 AM UTC by Srikanth R **Last Updated:** Thu Mar 16, 2017 05:28 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/2377/attachment/logs.tgz) (7.6 MB; application/x-compressed) Changeset : 8634 5.2.FC Setup : 2 controllers with 3 payloads ( Headless feature enabled) AMF application : 2n application 2 SUs 4SIs ( si-si deps disabled) Steps performed : -> Initially brought up 5 nodes. -> Deployed the attached configuration. -> Performed admin operations on SG couped with 2 headless operations. -> Later performed shutdown operation of SG, which resulted in unstable state. Attached logs : -> syslog,amfd and amfnd traces of both controllers and PL-3. -> AMF application --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2377 AMF: SG in unstable state after couple of admin operations during headless scenario
- **status**: unassigned --> assigned - **assigned_to**: Nagendra Kumar --- ** [tickets:#2377] AMF: SG in unstable state after couple of admin operations during headless scenario** **Status:** assigned **Milestone:** 5.2.RC2 **Created:** Wed Mar 15, 2017 04:54 AM UTC by Srikanth R **Last Updated:** Wed Mar 15, 2017 04:54 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/2377/attachment/logs.tgz) (7.6 MB; application/x-compressed) Changeset : 8634 5.2.FC Setup : 2 controllers with 3 payloads ( Headless feature enabled) AMF application : 2n application 2 SUs 4SIs ( si-si deps disabled) Steps performed : -> Initially brought up 5 nodes. -> Deployed the attached configuration. -> Performed admin operations on SG couped with 2 headless operations. -> Later performed shutdown operation of SG, which resulted in unstable state. Attached logs : -> syslog,amfd and amfnd traces of both controllers and PL-3. -> AMF application --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2278 mds: Blocking send causes AMF health check time-out
- **Comment**: Follow up by Hans Nordebäck : /=/=/= On 3/15/2017 1:55 PM, A V Mahesh wrote: > Hi Hans N, > > Try add some debugging logs in LEAP as well and try to reproduce. > > My guess is the specific system my running out of FD, > or the system libraries are not comestible to our LEAP system apis calls. > > -AVM > > On 3/15/2017 1:48 PM, Hans Nordebäck wrote: >> Hi Mahesh, >> >> ok, I understand the logic now, I'll check your patch but it only checks >> NCS_TMR_START? Perhaps we should trace >> the complete flow to conclude why the descriptor is not signaled, e.g. >> mds_tmr_mailbox_processing and so on? >> /Thanks HansN >> >> -Original Message- >> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >> Sent: den 15 mars 2017 08:04 >> To: Hans Nordebäck>> Cc: Anders Widell ; Nagendra Kumar >> >> Subject: Re: MDS question #2278 >> >> Hi Hans N >> >> >> mbcsv_mds_send_msg is 0 and therfore set to -1 when calling poll, >> normally it should be 10 seconds or 1 second. >> >> No . >> >> It is right, mbcsv_mds_send_msg() --> mds_mcm_time_wait() --> >> osaf_poll_one_fd () --> osaf_poll --> osaf_poll_no_timeout( -1 ), if MDS >> didn't find the subscription to that peer, MDS manually adds a subscription >> entry for the same and starts a discovery_tmr (with time out of 5 sec and >> creates sel_obj the same sel_obj added to await_disc_queue list of that >> subscription , the and the same sel_obj used for mds_mcm_time_wait() --> >> osaf_poll_one_fd () --> osaf_poll --> osaf_poll_no_timeout() If the >> subscription doesn't arrive with in 500 * 10 ms, then the >> subscription_tmr_expiry() function do SEL_OBJ_IND on disc_queue->sel_obj, so >> that osaf_poll_no_timeout() will come out of poll. >> So the only possible theoretical code bug in MDS is >> mds_mcm_subtn_add()-->mds_subtn_tbl_add()-->m_NCS_TMR_START(subtn_info->discovery_tmr) >> . >> >> Apply my debug patch and see m_NCS_TMR_START errors. >> >> -AVM >> >> || >> >> >> On 3/15/2017 12:04 PM, Hans Nordebäck wrote: >>> Hi Mahesh, >>> >>> Thanks, I'll read the private mail, but with the debug output below, >>> it seems quite clear why the 'amfd heart beat timeout' happens, the timeout >>> value to mbcsv_mds_send_msg is 0 and therfore set to -1 when calling poll, >>> normally it should be 10 seconds or 1 second. >>> >>> We managed to get a call chain at the time of the fault: >>> >>> vas-1:/root # /gstack 2578 >>> Thread 4 (Thread 0x7f5173128b00 (LWP 2584)): >>> 0 0x7f5170edbbfd in poll () from /lib64/libc.so.6 >>> 1 0x7f5172d45261 in osaf_ppoll () from >>> /usr/lib64/libopensaf_core.so.0 >>> 2 0x7f5172d453fb in osaf_poll () from >>> /usr/lib64/libopensaf_core.so.0 >>> 3 0x7f5172d45445 in osaf_poll_one_fd () from >>> /usr/lib64/libopensaf_core.so.0 >>> 4 0x7f5172d760d7 in rda_read_msg(int, char*, int) [clone >>> .constprop.2] () from /usr/lib64/libopensaf_core.so.0 >>> 5 0x7f5172d76364 in rda_callback_task(RDA_CALLBACK_CB*) () from >>> /usr/lib64/libopensaf_core.so.0 >>> 6 0x7f51720600a4 in start_thread () from /lib64/libpthread.so.0 >>> 7 0x7f5170ee402d in clone () from /lib64/libc.so.6 Thread 3 >>> (Thread 0x7f5173148b00 (LWP 2583)): >>> 0 0x7f5170edbbfd in poll () from /lib64/libc.so.6 >>> 1 0x7f5172d6d920 in mdtm_process_recv_events_tcp () from >>> /usr/lib64/libopensaf_core.so.0 >>> 2 0x7f51720600a4 in start_thread () from /lib64/libpthread.so.0 >>> 3 0x7f5170ee402d in clone () from /lib64/libc.so.6 Thread 2 >>> (Thread 0x7f5173188b00 (LWP 2581)): >>> 0 0x7f5170edbbfd in poll () from /lib64/libc.so.6 >>> 1 0x7f5172d45261 in osaf_ppoll () from >>> /usr/lib64/libopensaf_core.so.0 >>> 2 0x7f5172d4c3df in ncs_tmr_wait () from >>> /usr/lib64/libopensaf_core.so.0 >>> 3 0x7f51720600a4 in start_thread () from /lib64/libpthread.so.0 >>> 4 0x7f5170ee402d in clone () from /lib64/libc.so.6 Thread 1 >>> (Thread 0x7f51731ab740 (LWP 2578)): >>> 0 0x7f5170edbbfd in poll () from /lib64/libc.so.6 >>> 1 0x7f5172d45261 in osaf_ppoll () from >>> /usr/lib64/libopensaf_core.so.0 >>> 2 0x7f5172d453fb in osaf_poll () from >>> /usr/lib64/libopensaf_core.so.0 >>> 3 0x7f5172d45445 in osaf_poll_one_fd () from >>> /usr/lib64/libopensaf_core.so.0 >>> 4 0x7f5172d62457 in mds_mcm_time_wait () from >>> /usr/lib64/libopensaf_core.so.0 >>> 5 0x7f5172d626bc in mds_subtn_tbl_add_disc_queue.isra () from >>> /usr/lib64/libopensaf_core.so.0 >>> 6 0x7f5172d628c3 in mds_mcm_process_disc_queue_checks_redundant >>> () from /usr/lib64/libopensaf_core.so.0 >>>7 0x7f5172d632e5 in mcm_pvt_red_snd_process_common () from >>> /usr/lib64/libopensaf_core.so.0 >>> 8 0x7f5172d663ad in mds_send () from >>> /usr/lib64/libopensaf_core.so.0 >>> 9 0x7f5172d708b0 in ncsmds_api () from >>> /usr/lib64/libopensaf_core.so.0 >>>10
[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"
In normal conditions we are not able to reproduce the problem by doing `/etc/init.d/opensafd restart ` so can please provide following information , to reproduce the problem: 1) Can you please share or elaborate what "./opensaf nodestop" "./opensaf nodestart" scripts do aprt of ` /etc/init.d/opensafd stop` & `/etc/init.d/opensafd restart 2) is their any other NON Opensaf application using MDS/TCP libariry ? if so are they stoped cleanly before ` /etc/init.d/opensafd stop` --- ** [tickets:#2030] dtm: "Node already exit in the cluster with smiler configuration"** **Status:** assigned **Milestone:** 5.0.2 **Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell **Last Updated:** Mon Sep 26, 2016 02:26 PM UTC **Owner:** A V Mahesh (AVM) osafdtm does not handle rapid consecutive node reboots properly. I got the following errors in syslog: ~~~ Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM: Node already exit in the cluster with smiler configuration , correct the other joining Node configuration Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 ~~~ Here are the steps to reproduce this problem in UML: ./opensaf start (wait until the cluster comes up) ./opensaf nodestop 2 (wait a few seconds) ./opensaf nodestart 2 ./opensaf nodestart 2 The last two commands should be execute quickly after each other, maybe with one second delay in between them. It seems that osafdtmd asserts and dies when this happens. Here is the result from a second run of the above test: ~~~ Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM: Node already exit in the cluster with smiler configuration , correct the other joining Node configuration Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: dtm_process_node_info: Assertion '0' failed. Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafclmna[468]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafclmd[458]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafntfd[448]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osaflogd[437]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafimmnd[426]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafimmd[415]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osaffmd[405]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafrded[392]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'SC-1' Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'PL-4' Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'PL-5' Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'PL-3' Sep 13 14:25:59 SC-2 user.notice osafdtmd: osafdtmd Process down, Rebooting the node Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; timeout=60 ~~~ Update: it seems I forgot to do "./opensaf nodestop" between the two "./opensaf nodestart" above. Thus, there are probably two SC-2 nodes at the same time, and the error message "Node already exit in the cluster with smiler configuration" should be interpreted as "duplicate node detected in the network". Reducing the priority of this defect to "minor". Still two problems ought to be fixed: the error message should be changed so that it is clear what it means, and osafdtmd should not assert (it could call opensaf_reboot() if a there is a configuration problem, but asserting idicates a software problem). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net
[tickets] [opensaf:tickets] #2376 AMFD: IMM Jobs are not executed in ordered cluster shutdown
- **status**: unassigned --> accepted - **assigned_to**: Minh Hon Chau --- ** [tickets:#2376] AMFD: IMM Jobs are not executed in ordered cluster shutdown ** **Status:** accepted **Milestone:** 5.2.RC2 **Created:** Wed Mar 15, 2017 03:36 AM UTC by Minh Hon Chau **Last Updated:** Wed Mar 15, 2017 03:52 AM UTC **Owner:** Minh Hon Chau In scenario of ordered cluster shutdown ("/opensafd stop"), opensafd in active controller currently orders AMFND to terminate AMF components which includes IMMND before opensafd stops AMFD. if at this time AMFD still has jobs to update IMM, IMMND termination will cause the AMFD's jobs not be executed. Sometimes, AMFD alos tries to reinitialize with IMMND during shutdown, which is not neccessary because IMM service is not available. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2375 build: Development libraries contain dependencies to internal libraries
- **status**: accepted --> review --- ** [tickets:#2375] build: Development libraries contain dependencies to internal libraries** **Status:** review **Milestone:** 5.2.RC2 **Created:** Tue Mar 14, 2017 03:25 PM UTC by Anders Widell **Last Updated:** Tue Mar 14, 2017 03:25 PM UTC **Owner:** Anders Widell The AIS development libraries (libSaXXX.so) contain dependencies to internal OpenSAF libraries. If an application links with the AIS libraries, the application may also get a dependency towards these internal OpenSAF libraries (see ticket [#2298]). This can have the effect that the application will not run with other (older and/or newer) versions of OpenSAF. The solution is to create special development libraries without these dependencies. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2149 log: Create a C++ wrapper for handling IMM api
- **status**: unassigned --> assigned - **assigned_to**: Vu Minh Nguyen --- ** [tickets:#2149] log: Create a C++ wrapper for handling IMM api** **Status:** assigned **Milestone:** future **Created:** Fri Oct 28, 2016 12:25 PM UTC by elunlen **Last Updated:** Fri Oct 28, 2016 12:25 PM UTC **Owner:** Vu Minh Nguyen Create a wrapper for IMM API handling and replace all usage of current immutil.c This wrapper must not contain anything that is directly related to the log service. The purpose is to simplify handling of IMM APIs e.g. simplify the complicated void pointer handling used with the IMM C APIs. This wrapper can be limited to implement only what’s needed for the log service but the design shall make it possible to add more functionality. The goal is to create a generic C++ immutil to be used with all services and this log service immutil could be the start of that --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2380 log: change LOG_ER incase rename file failed
- Description has changed: Diff: --- old +++ new @@ -1,5 +1,5 @@ LOG_ER("Could not rename log file: %s", strerror(errno)); -After main thread send request to file handle thread to rename log file, the main thread may get some failed result. Maybe it's not error case. e.g. file is not exist incase deleting manually, or take long time to rename, +After main thread send request to file handle thread to rename log file, the main thread may get some failed result. Maybe it's not error case. e.g. file is not exist incase deleting manually, Convert ER to WA. - **status**: assigned --> accepted --- ** [tickets:#2380] log: change LOG_ER incase rename file failed** **Status:** accepted **Milestone:** 5.2.RC2 **Created:** Wed Mar 15, 2017 01:44 PM UTC by Canh Truong **Last Updated:** Wed Mar 15, 2017 01:44 PM UTC **Owner:** Canh Truong LOG_ER("Could not rename log file: %s", strerror(errno)); After main thread send request to file handle thread to rename log file, the main thread may get some failed result. Maybe it's not error case. e.g. file is not exist incase deleting manually, Convert ER to WA. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2380 log: change LOG_ER incase rename file failed
--- ** [tickets:#2380] log: change LOG_ER incase rename file failed** **Status:** assigned **Milestone:** 5.2.RC2 **Created:** Wed Mar 15, 2017 01:44 PM UTC by Canh Truong **Last Updated:** Wed Mar 15, 2017 01:44 PM UTC **Owner:** Canh Truong LOG_ER("Could not rename log file: %s", strerror(errno)); After main thread send request to file handle thread to rename log file, the main thread may get some failed result. Maybe it's not error case. e.g. file is not exist incase deleting manually, or take long time to rename, Convert ER to WA. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2379 smf:PR documentation for 5.2 release
--- ** [tickets:#2379] smf:PR documentation for 5.2 release** **Status:** accepted **Milestone:** 5.2.RC2 **Created:** Wed Mar 15, 2017 10:03 AM UTC by Neelakanta Reddy **Last Updated:** Wed Mar 15, 2017 10:03 AM UTC **Owner:** Neelakanta Reddy update the PR document wit 5.2 Enancements smf: add support for asynchronous detection of failed AMF entities [#2145] --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2340 immnd : restarts if larg db
- **status**: assigned --> wontfix - **Comment**: The Problem reported in the ticket may be observed when the disk is full. IMM is tested for 300k objects. When the test is performed in my setup, the IMMND is not restarted for 500k (when immomtest 3 10 is performed.) But the IMMND is restarted for the same test, when the traces are enabled and the disk is full . please share the bt . --- ** [tickets:#2340] immnd : restarts if larg db** **Status:** wontfix **Milestone:** 5.2.RC2 **Created:** Fri Mar 03, 2017 06:18 AM UTC by A V Mahesh (AVM) **Last Updated:** Tue Mar 14, 2017 06:36 AM UTC **Owner:** Neelakanta Reddy 1) Configure opensaf with --enable-ntf-imcn /# ./bootstrap.sh ;./configure --enable-imm-pbe --enable-tests --enable-tipc --enable-ntf-imcn; make rpm 2) Have a larg XML database of objects ( 70 k objects) 3) Run /# /usr/bin/immomtest 3 10 error: in src/imm/apitest/management/test_saImmOmSearchInitialize_2.c at 171: SA_AIS_ERR_TIMEOUT (5), expected SA_AIS_OK (1) - exiting 4) Immnd restarts = Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO ERR_INVALID_PARAM: Problem with new class 'saImmOmClassCreate_2_10' Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO ERR_INVALID_PARAM: Attribute 'SaImmAttrImplementerName' is neither SA_IMM_ATTR_CONFIG nor SA_IMM_ATTR_RUNTIME Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO ERR_INVALID_PARAM: Attribute 'SaImmAttrAdminOwnerName' is neither SA_IMM_ATTR_CONFIG nor SA_IMM_ATTR_RUNTIME Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO ERR_INVALID_PARAM: Attribute 'SaImmAttrClassName' is neither SA_IMM_ATTR_CONFIG nor SA_IMM_ATTR_RUNTIME Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO ERR_INVALID_PARAM: Problem with new class 'saImmOmClassCreate_2_11' Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO opensafImmNostdFlags changed to: 0x1f6 Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO opensafImmNostdFlags changed to: 0x1f7 Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO Class 'saImmOmClassCreate_SchemaChange_2_17' exist - check implied schema upgrade Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO Allowed upgrade, attribute saImmOmClassCreate_SchemaChange_2_17:attr adds flag SA_IMM_ATTR_STRONG_DEFAULT Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO Schema change for class saImmOmClassCreate_SchemaChange_2_17 ACCEPTED. Adding 0 and changing 1 attribute defs Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO No instances to migrate - schema change could have been avoided Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO Schema change completed for class saImmOmClassCreate_SchemaChange_2_17 Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO opensafImmNostdFlags changed to: 0x1f6 Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO opensafImmNostdFlags changed to: 0x1f7 Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO Class 'saImmOmClassCreate_SchemaChange_2_18' exist - check implied schema upgrade Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO Allowed upgrade, attribute saImmOmClassCreate_SchemaChange_2_18:attr removes flag SA_IMM_ATTR_STRONG_DEFAULT Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO Schema change for class saImmOmClassCreate_SchemaChange_2_18 ACCEPTED. Adding 0 and changing 1 attribute defs Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO No instances to migrate - schema change could have been avoided Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO Schema change completed for class saImmOmClassCreate_SchemaChange_2_18 Mar 3 11:40:27 SC-1 osafimmnd[32348]: NO opensafImmNostdFlags changed to: 0x1f6 Mar 3 11:40:57 SC-1 osafimmd[32335]: NO MDS event from svc_id 25 (change:4, dest:564114323931152) Mar 3 11:40:57 SC-1 osafsmfd[32430]: WA DispatchOiCallback: saImmOiDispatch() Fail 'SA_AIS_ERR_BAD_HANDLE (9)' Mar 3 11:40:57 SC-1 osafntfimcnd[32381]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) Mar 3 11:40:58 SC-1 osafamfnd[32412]: NO 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' component restart probation timer started (timeout: 600 ns) Mar 3 11:40:58 SC-1 osafamfnd[32412]: NO Restarting a component of 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1) Mar 3 11:40:58 SC-1 osafamfnd[32412]: NO 'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' Mar 3 11:40:58 SC-1 osafimmd[32335]: WA IMMND coordinator at 2010f apparently crashed => electing new coord Mar 3 11:40:58 SC-1 osafimmd[32335]: NO New coord elected, resides at 2020f --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org!
[tickets] [opensaf:tickets] #2372 amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.
>From the starting of CLM implementation, the service doesn't support admin >operations on more than one node simultaneously. There was a discussion ( or >ticket) on the earlier trac ticket system that CLM doesn't support operation >on two entities simultaneously. Below is the simple scenario to reproduce. -> Bring up CLM agent, and subscribe to the track callback. Do not respond to the START callback. -> Now perform CLM lock operation on the two payloads in two different terminals. -> In the CLM application, Respond to the callbacks only after invoking both admin operations. -> Both admin operations shall result in SA_AIS_ERR_REPAIR_PENDING return code. It seems that CLM doesn't store the invocation id for the initial admin op from the below output in syslog. Mar 15 11:54:20 SLES-1 osafamfd[3276]: NO Pending Response sent for CLM track callback::OK '7' --- ** [tickets:#2372] amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.** **Status:** accepted **Milestone:** 5.0.2 **Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen **Last Updated:** Tue Mar 14, 2017 09:29 AM UTC **Owner:** Praveen **Attachments:** - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) (3.4 MB; application/octet-stream) - [osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) (860.9 kB; application/octet-stream) Steps to reproduce: 1) Bring 4 nodes cluster up. 2) Deploy AMf demo on PL-3 and PL-4. 3) LOCK amfd nodes PL-3 and PL-4. 4) Make arranegements so that termination of amf_demo on PL-3 takes more time compare to PL-4. 5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock of PL-4. CLM and AMF traces are attached. Analysis: When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on PL-3. When termination of amf_demo still going on AMF gets another track callback with rootcausetentity as PL-4. However callback contains information of PL-3 also. AMFD starts terminating amf_demo on PL-4 but at the same time it responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 change_started completed and sends completion callback for PL-4. In this callback, AMF clears internal flags which monitors the graceful removal of nodes. Since AMF never responded for PL-3 callback, callback timer expires in CLMD and it sends complete callback to AMF. AMF thinks this is the case of nodefailover and tries to failover PL-3. Note: In all these stages, CLM sends track callback with information of all the nodes. AMF registers params are: SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP. I am still evaluating whther issue is in CLM or AMF. Since AMF registers for **|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all subsequent callbacks? Also AMF should respond to callback when it has completed termination of comps. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets