[tickets] [opensaf:tickets] #3204 amf: support of node repair feature
Hi Thang, Thanks, got it. Please check the return types by Amf Specs, I think TRY_AGAIN is more suitable than BAD_OP: SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The client may retry later. This error generally should be returned when the requested action is valid but not currently possible, probably because another operation is acting upon the logical entity on which the administrative operation is invoked. Such an operation can be another administrative operation or an error recovery initiated by the Availability Management Framework. SA_AIS_ERR_BAD_OPERATION - The operation could not ensure that the presence states of the relevant service units and components are either instantiated or uninstantiated. Please let me know. Thanks Anand --- ** [tickets:#3204] amf: support of node repair feature** **Status:** review **Milestone:** 5.20.08 **Created:** Mon Jul 20, 2020 11:57 PM UTC by Anand Sundararaj **Last Updated:** Tue Jul 28, 2020 01:49 AM UTC **Owner:** Anand Sundararaj Support of Administrative operation "SA_AMF_ADMIN_REPAIRED" on Amf Node Amf Specs B.4.1 section: 9.4.10 SA_AMF_ADMIN_REPAIRED --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #3206 clm: memory leak in valgrind report
- **status**: accepted --> review --- ** [tickets:#3206] clm: memory leak in valgrind report** **Status:** review **Milestone:** 5.20.08 **Created:** Mon Jul 27, 2020 09:24 AM UTC by Huynh Minh Thien **Last Updated:** Mon Jul 27, 2020 09:24 AM UTC **Owner:** Huynh Minh Thien ~~~ ==257== 136 bytes in 1 blocks are definitely lost in loss record 76 of 147 ==257==at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==257==by 0x5AFA7B7: mbcsv_mds_dec (mbcsv_mds.c:751) ==257==by 0x5B0A575: mds_mcm_do_decode_full_or_flat.isra.0 (mds_c_sndrcv.c:5623) ==257==by 0x5B0C2C7: mds_mcm_process_recv_snd_msg_common.part.5 (mds_c_sndrcv.c:4915) ==257==by 0x5B0C78A: mcm_recv_red_bcast (mds_c_sndrcv.c:5144) ==257==by 0x5B0C78A: mds_mcm_ll_data_rcv (mds_c_sndrcv.c:4808) ==257==by 0x5B12060: mdtm_process_recv_message_common (mds_dt_common.c:572) ==257==by 0x5B12431: mdtm_process_recv_data (mds_dt_common.c:1122) ==257==by 0x5B1AA7E: mdtm_process_recv_events (mds_dt_tipc.c:1146) ==257==by 0x62F76DA: start_thread (pthread_create.c:463) ==257==by 0x663088E: clone (clone.S:95) ==257== ==257== 2,136 bytes in 3 blocks are definitely lost in loss record 137 of 147 ==257==at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==257==by 0x1247F3: clms_mds_svc_event(ncsmds_callback_info*) (clms_mds.cc:1220) ==257==by 0x5B0266D: mds_mcm_user_event_callback (mds_c_api.c:4667) ==257==by 0x5B04CA5: mds_mcm_svc_down (mds_c_api.c:3685) ==257==by 0x5B193E4: mdtm_process_discovery_events (mds_dt_tipc.c:1433) ==257==by 0x5B1A818: mdtm_process_recv_events (mds_dt_tipc.c:945) ==257==by 0x62F76DA: start_thread (pthread_create.c:463) ==257==by 0x663088E: clone (clone.S:95) ==257== ==257== 16,144 (112 direct, 16,032 indirect) bytes in 2 blocks are definitely lost in loss record 144 of 147 ==257==at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==257==by 0x5AF16DB: sysf_alloc_pkt (sysf_mem.c:429) ==257==by 0x5AE069F: ncs_enc_init_space_pp (hj_ubaid.c:144) ==257==by 0x5B10CEF: mdtm_fill_data (mds_dt_common.c:1459) ==257==by 0x5B12092: mdtm_process_recv_message_common (mds_dt_common.c:541) ==257==by 0x5B12431: mdtm_process_recv_data (mds_dt_common.c:1122) ==257==by 0x5B1AA7E: mdtm_process_recv_events (mds_dt_tipc.c:1146) ==257==by 0x62F76DA: start_thread (pthread_create.c:463) ==257==by 0x663088E: clone (clone.S:95) ==280== 2,848 bytes in 4 blocks are definitely lost in loss record 145 of 151 ==280==at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==280==by 0x123706: clms_mds_dec(ncsmds_callback_info*) (clms_mds.cc:864) ==280==by 0x5B0A575: mds_mcm_do_decode_full_or_flat.isra.0 (mds_c_sndrcv.c:5623) ==280==by 0x5B0C2C7: mds_mcm_process_recv_snd_msg_common.part.5 (mds_c_sndrcv.c:4915) ==280==by 0x5B0C78A: mcm_recv_red_bcast (mds_c_sndrcv.c:5144) ==280==by 0x5B0C78A: mds_mcm_ll_data_rcv (mds_c_sndrcv.c:4808) ==280==by 0x5B12060: mdtm_process_recv_message_common (mds_dt_common.c:572) ==280==by 0x5B12431: mdtm_process_recv_data (mds_dt_common.c:1122) ==280==by 0x5B1AA7E: mdtm_process_recv_events (mds_dt_tipc.c:1146) ==280==by 0x62F76DA: start_thread (pthread_create.c:463) ==280==by 0x663088E: clone (clone.S:95) ==280== ~~~ --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #3204 amf: support of node repair feature
You will get the following log immediately: osafamfd[3437]: NO 'safAmfNode=SC-2,safAmfCluster=myAmfCluster' ADMIN_REPAIRED: CLM node is not member --- ** [tickets:#3204] amf: support of node repair feature** **Status:** review **Milestone:** 5.20.08 **Created:** Mon Jul 20, 2020 11:57 PM UTC by Anand Sundararaj **Last Updated:** Mon Jul 27, 2020 03:15 PM UTC **Owner:** Anand Sundararaj Support of Administrative operation "SA_AMF_ADMIN_REPAIRED" on Amf Node Amf Specs B.4.1 section: 9.4.10 SA_AMF_ADMIN_REPAIRED --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #3204 amf: support of node repair feature
Hi Thang, Amf is returning TRY_AGAIN immedietely to the admin client i.e. here the admin client is amf-admin. You can check the logs in the syslog file to validate. The patch code is: case SA_AMF_ADMIN_REPAIRED: if (node->saAmfNodeOperState == SA_AMF_OPERATIONAL_ENABLED) { report_admin_op_error( immOiHandle, invocation, SA_AIS_ERR_NO_OP, nullptr, "Admin repair request for '%s', op state already enabled", node->name.c_str()); goto done; } if (node->node_info.member == false) { LOG_NO("'%s' ADMIN_REPAIRED: CLM node is not member", node->name.c_str()); avd_saImmOiAdminOperationResult(immOiHandle, invocation, SA_AIS_ERR_TRY_AGAIN); <= Here goto done; } The problem/implemenetation is in amf-admin/immadmin that when it gets TRY_AGAIN from Amf, then it keeps on trying again the same admin operation till it doesn't succeed or until amf-admin cli command doesn't timeout. If you can write a C programs and use Imm OM api for performing admin command, then you get TRY_AGAIN immedietely. Please confirm, Thang? --- ** [tickets:#3204] amf: support of node repair feature** **Status:** review **Milestone:** 5.20.08 **Created:** Mon Jul 20, 2020 11:57 PM UTC by Anand Sundararaj **Last Updated:** Mon Jul 27, 2020 09:35 AM UTC **Owner:** Anand Sundararaj Support of Administrative operation "SA_AMF_ADMIN_REPAIRED" on Amf Node Amf Specs B.4.1 section: 9.4.10 SA_AMF_ADMIN_REPAIRED --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #3204 amf: support of node repair feature
Hi Nagu, I think in this case, admin should return asap if CLM node is not member. B.R/Thang --- ** [tickets:#3204] amf: support of node repair feature** **Status:** review **Milestone:** 5.20.08 **Created:** Mon Jul 20, 2020 11:57 PM UTC by Anand Sundararaj **Last Updated:** Mon Jul 27, 2020 08:42 AM UTC **Owner:** Anand Sundararaj Support of Administrative operation "SA_AMF_ADMIN_REPAIRED" on Amf Node Amf Specs B.4.1 section: 9.4.10 SA_AMF_ADMIN_REPAIRED --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #3206 clm: memory leak in valgrind report
--- ** [tickets:#3206] clm: memory leak in valgrind report** **Status:** accepted **Milestone:** 5.20.08 **Created:** Mon Jul 27, 2020 09:24 AM UTC by Huynh Minh Thien **Last Updated:** Mon Jul 27, 2020 09:24 AM UTC **Owner:** Huynh Minh Thien ~~~ ==257== 136 bytes in 1 blocks are definitely lost in loss record 76 of 147 ==257==at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==257==by 0x5AFA7B7: mbcsv_mds_dec (mbcsv_mds.c:751) ==257==by 0x5B0A575: mds_mcm_do_decode_full_or_flat.isra.0 (mds_c_sndrcv.c:5623) ==257==by 0x5B0C2C7: mds_mcm_process_recv_snd_msg_common.part.5 (mds_c_sndrcv.c:4915) ==257==by 0x5B0C78A: mcm_recv_red_bcast (mds_c_sndrcv.c:5144) ==257==by 0x5B0C78A: mds_mcm_ll_data_rcv (mds_c_sndrcv.c:4808) ==257==by 0x5B12060: mdtm_process_recv_message_common (mds_dt_common.c:572) ==257==by 0x5B12431: mdtm_process_recv_data (mds_dt_common.c:1122) ==257==by 0x5B1AA7E: mdtm_process_recv_events (mds_dt_tipc.c:1146) ==257==by 0x62F76DA: start_thread (pthread_create.c:463) ==257==by 0x663088E: clone (clone.S:95) ==257== ==257== 2,136 bytes in 3 blocks are definitely lost in loss record 137 of 147 ==257==at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==257==by 0x1247F3: clms_mds_svc_event(ncsmds_callback_info*) (clms_mds.cc:1220) ==257==by 0x5B0266D: mds_mcm_user_event_callback (mds_c_api.c:4667) ==257==by 0x5B04CA5: mds_mcm_svc_down (mds_c_api.c:3685) ==257==by 0x5B193E4: mdtm_process_discovery_events (mds_dt_tipc.c:1433) ==257==by 0x5B1A818: mdtm_process_recv_events (mds_dt_tipc.c:945) ==257==by 0x62F76DA: start_thread (pthread_create.c:463) ==257==by 0x663088E: clone (clone.S:95) ==257== ==257== 16,144 (112 direct, 16,032 indirect) bytes in 2 blocks are definitely lost in loss record 144 of 147 ==257==at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==257==by 0x5AF16DB: sysf_alloc_pkt (sysf_mem.c:429) ==257==by 0x5AE069F: ncs_enc_init_space_pp (hj_ubaid.c:144) ==257==by 0x5B10CEF: mdtm_fill_data (mds_dt_common.c:1459) ==257==by 0x5B12092: mdtm_process_recv_message_common (mds_dt_common.c:541) ==257==by 0x5B12431: mdtm_process_recv_data (mds_dt_common.c:1122) ==257==by 0x5B1AA7E: mdtm_process_recv_events (mds_dt_tipc.c:1146) ==257==by 0x62F76DA: start_thread (pthread_create.c:463) ==257==by 0x663088E: clone (clone.S:95) ==280== 2,848 bytes in 4 blocks are definitely lost in loss record 145 of 151 ==280==at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==280==by 0x123706: clms_mds_dec(ncsmds_callback_info*) (clms_mds.cc:864) ==280==by 0x5B0A575: mds_mcm_do_decode_full_or_flat.isra.0 (mds_c_sndrcv.c:5623) ==280==by 0x5B0C2C7: mds_mcm_process_recv_snd_msg_common.part.5 (mds_c_sndrcv.c:4915) ==280==by 0x5B0C78A: mcm_recv_red_bcast (mds_c_sndrcv.c:5144) ==280==by 0x5B0C78A: mds_mcm_ll_data_rcv (mds_c_sndrcv.c:4808) ==280==by 0x5B12060: mdtm_process_recv_message_common (mds_dt_common.c:572) ==280==by 0x5B12431: mdtm_process_recv_data (mds_dt_common.c:1122) ==280==by 0x5B1AA7E: mdtm_process_recv_events (mds_dt_tipc.c:1146) ==280==by 0x62F76DA: start_thread (pthread_create.c:463) ==280==by 0x663088E: clone (clone.S:95) ==280== ~~~ --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #3204 amf: support of node repair feature
Hi Thang, I just now tested this scenario. It returns TRY_AGAIN and a log in the syslog saying SC-2 is not part of CLM cluster. The log in the syslog, will be because to repair a disabled node, the differentiation has to be with CLM whether the node is down or faulty. When you test using amf-adm or immadm then when amf/someother-service returns TRY_AGAIN, then it will try again and will not return try_again to you until it times out. Thanks -Nagu --- ** [tickets:#3204] amf: support of node repair feature** **Status:** review **Milestone:** 5.20.08 **Created:** Mon Jul 20, 2020 11:57 PM UTC by Anand Sundararaj **Last Updated:** Mon Jul 27, 2020 07:43 AM UTC **Owner:** Anand Sundararaj Support of Administrative operation "SA_AMF_ADMIN_REPAIRED" on Amf Node Amf Specs B.4.1 section: 9.4.10 SA_AMF_ADMIN_REPAIRED --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #3204 amf: support of node repair feature
Hi Sundararaj, Could you specify some use case that can apply this action? With the below scenario, the amf-adm repaired on node not work. 1. Stop Opensaf on SC-2 /etc/init.d/opensafd stop 2. The node SC-2 with the below safAmfNode=SC-2,safAmfCluster=myAmfCluster saAmfNodeAdminState=UNLOCKED(1) saAmfNodeOperState=DISABLED(2) 3. Admin repaired has no effect SC-2-1:~ # amf-adm repaired safAmfNode=SC-2,safAmfCluster=myAmfCluster error - command timed out (alarm) B.R/Thang --- ** [tickets:#3204] amf: support of node repair feature** **Status:** review **Milestone:** 5.20.08 **Created:** Mon Jul 20, 2020 11:57 PM UTC by Anand Sundararaj **Last Updated:** Thu Jul 23, 2020 11:05 PM UTC **Owner:** Anand Sundararaj Support of Administrative operation "SA_AMF_ADMIN_REPAIRED" on Amf Node Amf Specs B.4.1 section: 9.4.10 SA_AMF_ADMIN_REPAIRED --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] Re: #3205 amf: remove hard-coding in amfnd
That's right, Thang. Thanks -Nagu --- ** [tickets:#3205] amf: remove hard-coding in amfnd** **Status:** review **Milestone:** 5.20.08 **Created:** Tue Jul 21, 2020 12:35 AM UTC by Anand Sundararaj **Last Updated:** Mon Jul 27, 2020 06:30 AM UTC **Owner:** Anand Sundararaj **Attachments:** - [amfnd_non_root_default.patch](https://sourceforge.net/p/opensaf/tickets/3205/attachment/amfnd_non_root_default.patch) (1.2 kB; application/octet-stream) Amfnd is hard-coded to run as root: "src/amf/amfnd/main.cc": daemonize_as_user("root", argc, argv); This needs to be removed. This is with reference to User Query and the patch(attached) was provided by Praveen: On 13-Apr-17 7:27 PM, Carroll, James R wrote: > Hi, > > I am using openSAF 5.0, and it appears that some of the openSAF (amfnd) > daemons are hard-coded to run as root. > Is there any way to disable this feature, so that I do not have to run the > daemon as root? > > I see the following note in the README documentation: > Only two processes are running as root, amfnd and smfnd. Reason is > that amfnd need todo that for backwards compatible reasons and the programs > it starts might be designed to require root access. > > We are trying to run all of our programs as non-root. Regarding the > documentation noted above, if we can start all our programs as non-root, then > we would not need to run the opensaf as root. As of now, it is hard-coded in amfnd to run as root. Attached are patches on default and 5.0 branch to enable amfnd to start as non-root. After installation of OpenSAF, uncomment "#AMFND_NON_ROOT=1" line in amfnd.conf to enable amfnd to run as a user as mentioned in amfnd.conf. By default it will run as root. Thanks Praveen --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #3205 amf: remove hard-coding in amfnd
Hi Nagu, OK. So this configuration will be handled by user if he/she wants to enable it. B.R/Thang --- ** [tickets:#3205] amf: remove hard-coding in amfnd** **Status:** review **Milestone:** 5.20.08 **Created:** Tue Jul 21, 2020 12:35 AM UTC by Anand Sundararaj **Last Updated:** Mon Jul 27, 2020 05:26 AM UTC **Owner:** Anand Sundararaj **Attachments:** - [amfnd_non_root_default.patch](https://sourceforge.net/p/opensaf/tickets/3205/attachment/amfnd_non_root_default.patch) (1.2 kB; application/octet-stream) Amfnd is hard-coded to run as root: "src/amf/amfnd/main.cc": daemonize_as_user("root", argc, argv); This needs to be removed. This is with reference to User Query and the patch(attached) was provided by Praveen: On 13-Apr-17 7:27 PM, Carroll, James R wrote: > Hi, > > I am using openSAF 5.0, and it appears that some of the openSAF (amfnd) > daemons are hard-coded to run as root. > Is there any way to disable this feature, so that I do not have to run the > daemon as root? > > I see the following note in the README documentation: > Only two processes are running as root, amfnd and smfnd. Reason is > that amfnd need todo that for backwards compatible reasons and the programs > it starts might be designed to require root access. > > We are trying to run all of our programs as non-root. Regarding the > documentation noted above, if we can start all our programs as non-root, then > we would not need to run the opensaf as root. As of now, it is hard-coded in amfnd to run as root. Attached are patches on default and 5.0 branch to enable amfnd to start as non-root. After installation of OpenSAF, uncomment "#AMFND_NON_ROOT=1" line in amfnd.conf to enable amfnd to run as a user as mentioned in amfnd.conf. By default it will run as root. Thanks Praveen --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets