Re: [devel] [PATCH 1/1] osaf: perform handshake in tcp_server in new thread [#3099]
Hi Gary, ack code review only. A question, with this changes it looks as an arbitrary client can just connect to the TCP server and e.g. monitor the "connect state" of the TCP server, but to exchange any data an SSL session has to be established after the TCP connect, if so I think this change looks good./BR Hans -Original Message- From: Gary Lee Sent: den 11 oktober 2019 05:22 To: Hans Nordebäck ; Minh Hon Chau ; Thuan Tran Cc: opensaf-devel@lists.sourceforge.net; Gary Lee Subject: [PATCH 1/1] osaf: perform handshake in tcp_server in new thread [#3099] --- src/osaf/consensus/plugins/tcp/tcp_server.py | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/osaf/consensus/plugins/tcp/tcp_server.py b/src/osaf/consensus/plugins/tcp/tcp_server.py index a7f22f2..c10859c 100755 --- a/src/osaf/consensus/plugins/tcp/tcp_server.py +++ b/src/osaf/consensus/plugins/tcp/tcp_server.py @@ -73,10 +73,15 @@ class ThreadedRPCServer(ThreadingMixIn, certfile=CERTFILE, keyfile=KEYFILE, cert_reqs=ssl.CERT_NONE, -ssl_version=ssl.PROTOCOL_TLSv1_2) +ssl_version=ssl.PROTOCOL_TLSv1_2, +do_handshake_on_connect=False) self.server_bind() self.server_activate() +def finish_request(self, request, client_address): + request.do_handshake() + return SimpleXMLRPCServer.finish_request(self, request, client_address) + class Arbitrator(object): """ Implementation of a simple arbitrator """ -- 2.7.4 ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] osaf: add tcp arbitrator [#3064]
Hi Gary, ack, review only. One comment/suggestion can we provide a small script that generates the x509 certificate (use e.g. openssl X509 ... ) instead of including a self signed cert? /BR Hans On Tue, 2019-10-01 at 12:53 +1000, Gary Lee wrote: > --- > src/osaf/consensus/plugins/tcp/README | 41 ++ > src/osaf/consensus/plugins/tcp/certificate.pem | 20 + > src/osaf/consensus/plugins/tcp/key.pem | 28 ++ > src/osaf/consensus/plugins/tcp/tcp.plugin | 520 > + > src/osaf/consensus/plugins/tcp/tcp_server.py | 157 > 5 files changed, 766 insertions(+) > create mode 100644 src/osaf/consensus/plugins/tcp/README > create mode 100644 src/osaf/consensus/plugins/tcp/certificate.pem > create mode 100644 src/osaf/consensus/plugins/tcp/key.pem > create mode 100755 src/osaf/consensus/plugins/tcp/tcp.plugin > create mode 100755 src/osaf/consensus/plugins/tcp/tcp_server.py > > diff --git a/src/osaf/consensus/plugins/tcp/README > b/src/osaf/consensus/plugins/tcp/README > new file mode 100644 > index 000..6f739e8 > --- /dev/null > +++ b/src/osaf/consensus/plugins/tcp/README > @@ -0,0 +1,41 @@ > +TCP arbitrator > + > +The TCP arbitrator may be useful for deployments where deploying > etcd is not > +feasible. An example arbitrator is provided to help prevent split > brain in > +clusters that contain up to 2 system controllers. > + > +The example arbitrator is a simple python based program that can be > deployed on > +a single payload or a node external to the cluster. > + > +Two main pieces of information are stored on the arbitrator: the > hostname of the > +current active controller and a heartbeat timestamp. > + > +An active controller sends a heartbeat to the controller every 100ms > using TLs > +over a persistent TCP connection. It should self-fence if it is > unable to > +heartbeat, as it is likely to be separated from the arbitrator. > + > +A candidate active controller must check the existing controller is > not > +heartbeating before promoting itself active. On a cluster using > TIPC, > +the timeout value is the TIPC link tolerance timeout. On a TCP based > cluster, > +the timeout is calculated from FMS_TAKEOVER_REQUEST_VALID_TIME. > + > +Suggested fmd.conf configuration: > + > +export FMS_SPLIT_BRAIN_PREVENTION=1 > +export FMS_KEYVALUE_STORE_PLUGIN_CMD=/full/path/to/tcp.plugin > +export FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE=0 (any other setting > is ignored) > +export FMS_RELAXED_NODE_PROMOTION=1 > + > +The above settings will allow a controller to be elected active > during > +cluster startup, even if the arbitrator is not yet running. > +If the arbitrator becomes temporarily unavailable, the controllers > will > +remain running if they can see each other. If an active controller > becomes > +isolated from the standby *and* the arbitrator, it will self-fence > and the > +standby will become active (if located in the same network partition > as > +the arbitrator). > + > +The provided self-signed certificate is an example only, and was > generated using: > + > +openssl req -newkey rsa:2048 -nodes -keyout key.pem -x509 -days > 10 -out certificate.pem > + > +It must be replaced in an actual deployment!! > diff --git a/src/osaf/consensus/plugins/tcp/certificate.pem > b/src/osaf/consensus/plugins/tcp/certificate.pem > new file mode 100644 > index 000..e0b4993 > --- /dev/null > +++ b/src/osaf/consensus/plugins/tcp/certificate.pem > @@ -0,0 +1,20 @@ > +-BEGIN CERTIFICATE- > +MIIDUTCCAjmgAwIBAgIJANrPYThNMllvMA0GCSqGSIb3DQEBCwUAMD4xCzAJBgNV > +BAYTAkFVMQ4wDAYDVQQIDAVTdGF0ZTENMAsGA1UEBwwEQ2l0eTEQMA4GA1UECgwH > +T3BlblNBRjAgFw0xOTA5MzAwMDMxNTRaGA8yMjkzMDcxNTAwMzE1NFowPjELMAkG > +A1UEBhMCQVUxDjAMBgNVBAgMBVN0YXRlMQ0wCwYDVQQHDARDaXR5MRAwDgYDVQQK > +DAdPcGVuU0FGMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA5pCFKYnS > ++pi0gzrRWPRYg1sak9VpNK+MkKbj+m0bptRt/8JvosV62js4q5Da3ldq2AAcEJyf > +gd02YZ4HUDdCMgMtlWT1CAx89rNpozRwyj5g+4cfmOqiz7ApeZ9yqltInjG720DT > +lam2/R4/00zmFGAqD2ZGPiOY93bjYx+GhtiHcDvpJuZS2Z2vQ/Dd09v6Omhus0rZ > +WMrENyfavc7HwFv2z/qi4Hsb/Aa9ZuAXUKp1Q2cvC0XWdRJMdZaZfGUlTfY6X8ar > +hSnswHJJKIjBq/0jYpztntOubceOuGVyezxPVXPw5qiBLO7ZyYNgN9IMoF6Rbu9y > +K1O1MvPw3ShlDQIDAQABo1AwTjAdBgNVHQ4EFgQU7UCcR6MgV5c5JXjCHpwcUC+9 > +HIAwHwYDVR0jBBgwFoAU7UCcR6MgV5c5JXjCHpwcUC+9HIAwDAYDVR0TBAUwAwEB > +/zANBgkqhkiG9w0BAQsFAAOCAQEAAOP3iMgjCx8JNKevOSq24mGcWAqlX0iHP0/1 > +hl7Dd/xRQywM90NfrMmiNTgO9Yyw1rOEKoeM4BFM/qs854iEHpAa7vlcW1ZidvHz > +eMQZA2Y6+AZ9zyt41bRJGqkqW7YdKVl9yuqWHcFBqBKf1pUsvt0bkab5EZFOBPuB > +tmKsODrU7cN1qeA1wjINZiOa88Kkh2YxkRoi7tL8NIMp2E40NLS3M5+xLEE8LKTH > +ouhReM4eEfGfzE171NPe/kzRRp+ujNZwmyQ8xmWp6jPjfD7Mfqdf1WYjspiGzziQ > +R/cdEHHAWq+wZrfG1aB5/yU4iA0h8xR8PNfVHjAjuUn4N6tSFg== > +-END CERTIFICATE- > diff --git a/src/osaf/consensus/plugins/tcp/key.pem > b/src/osaf/consensus/plugins/tcp/key.pem > new file mode 100644 > index 000..66b8bcb > --- /dev/null > +++ b/src/osaf/consensus/plugins/tcp/key.pem > @@ -0,0 +1,28 @@ > +-BEGIN PRIVATE
Re: [devel] [PATCH 3/9] mds: Add implementation for TIPC buffer overflow solution [#1960]
Hi Minh, see one comment below. /Thanks Hans On 2019-08-23 03:48, Minh Hon Chau wrote: > Hi Hans, > > Thanks for your time to review the patch, please see my replies below > your comments. > > Regards, > > Minh > > On 22/8/19 11:07 pm, Hans Nordebäck wrote: >> Hi Minh, >> >> it is a large patch so i have to review parts of it, below are my >> comments, marked with [HansN], for files: >> >> src/mds/Makefile.am >> src/mds/mds_dt.h >> src/mds/mds_dt_tipc.c >> >> I'll continue with the rest of the files a bit later. /Thanks Hans >> >> On 2019-08-14 08:38, Minh Chau wrote: >>> This is a collaborative patch of two participants:Thuan, Minh. >>> >>> Main changes: >>> - Add mds_tipc_fctrl_intf.h, mds_tipc_fctrl_intf.cc: These two files >>> introduce new functions which are called in mds_dt_tipc.c if the flow >>> control is enabled >>> - Add mds_tipc_fctrl_portid.h, mds_tipc_fctrl_portid.cc: These files >>> implements the tipc portid instance, which supports the sliding window, >>> mds msg queue >>> - Add mds_tipc_fctrl_msg.h, mds_tipc_fctrl_msg.cc: These files define >>> the event and messages which are used for this solution. >>> --- >>> src/mds/Makefile.am | 10 +- >>> src/mds/mds_dt.h | 8 +- >>> src/mds/mds_dt_tipc.c | 188 +--- >>> src/mds/mds_tipc_fctrl_intf.cc | 376 >>> +++ >>> src/mds/mds_tipc_fctrl_intf.h | 47 + >>> src/mds/mds_tipc_fctrl_msg.cc | 142 +++ >>> src/mds/mds_tipc_fctrl_msg.h | 129 ++ >>> src/mds/mds_tipc_fctrl_portid.cc | 261 +++ >>> src/mds/mds_tipc_fctrl_portid.h | 87 + >>> 9 files changed, 1184 insertions(+), 64 deletions(-) >>> create mode 100644 src/mds/mds_tipc_fctrl_intf.cc >>> create mode 100644 src/mds/mds_tipc_fctrl_intf.h >>> create mode 100644 src/mds/mds_tipc_fctrl_msg.cc >>> create mode 100644 src/mds/mds_tipc_fctrl_msg.h >>> create mode 100644 src/mds/mds_tipc_fctrl_portid.cc >>> create mode 100644 src/mds/mds_tipc_fctrl_portid.h >>> >>> diff --git a/src/mds/Makefile.am b/src/mds/Makefile.am >>> index 2d7b652..d849e8f 100644 >>> --- a/src/mds/Makefile.am >>> +++ b/src/mds/Makefile.am >>> @@ -48,10 +48,16 @@ lib_libopensaf_core_la_SOURCES += \ >>> if ENABLE_TIPC_TRANSPORT >>> noinst_HEADERS += src/mds/mds_dt_tipc.h \ >>> src/mds/mds_tipc_recvq_stats.h \ >>> - src/mds/mds_tipc_recvq_stats_impl.h >>> + src/mds/mds_tipc_recvq_stats_impl.h \ >>> + src/mds/mds_tipc_fctrl_intf.h \ >>> + src/mds/mds_tipc_fctrl_portid.h \ >>> + src/mds/mds_tipc_fctrl_msg.h >>> lib_libopensaf_core_la_SOURCES += src/mds/mds_dt_tipc.c \ >>> src/mds/mds_tipc_recvq_stats.cc \ >>> - src/mds/mds_tipc_recvq_stats_impl.cc >>> + src/mds/mds_tipc_recvq_stats_impl.cc \ >>> + src/mds/mds_tipc_fctrl_intf.cc \ >>> + src/mds/mds_tipc_fctrl_portid.cc \ >>> + src/mds/mds_tipc_fctrl_msg.cc >>> endif >>> if ENABLE_TESTS >>> diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h >>> index b645bb4..d9e8633 100644 >>> --- a/src/mds/mds_dt.h >>> +++ b/src/mds/mds_dt.h >>> @@ -162,7 +162,7 @@ uint32_t mdtm_del_from_ref_tbl(MDS_SUBTN_REF_VAL >>> ref); >>> uint32_t mds_tmr_mailbox_processing(void); >>> uint32_t mdtm_get_from_ref_tbl(MDS_SUBTN_REF_VAL ref, MDS_SVC_HDL >>> *svc_hdl); >>> uint32_t mdtm_add_frag_hdr(uint8_t *buf_ptr, uint16_t len, >>> uint32_t seq_num, >>> - uint16_t frag_byte); >>> + uint16_t frag_byte, uint16_t >>> fctrl_seq_num); >>> uint32_t mdtm_free_reassem_msg_mem(MDS_ENCODED_MSG *msg); >>> uint32_t mdtm_process_recv_data(uint8_t *buf, uint16_t len, >>> uint64_t tipc_id, >>> uint32_t *buff_dump); >>> @@ -240,9 +240,13 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, >>> NCSCONTEXT msg); >>> #define MDS_PROT 0xA0 >>> #define MDS_VERSION 0x08 >>> -#define MDS_PROT_VER_MASK (MDS_PROT | MDS_VERSION) >>> +#define MDS_PROT_VER_MASK 0xFC >>> #define MDTM_PRI_MASK 0x3 >>> +/* MDS protocol/version for flow control */ >>> +#de
Re: [devel] [PATCH 3/9] mds: Add implementation for TIPC buffer overflow solution [#1960]
Hi Minh, it is a large patch so i have to review parts of it, below are my comments, marked with [HansN], for files: src/mds/Makefile.am src/mds/mds_dt.h src/mds/mds_dt_tipc.c I'll continue with the rest of the files a bit later. /Thanks Hans On 2019-08-14 08:38, Minh Chau wrote: > This is a collaborative patch of two participants:Thuan, Minh. > > Main changes: > - Add mds_tipc_fctrl_intf.h, mds_tipc_fctrl_intf.cc: These two files > introduce new functions which are called in mds_dt_tipc.c if the flow > control is enabled > - Add mds_tipc_fctrl_portid.h, mds_tipc_fctrl_portid.cc: These files > implements the tipc portid instance, which supports the sliding window, > mds msg queue > - Add mds_tipc_fctrl_msg.h, mds_tipc_fctrl_msg.cc: These files define > the event and messages which are used for this solution. > --- > src/mds/Makefile.am | 10 +- > src/mds/mds_dt.h | 8 +- > src/mds/mds_dt_tipc.c| 188 +--- > src/mds/mds_tipc_fctrl_intf.cc | 376 > +++ > src/mds/mds_tipc_fctrl_intf.h| 47 + > src/mds/mds_tipc_fctrl_msg.cc| 142 +++ > src/mds/mds_tipc_fctrl_msg.h | 129 ++ > src/mds/mds_tipc_fctrl_portid.cc | 261 +++ > src/mds/mds_tipc_fctrl_portid.h | 87 + > 9 files changed, 1184 insertions(+), 64 deletions(-) > create mode 100644 src/mds/mds_tipc_fctrl_intf.cc > create mode 100644 src/mds/mds_tipc_fctrl_intf.h > create mode 100644 src/mds/mds_tipc_fctrl_msg.cc > create mode 100644 src/mds/mds_tipc_fctrl_msg.h > create mode 100644 src/mds/mds_tipc_fctrl_portid.cc > create mode 100644 src/mds/mds_tipc_fctrl_portid.h > > diff --git a/src/mds/Makefile.am b/src/mds/Makefile.am > index 2d7b652..d849e8f 100644 > --- a/src/mds/Makefile.am > +++ b/src/mds/Makefile.am > @@ -48,10 +48,16 @@ lib_libopensaf_core_la_SOURCES += \ > if ENABLE_TIPC_TRANSPORT > noinst_HEADERS += src/mds/mds_dt_tipc.h \ > src/mds/mds_tipc_recvq_stats.h \ > - src/mds/mds_tipc_recvq_stats_impl.h > + src/mds/mds_tipc_recvq_stats_impl.h \ > + src/mds/mds_tipc_fctrl_intf.h \ > + src/mds/mds_tipc_fctrl_portid.h \ > + src/mds/mds_tipc_fctrl_msg.h > lib_libopensaf_core_la_SOURCES += src/mds/mds_dt_tipc.c \ > src/mds/mds_tipc_recvq_stats.cc \ > - src/mds/mds_tipc_recvq_stats_impl.cc > + src/mds/mds_tipc_recvq_stats_impl.cc \ > + src/mds/mds_tipc_fctrl_intf.cc \ > + src/mds/mds_tipc_fctrl_portid.cc \ > + src/mds/mds_tipc_fctrl_msg.cc > endif > > if ENABLE_TESTS > diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h > index b645bb4..d9e8633 100644 > --- a/src/mds/mds_dt.h > +++ b/src/mds/mds_dt.h > @@ -162,7 +162,7 @@ uint32_t mdtm_del_from_ref_tbl(MDS_SUBTN_REF_VAL ref); > uint32_t mds_tmr_mailbox_processing(void); > uint32_t mdtm_get_from_ref_tbl(MDS_SUBTN_REF_VAL ref, MDS_SVC_HDL *svc_hdl); > uint32_t mdtm_add_frag_hdr(uint8_t *buf_ptr, uint16_t len, uint32_t seq_num, > - uint16_t frag_byte); > + uint16_t frag_byte, uint16_t fctrl_seq_num); > uint32_t mdtm_free_reassem_msg_mem(MDS_ENCODED_MSG *msg); > uint32_t mdtm_process_recv_data(uint8_t *buf, uint16_t len, uint64_t > tipc_id, > uint32_t *buff_dump); > @@ -240,9 +240,13 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, NCSCONTEXT > msg); > > #define MDS_PROT 0xA0 > #define MDS_VERSION 0x08 > -#define MDS_PROT_VER_MASK (MDS_PROT | MDS_VERSION) > +#define MDS_PROT_VER_MASK 0xFC > #define MDTM_PRI_MASK 0x3 > > +/* MDS protocol/version for flow control */ > +#define MDS_PROT_FCTRL (0xB0 | MDS_VERSION) > +#define MDS_PROT_FCTRL_ID 0x00AC13F5 > + > /* Added for the subscription changes */ > #define MDS_NCS_CHASSIS_ID (m_NCS_GET_NODE_ID & 0x00ff) > #define MDS_TIPC_COMMON_ID 0x01001000 > diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c > index 86b52bb..fef1c50 100644 > --- a/src/mds/mds_dt_tipc.c > +++ b/src/mds/mds_dt_tipc.c > @@ -47,6 +47,7 @@ > #include "mds_dt_tipc.h" > #include "mds_dt_tcp_disc.h" > #include "mds_core.h" > +#include "mds_tipc_fctrl_intf.h" > #include "mds_tipc_recvq_stats.h" > #include "base/osaf_utility.h" > #include "base/osaf_poll.h" > @@ -165,20 +166,22 @@ NCS_PATRICIA_TREE mdtm_reassembly_list; > uint32_t mdtm_global_frag_num; > > const unsigned int MAX_RECV_THRESHOLD = 30; > +uint8_t gl_mds_pro_ver = MDS_PROT | MDS_VERSION; > > -static bool get_tipc_port_id(int sock, uint32_t* port_id) { > +static bool get_tipc_port_id(int sock, struct tipc_portid* port_id) { > struct sockaddr_tipc addr; > socklen_t sz = sizeof(addr); > > memset(, 0, sizeof(addr)); > - *port_id = 0; > + port_id->node = 0; > + port_id->ref = 0; > if (0 > getsockname(sock, (struct sockaddr *), )) { > syslog(LOG_ERR, "MDTM:TIPC Failed to get socket
[devel] [PATCH 0/1] Review Request for util: Fenced should only write a log record when two acitve controllers is seen [#3073]
Summary: util: Fenced should only write a log record when two acitve controllers is seen [#3073] Review request for Ticket(s): 3073 Peer Reviewer(s): Gary, Duc Pull request to: Affected branch(es): develop Development branch: ticket-3073 Base revision: 729f71fbfff0eea6d4a6a394780142b87a9fb472 Personal repository: git://git.code.sf.net/u/hansnordeback/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesn Core libraries n Samples n Tests n Other y Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision 81ec4d662ecdcf6b147e2376697ff423625463d4 Author: Hans Nordeback Date: Thu, 22 Aug 2019 09:14:12 +0200 util: Fenced should only write a log record when two acitve controllers is seen [#3073] Complete diffstat: -- tools/devel/fenced/node_state_hdlr_pl.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Testing Commands: - *** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES *** Testing, Expected Results: -- *** PASTE COMMAND OUTPUTS / TEST RESULTS *** Conditions of Submission: - *** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC *** Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y n powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] util: Fenced should only write a log record when two acitve controllers is seen [#3073]
--- tools/devel/fenced/node_state_hdlr_pl.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/devel/fenced/node_state_hdlr_pl.cc b/tools/devel/fenced/node_state_hdlr_pl.cc index c74fe72b9..6bf032e5a 100644 --- a/tools/devel/fenced/node_state_hdlr_pl.cc +++ b/tools/devel/fenced/node_state_hdlr_pl.cc @@ -169,8 +169,8 @@ void NodeStateHdlrPl::check_isolation() { isolated_ = NodeIsolationState::kNotIsolated; syslog(LOG_NOTICE, "one active controller detected"); } else { - isolated_ = NodeIsolationState::kIsolated; - syslog(LOG_NOTICE, "%d active controllers detected, split brain", no_of_active); + isolated_ = NodeIsolationState::kNotIsolated; + syslog(LOG_NOTICE, "%d active controllers detected", no_of_active); } } notify: -- 2.17.1 ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 2/9] mds: Resolve c/c++ linking issue [#1960]
Hi Minh, ack, code review only/Thanks HansN On 2019-08-14 08:38, Minh Chau wrote: > (Sending on behalf of Thuan) > This patch solves the linking issue if mds_dt.h or mds_core.h > is included in c++ sources. > --- > src/mds/mds_core.h| 74 > +++ > src/mds/mds_dt.h | 4 +-- > src/mds/mds_dt2c.h| 67 -- > src/mds/mds_dt_tcp.c | 2 ++ > src/mds/mds_dt_tcp.h | 1 - > src/mds/mds_dt_tipc.c | 2 ++ > 6 files changed, 80 insertions(+), 70 deletions(-) > > diff --git a/src/mds/mds_core.h b/src/mds/mds_core.h > index 37696d4..c09b428 100644 > --- a/src/mds/mds_core.h > +++ b/src/mds/mds_core.h > @@ -573,6 +573,80 @@ extern uint32_t > mds_mcm_free_msg_uba_start(MDS_ENCODED_MSG msg); > extern void get_adest_details(MDS_DEST adest, char *adest_details); > extern void get_subtn_adest_details(MDS_PWE_HDL pwe_hdl, MDS_SVC_ID svc_id, > MDS_DEST adest, char *adest_details); > +#ifdef __cplusplus > +extern "C" { > +#endif > +/* */ > +/* */ > +/*MCM to MDTM */ > +/* */ > +/* */ > + > +/* Initialization of MDTM Module */ > +uint32_t (*mds_mdtm_init)(NODE_ID node_id, uint32_t *mds_tipc_ref); > + > +/* Destroying the MDTM Module*/ > +uint32_t (*mds_mdtm_destroy)(void); > + > +uint32_t (*mds_mdtm_send)(MDTM_SEND_REQ *req); > + > +/* SVC Install */ > +uint32_t (*mds_mdtm_svc_install)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id, > + NCSMDS_SCOPE_TYPE install_scope, > + V_DEST_RL role, MDS_VDEST_ID vdest_id, > + NCS_VDEST_TYPE vdest_policy, > + MDS_SVC_PVT_SUB_PART_VER mds_svc_pvt_ver); > + > +/* SVC Uninstall */ > +uint32_t (*mds_mdtm_svc_uninstall)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id, > + NCSMDS_SCOPE_TYPE install_scope, > + V_DEST_RL role, MDS_VDEST_ID vdest_id, > + NCS_VDEST_TYPE vdest_policy, > + MDS_SVC_PVT_SUB_PART_VER mds_svc_pvt_ver); > + > +/* SVC Subscribe */ > +uint32_t (*mds_mdtm_svc_subscribe)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id, > + NCSMDS_SCOPE_TYPE subscribe_scope, > + MDS_SVC_HDL local_svc_hdl, > + MDS_SUBTN_REF_VAL *subtn_ref_val); > + > +/* added svc_hdl */ > +/* SVC Unsubscribe */ > +uint32_t (*mds_mdtm_svc_unsubscribe)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id, > + NCSMDS_SCOPE_TYPE subscribe_scope, > + MDS_SUBTN_REF_VAL subtn_ref_val); > + > +/* VDEST Install */ > +uint32_t (*mds_mdtm_vdest_install)(MDS_VDEST_ID vdest_id); > + > +/* VDEST Uninstall */ > +uint32_t (*mds_mdtm_vdest_uninstall)(MDS_VDEST_ID vdest_id); > + > +/* VDEST Subscribe */ > +uint32_t (*mds_mdtm_vdest_subscribe)(MDS_VDEST_ID vdest_id, > + MDS_SUBTN_REF_VAL *subtn_ref_val); > + > +/* VDEST Unsubscribe */ > +uint32_t (*mds_mdtm_vdest_unsubscribe)(MDS_VDEST_ID vdest_id, > + MDS_SUBTN_REF_VAL subtn_ref_val); > + > +/* Tx Register (For incrementing the use count) */ > +uint32_t (*mds_mdtm_tx_hdl_register)(MDS_DEST adest); > + > +/* Tx Unregister (For decrementing the use count) */ > +uint32_t (*mds_mdtm_tx_hdl_unregister)(MDS_DEST adest); > + > +/* Node subscription */ > +uint32_t (*mds_mdtm_node_subscribe)(MDS_SVC_HDL svc_hdl, > +MDS_SUBTN_REF_VAL *subtn_ref_val); > + > +/* Node unsubscription */ > +uint32_t (*mds_mdtm_node_unsubscribe)(MDS_SUBTN_REF_VAL subtn_ref_val); > + > +#ifdef __cplusplus > +} > +#endif > + > /* */ > /* */ > /* MMGR Macros */ > diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h > index a6e2801..b645bb4 100644 > --- a/src/mds/mds_dt.h > +++ b/src/mds/mds_dt.h > @@ -214,10 +214,10 @@ typedef struct mdtm_ref_hdl_list { > MDS_SVC_HDL svc_hdl; > } MDTM_REF_HDL_LIST; > > -MDTM_REF_HDL_LIST *mdtm_ref_hdl_list_hdr; > +extern MDTM_REF_HDL_LIST *mdtm_ref_hdl_list_hdr; > +extern NCS_PATRICIA_TREE mdtm_reassembly_list; > uint32_t mdtm_attach_mbx(SYSF_MBX mbx); > void mds_buff_dump(uint8_t *buff, uint32_t len, uint32_t max); > -NCS_PATRICIA_TREE mdtm_reassembly_list; > > uint32_t mdtm_set_transport(MDTM_TX_TYPE transport); > bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, NCSCONTEXT msg); > diff --git a/src/mds/mds_dt2c.h b/src/mds/mds_dt2c.h > index 012999c..c92fecb 100644 > ---
Re: [devel] [PATCH 1/9] mds: Add README for solution of TIPC buffer overflow at MDS [#1960]
Hi Minh, ack, some minor comments below/Thanks Hans On 2019-08-14 08:38, Minh Chau wrote: > --- > src/mds/README | 221 > + > 1 file changed, 221 insertions(+) > create mode 100644 src/mds/README > > diff --git a/src/mds/README b/src/mds/README > new file mode 100644 > index 000..1b94632 > --- /dev/null > +++ b/src/mds/README > @@ -0,0 +1,221 @@ > +/* -*- OpenSAF -*- > + * > + * (C) Copyright 2019 The OpenSAF Foundation > + * > + * This program is distributed in the hope that it will be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY > + * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed > + * under the GNU Lesser General Public License Version 2.1, February 1999. > + * The complete license can be accessed from the following location: > + * http://opensource.org/licenses/lgpl-license.php > + * See the Copying file included with the OpenSAF distribution for full > + * licensing terms. > + * > + * Author(s): Ericsson AB > + * > + */ > +Background > +== > +If OpenSAF configures TIPC as transport, the MDS library today will use > +TIPC SOCK_RDM socket for message distribution in the cluster. The SOCK_RDM > +datagram socket possibly encounters buffer overflow at receiver ends which > +has been documented in tipc.io[1]. A temporary solution for this buffer > +overflow issue is that the socket buffer size can be increased to a larger > +number. However, if the cluster continues either scaling out or adding more > +components, the system will be under dimensioned, thus the TIPC buffer > +overflow can occur again. > + > +MDS's solution for TIPC buffer overflow > +=== > +If MDS disables TIPC_DEST_DROPPABLE, TIPC will return the ancillary message > +when the original message is failed to deliver. By this event, if the message > +has been saved in queue, MDS at sender sides can search and retransmit this > +message to the receivers. > +Once the messages in the sender's queue has been delivered successfully, MDS > +needs to remove them. MDS introduces its internal ACK message as an > +acknowledgment from receivers so that the senders can remove the messages > +out of the queue. > +Also, as such situation of buffer overflow at receivers, the retransmission > may > +not succeed or even become worse at receiver ends (the more retransmission, > +the more overflow to occur). MDS imitates the sliding window in TCP[2] to > +control the flow of data message towards the receivers. > + > +Legacy MDS data message, new (data + ACK) MDS message, and upgradability > + > +Below is the MDS legacy message format that has been used till OpenSAF > 5.19.07 > + > +oct 0 message length > +oct 1 > +-- > +oct 2 sequence number: incremented for every message sent out to all > destined > +... tipc portid. > +oct 5 > +-- > +oct 6 fragment number: a message with same sequence number can be > fragmented, > +oct 7 identified by this fragment number. > +-- > +oct 8 length check: cross check with message length(oct0,1), NOT USED. > +oct 9 > +-- > +oct 10 protocol version: (MDS_PROT:0xA0 | MDS_VERSION:0x08) = 0xA8, NOT USED > +-- > +oct 11 mds length: length of mds header and mds data, starting from oct13 > +oct 12 > +-- > +oct 13 mds header and data > +... > +-- > + > +The current sequence number/fragment number are being used in MDS for all > +messages sent to all discovered tipc portid(s), meaning that every message > is sent > +to any tipc portid, the sequence/fragment number is increased. The flow > control > +needs its own sequence number sliding between two tipc porid(s) so that > receivers > +can detect message drop due to buffer overload. Therefore, the oct8 and oct9 > are > +now reused as flow control sequence number. The oct10, protocol version, has > new > +value of 0xB8. The format of new data message as below: > + > +oct 0 same > +... > +oct 7 > +-- > +oct 8 flow control sequence number > +oct 9 > +-- > +oct 10 protocol version: (MDS_PROT_TIPC_FCTRL:0xB0 | MDS_VERSION:0x08) = 0xB8 > +-- > +oct 11 same > +... > +-- > + > +The ACK message is introduced to acknowledge one data message or a chunk of > +accumulative data message. The ACK message format: > + > +oct 0 message length > +oct 1 > +-- > +oct 2 8 bytes, NOT USED > + > +oct 9 > +-- > +oct 10
Re: [devel] [PATCH 1/1] amfd: add support for dynamically changing saAmfRank of SaAmfSIRankedSU [#3058]
Hi Alex, ack code review only, a few minor comments below/Thanks HansN On 2019-07-18 21:04, Jones, Alex wrote: Allow saAmfRank of SaAmfSIRankedSU to be changed at runtime --- src/amf/amfd/si.cc | 103 + src/amf/amfd/si.h | 3 ++ src/amf/amfd/siass.cc | 38 ++ src/amf/amfd/sirankedsu.cc | 73 +- src/amf/amfd/util.cc | 30 ++- 5 files changed, 243 insertions(+), 4 deletions(-) diff --git a/src/amf/amfd/si.cc b/src/amf/amfd/si.cc index b308e14a9..3f0a8bf51 100644 --- a/src/amf/amfd/si.cc +++ b/src/amf/amfd/si.cc @@ -255,6 +255,109 @@ void AVD_SI::remove_rankedsu(const std::string ) { TRACE_LEAVE(); } +/** + * @brief Update order of sisu list with new rank. + * + * @param suname + * @param saAmfRank + */ +void AVD_SI::update_sisu_rank(const std::string& suname, uint32_t newRank) { + TRACE_ENTER(); + + do { + // if there is only one entry nothing really to do + if (!list_of_sisu || !list_of_sisu->si_next) + break; + + // first find the su, and remove it from the linked list [HansN] AVD_SU_SI_REL *matched_susi{nullptr}; (also some more places below) + AVD_SU_SI_REL *matched_susi(0); + + for (AVD_SU_SI_REL *susi(list_of_sisu), *prev(0); [HansN] perhaps, for (AVD_SU_SI_REL *susi = list_of_sisu, *prev = nullptr; or for (AVD_SU_SI_REL *susi{list_of_sisu}, *prev{nullptr} ? + susi; + prev = susi, susi = susi->si_next) { + if (suname == susi->su->name) { + matched_susi = susi; + if (prev) + prev->si_next = susi->si_next; + else + list_of_sisu = susi->si_next; + break; + } + } + + osafassert(matched_susi); + + // now reinsert it at the correct place + AVD_SU_SI_REL *prev(nullptr); + + for (AVD_SU_SI_REL *curr_susi(list_of_sisu); + curr_susi; + prev = curr_susi, curr_susi = curr_susi->si_next) { + if (curr_susi->is_per_si == true) { + if (false == matched_susi->is_per_si) continue; + + AVD_SUS_PER_SI_RANK *i_su_rank_rec(0); + + /* determine the su_rank rec for this rec */ + for (const auto : *sirankedsu_db) { + i_su_rank_rec = value.second; + if (i_su_rank_rec->indx.si_name.compare(name) != 0) continue; + AVD_SU *curr_su(su_db->find(i_su_rank_rec->su_name)); + if (curr_su == curr_susi->su) break; + } + + osafassert(i_su_rank_rec); + + if (newRank <= i_su_rank_rec->indx.su_rank) break; + } else { + if (true == matched_susi->is_per_si) break; + + if (newRank <= curr_susi->su->saAmfSURank) break; + } + } + + if (prev) { + matched_susi->si_next = prev->si_next; + prev->si_next = matched_susi; + } else { + matched_susi->si_next = list_of_sisu; + list_of_sisu = matched_susi; + } + + // update PG rank + for (AVD_SU_SI_REL *curr_susi(matched_susi->si->list_of_sisu); + curr_susi; + curr_susi = curr_susi->si_next) { + if (curr_susi->state == SA_AMF_HA_STANDBY) + avd_pg_susi_chg_prc(avd_cb, curr_susi); + } + } while (false); + + TRACE_LEAVE(); +} + +uint32_t AVD_SI::get_sisu_rank(const std::string& suname) const { + uint32_t rank(0); + + TRACE_ENTER2("%s", suname.c_str()); + + for (const AVD_SU_SI_REL *susi(list_of_sisu); susi; susi = susi->si_next) { + TRACE("su: %s si: %s state: %i", + susi->su->name.c_str(), + susi->si->name.c_str(), + susi->state); + if (susi->state == SA_AMF_HA_STANDBY) + rank++; + + if (suname == susi->su->name) + break; + } + + TRACE_LEAVE(); + + return rank; +} + void AVD_SI::remove_csi(AVD_CSI *csi) { osafassert(csi->si == this); /* remove CSI from the SI */ diff --git a/src/amf/amfd/si.h b/src/amf/amfd/si.h index 3b93e56b1..0db8dde13 100644 --- a/src/amf/amfd/si.h +++ b/src/amf/amfd/si.h @@ -128,6 +128,9 @@ class AVD_SI { void add_rankedsu(const std::string , uint32_t saAmfRank); void remove_rankedsu(const std::string ); + void update_sisu_rank(const std::string& suname, uint32_t saAmfRank); + + uint32_t get_sisu_rank(const std::string& suname) const; void set_si_switch(AVD_CL_CB *cb, const SaToggleState state); diff --git a/src/amf/amfd/siass.cc b/src/amf/amfd/siass.cc index 8a2d2175e..8895af3b4 100644 --- a/src/amf/amfd/siass.cc +++ b/src/amf/amfd/siass.cc @@ -616,6 +616,21 @@ done: avd_gen_su_ha_state_changed_ntf(cb, su_si); } + /* + * If we are adding an entry which is not the last we need to send PG updates + * for rank. + */ + if (su_si->si_next) { + for (AVD_SU_SI_REL *curr_susi(si->list_of_sisu); + curr_susi; + curr_susi = curr_susi->si_next) { + // skip this one since a pg update will sent for it already + if (su_si == curr_susi || curr_susi->state != SA_AMF_HA_STANDBY) continue; + + avd_pg_susi_chg_prc(avd_cb, curr_susi); + } + } + TRACE_LEAVE(); return su_si; } @@ -742,13 +757,36 @@ uint32_t avd_susi_delete(AVD_CL_CB *cb, AVD_SU_SI_REL *susi, bool ckpt) { susi->su_next = nullptr; } + bool sendPgUpdate(false); + /* now delete it from the SI list */ if (p_si_su == nullptr) { susi->si->list_of_sisu = susi->si_next; susi->si_next = nullptr; + + if (susi->si->list_of_sisu) + sendPgUpdate = true; } else { p_si_su->si_next = susi->si_next; susi->si_next = nullptr; + + if (p_si_su->si_next) + sendPgUpdate = true; +
Re: [devel] [PATCH 1/1] amfd: prevent infinite loop [#3050]
Hi Gary, ack, code review only/Thanks HansN On 2019-06-20 04:13, Gary Lee wrote: > In handle_event_in_failover_state(), we iterate through > queue_evt in a while loop, but process_event() can insert > items into the queue inside the loop, and we may end > up never exiting the while loop. > --- > src/amf/amfd/main.cc | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/src/amf/amfd/main.cc b/src/amf/amfd/main.cc > index 50daa59..e3d0957 100644 > --- a/src/amf/amfd/main.cc > +++ b/src/amf/amfd/main.cc > @@ -406,12 +406,18 @@ static void handle_event_in_failover_state(AVD_EVT > *evt) { > > /* Dequeue, all the messages from the queue > and process them now */ > - > -while (!cb->evt_queue.empty()) { > +auto size_before_loop = cb->evt_queue.size(); > +std::queue::size_type count = 0; > +while (count < size_before_loop) { > + // note: process_event() may insert items into > + // the queue, so terminate loop when we have > + // processed all the original elements > + // to avoid infinite loop > AVD_EVT_QUEUE *queue_evt = cb->evt_queue.front(); > cb->evt_queue.pop(); > process_event(cb, queue_evt->evt); > delete queue_evt; > + ++count; > } > > /* Walk through all the nodes to check if any of the nodes state is ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for utils: Use a fence daemon as an alternative to payload reboot fencing [#3048]
Summary: utils: Use a fence daemon as an alternative to payload reboot fencing [#3048] Review request for Ticket(s): 3048 Peer Reviewer(s): Duc, Gary, Anders Pull request to: Affected branch(es): develop Development branch: ticket-3048 Base revision: 3895c7a88bdb3c6f86da1083ea0fd9e2cd642d01 Personal repository: git://git.code.sf.net/u/hansnordeback/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesn Core libraries n Samples n Tests n Other y NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - revision 810b5dc5ce2f1e8830c157b09f2649e47a8ea070 Author: Hans Nordeback Date: Wed, 5 Jun 2019 10:22:30 +0200 utils: Use a fence daemon as an alternative to payload reboot fencing [#3048] Added Files: src/fm/fmd/tipc_server.cc src/fm/fmd/tipc_server.h tools/devel/fenced/command.cc tools/devel/fenced/command.h tools/devel/fenced/cpp_macros.h tools/devel/fenced/fenced.conf tools/devel/fenced/fenced_main.cc tools/devel/fenced/Makefile tools/devel/fenced/node_state_file.cc tools/devel/fenced/node_state_file.h tools/devel/fenced/node_state_hdlr.cc tools/devel/fenced/node_state_hdlr_factory.cc tools/devel/fenced/node_state_hdlr_factory.h tools/devel/fenced/node_state_hdlr.h tools/devel/fenced/node_state_hdlr_pl.cc tools/devel/fenced/node_state_hdlr_pl.h tools/devel/fenced/node_state_hdlr_sc.cc tools/devel/fenced/node_state_hdlr_sc.h tools/devel/fenced/osaffenced.service tools/devel/fenced/README_TOOLS tools/devel/fenced/service.cc tools/devel/fenced/service.h tools/devel/fenced/timer.cc tools/devel/fenced/timer.h tools/devel/fenced/watchdog.cc tools/devel/fenced/watchdog.h Complete diffstat: -- src/fm/Makefile.am| 6 +- src/fm/fmd/fm_amf.cc | 14 ++ src/fm/fmd/tipc_server.cc | 93 src/fm/fmd/tipc_server.h | 45 tools/devel/fenced/Makefile | 63 ++ tools/devel/fenced/README_TOOLS | 15 ++ tools/devel/fenced/command.cc | 134 tools/devel/fenced/command.h | 43 tools/devel/fenced/cpp_macros.h | 33 +++ tools/devel/fenced/fenced.conf| 17 ++ tools/devel/fenced/fenced_main.cc | 179 tools/devel/fenced/node_state_file.cc | 87 tools/devel/fenced/node_state_file.h | 41 tools/devel/fenced/node_state_hdlr.cc | 54 + tools/devel/fenced/node_state_hdlr.h | 45 tools/devel/fenced/node_state_hdlr_factory.cc | 66 ++ tools/devel/fenced/node_state_hdlr_factory.h | 35 +++ tools/devel/fenced/node_state_hdlr_pl.cc | 292 ++ tools/devel/fenced/node_state_hdlr_pl.h | 60 ++ tools/devel/fenced/node_state_hdlr_sc.cc | 42 tools/devel/fenced/node_state_hdlr_sc.h | 41 tools/devel/fenced/osaffenced.service | 14 ++ tools/devel/fenced/service.cc | 53 + tools/devel/fenced/service.h | 42 tools/devel/fenced/timer.cc | 62 ++ tools/devel/fenced/timer.h| 53 + tools/devel/fenced/watchdog.cc| 37 tools/devel/fenced/watchdog.h | 39 28 files changed, 1703 insertions(+), 2 deletions(-) Testing Commands: - At tools/devel/fenced run make ; make install to build and install fenced on a payload. enable 'headless' via enabling export IMMSV_SC_ABSENCE_ALLOWED=900 in immd.conf. osaffenced on the payload will stop opensaf if the node becomes 'isolated', i.e. no active SC. opensaf will be started by osaffenced when the node is 'not isolated', i.e. active SC. A file will also be dropped to 'expose' the node isolated/not isolated state. Testing, Expected Results: -- *** PASTE COMMAND OUTPUTS / TEST RESULTS *** Conditions of Submission: - *** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC *** Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need
[devel] [PATCH 1/1] utils: Use a fence daemon as an alternative to payload reboot fencing [#3048]
--- src/fm/Makefile.am| 6 +- src/fm/fmd/fm_amf.cc | 14 + src/fm/fmd/tipc_server.cc | 93 ++ src/fm/fmd/tipc_server.h | 45 +++ tools/devel/fenced/Makefile | 63 tools/devel/fenced/README_TOOLS | 15 + tools/devel/fenced/command.cc | 134 tools/devel/fenced/command.h | 43 +++ tools/devel/fenced/cpp_macros.h | 33 ++ tools/devel/fenced/fenced.conf| 17 + tools/devel/fenced/fenced_main.cc | 179 +++ tools/devel/fenced/node_state_file.cc | 87 ++ tools/devel/fenced/node_state_file.h | 41 +++ tools/devel/fenced/node_state_hdlr.cc | 54 tools/devel/fenced/node_state_hdlr.h | 45 +++ tools/devel/fenced/node_state_hdlr_factory.cc | 66 tools/devel/fenced/node_state_hdlr_factory.h | 35 +++ tools/devel/fenced/node_state_hdlr_pl.cc | 292 ++ tools/devel/fenced/node_state_hdlr_pl.h | 60 tools/devel/fenced/node_state_hdlr_sc.cc | 42 +++ tools/devel/fenced/node_state_hdlr_sc.h | 41 +++ tools/devel/fenced/osaffenced.service | 14 + tools/devel/fenced/service.cc | 53 tools/devel/fenced/service.h | 42 +++ tools/devel/fenced/timer.cc | 62 tools/devel/fenced/timer.h| 53 tools/devel/fenced/watchdog.cc| 37 +++ tools/devel/fenced/watchdog.h | 39 +++ 28 files changed, 1703 insertions(+), 2 deletions(-) create mode 100644 src/fm/fmd/tipc_server.cc create mode 100644 src/fm/fmd/tipc_server.h create mode 100755 tools/devel/fenced/Makefile create mode 100644 tools/devel/fenced/README_TOOLS create mode 100644 tools/devel/fenced/command.cc create mode 100644 tools/devel/fenced/command.h create mode 100644 tools/devel/fenced/cpp_macros.h create mode 100644 tools/devel/fenced/fenced.conf create mode 100644 tools/devel/fenced/fenced_main.cc create mode 100644 tools/devel/fenced/node_state_file.cc create mode 100644 tools/devel/fenced/node_state_file.h create mode 100644 tools/devel/fenced/node_state_hdlr.cc create mode 100644 tools/devel/fenced/node_state_hdlr.h create mode 100644 tools/devel/fenced/node_state_hdlr_factory.cc create mode 100644 tools/devel/fenced/node_state_hdlr_factory.h create mode 100644 tools/devel/fenced/node_state_hdlr_pl.cc create mode 100644 tools/devel/fenced/node_state_hdlr_pl.h create mode 100644 tools/devel/fenced/node_state_hdlr_sc.cc create mode 100644 tools/devel/fenced/node_state_hdlr_sc.h create mode 100644 tools/devel/fenced/osaffenced.service create mode 100644 tools/devel/fenced/service.cc create mode 100644 tools/devel/fenced/service.h create mode 100644 tools/devel/fenced/timer.cc create mode 100644 tools/devel/fenced/timer.h create mode 100644 tools/devel/fenced/watchdog.cc create mode 100644 tools/devel/fenced/watchdog.h diff --git a/src/fm/Makefile.am b/src/fm/Makefile.am index 0f254b94f..325847ae9 100644 --- a/src/fm/Makefile.am +++ b/src/fm/Makefile.am @@ -20,7 +20,8 @@ noinst_HEADERS += \ src/fm/fmd/fm_cb.h \ src/fm/fmd/fm_evt.h \ src/fm/fmd/fm_mds.h \ - src/fm/fmd/fm_mem.h + src/fm/fmd/fm_mem.h \ + src/fm/fmd/tipc_server.h osaf_execbin_PROGRAMS += bin/osaffmd nodist_pkgclccli_SCRIPTS += \ @@ -44,7 +45,8 @@ bin_osaffmd_SOURCES = \ src/fm/fmd/fm_amf.cc \ src/fm/fmd/fm_main.cc \ src/fm/fmd/fm_mds.cc \ - src/fm/fmd/fm_rda.cc + src/fm/fmd/fm_rda.cc \ + src/fm/fmd/tipc_server.cc bin_osaffmd_LDADD = \ lib/libSaAmf.la \ diff --git a/src/fm/fmd/fm_amf.cc b/src/fm/fmd/fm_amf.cc index e99f3ba7e..8cf284f97 100644 --- a/src/fm/fmd/fm_amf.cc +++ b/src/fm/fmd/fm_amf.cc @@ -34,6 +34,12 @@ **/ #include "fm.h" +#include "tipc_server.h" + +namespace { +TIPCServer tipc_srv; +} + extern uint32_t gl_fm_hdl; uint32_t fm_amf_init(FM_AMF_CB *fm_amf_cb); @@ -151,6 +157,11 @@ void fm_saf_CSI_set_callback(SaInvocationT invocation, const SaNameT *compName, } else { fm_cb->amf_state = new_haState; fm_cb->csi_assigned = true; + if (new_haState == SA_AMF_HA_ACTIVE) { +tipc_srv.publish(); + } else { +tipc_srv.unpublish(); + } } error = saAmfResponse(fm_amf_cb->amf_hdl, invocation, error); } @@ -300,6 +311,9 @@ uint32_t fm_amf_init(FM_AMF_CB *fm_amf_cb) { SaNameT sname; uint32_t rc = NCSCC_RC_SUCCESS; TRACE_ENTER(); + + tipc_srv.init(); + memset(, 0, sizeof(SaAmfCallbacksT)); if (fm_amf_cb->nid_started && amf_comp_name_get_set_from_file("FM_COMP_NAME_FILE", ) != diff --git a/src/fm/fmd/tipc_server.cc b/src/fm/fmd/tipc_server.cc new file mode 100644 index
Re: [devel] [PATCH 1/1] rded: improve self-fencing response time [#3039]
Hi Gary, ack, review only/Thanks HansN On 5/27/19 2:09 AM, Gary Lee wrote: > When connectivity to consensus service is lost, it is recorded > in a state variable. When all RDE peers are lost, the node will > now self-fence immediately. > --- > src/rde/rded/rde_cb.h| 5 + > src/rde/rded/rde_main.cc | 18 -- > src/rde/rded/role.cc | 24 > src/rde/rded/role.h | 3 +++ > 4 files changed, 48 insertions(+), 2 deletions(-) > > diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h > index 9a0919c..e35fdab 100644 > --- a/src/rde/rded/rde_cb.h > +++ b/src/rde/rded/rde_cb.h > @@ -18,6 +18,7 @@ > #ifndef RDE_RDED_RDE_CB_H_ > #define RDE_RDED_RDE_CB_H_ > > +#include > #include > #include > #include "base/osaf_utility.h" > @@ -37,6 +38,8 @@ > enum class State {kNotActive = 0, kNotActiveSeenPeer, kActiveElected, > kActiveElectedSeenPeer, kActiveFailover}; > > +enum class ConsensusState {kUnknown = 0, kConnected, kDisconnected}; > + > struct RDE_CONTROL_BLOCK { > SYSF_MBX mbx; > NCSCONTEXT task_handle; > @@ -49,6 +52,8 @@ struct RDE_CONTROL_BLOCK { > // used for discovering peer controllers, regardless of their role > std::set peer_controllers{}; > State state{State::kNotActive}; > + std::atomic > consensus_service_state{ConsensusState::kUnknown}; > + std::atomic state_refresh_thread_started{false}; // consensus service > }; > > enum RDE_MSG_TYPE { > diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc > index 456d2ce..1a7e587 100644 > --- a/src/rde/rded/rde_main.cc > +++ b/src/rde/rded/rde_main.cc > @@ -178,6 +178,19 @@ static void handle_mbx_event() { > case RDE_MSG_CONTROLLER_DOWN: > rde_cb->peer_controllers.erase(msg->fr_node_id); > TRACE("peer_controllers: size %zu", rde_cb->peer_controllers.size()); > + if (role->role() == PCS_RDA_ACTIVE) { > +Consensus consensus_service; > +if (consensus_service.IsEnabled() == true && > +rde_cb->consensus_service_state == ConsensusState::kDisconnected > && > +consensus_service.IsRelaxedNodePromotionEnabled() == true && > +role->IsPeerPresent() == false) { > +LOG_NO("Lost connectivity to consensus service. No peer > present"); > +if (consensus_service.IsRemoteFencingEnabled() == false) { > +opensaf_quick_reboot("Lost connectivity to consensus > service. " > + "Rebooting this node"); > +} > +} > + } > break; > case RDE_MSG_TAKEOVER_REQUEST_CALLBACK: { > rde_cb->monitor_takeover_req_thread_running = false; > @@ -214,7 +227,7 @@ static void handle_mbx_event() { > if (consensus_service.IsRelaxedNodePromotionEnabled() == true) { > if (rde_cb->state == State::kActiveElected) { > TRACE("Relaxed mode is enabled"); > -TRACE(" No peer SC yet seen, ignore consensus service > failure"); > +TRACE("No peer SC yet seen, ignore consensus service > failure"); > // if relaxed node promotion is enabled, and we have yet to > see > // a peer SC after being promoted, tolerate consensus > service > // not working > @@ -227,13 +240,14 @@ static void handle_mbx_event() { > // we have seen the peer, and peer is still connected, > tolerate > // consensus service not working > fencing_required = false; > +rde_cb->consensus_service_state = > ConsensusState::kDisconnected; > } > } > if (fencing_required == true) { > LOG_NO("Lost connectivity to consensus service"); > if (consensus_service.IsRemoteFencingEnabled() == false) { > opensaf_quick_reboot("Lost connectivity to consensus > service. " > - "Rebooting this node"); > + "Rebooting this node"); > } > } > } > diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc > index 3effc25..b8c8157 100644 > --- a/src/rde/rded/role.cc > +++ b/src/rde/rded/role.cc > @@ -215,6 +215,18 @@ timespec* Role::Poll(timespec* ts) { > is_candidate).detach(); > } > } > + } else if (role_ == PCS_RDA_ACTIVE) { > +RDE_CONTROL_BLOCK* cb = rde_get_control_block(); > +if (cb->consensus_service_state == ConsensusState::kUnknown || > +cb->consensus_service_state == ConsensusState::kDisconnected) { > + // consensus service was previously disconnected, refresh state > + Consensus consensus_service; > + if (consensus_service.IsEnabled() == true && > +cb->state_refresh_thread_started == false) { > +cb->state_refresh_thread_started = true; > +
Re: [devel] [PATCH 1/1] uml: Update to Linux 4.18.20, iproute2 5.1.0 and busybox 1.30.1 [#3042]
Hi Anders, ack, (I think CONFIG_KCOV=n should be added to the config file). /BR HansN On 2019-05-20 14:38, Anders Widell wrote: > Uplift the Linux kernel version for UML to 4.18.20, to make it possible to > build > UML with newer glibc version (e.g. on Ubuntu 18.04). > --- > tools/cluster_sim_uml/README | 2 + > tools/cluster_sim_uml/uml/bin/uml_start | 2 +- > tools/cluster_sim_uml/uml/build_uml | 21 +- > .../config/{busybox-1.27.2 => busybox-1.30.1} | 120 +++--- > .../{linux-4.13.3-i686 => linux-4.18.20-i686} | 225 -- > ...nux-4.13.3-x86_64 => linux-4.18.20-x86_64} | 221 - > 6 files changed, 291 insertions(+), 300 deletions(-) > rename tools/cluster_sim_uml/uml/config/{busybox-1.27.2 => busybox-1.30.1} > (93%) > rename tools/cluster_sim_uml/uml/config/{linux-4.13.3-i686 => > linux-4.18.20-i686} (88%) > rename tools/cluster_sim_uml/uml/config/{linux-4.13.3-x86_64 => > linux-4.18.20-x86_64} (88%) > > diff --git a/tools/cluster_sim_uml/README b/tools/cluster_sim_uml/README > index 1d5912156..e747786ae 100644 > --- a/tools/cluster_sim_uml/README > +++ b/tools/cluster_sim_uml/README > @@ -202,6 +202,8 @@ The following Debian/Ubuntu packages are known to work. > Also make sure that you > have installed the corresponding development packages for these libraries. > > - bash 4.3 > +- bison 3.0.4 > +- flex 2.6.0 > - libc6 2.23 > - libgcc1 6.0.1 > - libmnl0 1.0.3 > diff --git a/tools/cluster_sim_uml/uml/bin/uml_start > b/tools/cluster_sim_uml/uml/bin/uml_start > index de4cb289e..501bf73f8 100755 > --- a/tools/cluster_sim_uml/uml/bin/uml_start > +++ b/tools/cluster_sim_uml/uml/bin/uml_start > @@ -36,7 +36,7 @@ uid=$(id -u) > byte1=2 > byte2=0 > byte3=0 > -if [ "$OSAF_UML_DYNAMIC_MAC" -eq "1" ]; then > +if [ "$OSAF_UML_DYNAMIC_MAC" = "1" ]; then > byte4=$(echo $(od -N1 -An -tx1 /dev/urandom)) > else > byte4=1 > diff --git a/tools/cluster_sim_uml/uml/build_uml > b/tools/cluster_sim_uml/uml/build_uml > index ac7246058..df59b0caf 100755 > --- a/tools/cluster_sim_uml/uml/build_uml > +++ b/tools/cluster_sim_uml/uml/build_uml > @@ -65,28 +65,32 @@ help() { > exit 0 > } > test -n "$1" || help > + > +type -t bison > /dev/null || die "Missing the tool 'bison'" > +type -t flex > /dev/null || die "Missing the tool 'flex'" > + > cd "$dir" > archive=${OSAF_UML_ARCHIVE:-$dir/archive} > build=${OSAF_UML_BUILD:-$dir} > configd=${OSAF_UML_CONFIGD:-$dir/config} > > -kver=${OSAF_UML_KVER:-4.13.3} > +kver=${OSAF_UML_KVER:-4.18.20} > kbasedir=$(echo "$kver" | cut -d. -f1).x > #kurlbase=${KURLBASE:-"https://www.kernel.org/pub/linux/kernel/v$kbasedir"} > > kurlbase=${KURLBASE:-"http://ftp.funet.fi/pub/mirrors/ftp.kernel.org/pub/linux/kernel/v$kbasedir"} > > #kurlbase=${KURLBASE:-"http://ftp.funet.fi/pub/mirrors/ftp.kernel.org/pub/linux/kernel/v$kbasedir/testing"} > kernel_decompress=xz > kurl="$kurlbase/linux-$kver.tar.xz" > -kernel_sha256sum='03d22c74a102b66341b6f52e72142f0544cea3b413ca78bffe7d2a09e288caab > linux-4.13.3.tar.xz' > +kernel_sha256sum='68ac319e0fb7edd6b6051541d9cf112cd4f77a29e16a69ae1e133ff51117f653 > linux-4.18.20.tar.xz' > > -iproute2ver=${OSAF_UML_IPRVER:-4.13.0} > +iproute2ver=${OSAF_UML_IPRVER:-5.1.0} > > iproute2url="https://www.kernel.org/pub/linux/utils/net/iproute2/iproute2-$iproute2ver.tar.xz; > > #iproute2url="http://ftp.funet.fi/pub/mirrors/ftp.kernel.org/pub/linux/utils/net/iproute2/iproute2-$iproute2ver.tar.xz; > -iproute2_sha256sum='9cfb81edf8c8509e03daa77cf62aead01c4a827132f6c506578f94cc19415c50 > iproute2-4.13.0.tar.xz' > +iproute2_sha256sum='dc5a980873eabf6b00c0be976b6e5562b1400d47d1d07d2ac35d5e5acbcf7bcf > iproute2-5.1.0.tar.xz' > > -bbver=${OSAF_UML_BBVER:-1.27.2} > +bbver=${OSAF_UML_BBVER:-1.30.1} > bburl="http://busybox.net/downloads/busybox-$bbver.tar.bz2; > -bb_sha256sum='9d4be516b61e6480f156b11eb42577a13529f75d3383850bb75c50c285de63df > busybox-1.27.2.tar.bz2' > +bb_sha256sum='3d1d04a4dbd34048f4794815a5c48ebb9eb53c5277e09c060323b95dfbdc > busybox-1.30.1.tar.bz2' > > umlutilsver=20070815 > > umlutilsurl="http://user-mode-linux.sourceforge.net/uml_utilities_$umlutilsver.tar.bz2; > @@ -375,10 +379,7 @@ cmd_build_iproute2() > test -d bin || mkdir -p bin > cd iproute2-$iproute2ver > ./configure > -cd tipc > -mkdir linux > -cp $build/linux-$kver/include/uapi/linux/tipc*.h linux > -make -j$no_of_processors CFLAGS="-s -pipe -O2 -I." tipc > +make -j$no_of_processors > cd $build > cp iproute2-$iproute2ver/tipc/tipc bin || die "Could not build tipc" > } > diff --git a/tools/cluster_sim_uml/uml/config/busybox-1.27.2 > b/tools/cluster_sim_uml/uml/config/busybox-1.30.1 > similarity index 93% > rename from tools/cluster_sim_uml/uml/config/busybox-1.27.2 > rename to
[devel] [PATCH 0/1] Review Request for mds: use new TIPC getsockopt to log receive buffer utilization [#3038]
Summary: mds: use new TIPC getsockopt to log receive queue utilization [#3038] Review request for Ticket(s): 3038 Peer Reviewer(s): AndersW, Lennart, Gary Pull request to: Affected branch(es): develop Development branch: ticket-3038 Base revision: 3b124020051730287ace8bd9ab28a8fa431fc85a Personal repository: git://git.code.sf.net/u/hansnordeback/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesn Core libraries y Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision 7d1b566f8167f9fbe1512a78f0bcf4fb1c58f449 Author: Hans Nordeback Date: Mon, 20 May 2019 13:47:28 +0200 mds: use new TIPC getsockopt to log receive queue utilization [#3038] Added Files: src/base/statistics.h src/mds/mds_tipc_recvq_stats.cc src/mds/mds_tipc_recvq_stats.h src/mds/mds_tipc_recvq_stats_impl.cc src/mds/mds_tipc_recvq_stats_impl.h Complete diffstat: -- 00-README.conf | 14 +++ src/base/Makefile.am | 1 + src/base/statistics.h| 88 + src/mds/Makefile.am | 8 +- src/mds/mds_dt_tipc.c| 3 + src/mds/mds_tipc_recvq_stats.cc | 29 ++ src/mds/mds_tipc_recvq_stats.h | 32 +++ src/mds/mds_tipc_recvq_stats_impl.cc | 178 +++ src/mds/mds_tipc_recvq_stats_impl.h | 39 9 files changed, 390 insertions(+), 2 deletions(-) Testing Commands: - Pls. see 00-README.conf Testing, Expected Results: -- Pls. see 00-README.conf Conditions of Submission: - *** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC *** Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] mds: use new TIPC getsockopt to log receive queue utilization [#3038]
--- 00-README.conf | 14 +++ src/base/Makefile.am | 1 + src/base/statistics.h| 88 + src/mds/Makefile.am | 8 +- src/mds/mds_dt_tipc.c| 3 + src/mds/mds_tipc_recvq_stats.cc | 29 + src/mds/mds_tipc_recvq_stats.h | 32 + src/mds/mds_tipc_recvq_stats_impl.cc | 178 +++ src/mds/mds_tipc_recvq_stats_impl.h | 39 ++ 9 files changed, 390 insertions(+), 2 deletions(-) create mode 100644 src/base/statistics.h create mode 100644 src/mds/mds_tipc_recvq_stats.cc create mode 100644 src/mds/mds_tipc_recvq_stats.h create mode 100644 src/mds/mds_tipc_recvq_stats_impl.cc create mode 100644 src/mds/mds_tipc_recvq_stats_impl.h diff --git a/00-README.conf b/00-README.conf index 8f20e5209..da1825f06 100644 --- a/00-README.conf +++ b/00-README.conf @@ -737,3 +737,17 @@ initiate a 'self-fencing' by rebooting the node, if it determines the node should no longer be active according to the consensus service, to prevent a split-brain situation. +TIPC receive queue utilization +== + +If setting the environment variable MDS_RECVQ_STATS_LOG_FREQ_SEC in a service config +file enables TIPC receive queue utilisation statistics. The argument is how often the +statistics will be written to syslog. + +Example amfd.conf: + +export MDS_RECVQ_STATS_LOG_FREQ_SEC=5 + +then every 5 seconds a log record is written: + +May 20 12:23:30 SC-1 local0.notice osafamfd[545]: NO TIPC receive queue utilization (in %): min: 3.86 max: 4.38 mean: 4.15 std dev: 0.18 diff --git a/src/base/Makefile.am b/src/base/Makefile.am index ce93562e5..025fb86a2 100644 --- a/src/base/Makefile.am +++ b/src/base/Makefile.am @@ -157,6 +157,7 @@ noinst_HEADERS += \ src/base/saf_error.h \ src/base/saf_mem.h \ src/base/sprr_dl_api.h \ + src/base/statistics.h \ src/base/string_parse.h \ src/base/sysf_exc_scr.h \ src/base/sysf_ipc.h \ diff --git a/src/base/statistics.h b/src/base/statistics.h new file mode 100644 index 0..9ce980fc1 --- /dev/null +++ b/src/base/statistics.h @@ -0,0 +1,88 @@ +/* -*- OpenSAF -*- + * + * (C) Copyright 2019 The OpenSAF Foundation + * Copyright Ericsson AB 2019 - All Rights Reserved. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed + * under the GNU Lesser General Public License Version 2.1, February 1999. + * The complete license can be accessed from the following location: + * http://opensource.org/licenses/lgpl-license.php + * See the Copying file included with the OpenSAF distribution for full + * licensing terms. + * + * Author(s): Ericsson AB + * + */ + +#ifndef STATISTICS_H_ +#define STATISTICS_H_ + +#include + +namespace base { + +class Statistics { + public: + void clear() { +n_ = 0; + } + + void push(double x) { +n_++; + +// See Knuth, Art Of Computer Programming, Volume 2. The Seminumerical Algorithms, 4.2.2. Accuracy of Floating Point Arithmetic, +// using the recurrence formulas: +// M1 = x1, Mk = Mk-1 + (xk - Mk-1) / k (15) +// S1 = 0, Sk = Sk-1 + (xk - Mk-1) * (xk - Mk) (16) +// for 2 <= k <= n, sqrt(Sn/(n-1) +if (n_ == 1) { + prev_m_ = current_m_ = x; + prev_s_ = 0; + min_ = x; + max_ = x; +} else { + current_m_ = prev_m_ + (x - prev_m_) / n_; + current_s_ = prev_s_ + (x - prev_m_) * (x - current_m_); + + if (x > max_) max_ = x; + if (x < min_) min_ = x; + prev_m_ = current_m_; + prev_s_ = current_s_; +} + } + + double mean() const { +return (n_ > 0) ? current_m_ : 0; + } + + double variance() const { +return (n_ > 1) ? current_s_ / (n_ - 1) : 0; + } + + double std_dev() const { +return sqrt(variance()); + } + + double min() const { +return min_; + } + double max() const { +return max_; + } + + private: + int n_{0}; + double prev_m_{0}; + double current_m_{0}; + double prev_s_{0}; + double current_s_{0}; + double min_{0}; + double max_{0}; +}; + +} // namespace base + +#endif // STATISTICS_H_ + diff --git a/src/mds/Makefile.am b/src/mds/Makefile.am index 3724d2ea8..2d7b652e9 100644 --- a/src/mds/Makefile.am +++ b/src/mds/Makefile.am @@ -46,8 +46,12 @@ lib_libopensaf_core_la_SOURCES += \ src/mds/ncs_vda.c if ENABLE_TIPC_TRANSPORT -noinst_HEADERS += src/mds/mds_dt_tipc.h -lib_libopensaf_core_la_SOURCES += src/mds/mds_dt_tipc.c +noinst_HEADERS += src/mds/mds_dt_tipc.h \ + src/mds/mds_tipc_recvq_stats.h \ + src/mds/mds_tipc_recvq_stats_impl.h +lib_libopensaf_core_la_SOURCES += src/mds/mds_dt_tipc.c \ + src/mds/mds_tipc_recvq_stats.cc \ + src/mds/mds_tipc_recvq_stats_impl.cc endif if ENABLE_TESTS diff --git
Re: [devel] [PATCH 1/1] base: strip leading and trailing quotes [#3041]
Hi Gary, ack, code review only/Thanks HansN On 2019-05-17 14:47, Gary Lee wrote: > ConfigFileReader enables runtime 'reload' of .conf files. > However, if the environment variable is surrounded by quotes, > it adds the quotes to the value which is not the expected behaviour. > > export FOO="foo" > > FOO should contain just foo, not "foo". > --- > src/base/config_file_reader.cc | 15 +++ > src/osaf/consensus/consensus.cc | 1 + > 2 files changed, 16 insertions(+) > > diff --git a/src/base/config_file_reader.cc b/src/base/config_file_reader.cc > index 63cad7d..0132547 100644 > --- a/src/base/config_file_reader.cc > +++ b/src/base/config_file_reader.cc > @@ -36,6 +36,18 @@ static void trim(std::string& str) { > right_trim(str); > } > > +static void strip_quotes(std::string& str) { > + // trim leading and trailing quotes > + if (str.front() == '"' || > + str.front() == '\'') { > +str.erase(0, 1); // delete first char > + } > + if (str.back() == '"' || > +str.back() == '\'') { > +str.pop_back(); // delete last char > + } > +} > + > ConfigFileReader::SettingsMap ConfigFileReader::ParseFile( > const std::string& filename) { > const std::string prefix("export"); > @@ -80,6 +92,9 @@ ConfigFileReader::SettingsMap ConfigFileReader::ParseFile( > std::string value = line.substr(equal + 1); > trim(value); > > + strip_quotes(key); > + strip_quotes(value); > + > map[key] = value; > } > file.close(); > diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc > index 480f7d2..0bebab2 100644 > --- a/src/osaf/consensus/consensus.cc > +++ b/src/osaf/consensus/consensus.cc > @@ -295,6 +295,7 @@ bool Consensus::ReloadConfiguration() { > continue; > } > int rc; > +TRACE("Setting '%s' to '%s'", kv.first.c_str(), kv.second.c_str()); > rc = setenv(kv.first.c_str(), kv.second.c_str(), 1); > osafassert(rc == 0); > } ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3
Hi Thuan, Ack, code review only. Few comments below: - Agree, but shouldn't the description in the ticket be something like "mds: At MDS broadcast use TIPC multicast for fragmented messages instead of unicasting to only one destination' ? - And adding the test cases would be good. /BR HansN On 2019-05-03 11:03, Tran Thuan wrote: Hi Hans, static uint32_t mdtm_sendto(uint8_t *buffer, uint16_t buff_len, struct tipc_portid tipc_id); static uint32_t mdtm_mcast_sendto(void *buffer, size_t size, const MDTM_SEND_REQ *req); Before the fix, fragment package is sent via mdtm_sendto() which is designed (MDS design) to send to one destination. After the fix, fragment package is sent via mdtm_mcast_sendto() which is designed (MDS design) to send to all destination. Both functions are call sendto() of TIPC but just different parameters. Best Regards, ThuanTr From: Hans Nordebäck <mailto:hans.nordeb...@ericsson.com> Sent: Friday, May 3, 2019 3:50 PM To: Thuan Tran <mailto:thuan.t...@dektech.com.au>; Minh Hon Chau <mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>; Anders Widell <mailto:anders.wid...@ericsson.com> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi Thuan, a question, the old code uses unicast for the fragments, now multicast is used. But from the 'sendto' TIPC documentation: "If the destination is a service range, the message is a multicast to all matching sockets." so before when unicast was used for the fragments, TIPC multicast the fragments? What problem do this patch solves, can you clarify? /Thanks HansN On 2019-05-03 10:20, Tran Thuan wrote: Hi Hans, Yes, we try that kind of basic test, IMMD can deliver big message via multicast. Best Regards, ThuanTr -Original Message- From: Hans Nordebäck <mailto:hans.nordeb...@ericsson.com> Sent: Friday, May 3, 2019 3:11 PM To: Thuan Tran <mailto:thuan.t...@dektech.com.au>; Minh Hon Chau <mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi Thuan, that sounds good, is this how you test now? When looking at the MDS code before this change it looks that large multicast messages are fragmented and only sent to one receiver using unicast, but with this change the fragments are multicasted to all receivers, which seems more correct. /Thanks HansN On 2019-05-03 10:04, Tran Thuan wrote: Hi Hans, Current MDS apitest only binary execution on one node. It is easier if create IMM test case to make IMMD send broadcast big message. I think we can create new ticket for this additional test. Best Regards, ThuanTr -Original Message- From: Hans Nordebäck <mailto:hans.nordeb...@ericsson.com> Sent: Friday, May 3, 2019 2:45 PM To: Thuan Tran <mailto:thuan.t...@dektech.com.au>; Minh Hon Chau <mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi Thuan, ok, if we can add additional tests to the mds api test suite would be good/Thanks HansN -----Original Message- From: Tran Thuan <mailto:thuan.t...@dektech.com.au> Sent: den 3 maj 2019 09:41 To: Hans Nordebäck <mailto:hans.nordeb...@ericsson.com>; Minh Hon Chau <mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi Hans, I don't see this kind of test in mds apitests. Best Regards, ThuanTr -Original Message- From: Hans Nordebäck <mailto:hans.nordeb...@ericsson.com> Sent: Friday, May 3, 2019 2:31 PM To: Thuan Tran <mailto:thuan.t...@dektech.com.au>; Minh Hon Chau <mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi Thuan, I'm reviewing the patch now. I haven't checked yet but do you know if the mds apitests cover this case sending large multicast messages? /Thanks HansN
Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3
Hi Thuan, a question, the old code uses unicast for the fragments, now multicast is used. But from the 'sendto' TIPC documentation: "If the destination is a service range, the message is a multicast to all matching sockets." so before when unicast was used for the fragments, TIPC multicast the fragments? What problem do this patch solves, can you clarify? /Thanks HansN On 2019-05-03 10:20, Tran Thuan wrote: Hi Hans, Yes, we try that kind of basic test, IMMD can deliver big message via multicast. Best Regards, ThuanTr -Original Message----- From: Hans Nordebäck <mailto:hans.nordeb...@ericsson.com> Sent: Friday, May 3, 2019 3:11 PM To: Thuan Tran <mailto:thuan.t...@dektech.com.au>; Minh Hon Chau <mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi Thuan, that sounds good, is this how you test now? When looking at the MDS code before this change it looks that large multicast messages are fragmented and only sent to one receiver using unicast, but with this change the fragments are multicasted to all receivers, which seems more correct. /Thanks HansN On 2019-05-03 10:04, Tran Thuan wrote: Hi Hans, Current MDS apitest only binary execution on one node. It is easier if create IMM test case to make IMMD send broadcast big message. I think we can create new ticket for this additional test. Best Regards, ThuanTr -Original Message- From: Hans Nordebäck <mailto:hans.nordeb...@ericsson.com> Sent: Friday, May 3, 2019 2:45 PM To: Thuan Tran <mailto:thuan.t...@dektech.com.au>; Minh Hon Chau <mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi Thuan, ok, if we can add additional tests to the mds api test suite would be good/Thanks HansN -Original Message- From: Tran Thuan <mailto:thuan.t...@dektech.com.au> Sent: den 3 maj 2019 09:41 To: Hans Nordebäck <mailto:hans.nordeb...@ericsson.com>; Minh Hon Chau <mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi Hans, I don't see this kind of test in mds apitests. Best Regards, ThuanTr -Original Message- From: Hans Nordebäck <mailto:hans.nordeb...@ericsson.com> Sent: Friday, May 3, 2019 2:31 PM To: Thuan Tran <mailto:thuan.t...@dektech.com.au>; Minh Hon Chau <mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi Thuan, I'm reviewing the patch now. I haven't checked yet but do you know if the mds apitests cover this case sending large multicast messages? /Thanks HansN -Original Message- From: Tran Thuan <mailto:thuan.t...@dektech.com.au> Sent: den 2 maj 2019 05:56 To: Minh Hon Chau <mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au>; Hans Nordebäck <mailto:hans.nordeb...@ericsson.com> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi Hans, Do you have any further comment? Can we push the patch? Best Regards, ThuanTr -Original Message- From: Minh Hon Chau <mailto:minh.c...@dektech.com.au> Sent: Friday, April 26, 2019 4:11 PM To: Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au>; 'Hans Nordebäck' <mailto:hans.nordeb...@ericsson.com>; 'Thuan Tran' <mailto:thuan.t...@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi, ack from me (code review) Thanks Minh On 25/4/19 9:33 pm, Vu Minh Nguyen wrote: Hi Hans, Probably you were looking at code that included this Thuan's patch. In legacy code, only mdtm_sendto() is called inside the function mdtm_frag_and_send(). Regards, Vu -Original Message- From: Hans Nordebäck <mailto:hans.nordeb...@ericsson.com> Sent: Thursday, April 25, 2019 6:10 PM To: Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au>; Thuan Tran <mailto:thuan.t...@dektech.com.au>; Mi
Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3
Hi Thuan, that sounds good, is this how you test now? When looking at the MDS code before this change it looks that large multicast messages are fragmented and only sent to one receiver using unicast, but with this change the fragments are multicasted to all receivers, which seems more correct. /Thanks HansN On 2019-05-03 10:04, Tran Thuan wrote: > Hi Hans, > > Current MDS apitest only binary execution on one node. > It is easier if create IMM test case to make IMMD send broadcast big message. > I think we can create new ticket for this additional test. > > Best Regards, > ThuanTr > > -Original Message- > From: Hans Nordebäck > Sent: Friday, May 3, 2019 2:45 PM > To: Thuan Tran ; Minh Hon Chau > ; Vu Minh Nguyen > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented > messages [#3033] V3 > > Hi Thuan, > ok, if we can add additional tests to the mds api test suite would be > good/Thanks HansN > > -Original Message- > From: Tran Thuan > Sent: den 3 maj 2019 09:41 > To: Hans Nordebäck ; Minh Hon Chau > ; Vu Minh Nguyen > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented > messages [#3033] V3 > > Hi Hans, > > I don't see this kind of test in mds apitests. > > Best Regards, > ThuanTr > > -Original Message- > From: Hans Nordebäck > Sent: Friday, May 3, 2019 2:31 PM > To: Thuan Tran ; Minh Hon Chau > ; Vu Minh Nguyen > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented > messages [#3033] V3 > > Hi Thuan, > I'm reviewing the patch now. I haven't checked yet but do you know if the mds > apitests cover this case sending large multicast messages? /Thanks HansN > > -Original Message- > From: Tran Thuan > Sent: den 2 maj 2019 05:56 > To: Minh Hon Chau ; Vu Minh Nguyen > ; Hans Nordebäck > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented > messages [#3033] V3 > > Hi Hans, > > Do you have any further comment? > Can we push the patch? > > Best Regards, > ThuanTr > > -Original Message- > From: Minh Hon Chau > Sent: Friday, April 26, 2019 4:11 PM > To: Vu Minh Nguyen ; 'Hans Nordebäck' > ; 'Thuan Tran' > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented > messages [#3033] V3 > > Hi, > > ack from me (code review) > > Thanks > > Minh > > On 25/4/19 9:33 pm, Vu Minh Nguyen wrote: >> Hi Hans, >> >> Probably you were looking at code that included this Thuan's patch. >> >> In legacy code, only mdtm_sendto() is called inside the function >> mdtm_frag_and_send(). >> >> Regards, Vu >> >>> -Original Message- >>> From: Hans Nordebäck >>> Sent: Thursday, April 25, 2019 6:10 PM >>> To: Vu Minh Nguyen ; Thuan Tran >>> ; Minh Hon Chau >>> Cc: opensaf-devel@lists.sourceforge.net >>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast >>> fragmented messages [#3033] V3 >>> >>> >>> Hi Vu, >>> It seems mdtm_mcast_sendto is used in mdtm_frag_and_send, at >>> MDS_SENDTYPE_BCAST/BR Hans -Original Message- >>> From: Vu Minh Nguyen >>> Sent: den 25 april 2019 12:20 >>> To: Hans Nordebäck ; Thuan Tran >>> ; Minh Hon Chau >>> Cc: opensaf-devel@lists.sourceforge.net >>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast >>> fragmented messages [#3033] V3 >>> >>> Hi Hans, >>> >>> See my responses inline. >>> >>> Regards, Vu >>> >>>> -Original Message- >>>> From: Hans Nordebäck >>>> Sent: Thursday, April 25, 2019 4:28 PM >>>> To: Thuan Tran ; Vu Minh Nguyen >>>> ; Minh Hon Chau >>> >>>> Cc: opensaf-devel@lists.sourceforge.net >>>> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast >>>> fragmented messages [#3033] V3 >>>> >>>> Hi Vu and Thuan, >>>> >>>> a few question, is the text in the ticket description correct? E.g >>>> it says unicast is used if a multicast message is fragmented, (I >>>> think multicast still is used >>>> >>>> to send the fragments), this is what you mean with 2 different channels? >>>> (only
Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3
Hi Thuan, I'm reviewing the patch now. I haven't checked yet but do you know if the mds apitests cover this case sending large multicast messages? /Thanks HansN -Original Message- From: Tran Thuan Sent: den 2 maj 2019 05:56 To: Minh Hon Chau ; Vu Minh Nguyen ; Hans Nordebäck Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi Hans, Do you have any further comment? Can we push the patch? Best Regards, ThuanTr -Original Message- From: Minh Hon Chau Sent: Friday, April 26, 2019 4:11 PM To: Vu Minh Nguyen ; 'Hans Nordebäck' ; 'Thuan Tran' Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi, ack from me (code review) Thanks Minh On 25/4/19 9:33 pm, Vu Minh Nguyen wrote: > Hi Hans, > > Probably you were looking at code that included this Thuan's patch. > > In legacy code, only mdtm_sendto() is called inside the function > mdtm_frag_and_send(). > > Regards, Vu > >> -----Original Message- >> From: Hans Nordebäck >> Sent: Thursday, April 25, 2019 6:10 PM >> To: Vu Minh Nguyen ; Thuan Tran >> ; Minh Hon Chau >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast >> fragmented messages [#3033] V3 >> >> >> Hi Vu, >> It seems mdtm_mcast_sendto is used in mdtm_frag_and_send, at >> MDS_SENDTYPE_BCAST/BR Hans -Original Message- >> From: Vu Minh Nguyen >> Sent: den 25 april 2019 12:20 >> To: Hans Nordebäck ; Thuan Tran >> ; Minh Hon Chau >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast >> fragmented messages [#3033] V3 >> >> Hi Hans, >> >> See my responses inline. >> >> Regards, Vu >> >>> -Original Message- >>> From: Hans Nordebäck >>> Sent: Thursday, April 25, 2019 4:28 PM >>> To: Thuan Tran ; Vu Minh Nguyen >>> ; Minh Hon Chau >> >>> Cc: opensaf-devel@lists.sourceforge.net >>> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast >>> fragmented messages [#3033] V3 >>> >>> Hi Vu and Thuan, >>> >>> a few question, is the text in the ticket description correct? E.g >>> it says unicast is used if a multicast message is fragmented, (I >>> think multicast still is used >>> >>> to send the fragments), this is what you mean with 2 different channels? >>> (only one socket is used, BSRsock), >> [Vu] Yes. Unicast is used to send fragmented messages. Here is the >> current logic in case of sending a large package: >> Iterate over destinations { // mcm_pvt_process_svc_bcast_common() @ >> mds_c_sndrcv.c >> 1) Fragment the package // mdtm_frag_and_send() @ mds_dt_tipc.c >> 2) Unicast to a specific adest // mdtm_sendto() @ >> mds_dt_tipc.c >> 4) Continue with next adest >> } >> >>> The problem stated is sending one large multicast message and then >>> several smaller multicast messages, have you checked the >>> >>> fragment re-assembly part of the common code? >> [Vu] Yes. At the receive side, if msg is fragmented, mds will not >> forward to upper layer until all fragmented msgs are collected. >> If the message is not fragmented, mds will transfer the msg to upper >> right away. >> >> I checked with TIPC guys here, and he said that TIPC does not >> guarantee the order if we send msgs in different channels (unicast vs mcast). >> >>> /BR Hans >>> >>> >>> On 2019-04-24 13:06, thuan.tran wrote: >>>> Summary: mds: support multicast fragmented messages [#3033] Review >>>> request for Ticket(s): 3033 Peer Reviewer(s): Hans, Minh, Vu Pull >>>> request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected >>>> branch(es): develop Development branch: ticket-3033 Base revision: >>>> 7916ac316e86478c621c8359cf2aca4886288a38 >>>> Personal repository: git://git.code.sf.net/u/thuantr/review >>>> >>>> >>>> Impacted area Impact y/n >>>> >>>>Docsn >>>>Build systemn >>>>RPM/packaging n >>>>Configuration files n >>>>Startup scripts n >>>>SAF servicesy >>>>OpenSAF se
Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3
Hi Thuan, ok, if we can add additional tests to the mds api test suite would be good/Thanks HansN -Original Message- From: Tran Thuan Sent: den 3 maj 2019 09:41 To: Hans Nordebäck ; Minh Hon Chau ; Vu Minh Nguyen Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi Hans, I don't see this kind of test in mds apitests. Best Regards, ThuanTr -Original Message- From: Hans Nordebäck Sent: Friday, May 3, 2019 2:31 PM To: Thuan Tran ; Minh Hon Chau ; Vu Minh Nguyen Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi Thuan, I'm reviewing the patch now. I haven't checked yet but do you know if the mds apitests cover this case sending large multicast messages? /Thanks HansN -Original Message- From: Tran Thuan Sent: den 2 maj 2019 05:56 To: Minh Hon Chau ; Vu Minh Nguyen ; Hans Nordebäck Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi Hans, Do you have any further comment? Can we push the patch? Best Regards, ThuanTr -Original Message- From: Minh Hon Chau Sent: Friday, April 26, 2019 4:11 PM To: Vu Minh Nguyen ; 'Hans Nordebäck' ; 'Thuan Tran' Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi, ack from me (code review) Thanks Minh On 25/4/19 9:33 pm, Vu Minh Nguyen wrote: > Hi Hans, > > Probably you were looking at code that included this Thuan's patch. > > In legacy code, only mdtm_sendto() is called inside the function > mdtm_frag_and_send(). > > Regards, Vu > >> -----Original Message- >> From: Hans Nordebäck >> Sent: Thursday, April 25, 2019 6:10 PM >> To: Vu Minh Nguyen ; Thuan Tran >> ; Minh Hon Chau >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast >> fragmented messages [#3033] V3 >> >> >> Hi Vu, >> It seems mdtm_mcast_sendto is used in mdtm_frag_and_send, at >> MDS_SENDTYPE_BCAST/BR Hans -Original Message- >> From: Vu Minh Nguyen >> Sent: den 25 april 2019 12:20 >> To: Hans Nordebäck ; Thuan Tran >> ; Minh Hon Chau >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast >> fragmented messages [#3033] V3 >> >> Hi Hans, >> >> See my responses inline. >> >> Regards, Vu >> >>> -Original Message- >>> From: Hans Nordebäck >>> Sent: Thursday, April 25, 2019 4:28 PM >>> To: Thuan Tran ; Vu Minh Nguyen >>> ; Minh Hon Chau >> >>> Cc: opensaf-devel@lists.sourceforge.net >>> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast >>> fragmented messages [#3033] V3 >>> >>> Hi Vu and Thuan, >>> >>> a few question, is the text in the ticket description correct? E.g >>> it says unicast is used if a multicast message is fragmented, (I >>> think multicast still is used >>> >>> to send the fragments), this is what you mean with 2 different channels? >>> (only one socket is used, BSRsock), >> [Vu] Yes. Unicast is used to send fragmented messages. Here is the >> current logic in case of sending a large package: >> Iterate over destinations { // mcm_pvt_process_svc_bcast_common() @ >> mds_c_sndrcv.c >> 1) Fragment the package // mdtm_frag_and_send() @ mds_dt_tipc.c >> 2) Unicast to a specific adest // mdtm_sendto() @ >> mds_dt_tipc.c >> 4) Continue with next adest >> } >> >>> The problem stated is sending one large multicast message and then >>> several smaller multicast messages, have you checked the >>> >>> fragment re-assembly part of the common code? >> [Vu] Yes. At the receive side, if msg is fragmented, mds will not >> forward to upper layer until all fragmented msgs are collected. >> If the message is not fragmented, mds will transfer the msg to upper >> right away. >> >> I checked with TIPC guys here, and he said that TIPC does not >> guarantee the order if we send msgs in different channels (unicast vs mcast). >> >>> /BR Hans >>> >>> >>> On 2019-04-24 13:06, thuan.tran wrote: >>>> Summary: mds: support multicast fragmented messages [#3033] Review >>>> request for Ticket(s): 3033 Peer Reviewer(s): Hans, Minh, Vu Pull >>>> request to: *** LIST THE PERSON
Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3
Hi Vu and Thuan, a few question, is the text in the ticket description correct? E.g it says unicast is used if a multicast message is fragmented, (I think multicast still is used to send the fragments), this is what you mean with 2 different channels? (only one socket is used, BSRsock), The problem stated is sending one large multicast message and then several smaller multicast messages, have you checked the fragment re-assembly part of the common code? /BR Hans On 2019-04-24 13:06, thuan.tran wrote: > Summary: mds: support multicast fragmented messages [#3033] > Review request for Ticket(s): 3033 > Peer Reviewer(s): Hans, Minh, Vu > Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** > Affected branch(es): develop > Development branch: ticket-3033 > Base revision: 7916ac316e86478c621c8359cf2aca4886288a38 > Personal repository: git://git.code.sf.net/u/thuantr/review > > > Impacted area Impact y/n > > Docsn > Build systemn > RPM/packaging n > Configuration files n > Startup scripts n > SAF servicesy > OpenSAF servicesn > Core libraries n > Samples n > Tests n > Other n > > NOTE: Patch(es) contain lines longer than 80 characers > > Comments (indicate scope for each "y" above): > - > N/A > > revision 568f09774f936506f5e05e03813fa572af0fe0d3 > Author: thuan.tran > Date: Wed, 24 Apr 2019 17:54:25 +0700 > > mds: support multicast fragmented messages [#3033] > > - Sender may send broadcast big messages (> 65K) then small messages (< 65K). > Current MDS just loop via all destinations to unicast all fragmented messages > to one by one destinations. But sending multicast non-fragment messages to all > destinations. Therefor, receivers may get messages with incorrect order, > non-fragment messages may come before fragmented messages. > For example, it may lead to OUT OF ORDER for IMMNDs during IMMD sync. > - Solution: support send multicast each fragmented messages to avoid > disorder of arrived broadcast messages. > > > > Complete diffstat: > -- > src/mds/mds_c_sndrcv.c | 3 +- > src/mds/mds_dt_tipc.c | 104 > +++-- > 2 files changed, 40 insertions(+), 67 deletions(-) > > > Testing Commands: > - > N/A > > Testing, Expected Results: > -- > N/A > > Conditions of Submission: > - > N/A > > Arch Built StartedLinux distro > --- > mipsn n > mips64 n n > x86 n n > x86_64 y y > powerpc n n > powerpc64 n n > > > Reviewer Checklist: > --- > [Submitters: make sure that your review doesn't trigger any checkmarks!] > > > Your checkin has not passed review because (see checked entries): > > ___ Your RR template is generally incomplete; it has too many blank entries > that need proper data filled in. > > ___ You have failed to nominate the proper persons for review and push. > > ___ Your patches do not have proper short+long header > > ___ You have grammar/spelling in your header that is unacceptable. > > ___ You have exceeded a sensible line length in your headers/comments/text. > > ___ You have failed to put in a proper Trac Ticket # into your commits. > > ___ You have incorrectly put/left internal data in your comments/files > (i.e. internal bug tracking tool IDs, product names etc) > > ___ You have not given any evidence of testing beyond basic build tests. > Demonstrate some level of runtime or other sanity testing. > > ___ You have ^M present in some of your files. These have to be removed. > > ___ You have needlessly changed whitespace or added whitespace crimes > like trailing spaces, or spaces before tabs. > > ___ You have mixed real technical changes with whitespace and other > cosmetic code cleanup changes. These have to be separate commits. > > ___ You need to refactor your submission into logical chunks; there is > too much content into a single commit. > > ___ You have extraneous garbage in your review (merge commits etc) > > ___ You have giant attachments which should never have been sent; > Instead you should place your content in a public tree to be pulled. > > ___ You have too many commits attached to an e-mail; resend as threaded > commits, or place in a public tree for a pull. > > ___ You have resent this content multiple times without a clear indication > of what has changed between each re-send. > > ___ You have failed to adequately and individually address all of the > comments and change requests that were proposed in the initial review. > > ___ You have a misconfigured
Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3
Hi Vu, you are right, my concern was the description of the problem, and it looks ok with your explanation. /Thanks Hans On 2019-04-25 13:33, Vu Minh Nguyen wrote: > Hi Hans, > > Probably you were looking at code that included this Thuan's patch. > > In legacy code, only mdtm_sendto() is called inside the function > mdtm_frag_and_send(). > > Regards, Vu > >> -----Original Message- >> From: Hans Nordebäck >> Sent: Thursday, April 25, 2019 6:10 PM >> To: Vu Minh Nguyen ; Thuan Tran >> ; Minh Hon Chau >> >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast >> fragmented messages [#3033] V3 >> >> >> Hi Vu, >> It seems mdtm_mcast_sendto is used in mdtm_frag_and_send, at >> MDS_SENDTYPE_BCAST/BR Hans >> -Original Message- >> From: Vu Minh Nguyen >> Sent: den 25 april 2019 12:20 >> To: Hans Nordebäck ; Thuan Tran >> ; Minh Hon Chau >> >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast >> fragmented messages [#3033] V3 >> >> Hi Hans, >> >> See my responses inline. >> >> Regards, Vu >> >>> -Original Message- >>> From: Hans Nordebäck >>> Sent: Thursday, April 25, 2019 4:28 PM >>> To: Thuan Tran ; Vu Minh Nguyen >>> ; Minh Hon Chau >> >>> Cc: opensaf-devel@lists.sourceforge.net >>> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast >>> fragmented messages [#3033] V3 >>> >>> Hi Vu and Thuan, >>> >>> a few question, is the text in the ticket description correct? E.g it >>> says unicast is used if a multicast message is fragmented, (I think >>> multicast still is used >>> >>> to send the fragments), this is what you mean with 2 different channels? >>> (only one socket is used, BSRsock), >> [Vu] Yes. Unicast is used to send fragmented messages. Here is the current >> logic in case of sending a large package: >> Iterate over destinations { // mcm_pvt_process_svc_bcast_common() @ >> mds_c_sndrcv.c >> 1) Fragment the package // mdtm_frag_and_send() @ mds_dt_tipc.c >> 2) Unicast to a specific adest // mdtm_sendto() @ >> mds_dt_tipc.c >> 4) Continue with next adest >> } >> >>> The problem stated is sending one large multicast message and then >>> several smaller multicast messages, have you checked the >>> >>> fragment re-assembly part of the common code? >> [Vu] Yes. At the receive side, if msg is fragmented, mds will not forward to >> upper layer until all fragmented msgs are collected. >> If the message is not fragmented, mds will transfer the msg to upper right >> away. >> >> I checked with TIPC guys here, and he said that TIPC does not guarantee the >> order if we send msgs in different channels (unicast vs mcast). >> >>> /BR Hans >>> >>> >>> On 2019-04-24 13:06, thuan.tran wrote: >>>> Summary: mds: support multicast fragmented messages [#3033] Review >>>> request for Ticket(s): 3033 Peer Reviewer(s): Hans, Minh, Vu Pull >>>> request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected >>>> branch(es): develop Development branch: ticket-3033 Base revision: >>>> 7916ac316e86478c621c8359cf2aca4886288a38 >>>> Personal repository: git://git.code.sf.net/u/thuantr/review >>>> >>>> >>>> Impacted area Impact y/n >>>> >>>>Docsn >>>>Build systemn >>>>RPM/packaging n >>>>Configuration files n >>>>Startup scripts n >>>>SAF servicesy >>>>OpenSAF servicesn >>>>Core libraries n >>>>Samples n >>>>Tests n >>>>Other n >>>> >>>> NOTE: Patch(es) contain lines longer than 80 characers >>>> >>>> Comments (indicate scope for each "y" above): >>>> - >>>> N/A >>>> >>>> revision 568f09774f936506f5e05e03813fa572af0fe0d3 >>>> Author:thuan.tran >>>> Date: Wed, 24 Apr 2019 17:54:25 +0700 >>>> >>>> mds:
Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3
Hi Vu, It seems mdtm_mcast_sendto is used in mdtm_frag_and_send, at MDS_SENDTYPE_BCAST/BR Hans -Original Message- From: Vu Minh Nguyen Sent: den 25 april 2019 12:20 To: Hans Nordebäck ; Thuan Tran ; Minh Hon Chau Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3 Hi Hans, See my responses inline. Regards, Vu > -Original Message- > From: Hans Nordebäck > Sent: Thursday, April 25, 2019 4:28 PM > To: Thuan Tran ; Vu Minh Nguyen > ; Minh Hon Chau > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 0/1] Review Request for mds: support multicast > fragmented messages [#3033] V3 > > Hi Vu and Thuan, > > a few question, is the text in the ticket description correct? E.g it > says unicast is used if a multicast message is fragmented, (I think > multicast still is used > > to send the fragments), this is what you mean with 2 different channels? > (only one socket is used, BSRsock), [Vu] Yes. Unicast is used to send fragmented messages. Here is the current logic in case of sending a large package: Iterate over destinations { // mcm_pvt_process_svc_bcast_common() @ mds_c_sndrcv.c 1) Fragment the package // mdtm_frag_and_send() @ mds_dt_tipc.c 2) Unicast to a specific adest // mdtm_sendto() @ mds_dt_tipc.c 4) Continue with next adest } > > The problem stated is sending one large multicast message and then > several smaller multicast messages, have you checked the > > fragment re-assembly part of the common code? [Vu] Yes. At the receive side, if msg is fragmented, mds will not forward to upper layer until all fragmented msgs are collected. If the message is not fragmented, mds will transfer the msg to upper right away. I checked with TIPC guys here, and he said that TIPC does not guarantee the order if we send msgs in different channels (unicast vs mcast). > > /BR Hans > > > On 2019-04-24 13:06, thuan.tran wrote: > > Summary: mds: support multicast fragmented messages [#3033] Review > > request for Ticket(s): 3033 Peer Reviewer(s): Hans, Minh, Vu Pull > > request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected > > branch(es): develop Development branch: ticket-3033 Base revision: > > 7916ac316e86478c621c8359cf2aca4886288a38 > > Personal repository: git://git.code.sf.net/u/thuantr/review > > > > > > Impacted area Impact y/n > > > > Docsn > > Build systemn > > RPM/packaging n > > Configuration files n > > Startup scripts n > > SAF servicesy > > OpenSAF servicesn > > Core libraries n > > Samples n > > Tests n > > Other n > > > > NOTE: Patch(es) contain lines longer than 80 characers > > > > Comments (indicate scope for each "y" above): > > - > > N/A > > > > revision 568f09774f936506f5e05e03813fa572af0fe0d3 > > Author: thuan.tran > > Date: Wed, 24 Apr 2019 17:54:25 +0700 > > > > mds: support multicast fragmented messages [#3033] > > > > - Sender may send broadcast big messages (> 65K) then small messages > > (< > 65K). > > Current MDS just loop via all destinations to unicast all fragmented > messages > > to one by one destinations. But sending multicast non-fragment > > messages > to all > > destinations. Therefor, receivers may get messages with incorrect > > order, non-fragment messages may come before fragmented messages. > > For example, it may lead to OUT OF ORDER for IMMNDs during IMMD sync. > > - Solution: support send multicast each fragmented messages to avoid > > disorder of arrived broadcast messages. > > > > > > > > Complete diffstat: > > -- > > src/mds/mds_c_sndrcv.c | 3 +- > > src/mds/mds_dt_tipc.c | 104 > > +++- > - > > 2 files changed, 40 insertions(+), 67 deletions(-) > > > > > > Testing Commands: > > - > > N/A > > > > Testing, Expected Results: > > -- > > N/A > > > > Conditions of Submission: > > - > > N/A > > > > Arch Built StartedLinux distro > > --- > > mipsn n > > mips64 n n &
Re: [devel] [PATCH 1/1] osaf: ensure an error is returned if takeover_request fails [#3023]
ack, review only/Thanks HansN -Original Message- From: Gary Lee Sent: den 26 mars 2019 02:05 To: Minh Hon Chau ; Hans Nordebäck Cc: opensaf-devel@lists.sourceforge.net; Gary Lee Subject: [PATCH 1/1] osaf: ensure an error is returned if takeover_request fails [#3023] if we cannot read the result of a takeover_request, ensure we return an error --- src/osaf/consensus/consensus.cc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc index cf307b3..480f7d2 100644 --- a/src/osaf/consensus/consensus.cc +++ b/src/osaf/consensus/consensus.cc @@ -433,6 +433,8 @@ SaAisErrorT Consensus::CreateTakeoverRequest(const std::string& current_owner, return rc; } + // in case takeover request cannot be read rc = + SA_AIS_ERR_FAILED_OPERATION; // wait up to max_takeover_retry seconds for request to be answered retries = 0; while (retries < max_takeover_retry_) { -- 2.7.4 ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] osaf: improve response time in etcd3.plugin [#3016]
Hi Gary, ack, review only. /BR HansN On 3/12/19 01:32, Gary Lee wrote: > if the initial call to watch takeover request in etcd3.plugin > is made when etcd has already been shutdown (for example, > when etcd is running locally and the node is being shutdown), > the plugin should return 0 with a fake takeover request to ensure > rded shuts down promptly. Otherwise, it will keep calling > watch, delaying node shutdown. > --- > src/osaf/consensus/plugins/etcd3.plugin | 11 +-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/src/osaf/consensus/plugins/etcd3.plugin > b/src/osaf/consensus/plugins/etcd3.plugin > index acccd98..d926885 100644 > --- a/src/osaf/consensus/plugins/etcd3.plugin > +++ b/src/osaf/consensus/plugins/etcd3.plugin > @@ -357,9 +357,16 @@ watch() { > return 0 > fi > done > + else > +# etcd down? > +if [ "$watch_key" == "$takeover_request" ]; then > + hostname=`cat $node_name_file` > + echo "$hostname SC-0 1000 UNDEFINED" > + return 0 > +else > + return 1 > +fi > fi > - > - return 1 > } > > # argument parsing ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] dtm: Fix dtm close socket due to duplication of adding node IP info [#2984]
Hi Canh, ack, review only. I think it would be good to separate the re-factoring part in a separate ticket though. /BR Hans On 12/18/18 08:25, Canh Van Truong wrote: > During cluster start, one node (node 1) broadcast up msg to other node. The > remote node (node 2) get this msg and send the connection to node 1 > (connect()). > Similarly node 1 send the connection to node 2 after node 2 broadcast up msg > to. > Beside of node 2 connect() to node 1, node 2 also add the IP and ID info of > node 1 to database. > But before of that, node 2 may also accept the connection that come from node > 1. The > acception is also add node ID of node 1. So there is 2 times adding the node > ID > info of node 1 to database in node 2. This causes the socket connection is > closed > and node is restart again. > > The patch change to retrieve node from database by node IP instead node ID in > processing connection. This will reject the double of establishing connection > between 2 nodes and also double of adding node IP to database. > --- > src/dtm/dtmnd/dtm.h | 11 -- > src/dtm/dtmnd/dtm_inter_trans.cc | 3 +- > src/dtm/dtmnd/dtm_node.cc | 2 +- > src/dtm/dtmnd/dtm_node_db.cc | 79 > --- > src/dtm/dtmnd/dtm_node_sockets.cc | 20 ++ > 5 files changed, 72 insertions(+), 43 deletions(-) > > diff --git a/src/dtm/dtmnd/dtm.h b/src/dtm/dtmnd/dtm.h > index 28c811e65..a06b8f503 100644 > --- a/src/dtm/dtmnd/dtm.h > +++ b/src/dtm/dtmnd/dtm.h > @@ -45,6 +45,11 @@ typedef enum { > DTM_MBX_MSG_TYPE = 5, > } MBX_POST_TYPES; > > +typedef enum { > + DTM_NODE_ID_KEY_TYPE = 0, > + DTM_NODE_IP_KEY_TYPE = 2, > +} KEY_TYPES; > + > typedef struct dtm_rcv_msg_elem { > void *next; > MBX_POST_TYPES type; > @@ -99,10 +104,10 @@ typedef struct dtm_snd_msg_elem { > > extern void node_discovery_process(void *arg); > extern uint32_t dtm_cb_init(DTM_INTERNODE_CB *dtms_cb); > -extern DTM_NODE_DB *dtm_node_get_by_id(uint32_t nodeid); > +extern DTM_NODE_DB *dtm_node_get(uint8_t *key, KEY_TYPES type); > extern DTM_NODE_DB *dtm_node_getnext_by_id(uint32_t node_id); > -extern uint32_t dtm_node_add(DTM_NODE_DB *node, int i); > -extern uint32_t dtm_node_delete(DTM_NODE_DB *nnode, int i); > +extern uint32_t dtm_node_add(DTM_NODE_DB *node, KEY_TYPES type); > +extern uint32_t dtm_node_delete(DTM_NODE_DB *nnode, KEY_TYPES type); > extern DTM_NODE_DB *dtm_node_new(const DTM_NODE_DB *new_node); > extern void dtm_print_config(DTM_INTERNODE_CB *config); > extern int dtm_read_config(DTM_INTERNODE_CB *config, > diff --git a/src/dtm/dtmnd/dtm_inter_trans.cc > b/src/dtm/dtmnd/dtm_inter_trans.cc > index 9d8335466..9b4194614 100644 > --- a/src/dtm/dtmnd/dtm_inter_trans.cc > +++ b/src/dtm/dtmnd/dtm_inter_trans.cc > @@ -235,9 +235,10 @@ static uint32_t dtm_internode_snd_msg_common(DTM_NODE_DB > *node, uint8_t *buffer, > uint32_t dtm_internode_snd_msg_to_node(uint8_t *buffer, uint16_t len, > NODE_ID node_id) { > DTM_NODE_DB *node = nullptr; > + uint8_t *key = reinterpret_cast(_id); > > TRACE_ENTER(); > - node = dtm_node_get_by_id(node_id); > + node = dtm_node_get(key, DTM_NODE_ID_KEY_TYPE); > > if (nullptr != node) { > if (NCSCC_RC_SUCCESS != dtm_internode_snd_msg_common(node, buffer, > len)) { > diff --git a/src/dtm/dtmnd/dtm_node.cc b/src/dtm/dtmnd/dtm_node.cc > index de2f94738..72506f262 100644 > --- a/src/dtm/dtmnd/dtm_node.cc > +++ b/src/dtm/dtmnd/dtm_node.cc > @@ -125,7 +125,7 @@ uint32_t dtm_process_node_info(DTM_INTERNODE_CB *dtms_cb, > DTM_NODE_DB *node, > memcpy(node->node_name, data, nodename_len); > node->node_name[nodename_len] = '\0'; > node->comm_status = true; > - if (dtm_node_add(node, 0) != NCSCC_RC_SUCCESS) { > + if (dtm_node_add(node, DTM_NODE_ID_KEY_TYPE) != NCSCC_RC_SUCCESS) { > LOG_ER( > "DTM: A node already exists in the cluster with similar " > "configuration (possible duplicate IP address and/or node id), > please " > diff --git a/src/dtm/dtmnd/dtm_node_db.cc b/src/dtm/dtmnd/dtm_node_db.cc > index 1c9da4dac..1038f0918 100644 > --- a/src/dtm/dtmnd/dtm_node_db.cc > +++ b/src/dtm/dtmnd/dtm_node_db.cc > @@ -123,24 +123,49 @@ uint32_t dtm_cb_init(DTM_INTERNODE_CB *dtms_cb) { > } > > /** > - * Retrieve node from node db by nodeid > + * Retrieve node from node db >* > - * @param nodeid > + * @param key > + * @param i >* > - * @return NCSCC_RC_SUCCESS > - * @return NCSCC_RC_FAILURE > + * @return node >* >*/ > -DTM_NODE_DB *dtm_node_get_by_id(uint32_t nodeid) { > +DTM_NODE_DB *dtm_node_get(uint8_t *key, KEY_TYPES type) { > TRACE_ENTER(); > DTM_INTERNODE_CB *dtms_cb = dtms_gl_cb; > + DTM_NODE_DB *node = nullptr; > > - DTM_NODE_DB *node = reinterpret_cast(ncs_patricia_tree_get( > - _cb->nodeid_tree, reinterpret_cast())); > - if (node != nullptr)
Re: [devel] [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]
Hi Vu, ack, with one comment below. /BR HansN On 3/1/19 10:07, Vu Minh Nguyen wrote: > There is a dependency b/w svc_monitor_thread and spawn_services. > The coredump happens when spawn_services is executed while > the thread has not yet started. In this case, data is sent to the > pipe but no one consumed it. When it comes to consume the data, > will get unexpected data and crash the program. > > This patch ensures the things will happen in the right order: > svc_monitor_thread must be in ready state before spawn_services() > is executed. > --- > src/nid/nodeinit.cc | 34 +++--- > 1 file changed, 23 insertions(+), 11 deletions(-) > > diff --git a/src/nid/nodeinit.cc b/src/nid/nodeinit.cc > index 5f15916b4..2e6a5cd05 100644 > --- a/src/nid/nodeinit.cc > +++ b/src/nid/nodeinit.cc > @@ -47,6 +47,8 @@ >*any notification. * >/ > > +#include "nid/nodeinit.h" > + > #include > #include > #include > @@ -61,20 +63,18 @@ > #include > #include > > -#include "osaf/configmake.h" > -#include "rde/agent/rda_papi.h" > -#include "base/logtrace.h" > - > +#include > #include > #include > #include > #include > > +#include "osaf/configmake.h" > +#include "rde/agent/rda_papi.h" > +#include "base/logtrace.h" > #include "base/conf.h" > #include "base/osaf_poll.h" > #include "base/osaf_time.h" > - > -#include "nid/nodeinit.h" > #include "base/file_notify.h" > > #define SETSIG(sa, sig, fun, flags) \ > @@ -134,6 +134,7 @@ static int start_monitor_svc(const char *svc); > /* Data declarations for service monitoring */ > static int svc_mon_fd = -1; > static int next_svc_fds_slot = 0; > +static std::atomic svc_monitor_thread_ready{false}; > > struct SAFServices { > const std::string fifo_dir = PKGLOCALSTATEDIR; > @@ -712,9 +713,9 @@ int32_t fork_daemon(NID_SPAWN_INFO *service, char *app, > char *args[], > > tmp_pid = getpid(); > while (write(filedes[1], _pid, sizeof(int)) < 0) { > - if (errno == EINTR) > + if (errno == EINTR) { > continue; > - else if (errno == EPIPE) { > + } else if (errno == EPIPE) { > LOG_ER("Reader not available to return my PID"); > } else { > LOG_ER("Problem writing to pipe, err=%s", strerror(errno)); > @@ -1517,6 +1518,7 @@ void *svc_monitor_thread(void *fd) { > next_svc_fds_slot++; > > while (true) { > +svc_monitor_thread_ready = true; > unsigned rc = osaf_poll(fds, next_svc_fds_slot, -1); > if (rc > 0) { > // check if any monitored service has exit > @@ -1529,9 +1531,9 @@ void *svc_monitor_thread(void *fd) { > > if (fds[FD_SVC_MON_THR].revents & POLLIN) { > while (true) { > - read_rc = read(svc_mon_thr_fd, nid_name, NID_MAXSNAME); > + read_rc = recv(svc_mon_thr_fd, nid_name, NID_MAXSNAME, > MSG_DONTWAIT); > if (read_rc == -1) { > -if (errno == EINTR) { > +if (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK) { [HansN] should be if (errno == EINTR) continue; if (errno == EAGAIN || errno == EWOULDBLOCK) break; > continue; > } else { > LOG_ER("Failed to read on socketpair descriptor: %s", > @@ -1574,7 +1576,7 @@ uint32_t create_svc_monitor_thread(void) { > > TRACE_ENTER(); > > - if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0, s_pair) == -1) { > + if (socketpair(AF_UNIX, SOCK_DGRAM | SOCK_CLOEXEC, 0, s_pair) == -1) { > LOG_ER("socketpair FAILED: %s", strerror(errno)); > return NCSCC_RC_FAILURE; > } > @@ -1655,6 +1657,16 @@ int main(int argc, char *argv[]) { > exit(EXIT_FAILURE); > } > > + // Waiting until svc_monitor_thread is up and in ready state. > + unsigned no_repeat = 0; > + while (svc_monitor_thread_ready == false && no_repeat < 100) { > +osaf_nanosleep(); > +no_repeat++; > + } > + > + osafassert(svc_monitor_thread_ready); > + LOG_NO("svc_monitor_thread is up and in ready state"); > + > if (parse_nodeinit_conf(sbuf) != NCSCC_RC_SUCCESS) { > LOG_ER("Failed to parse file %s. Exiting", sbuf); > exit(EXIT_FAILURE); ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]
Hi Vu, fine, perhaps also changing the static bool svc_monitor_thread_running = false to std::atomic?/BR Hans From: Vu Minh Nguyen Sent: den 28 februari 2019 09:30 To: Hans Nordebäck ; Gary Lee Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] Thanks Hans. I will send the V2 for these updates. Regards, Vu From: Hans Nordebäck mailto:hans.nordeb...@ericsson.com>> Sent: Thursday, February 28, 2019 2:16 PM To: Vu Minh Nguyen mailto:vu.m.ngu...@dektech.com.au>>; Gary Lee mailto:gary@dektech.com.au>> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net> Subject: Re: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] Hi Vu, you can keep your patch for the ready state, but also change SOCK_STREAM to SOCK_DGRAM and change the read(svc_mon_thr_fd, nid_name, NID_MAXSNAME) in svc_monitor_thread to recv(svc_mon_thr_fd, nid_name, NID_MAXSNAME, MSG_DONTWAIT) and also handle EAGAIN and EWOULDBLOCK. Then only one nid_name per read/recv will be given instead of several nid_names as in the SOCK_STREAM case. /BR Hans On 2/28/19 05:30, Vu Minh Nguyen wrote: Hi Hans, Thanks for your comment. But I has a concern that the service-monitoring function may not fully work if a service is crashed before the svc_monitor_thread goes to ready state? Is it mandatory for monitoring thread to enter ready state before spawning SAF services? Regards, Vu -Original Message----- From: Hans Nordebäck <mailto:hans.nordeb...@ericsson.com> Sent: Wednesday, February 27, 2019 8:23 PM To: Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au>; Gary Lee <mailto:gary@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Subject: RE: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] Hi Vu, I discussed a bit with Anders, likely it should work if the socketpair is changed to socketpair(AF_UNIX, SOCK_DGRAM .. from SOCK_STREAM. /BR Hans -Original Message- From: Hans Nordebäck Sent: den 27 februari 2019 11:55 To: 'Vu Minh Nguyen' <mailto:vu.m.ngu...@dektech.com.au>; Gary Lee <mailto:gary@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Subject: RE: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] Hi Vu, ack, code review only/Thanks HansN -Original Message- From: Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Sent: den 27 februari 2019 11:48 To: Hans Nordebäck <mailto:hans.nordeb...@ericsson.com>; Gary Lee <mailto:gary@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Subject: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] There is a dependency b/w svc_monitor_thread and spawn_services. The coredump happens when spawn_services is executed while the thread has not yet started. In this case, data is sent to the pipe but no one consumed it. Later on, reading data from the pipe, will get unexpected data and crash the program. This patch ensures the order: svc_monitor_thread must be in ready state before spawn_services() is executed. --- src/nid/nodeinit.cc | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/nid/nodeinit.cc b/src/nid/nodeinit.cc index 5f15916b4..b4945b05c 100644 --- a/src/nid/nodeinit.cc +++ b/src/nid/nodeinit.cc @@ -134,6 +134,7 @@ static int start_monitor_svc(const char *svc); /* Data declarations for service monitoring */ static int svc_mon_fd = -1; static int next_svc_fds_slot = 0; +static bool svc_monitor_thread_running = false; struct SAFServices { const std::string fifo_dir = PKGLOCALSTATEDIR; @@ -1517,6 +1518,7 @@ void *svc_monitor_thread(void *fd) { next_svc_fds_slot++; while (true) { +svc_monitor_thread_running = true; unsigned rc = osaf_poll(fds, next_svc_fds_slot, -1); if (rc > 0) { // check if any monitored service has exit @@ -1655,6 +1657,15 @@ int main(int argc, char *argv[]) { exit(EXIT_FAILURE); } + // Waiting until svc_monitor_thread is up and in ready state. + // If spawn_services runs before the thread is in ready state, // + receive side of the pipe s_pair will get unexpected data and // may + crash the process. + while (svc_monitor_thread_running == false) { +usleep(100); + } + + LOG_NO("svc_monitor_thread is up and in ready state"); if (parse_nodeinit_conf(sbuf) != NCSCC_RC_SUCCESS) { LOG_ER("Failed to parse file %s. Exiting", sbuf); exit(EXIT_FAILURE); -- 2.19.2 ___
Re: [devel] [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]
Hi Vu, you can keep your patch for the ready state, but also change SOCK_STREAM to SOCK_DGRAM and change the read(svc_mon_thr_fd, nid_name, NID_MAXSNAME) in svc_monitor_thread to recv(svc_mon_thr_fd, nid_name, NID_MAXSNAME, MSG_DONTWAIT) and also handle EAGAIN and EWOULDBLOCK. Then only one nid_name per read/recv will be given instead of several nid_names as in the SOCK_STREAM case. /BR Hans On 2/28/19 05:30, Vu Minh Nguyen wrote: Hi Hans, Thanks for your comment. But I has a concern that the service-monitoring function may not fully work if a service is crashed before the svc_monitor_thread goes to ready state? Is it mandatory for monitoring thread to enter ready state before spawning SAF services? Regards, Vu -Original Message- From: Hans Nordebäck <mailto:hans.nordeb...@ericsson.com> Sent: Wednesday, February 27, 2019 8:23 PM To: Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au>; Gary Lee <mailto:gary@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Subject: RE: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] Hi Vu, I discussed a bit with Anders, likely it should work if the socketpair is changed to socketpair(AF_UNIX, SOCK_DGRAM .. from SOCK_STREAM. /BR Hans -----Original Message- From: Hans Nordebäck Sent: den 27 februari 2019 11:55 To: 'Vu Minh Nguyen' <mailto:vu.m.ngu...@dektech.com.au>; Gary Lee <mailto:gary@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Subject: RE: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] Hi Vu, ack, code review only/Thanks HansN -Original Message- From: Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Sent: den 27 februari 2019 11:48 To: Hans Nordebäck <mailto:hans.nordeb...@ericsson.com>; Gary Lee <mailto:gary@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>; Vu Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au> Subject: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] There is a dependency b/w svc_monitor_thread and spawn_services. The coredump happens when spawn_services is executed while the thread has not yet started. In this case, data is sent to the pipe but no one consumed it. Later on, reading data from the pipe, will get unexpected data and crash the program. This patch ensures the order: svc_monitor_thread must be in ready state before spawn_services() is executed. --- src/nid/nodeinit.cc | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/nid/nodeinit.cc b/src/nid/nodeinit.cc index 5f15916b4..b4945b05c 100644 --- a/src/nid/nodeinit.cc +++ b/src/nid/nodeinit.cc @@ -134,6 +134,7 @@ static int start_monitor_svc(const char *svc); /* Data declarations for service monitoring */ static int svc_mon_fd = -1; static int next_svc_fds_slot = 0; +static bool svc_monitor_thread_running = false; struct SAFServices { const std::string fifo_dir = PKGLOCALSTATEDIR; @@ -1517,6 +1518,7 @@ void *svc_monitor_thread(void *fd) { next_svc_fds_slot++; while (true) { +svc_monitor_thread_running = true; unsigned rc = osaf_poll(fds, next_svc_fds_slot, -1); if (rc > 0) { // check if any monitored service has exit @@ -1655,6 +1657,15 @@ int main(int argc, char *argv[]) { exit(EXIT_FAILURE); } + // Waiting until svc_monitor_thread is up and in ready state. + // If spawn_services runs before the thread is in ready state, // + receive side of the pipe s_pair will get unexpected data and // may + crash the process. + while (svc_monitor_thread_running == false) { +usleep(100); + } + + LOG_NO("svc_monitor_thread is up and in ready state"); if (parse_nodeinit_conf(sbuf) != NCSCC_RC_SUCCESS) { LOG_ER("Failed to parse file %s. Exiting", sbuf); exit(EXIT_FAILURE); -- 2.19.2 ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]
Hi Vu, I discussed a bit with Anders, likely it should work if the socketpair is changed to socketpair(AF_UNIX, SOCK_DGRAM .. from SOCK_STREAM. /BR Hans -Original Message- From: Hans Nordebäck Sent: den 27 februari 2019 11:55 To: 'Vu Minh Nguyen' ; Gary Lee Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen Subject: RE: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] Hi Vu, ack, code review only/Thanks HansN -Original Message- From: Vu Minh Nguyen Sent: den 27 februari 2019 11:48 To: Hans Nordebäck ; Gary Lee Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen Subject: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] There is a dependency b/w svc_monitor_thread and spawn_services. The coredump happens when spawn_services is executed while the thread has not yet started. In this case, data is sent to the pipe but no one consumed it. Later on, reading data from the pipe, will get unexpected data and crash the program. This patch ensures the order: svc_monitor_thread must be in ready state before spawn_services() is executed. --- src/nid/nodeinit.cc | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/nid/nodeinit.cc b/src/nid/nodeinit.cc index 5f15916b4..b4945b05c 100644 --- a/src/nid/nodeinit.cc +++ b/src/nid/nodeinit.cc @@ -134,6 +134,7 @@ static int start_monitor_svc(const char *svc); /* Data declarations for service monitoring */ static int svc_mon_fd = -1; static int next_svc_fds_slot = 0; +static bool svc_monitor_thread_running = false; struct SAFServices { const std::string fifo_dir = PKGLOCALSTATEDIR; @@ -1517,6 +1518,7 @@ void *svc_monitor_thread(void *fd) { next_svc_fds_slot++; while (true) { +svc_monitor_thread_running = true; unsigned rc = osaf_poll(fds, next_svc_fds_slot, -1); if (rc > 0) { // check if any monitored service has exit @@ -1655,6 +1657,15 @@ int main(int argc, char *argv[]) { exit(EXIT_FAILURE); } + // Waiting until svc_monitor_thread is up and in ready state. + // If spawn_services runs before the thread is in ready state, // + receive side of the pipe s_pair will get unexpected data and // may + crash the process. + while (svc_monitor_thread_running == false) { +usleep(100); + } + + LOG_NO("svc_monitor_thread is up and in ready state"); if (parse_nodeinit_conf(sbuf) != NCSCC_RC_SUCCESS) { LOG_ER("Failed to parse file %s. Exiting", sbuf); exit(EXIT_FAILURE); -- 2.19.2 ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]
Hi Vu, ack, code review only/Thanks HansN -Original Message- From: Vu Minh Nguyen Sent: den 27 februari 2019 11:48 To: Hans Nordebäck ; Gary Lee Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen Subject: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] There is a dependency b/w svc_monitor_thread and spawn_services. The coredump happens when spawn_services is executed while the thread has not yet started. In this case, data is sent to the pipe but no one consumed it. Later on, reading data from the pipe, will get unexpected data and crash the program. This patch ensures the order: svc_monitor_thread must be in ready state before spawn_services() is executed. --- src/nid/nodeinit.cc | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/nid/nodeinit.cc b/src/nid/nodeinit.cc index 5f15916b4..b4945b05c 100644 --- a/src/nid/nodeinit.cc +++ b/src/nid/nodeinit.cc @@ -134,6 +134,7 @@ static int start_monitor_svc(const char *svc); /* Data declarations for service monitoring */ static int svc_mon_fd = -1; static int next_svc_fds_slot = 0; +static bool svc_monitor_thread_running = false; struct SAFServices { const std::string fifo_dir = PKGLOCALSTATEDIR; @@ -1517,6 +1518,7 @@ void *svc_monitor_thread(void *fd) { next_svc_fds_slot++; while (true) { +svc_monitor_thread_running = true; unsigned rc = osaf_poll(fds, next_svc_fds_slot, -1); if (rc > 0) { // check if any monitored service has exit @@ -1655,6 +1657,15 @@ int main(int argc, char *argv[]) { exit(EXIT_FAILURE); } + // Waiting until svc_monitor_thread is up and in ready state. + // If spawn_services runs before the thread is in ready state, // + receive side of the pipe s_pair will get unexpected data and // may + crash the process. + while (svc_monitor_thread_running == false) { +usleep(100); + } + + LOG_NO("svc_monitor_thread is up and in ready state"); if (parse_nodeinit_conf(sbuf) != NCSCC_RC_SUCCESS) { LOG_ER("Failed to parse file %s. Exiting", sbuf); exit(EXIT_FAILURE); -- 2.19.2 ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] base: fix warnings [#3006]
Hi Gary, ack, review only/Thanks HansN On 2/9/19 04:11, Gary Lee wrote: > fix warnings about unused variables and add SA_RESTART > --- > src/base/daemon.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/src/base/daemon.c b/src/base/daemon.c > index 2f7f37f..e24eaaa 100644 > --- a/src/base/daemon.c > +++ b/src/base/daemon.c > @@ -539,6 +539,9 @@ static void sigterm_handler(int signum, siginfo_t *info, > void *ptr) >*/ > static void sighup_handler(int signum, siginfo_t *info, void *ptr) > { > + (void)signum; > + (void)info; > + (void)ptr; > ncs_sel_obj_ind(_sel_obj); > } > > @@ -605,7 +608,7 @@ NCS_SEL_OBJ* daemon_sighup_install(int *hangup_fd) > > sigemptyset(_mask); > act.sa_sigaction = sighup_handler; > - act.sa_flags = SA_SIGINFO; > + act.sa_flags = SA_RESTART | SA_SIGINFO; > if (sigaction(SIGHUP, , NULL) < 0) { > syslog(LOG_ERR, "sigaction HUP failed: %s", strerror(errno)); > exit(EXIT_FAILURE); ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/2] fmd: improve failover response time [#3008]
Hi Gary, ack, review only/BR HansN On 2/19/19 05:10, Gary Lee wrote: > Improve failover response time if split brain prevention is enabled > but FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is set to 0. > > Also, return immediately if node promotion fails to avoid > sending active role to RDA. > --- > src/fm/fmd/fm_rda.cc | 14 +- > 1 file changed, 9 insertions(+), 5 deletions(-) > > diff --git a/src/fm/fmd/fm_rda.cc b/src/fm/fmd/fm_rda.cc > index 504757c..d3063ba 100644 > --- a/src/fm/fmd/fm_rda.cc > +++ b/src/fm/fmd/fm_rda.cc > @@ -88,17 +88,20 @@ uint32_t fm_rda_set_role(FM_CB *fm_cb, PCS_RDA_ROLE role) > { > > Consensus consensus_service; > if (consensus_service.IsEnabled() == true) { > -// Allow topology events to be processed first. The MDS thread may > -// be processing MDS down events and updating cluster_size concurrently. > -// We need cluster_size to be as accurate as possible, without waiting > -// too long for node down events. > -std::this_thread::sleep_for(std::chrono::seconds(4)); > +if (consensus_service.PrioritisePartitionSize() == true) { > + // Allow topology events to be processed first. The MDS thread may > + // be processing MDS down events and updating cluster_size > concurrently. > + // We need cluster_size to be as accurate as possible, without waiting > + // too long for node down events. > + std::this_thread::sleep_for(std::chrono::seconds(4)); > +} > > rc = consensus_service.PromoteThisNode(true, fm_cb->cluster_size); > if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) { > LOG_ER("Unable to set active controller in consensus service"); > opensaf_quick_reboot("Unable to set active controller " > "in consensus service"); > + return NCSCC_RC_FAILURE; > } else if (rc == SA_AIS_ERR_EXIST) { > // @todo if we don't reboot, we don't seem to recover from this. Can > we > // improve? > @@ -107,6 +110,7 @@ uint32_t fm_rda_set_role(FM_CB *fm_cb, PCS_RDA_ROLE role) > { > "cluster?"); > opensaf_quick_reboot("A controller is already active. We were > separated " > "from the cluster?"); > + return NCSCC_RC_FAILURE; > } > } > ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 2/2] rded: do not send SUCCESS to main thread [#3008]
Hi Gary, a question, why was the return's added? /BR HansN On 2/19/19 05:10, Gary Lee wrote: > do not send RDE_MSG_ACTIVE_PROMOTION_SUCCESS to > main thread if lock cannot be obtained > --- > src/rde/rded/role.cc | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc > index 06e93c6..3effc25 100644 > --- a/src/rde/rded/role.cc > +++ b/src/rde/rded/role.cc > @@ -114,6 +114,7 @@ void Role::PromoteNode(const uint64_t cluster_size, > LOG_ER("Unable to set active controller in consensus service"); > opensaf_quick_reboot("Unable to set active controller " > "in consensus service"); > +return; > } > > RDE_CONTROL_BLOCK* cb = rde_get_control_block(); > @@ -135,6 +136,7 @@ void Role::PromoteNode(const uint64_t cluster_size, > LOG_ER("Unable to set active controller in consensus service"); > opensaf_quick_reboot("Unable to set active controller in " > "consensus service"); > +return; > } > std::this_thread::sleep_for(std::chrono::seconds(1)); > } ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] clm: Incorrect encode/decode time_super [#3007]
Hi Thanh, ack, review only/Thanks HansN On 2/20/19 06:19, Thanh Nguyen wrote: > Changing ecoding of time_super using 64 bit instead of 32 bit. > --- > src/clm/clmd/clms_mds.cc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/clm/clmd/clms_mds.cc b/src/clm/clmd/clms_mds.cc > index 833d18c..5a77885 100644 > --- a/src/clm/clmd/clms_mds.cc > +++ b/src/clm/clmd/clms_mds.cc > @@ -542,7 +542,7 @@ static uint32_t clms_enc_track_cbk_msg(NCS_UBAID *uba, > CLMSV_MSG *msg) { > TRACE("p8 nullptr!!!"); > return 0; > } > - ncs_encode_32bit(, track->time_super); > + ncs_encode_64bit(, track->time_super); > ncs_enc_claim_space(uba, 8); > total_bytes += 8; > ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] osaf: Call opensaf_quick_reboot if failed to set active role in consensus [#3001]
Hi Minh, ack, review only/Thanks HansN On 2/15/19 10:51, Minh Chau wrote: > --- > src/fm/fmd/fm_rda.cc | 4 ++-- > src/rde/rded/rde_main.cc | 8 +++- > src/rde/rded/role.cc | 8 > 3 files changed, 9 insertions(+), 11 deletions(-) > > diff --git a/src/fm/fmd/fm_rda.cc b/src/fm/fmd/fm_rda.cc > index 028bfa3..0aa5a3d 100644 > --- a/src/fm/fmd/fm_rda.cc > +++ b/src/fm/fmd/fm_rda.cc > @@ -97,8 +97,8 @@ uint32_t fm_rda_set_role(FM_CB *fm_cb, PCS_RDA_ROLE role) { > rc = consensus_service.PromoteThisNode(true, fm_cb->cluster_size); > if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) { > LOG_ER("Unable to set active controller in consensus service"); > - opensaf_reboot(0, nullptr, > - "Unable to set active controller in consensus service"); > + opensaf_quick_reboot("Unable to set active controller" > + "in consensus service"); > } else if (rc == SA_AIS_ERR_EXIST) { > // @todo if we don't reboot, we don't seem to recover from this. Can > we > // improve? > diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc > index bb17133..3487f0b 100644 > --- a/src/rde/rded/rde_main.cc > +++ b/src/rde/rded/rde_main.cc > @@ -203,9 +203,8 @@ static void handle_mbx_event() { > if (state == Consensus::TakeoverState::ACCEPTED) { > LOG_NO("Accepted takeover request"); > if (consensus_service.IsRemoteFencingEnabled() == false) { > -opensaf_reboot(0, nullptr, > - "Another controller is taking over the active > role. " > - "Rebooting this node"); > +opensaf_quick_reboot("Another controller is taking over" > +"the active role. Rebooting this node"); > } > } else if (state == Consensus::TakeoverState::UNDEFINED) { > bool fencing_required = true; > @@ -233,8 +232,7 @@ static void handle_mbx_event() { > if (fencing_required == true) { > LOG_NO("Lost connectivity to consensus service"); > if (consensus_service.IsRemoteFencingEnabled() == false) { > -opensaf_reboot(0, nullptr, > - "Lost connectivity to consensus service. " > +opensaf_quick_reboot("Lost connectivity to consensus > service. " > "Rebooting this node"); > } > } > diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc > index 499f7c8..b2b9b49 100644 > --- a/src/rde/rded/role.cc > +++ b/src/rde/rded/role.cc > @@ -112,8 +112,8 @@ void Role::PromoteNode(const uint64_t cluster_size, > promotion_pending = true; > } else if (rc != SA_AIS_OK) { > LOG_ER("Unable to set active controller in consensus service"); > -opensaf_reboot(0, nullptr, > - "Unable to set active controller in consensus service"); > +opensaf_quick_reboot("Unable to set active controller" > +"in consensus service"); > } > > RDE_CONTROL_BLOCK* cb = rde_get_control_block(); > @@ -133,8 +133,8 @@ void Role::PromoteNode(const uint64_t cluster_size, > rc = consensus_service.PromoteThisNode(true, cluster_size); > if (rc == SA_AIS_ERR_EXIST) { > LOG_ER("Unable to set active controller in consensus service"); > -opensaf_reboot(0, nullptr, > - "Unable to set active controller in consensus > service"); > +opensaf_quick_reboot("Unable to set active controller in" > +"consensus service"); > } > std::this_thread::sleep_for(std::chrono::seconds(1)); > } ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] osaf: do quick local node reboot when split network [#3001]
Hi Vu, see my comment below/Hans On 1/28/19 10:59, Vu Minh Nguyen wrote: > Hi Hans, > > Thanks for your comments. See my comment inline. Thanks > > Regards, Vu > >> -Original Message- >> From: Hans Nordebäck >> Sent: Monday, January 28, 2019 4:37 PM >> To: Hans Nordebäck ; Vu Minh Nguyen >> ; Gary Lee ; >> Minh Hon Chau >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: Re: [devel] [PATCH 1/1] osaf: do quick local node reboot when > split >> network [#3001] >> >> Hi Vu, >> See one more comment below/Thanks HansN >> >> -Original Message- >> From: Hans Nordebäck >> Sent: den 28 januari 2019 10:15 >> To: Vu Minh Nguyen ; Gary Lee >> ; Minh Hon Chau >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: Re: [devel] [PATCH 1/1] osaf: do quick local node reboot when > split >> network [#3001] >> >> Hi Vu, ack review only. Two comments below/Thanks HansN >> >> On 1/25/19 12:34, Vu Minh Nguyen wrote: >>> --- >>>scripts/opensaf_reboot | 33 +++-- >>>src/amf/amfd/ndproc.cc | 4 ++-- >>>src/base/ncssysf_def.h | 6 ++ >>>src/base/sysf_def.c | 10 ++ >>>src/fm/fmd/fm_main.cc| 6 +++--- >>>src/fm/fmd/fm_rda.cc | 5 ++--- >>>src/rde/rded/rde_main.cc | 6 ++ >>>7 files changed, 52 insertions(+), 18 deletions(-) >>> >>> diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot index >>> 727272e1d..2f7a7daeb 100644 >>> --- a/scripts/opensaf_reboot >>> +++ b/scripts/opensaf_reboot >>> @@ -31,7 +31,7 @@ export >> LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH >>># Node fencing: OpenSAF cannot reboot a node when there's no CLM >> node to >>># PLM EE mapping in the information model. In such cases rebooting >>> would be done -# through proprietary mechanisms, i.e. not through PLM. >>> Node_id is (the only >>> +# through proprietary mechanisms, i.e. not through PLM. Node_id is >>> +(the only >>># entity) at the disposal of such a mechanism. >>> >>>if [ -f "$pkgsysconfdir/fmd.conf" ]; then @@ -81,7 +81,6 @@ >>> opensaf_reboot_with_remote_fencing() >>>#if plm exists in the system,then the reboot is performed using the >> eename. >>>opensaf_reboot_with_plm() >>>{ >>> - >>>immadm -o 7 $ee_name >>>retval=$? >>>if [ $retval != 0 ]; then >>> @@ -96,12 +95,29 @@ opensaf_reboot_with_plm() >>>logger -t "opensaf_reboot" "abrupt restart failed for $ee_name: unable > to >> restart remote node" >>>exit 1 >>>fi >>> -fi >>> +fi >>>fi >>>#Note: Operation Id SA_PLM_ADMIN_RESTART=7 >>>#In the example the $ee_name would expand to (for eg:-) >> safEE=my_linux_os,safHE=64bitmulticore,safDomain=my_domain >>>} >>> >>> +# Force local node reboot as fast as possible >>> +quick_local_node_reboot() >>> +{ >>> +logger -t "opensaf_reboot" "Do quick local node reboot" >> [HansN] perhaps reuse the same logic as in sysf_def.c, i.e. use the sysrq > as >> fallback and use a short timeout > [Vu] > Forcing node reboot by touching /proc/sysrq-trigger is not allowed on > containers such as LXC > (as container is immutable), therefore I provided 02 more alternatives below > in case the first try is failed. [HansN] preferable to only run the sysrq if the reboot fails, i.e. the same logic as in sysf_def.c, see the SIGALRM and supervision_time. >>> + >>> +$icmd /bin/echo -n 'b' 2> /dev/null > /proc/sysrq-trigger >> [HansN] if not run as root, i.e. icmd is sudo, I think you need to use >> cmd: /bin/echo -n 'b' | $icmd tee /proc/sysrq-trigger , please check >> [HansN] or $icmd /bin/sh -c "/bin/echo -n 'b' 2> /dev/null > /proc/sysrq- >> trigger" > [Vu] Thanks for your suggestion. Will update accordingly. >>> +ret_code=$? >>> + >>> +if [ $ret_code != 0 ] && [ -x /bin/systemctl ]; then >>> +$icmd /bin/systemctl --force --force reboot >>> +ret_code=$? >>> +fi >>> + >>> +if [ $ret_code != 0 ]; then >>> +$icmd /sbin/reboot -f >>> +fi >>> +} >>> >>>if ! test -f "$NODE_ID_FILE"; then >>>logger -t "opensaf_reboot" "$NODE_ID_FILE doesnt exists,reboot failed > &quo
Re: [devel] [PATCH 1/1] osaf: do quick local node reboot when split network [#3001]
Hi Vu, See one more comment below/Thanks HansN -Original Message- From: Hans Nordebäck Sent: den 28 januari 2019 10:15 To: Vu Minh Nguyen ; Gary Lee ; Minh Hon Chau Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [devel] [PATCH 1/1] osaf: do quick local node reboot when split network [#3001] Hi Vu, ack review only. Two comments below/Thanks HansN On 1/25/19 12:34, Vu Minh Nguyen wrote: > --- > scripts/opensaf_reboot | 33 +++-- > src/amf/amfd/ndproc.cc | 4 ++-- > src/base/ncssysf_def.h | 6 ++ > src/base/sysf_def.c | 10 ++ > src/fm/fmd/fm_main.cc| 6 +++--- > src/fm/fmd/fm_rda.cc | 5 ++--- > src/rde/rded/rde_main.cc | 6 ++ > 7 files changed, 52 insertions(+), 18 deletions(-) > > diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot index > 727272e1d..2f7a7daeb 100644 > --- a/scripts/opensaf_reboot > +++ b/scripts/opensaf_reboot > @@ -31,7 +31,7 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH > > # Node fencing: OpenSAF cannot reboot a node when there's no CLM node to > # PLM EE mapping in the information model. In such cases rebooting > would be done -# through proprietary mechanisms, i.e. not through PLM. > Node_id is (the only > +# through proprietary mechanisms, i.e. not through PLM. Node_id is > +(the only > # entity) at the disposal of such a mechanism. > > if [ -f "$pkgsysconfdir/fmd.conf" ]; then @@ -81,7 +81,6 @@ > opensaf_reboot_with_remote_fencing() > #if plm exists in the system,then the reboot is performed using the eename. > opensaf_reboot_with_plm() > { > - > immadm -o 7 $ee_name > retval=$? > if [ $retval != 0 ]; then > @@ -96,12 +95,29 @@ opensaf_reboot_with_plm() > logger -t "opensaf_reboot" "abrupt restart > failed for $ee_name: unable to restart remote node" > exit 1 > fi > - fi > + fi > fi > #Note: Operation Id SA_PLM_ADMIN_RESTART=7 > #In the example the $ee_name would expand to (for eg:-) > safEE=my_linux_os,safHE=64bitmulticore,safDomain=my_domain > } > > +# Force local node reboot as fast as possible > +quick_local_node_reboot() > +{ > + logger -t "opensaf_reboot" "Do quick local node reboot" [HansN] perhaps reuse the same logic as in sysf_def.c, i.e. use the sysrq as fallback and use a short timeout > + > + $icmd /bin/echo -n 'b' 2> /dev/null > /proc/sysrq-trigger [HansN] if not run as root, i.e. icmd is sudo, I think you need to use cmd: /bin/echo -n 'b' | $icmd tee /proc/sysrq-trigger , please check [HansN] or $icmd /bin/sh -c "/bin/echo -n 'b' 2> /dev/null > /proc/sysrq-trigger" > + ret_code=$? > + > + if [ $ret_code != 0 ] && [ -x /bin/systemctl ]; then > + $icmd /bin/systemctl --force --force reboot > + ret_code=$? > + fi > + > + if [ $ret_code != 0 ]; then > + $icmd /sbin/reboot -f > + fi > +} > > if ! test -f "$NODE_ID_FILE"; then > logger -t "opensaf_reboot" "$NODE_ID_FILE doesnt exists,reboot failed " > @@ -112,8 +128,13 @@ temp_node_id=`cat "$NODE_ID_FILE"` > temp_node_id=`echo "$temp_node_id" |sed -e 's:^0[bBxX]::'| sed -e 's:^:0x:'` > self_node_id=`printf "%d" $temp_node_id` > > -# If clm cluster reboot requested argument one and two are set but > not used, argument 3 is set to 1, "safe reboot" request -if [ > "$safe_reboot" = 1 ]; then > +# If no argument is provided, forcing node reboot immediately without > +log # flushing, process terminating, disk un-mounting. > +# If clm cluster reboot requested argument one and two are set but > +not used, # argument 3 is set to 1, "safe reboot" request. > +if [ "$#" = 0 ]; then > + quick_local_node_reboot > +elif [ "$safe_reboot" = 1 ]; then > opensaf_safe_reboot > else > # A node ID of zero(0) means an order to reboot the local node @@ > -165,7 +186,7 @@ else > logger -t "opensaf_reboot" "Not > rebooting remote node $ee_name as it is not in INSTANTIATED state" > elif [ $plm_node_state != 2 ]; then > opensaf_reboot_with_plm > - else > + else > logger -t "opensaf_reboot" "Not > rebooting remote node $ee_name as it is already in locked state" &g
Re: [devel] [PATCH 1/2] osaf: update etcd v2 plugin [#3003]
ack, review only/Thanks HansN On 1/24/19 02:17, Gary Lee wrote: > 'etcdctl watch' will return if connection to the etcd server is lost. > If that occurs, send a 'fake' takeover request to rded so rded > will reboot the node. This is in alignment with the etcd v3 plugin. > --- > src/osaf/consensus/plugins/etcd.plugin | 29 + > 1 file changed, 21 insertions(+), 8 deletions(-) > > diff --git a/src/osaf/consensus/plugins/etcd.plugin > b/src/osaf/consensus/plugins/etcd.plugin > index f62cc89..f88a7e7 100644 > --- a/src/osaf/consensus/plugins/etcd.plugin > +++ b/src/osaf/consensus/plugins/etcd.plugin > @@ -17,7 +17,9 @@ > # backward compatible. This plugin may need to be adapted. > > readonly keyname="opensaf_consensus_lock" > +readonly takeover_request="takeover_request" > readonly directory="/opensaf/" > +readonly node_name_file="/etc/opensaf/node_name" > readonly etcd_options="--no-sync" > readonly etcd_timeout="5s" > > @@ -27,7 +29,8 @@ readonly etcd_timeout="5s" > # $1 - > # returns: > # 0 - success, is echoed to stdout > -# non-zero - failure > +# 1 - invalid param > +# other - failure > get() { > readonly key="$1" > > @@ -36,7 +39,7 @@ get() { > echo "$value" > return 0 > else > -return 1 > +return 2 > fi > } > > @@ -73,7 +76,8 @@ setkey() { > # returns: > # 0 - success > # 1 - already exists > -# 2 or above - other failure > +# 2 - invalid param > +# 3 or above - other failure > create_key() { > readonly key="$1" > readonly value="$2" > @@ -90,7 +94,7 @@ create_key() { > fi > fi > > - return 2 > + return 3 > } > > # set > @@ -103,7 +107,8 @@ create_key() { > # $4 - > # returns: > # 0 - success > -# non-zero - failure > +# 1 - invalid param > +# other - failure > setkey_match_prev() { > readonly key="$1" > readonly value="$2" > @@ -115,7 +120,7 @@ setkey_match_prev() { > then > return 0 > else > -return 1 > +return 2 > fi > } > > @@ -158,7 +163,8 @@ lock() { > return 0 > fi > > - if current_owner=$(etcdctl get "$directory$keyname") > + if current_owner=$(etcdctl $etcd_options --timeout $etcd_timeout \ > +get "$directory$keyname") > then > # see if we already hold the lock > if [ "$current_owner" = "$owner" ]; then > @@ -252,7 +258,14 @@ watch() { > echo "$value" > return 0 > else > -return 1 > +# etcd down? > +if [ "$key" = "$takeover_request" ]; then > + hostname=`cat $node_name_file` > + echo "$hostname SC-0 1000 UNDEFINED" > + return 0 > +else > + return 1 > +fi > fi > } > ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 2/2] osaf: update sample plugin [#3003]
ack, code review only/Thanks HansN On 1/24/19 02:17, Gary Lee wrote: > --- > src/osaf/consensus/plugins/sample.plugin | 20 +++- > 1 file changed, 15 insertions(+), 5 deletions(-) > > diff --git a/src/osaf/consensus/plugins/sample.plugin > b/src/osaf/consensus/plugins/sample.plugin > index fc4c54c..cadb9e0 100644 > --- a/src/osaf/consensus/plugins/sample.plugin > +++ b/src/osaf/consensus/plugins/sample.plugin > @@ -17,6 +17,8 @@ > # backward compatible. > > readonly keyname="opensaf_consensus_lock" > +readonly takeover_request="takeover_request" > +readonly node_name_file="/etc/opensaf/node_name" > > # get > # retrieve of from key-value store > @@ -24,7 +26,8 @@ readonly keyname="opensaf_consensus_lock" > # $1 - > # returns: > # 0 - success, is echoed to stdout > -# non-zero - failure > +# 1 - invalid param > +# other - failure > get() { > readonly key="$1" > ... > @@ -56,7 +59,8 @@ setkey() { > # returns: > # 0 - success > # 1 - already exists > -# 2 or above - other failure > +# 2 - invalid param > +# 3 or above - other failure > create_key() { > readonly key="$1" > readonly value="$2" > @@ -74,7 +78,8 @@ create_key() { > # $4 - > # returns: > # 0 - success > -# non-zero - failure > +# 1 - invalid param > +# other - failure > setkey_match_prev() { > readonly key="$1" > readonly value="$2" > @@ -101,7 +106,8 @@ erase() { > # $2 - , will automatically unlock after seconds > # returns: > # 0 - success > -# non-zero - failure > +# 1 - the lock is owned by someone else > +# 2 or above - other failure > lock() { > readonly owner="$1" > readonly timeout="$2" > @@ -129,7 +135,7 @@ lock_owner() { > # returns: > # 0 - success > # 1 - the lock is owned by someone else > -# 2 or above - other failure# > +# 2 or above - other failure > unlock() { > readonly owner="$1" > readonly forced=${2:-false} > @@ -146,6 +152,10 @@ unlock() { > watch() { > readonly key="$1" > .. > + # if is $takeover_request and we have lost connectivity to the > + # consensus service, a fake takeover_request can be returned to force > + # rded to fence this node. Eg: > + # "$hostname SC-0 1000 UNDEFINED" > } > > # argument parsing ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 5/5] rded: add relaxed node promotion feature [#2996]
ack, review only, one question below/Thanks HansN On 1/21/19 04:52, Gary Lee wrote: > Allow promotion of node to active at cluster startup, even if the > consensus service is unavailable, if the peer SC can be seen. > > During normal cluster operation, if the consensus service becomes > unavailable but the peer SC can still be seen, allow the existing > active SC to remain active. > > A new NCSMDS_SVC_ID_RDE_DISCOVERY service ID is exported by rded. > This is installed as soon as rded is started, unlike > NCSMDS_SVC_ID_RDE which is only installed when it becomes > a candidate for election. > --- > src/mds/mds_papi.h | 1 + > src/rde/rded/rde_cb.h| 12 +- > src/rde/rded/rde_main.cc | 71 +++ > src/rde/rded/rde_mds.cc | 94 -- > src/rde/rded/role.cc | 97 > +++- > src/rde/rded/role.h | 4 +- > 6 files changed, 256 insertions(+), 23 deletions(-) > > diff --git a/src/mds/mds_papi.h b/src/mds/mds_papi.h > index 03d755d..7cd543c 100644 > --- a/src/mds/mds_papi.h > +++ b/src/mds/mds_papi.h > @@ -191,6 +191,7 @@ typedef enum ncsmds_svc_id { > NCSMDS_SVC_ID_PLMS = 37, > NCSMDS_SVC_ID_PLMS_HRB = 38, > NCSMDS_SVC_ID_PLMA = 39, > + NCSMDS_SVC_ID_RDE_DISCOVERY = 40, > NCSMDS_SVC_ID_NCSMAX, /* This mnemonic always last */ > > /* The range below is for OpenSAF internal use */ > diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h > index d3f5a24..9a0919c 100644 > --- a/src/rde/rded/rde_cb.h > +++ b/src/rde/rded/rde_cb.h > @@ -34,6 +34,9 @@ >** >*/ > > +enum class State {kNotActive = 0, kNotActiveSeenPeer, kActiveElected, > + kActiveElectedSeenPeer, kActiveFailover}; > + > struct RDE_CONTROL_BLOCK { > SYSF_MBX mbx; > NCSCONTEXT task_handle; > @@ -43,6 +46,9 @@ struct RDE_CONTROL_BLOCK { > bool monitor_lock_thread_running{false}; > bool monitor_takeover_req_thread_running{false}; > std::set cluster_members{}; > + // used for discovering peer controllers, regardless of their role > + std::set peer_controllers{}; > + State state{State::kNotActive}; > }; > > enum RDE_MSG_TYPE { > @@ -54,7 +60,9 @@ enum RDE_MSG_TYPE { > RDE_MSG_NODE_UP = 6, > RDE_MSG_NODE_DOWN = 7, > RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8, > - RDE_MSG_ACTIVE_PROMOTION_SUCCESS = 9 > + RDE_MSG_ACTIVE_PROMOTION_SUCCESS = 9, > + RDE_MSG_CONTROLLER_UP = 10, > + RDE_MSG_CONTROLLER_DOWN = 11 > }; > > struct rde_peer_info { > @@ -82,7 +90,9 @@ extern const char *rde_msg_name[]; > > extern RDE_CONTROL_BLOCK *rde_get_control_block(); > extern uint32_t rde_mds_register(); > +extern uint32_t rde_discovery_mds_register(); > extern uint32_t rde_mds_unregister(); > +extern uint32_t rde_discovery_mds_unregister(); > extern uint32_t rde_mds_send(rde_msg *msg, MDS_DEST to_dest); > extern uint32_t rde_set_role(PCS_RDA_ROLE role); > > diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc > index e5813e4..2d9aa51 100644 > --- a/src/rde/rded/rde_main.cc > +++ b/src/rde/rded/rde_main.cc > @@ -39,6 +39,7 @@ > #include "osaf/consensus/consensus.h" > #include "rde/rded/rde_cb.h" > #include "rde/rded/role.h" > +#include "rde_cb.h" > > #define RDA_MAX_CLIENTS 32 > > @@ -56,7 +57,9 @@ const char *rde_msg_name[] = {"-", > "RDE_MSG_NODE_UP(6)", > "RDE_MSG_NODE_DOWN(7)", > "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)", > - "RDE_MSG_ACTIVE_PROMOTION_SUCCESS(9)"}; > + "RDE_MSG_ACTIVE_PROMOTION_SUCCESS(9)", > + "RDE_MSG_CONTROLLER_UP(10)", > + "RDE_MSG_CONTROLLER_DOWN(11)"}; > > static RDE_CONTROL_BLOCK _rde_cb; > static RDE_CONTROL_BLOCK *rde_cb = &_rde_cb; > @@ -157,6 +160,23 @@ static void handle_mbx_event() { > rde_cb->cluster_members.erase(msg->fr_node_id); > TRACE("cluster_size %zu", rde_cb->cluster_members.size()); > break; > +case RDE_MSG_CONTROLLER_UP: > + if (msg->fr_node_id != own_node_id) { > +rde_cb->peer_controllers.insert(msg->fr_node_id); > +TRACE("peer_controllers: size %zu", rde_cb->peer_controllers.size()); > +if (rde_cb->state == State::kNotActive) { > + TRACE("Set state to kNotActiveSeenPeer"); > + rde_cb->state = State::kNotActiveSeenPeer; > +} else if (rde_cb->state == State::kActiveElected) { > + TRACE("Set state to kActiveElectedSeenPeer"); > + rde_cb->state = State::kActiveElectedSeenPeer; > +} > + } > + break; > +case RDE_MSG_CONTROLLER_DOWN: > + rde_cb->peer_controllers.erase(msg->fr_node_id); > + TRACE("peer_controllers: size %zu", rde_cb->peer_controllers.size()); > + break; > case
Re: [devel] [PATCH 2/5] fmd: add configuration parameters [#2996]
ack, review only/Thanks HansN On 1/21/19 04:52, Gary Lee wrote: > Add parameters FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE and > FMS_RELAXED_NODE_PROMOTION. > --- > src/fm/fmd/fmd.conf | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/src/fm/fmd/fmd.conf b/src/fm/fmd/fmd.conf > index 9a106bf..209e484 100644 > --- a/src/fm/fmd/fmd.conf > +++ b/src/fm/fmd/fmd.conf > @@ -30,6 +30,23 @@ export FMS_TAKEOVER_REQUEST_VALID_TIME=20 > # Full path to key-value store plugin > #export FMS_KEYVALUE_STORE_PLUGIN_CMD= > > +# In the event of SCs being split into network partitions, we can try to make > +# the active SC reside in the largest network partition. If it is preferable > +# to keep the current SC active, then set this to 0 > +# Default is 1 > +#export FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE=1 > + > +# Default behaviour is not to allow promotion of this node to Active > +# unless a lock can be obtained, if split brain prevention is enabled. > +# Uncomment the next line to allow promotion of this node at cluster startup, > +# if a peer SC can be seen and we have a lower node ID, in the event the > +# consensus service is not available. > +# Also if the consensus service is down, but a peer SC can be seen, > +# then an active SC may remain active. > +# This mode should not be used together with the roaming SC feature > +# Default is 0 > +#export FMS_RELAXED_NODE_PROMOTION=0 > + > # FM will supervise transitions to the ACTIVE role when this variable is > set to > # a non-zero value. The value is the time in the unit of 10 ms to wait for a > # role change to ACTIVE to take effect. If AMF has not give FM an active ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/5] osaf: update etcd3 to poll instead of watch [#2996]
ack, review only/Thanks HansN On 1/21/19 04:52, Gary Lee wrote: > The 'watch' command does not return if the etcd server goes down. > We need to poll the etcd server to properly check we still have > connectivity to the etcd server. > --- > src/osaf/consensus/plugins/etcd3.plugin | 50 > ++--- > 1 file changed, 40 insertions(+), 10 deletions(-) > > diff --git a/src/osaf/consensus/plugins/etcd3.plugin > b/src/osaf/consensus/plugins/etcd3.plugin > index b3814c9..4998df0 100644 > --- a/src/osaf/consensus/plugins/etcd3.plugin > +++ b/src/osaf/consensus/plugins/etcd3.plugin > @@ -17,9 +17,12 @@ > # backward compatible. This plugin may need to be adapted. > > readonly keyname="opensaf_consensus_lock" > +readonly takeover_request="takeover_request" > +readonly node_name_file="/etc/opensaf/node_name" > readonly directory="/opensaf/" > readonly etcd_options="" > -readonly etcd_timeout="10s" > +readonly etcd_timeout="3s" > +readonly heartbeat_interval=2 > > export ETCDCTL_API=3 > > @@ -29,7 +32,8 @@ export ETCDCTL_API=3 > # $1 - > # returns: > # 0 - success, is echoed to stdout > -# non-zero - failure > +# 1 - invalid param > +# other - failure > get() { > readonly key="$1" > > @@ -51,7 +55,7 @@ get() { > return 1 > fi > else > -return 1 > +return 2 > fi > } > > @@ -101,7 +105,8 @@ setkey() { > # returns: > # 0 - success > # 1 - already exists > -# 2 or above - other failure > +# 2 - invalid param > +# 3 or above - other failure > create_key() { > readonly key="$1" > readonly value="$2" > @@ -114,7 +119,7 @@ create_key() { > lease_id=$(echo $output | awk '{print $2}') > lease_param="--lease="$lease_id"" > else > - return 2 > + return 3 > fi > else > lease_param="" > @@ -135,7 +140,7 @@ create_key() { > then > return 1 > else > -return 2 > +return 3 > fi > } > > @@ -149,6 +154,7 @@ create_key() { > # $4 - > # returns: > # 0 - success > +# 1 - invalid param > # non-zero - failure > setkey_match_prev() { > readonly key="$1" > @@ -326,10 +332,34 @@ unlock() { > # non-zero - failure > watch() { > readonly watch_key="$1" > - etcdctl $etcd_options --dial-timeout $etcd_timeout \ > -watch "$directory$watch_key" | grep -m0 \"\" 2>&1 > - get "$watch_key" > - return 0 > + > + # get baseline > + orig_value=$(get "$watch_key") > + result=$? > + > + if [ "$result" -le "1" ]; then > +while true > +do > + sleep $heartbeat_interval > + current_value=$(get "$watch_key") > + result=$? > + if [ "$result" -gt "1" ]; then > +# etcd down? > +if [ "$watch_key" == "$takeover_request" ]; then > + hostname=`cat $node_name_file` > + echo "$hostname SC-0 1000 UNDEFINED" > + return 0 > +else > + return 1 > +fi > + elif [ "$orig_value" != "$current_value" ]; then > +echo $current_value > +return 0 > + fi > +done > + fi > + > + return 1 > } > > # argument parsing ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] amfd: Only start clm track for 2N Opensaf SU in failover [#2980]
Hi Minh, ack, review only/Thanks HansN On 12/10/18 07:00, Minh Chau wrote: > --- > src/amf/amfd/sg_2n_fsm.cc | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/amf/amfd/sg_2n_fsm.cc b/src/amf/amfd/sg_2n_fsm.cc > index a218786..91ffc63 100644 > --- a/src/amf/amfd/sg_2n_fsm.cc > +++ b/src/amf/amfd/sg_2n_fsm.cc > @@ -1784,7 +1784,8 @@ uint32_t SG_2N::susi_success_sg_realign(AVD_SU *su, > AVD_SU_SI_REL *susi, > } > > if ((state == SA_AMF_HA_ACTIVE) && > - (cb->node_id_avd == su->su_on_node->node_info.nodeId)) { > + (cb->node_id_avd == su->su_on_node->node_info.nodeId) && > + (su->sg_of_su->sg_ncs_spec == true)) { > /* This is as a result of failover, start CLM tracking*/ > if (avd_clm_track_start(cb) == SA_AIS_ERR_TRY_AGAIN) > Fifo::queue(new ClmTrackStart()); ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] amfd: Update the assignment counters after restore absent assignment from imm [#2977]
Hi Minh, ack, code review only/Thanks HansN On 12/3/18 04:29, Minh Chau wrote: > AMF performs headless recovery by syncing the assignments from AMFND(s) and > re-create them in AMFD's db and IMM. Next step, AMFD compares the assignment > objects from IMM and from AMFND(s) to figure out the on-going assignments > that have been left over before headless and failover them, the assignments > states/counters are also restored in this step. If all payloads come from > headless without occurence of network split (legacy headless), IMM db in all > payloads should be consistent, thus AMFD creates the IMM assignments normally > without any problem. But if the payloads come from headless and there was a > network split before, IMM appears often busy at the time AMFD creates the > synced assignments in IMM. The assignment object creation is pending in the > queue and executed later, but AMFD has missed to restore the assignment states > and counters of the synced assignments at the time comparision between IMM > and AMFND(s). > Also in legacy headless, when both SCs go down, the assignment objects are > still in IMM. Even IMM is busy, AMFD has not missed the counter updates. > > The patch moves the counter update after restoring absent assignment from IMM. > --- > src/amf/amfd/siass.cc | 67 > +-- > 1 file changed, 38 insertions(+), 29 deletions(-) > > diff --git a/src/amf/amfd/siass.cc b/src/amf/amfd/siass.cc > index ffde7b1..8a2d217 100644 > --- a/src/amf/amfd/siass.cc > +++ b/src/amf/amfd/siass.cc > @@ -264,14 +264,48 @@ void avd_susi_read_headless_cached_rta(AVD_CL_CB *cb) { > } > > #endif > +} else { // For ABSENT SUSI > + TRACE("Check absent SUSI, ha_state:'%u', fsm_state:'%u'", imm_ha_state, > +imm_susi_fsm); > + if (avd_susi_validate_absent_assignment(su, si, > + imm_ha_state, imm_susi_fsm) == false) { > +avd_saImmOiRtObjectDelete(Amf::to_string()); > +continue; > + } > + absent_susi = avd_susi_create(avd_cb, si, su, imm_ha_state, false, > + AVSV_SUSI_ACT_BASE); > + // Restore the fsm of this absent SUSI, which is used to determine > + // whether a SU should be added in SG's SUOperationList > + // Memorize it in temporary var @absent > + // The fsm of this SUSI will be changed to AVD_SU_SI_STATE_ABSENT > + // after restoring SUOperationList > + absent_susi->fsm = imm_susi_fsm; > + absent_susi->absent = true; > + if (absent_susi->si->saAmfSIAdminState == SA_AMF_ADMIN_LOCKED || > + absent_susi->si->saAmfSIAdminState == SA_AMF_ADMIN_SHUTTING_DOWN) { > +if (absent_susi->fsm == AVD_SU_SI_STATE_MODIFY && > +(absent_susi->state == SA_AMF_HA_QUIESCED || > +absent_susi->state == SA_AMF_HA_QUIESCING)) { > + m_AVD_SET_SG_ADMIN_SI(cb, si); > +} > + } > +} > + } > + (void)immutil_saImmOmSearchFinalize(searchHandle); > + > + // Update all PRESENT SUSI, in case that a SUSI is missed to update because > + // it is not present in IMM > + for (const auto : *su_db) { > +AVD_SU *su = value.second; > +susi = su->list_of_susi; > +while (susi != nullptr && susi->absent == false) { > + AVD_SI *si = susi->si; > // validate SUSI assignments that are over assigned > if (avd_susi_validate_excessive_assignment(susi) == true) { > susi->fsm = AVD_SU_SI_STATE_EXCESSIVE; > } > - > // Checkpoint to add this SUSI > m_AVSV_SEND_CKPT_UPDT_ASYNC_ADD(avd_cb, susi, AVSV_CKPT_AVD_SI_ASS); > - > // restore assignment counter > if (susi->fsm == AVD_SU_SI_STATE_ASGN || > susi->fsm == AVD_SU_SI_STATE_ASGND || > @@ -296,36 +330,11 @@ void avd_susi_read_headless_cached_rta(AVD_CL_CB *cb) { > // only restore if not done > if (susi->su->su_on_node->admin_ng == nullptr) > avd_ng_restore_headless_states(cb, susi); > -} else { // For ABSENT SUSI > - TRACE("Check absent SUSI, ha_state:'%u', fsm_state:'%u'", imm_ha_state, > -imm_susi_fsm); > - if (avd_susi_validate_absent_assignment(su, si, > - imm_ha_state, imm_susi_fsm) == false) { > -avd_saImmOiRtObjectDelete(Amf::to_string()); > -continue; > - } > - absent_susi = avd_susi_create(avd_cb, si, su, imm_ha_state, false, > - AVSV_SUSI_ACT_BASE); > - // Restore the fsm of this absent SUSI, which is used to determine > - // whether a SU should be added in SG's SUOperationList > - // Memorize it in temporary var @absent > - // The fsm of this SUSI will be changed to AVD_SU_SI_STATE_ABSENT > - // after restoring SUOperationList > - absent_susi->fsm = imm_susi_fsm; > - absent_susi->absent = true; > - if (absent_susi->si->saAmfSIAdminState == SA_AMF_ADMIN_LOCKED || > - absent_susi->si->saAmfSIAdminState ==
Re: [devel] [PATCH 0/4] Review Request for clm: add new test cases in clm apitest [#2914]
Hi Mohan, ack, review only/Thanks HansN On 11/6/18 11:19, Mohan Kanakam wrote: > Summary: clm: add new test case of API saClmClusterNotificationFree_4() of > apitest [#2914] > Review request for Ticket(s): 2914-1 > Peer Reviewer(s):Anders, Hans, Ravi > Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** > Affected branch(es): develop > Development branch: ticket-2914-1 > Base revision: f8a6848a1cdbff0b518c3db951e4689e260226c7 > Personal repository: git://git.code.sf.net/u/mohan-hasoln/review > > > Impacted area Impact y/n > > Docsn > Build systemn > RPM/packaging n > Configuration files n > Startup scripts n > SAF servicesn > OpenSAF servicesn > Core libraries n > Samples n > Tests y > Other n > > > Comments (indicate scope for each "y" above): > - > *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** > > revision 9a6bf69135567cdceded5cd89b0a22138b7c116a > Author: Mohan Kanakam > Date: Tue, 6 Nov 2018 15:41:15 +0530 > > clm: add new test case in saClmResponse_4() of apitest [#2914] > > > > revision afd96329f2b2cd6211e0dae90a6e46881d140092 > Author: Mohan Kanakam > Date: Tue, 6 Nov 2018 15:15:32 +0530 > > clm: add new test case saClmClusterNodeGetAsync() of apitest [#2914] > > > > revision 71b52f9c79477fe8ee567123b68ebf52cd6ee433 > Author: Mohan Kanakam > Date: Tue, 6 Nov 2018 15:05:34 +0530 > > clm: add new test case of API saClmClusterNodeGet_4() of apitest [#2914] > > > > revision eda88ade8b2d87ba657465ef17d73cb553082551 > Author: Mohan Kanakam > Date: Tue, 6 Nov 2018 14:39:02 +0530 > > clm: add new test case of API saClmClusterNotificationFree_4() of apitest > [#2914] > > > > Complete diffstat: > -- > src/clm/apitest/tet_saClmClusterNodeGet.cc | 21 > + > src/clm/apitest/tet_saClmClusterNodeGetAsync.cc | 19 +++ > src/clm/apitest/tet_saClmClusterNotificationFree.cc | 14 ++ > src/clm/apitest/tet_saClmResponse.cc| 9 + > 4 files changed, 63 insertions(+) > > > Testing Commands: > - > ./clmtest > > Testing, Expected Results: > -- > 5 PASSED saClmClusterNotificationFree with finalized handle > 10 PASSED/PASSEDsaClmClusterNodeGet & saClmClusterNodeGet_4 with > Finalized handle > 10 PASSED/PASSEDsaClmClusterNodeGetAsync with finalized handle > 6 PASSED saClmResponse with Finalized handle > > > Conditions of Submission: > - > Ack from mainatiners > > Arch Built StartedLinux distro > --- > mipsn n > mips64 n n > x86 n n > x86_64 y y > powerpc n n > powerpc64 n n > > > Reviewer Checklist: > --- > [Submitters: make sure that your review doesn't trigger any checkmarks!] > > > Your checkin has not passed review because (see checked entries): > > ___ Your RR template is generally incomplete; it has too many blank entries > that need proper data filled in. > > ___ You have failed to nominate the proper persons for review and push. > > ___ Your patches do not have proper short+long header > > ___ You have grammar/spelling in your header that is unacceptable. > > ___ You have exceeded a sensible line length in your headers/comments/text. > > ___ You have failed to put in a proper Trac Ticket # into your commits. > > ___ You have incorrectly put/left internal data in your comments/files > (i.e. internal bug tracking tool IDs, product names etc) > > ___ You have not given any evidence of testing beyond basic build tests. > Demonstrate some level of runtime or other sanity testing. > > ___ You have ^M present in some of your files. These have to be removed. > > ___ You have needlessly changed whitespace or added whitespace crimes > like trailing spaces, or spaces before tabs. > > ___ You have mixed real technical changes with whitespace and other > cosmetic code cleanup changes. These have to be separate commits. > > ___ You need to refactor your submission into logical chunks; there is > too much content into a single commit. > > ___ You have extraneous garbage in your review (merge commits etc) > > ___ You have giant attachments which should never have been sent; > Instead you should place your content in a public tree to be pulled. > > ___ You have too many commits attached to an e-mail; resend as threaded > commits, or place in a public tree for a pull. > > ___ You have resent this content multiple times without a clear indication > of what has changed between each re-send. > > ___ You have failed to adequately and individually
Re: [devel] [PATCH 1/1] osaf: Set sticky bit for socket and pipe files [#2953]
Hi Minh, the "sticky" bit here is in fact the "restricted deletion bit", it is used on directories, e.g. the /tmp directory where several users have r/w access but when the 't' bit is set only the file owners may delete its files. It should not be set on files only directories and I don't think it is need here. /Thanks HansN On 11/5/18 09:56, Minh Anh Du wrote: > There are files, sockets and pipes have world writable permission, > but only root user and owner should be able to create/delete > these files. Sticky bit should be set for these sockets and pipes > for security reason. > --- > src/base/daemon.c | 2 +- > src/base/osaf_secutil.c | 2 +- > src/dtm/transport/log_server.cc | 2 +- > src/nid/agent/nid_ipc.c | 2 +- > 4 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/src/base/daemon.c b/src/base/daemon.c > index cdde7fd..50ddc50 100644 > --- a/src/base/daemon.c > +++ b/src/base/daemon.c > @@ -162,7 +162,7 @@ static void create_fifofile(const char *fifofile) > > mask = umask(0); > > - if (mkfifo(fifofile, 0666) == -1) { > + if (mkfifo(fifofile, 01666) == -1) { > if (errno == EEXIST) { > syslog(LOG_INFO, "mkfifo already exists: %s %s", > fifofile, strerror(errno)); > diff --git a/src/base/osaf_secutil.c b/src/base/osaf_secutil.c > index 0e175c9..71e512a 100644 > --- a/src/base/osaf_secutil.c > +++ b/src/base/osaf_secutil.c > @@ -147,7 +147,7 @@ static int server_sock_create(const char *pathname) > } > > /* Connecting to the socket object requires read/write permission. */ > - if (chmod(pathname, 0777) == -1) { > + if (chmod(pathname, 01777) == -1) { > LOG_ER("%s: chmod failed - %s", __FUNCTION__, strerror(errno)); > return -1; > } > diff --git a/src/dtm/transport/log_server.cc b/src/dtm/transport/log_server.cc > index bef1f07..866fe59 100644 > --- a/src/dtm/transport/log_server.cc > +++ b/src/dtm/transport/log_server.cc > @@ -35,7 +35,7 @@ LogServer::LogServer(int term_fd) > max_backups_{9}, > max_file_size_{5 * 1024 * 1024}, > log_socket_{Osaflog::kServerSocketPath, > base::UnixSocket::kNonblocking, > - 0777}, > + 01777}, > log_streams_{}, > current_stream_{new LogStream{kMdsLogStreamName, 1, 5 * 1024 * 1024}}, > no_of_log_streams_{1} { > diff --git a/src/nid/agent/nid_ipc.c b/src/nid/agent/nid_ipc.c > index 172063a..eae8de3 100644 > --- a/src/nid/agent/nid_ipc.c > +++ b/src/nid/agent/nid_ipc.c > @@ -66,7 +66,7 @@ uint32_t nid_create_ipc(char *strbuf) > mask = umask(0); > > /* Create nid fifo */ > - if (mkfifo(NID_FIFO, 0666) < 0) { > + if (mkfifo(NID_FIFO, 01666) < 0) { > sprintf(strbuf, " FAILURE: Unable To Create FIFO Error:%s\n", > strerror(errno)); > umask(mask); ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] mds: Send NCSMDS_DOWN with vdest if there is no any adest [#2941]
Hi Minh, ack, code review and mdstest run. One minor comment below. /Thanks HansN On 10/25/18 04:40, Minh Chau wrote: > If split brain happens and network merges back, at this point in time > there are a few mds events coming to payloads, which are the SVC UP > from the other controller; SVC down from services in both controllers > due to reboot from split brain detection. > In the ticket description, the first partition includes SC1, PL3, > the second partition includes SC2, PL4, PL5. The amfnd on PL3 is > missing NCSMDS_DOWN with vdest in the below scenario: > > - SVC up event from the other amfd (on SC2) > - SVC down event from amfd (SC1), it's the same active adest from > mds-PL3's view, start await_active timer, but no NCSMDS_DOWN with > vdest is sent because the adest on SC2 exists. > - SVC down event from amfd (SC2), it's different active adest. > > Because the payloads reside in different partitions so they don't > have the same active adest view at mds level. When both SCs go down > due to split brain detection, the same SVC down events occur and > comes to all payloads, but they have different view so they behave > differently to the payloads in the other partition. > > The patch adds an additional condition to send NCSMDS_DOWN if there is > no actual adest existed > --- > src/mds/mds_c_api.c | 80 > ++--- > 1 file changed, 46 insertions(+), 34 deletions(-) > > diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c > index f5ba318..73849cc 100644 > --- a/src/mds/mds_c_api.c > +++ b/src/mds/mds_c_api.c > @@ -3644,13 +3644,58 @@ uint32_t mds_mcm_svc_down(PW_ENV_ID pwe_id, > MDS_SVC_ID svc_id, V_DEST_RL role, > local_svc_hdl, svc_id, vdest_id, > _adest, _running, > _result_info, true); > - [HansN] is this log message informative/needed? > + m_MDS_LOG_INFO("MCM:API: svc_down: " > + "active_adest:%lu", active_adest); > /* First delete the entry */ > mds_subtn_res_tbl_del( > local_svc_hdl, svc_id, vdest_id, adest, > vdest_policy, svc_sub_part_ver, > archword_type); > > + MDS_SUBSCRIPTION_RESULTS_INFO *s_info = NULL; > + bool adest_exists = false; > + > + /* if no adest remains for this svc > + * send MDS_DOWN > + */ > + status = mds_subtn_res_tbl_getnext_any( > + local_svc_hdl, svc_id, > + _info); > + > + while (status != NCSCC_RC_FAILURE) { > + if (s_info->key.vdest_id != > + m_VDEST_ID_FOR_ADEST_ENTRY) { > + adest_exists = true; > + break; > + } > + > + status = mds_subtn_res_tbl_getnext_any( > + local_svc_hdl, svc_id, _info); > + } > + > + if (active_adest != adest > + && vdest_policy == NCS_VDEST_TYPE_MxN > + && adest_exists == false) { > + m_MDS_LOG_INFO("MCM:API: svc_down : " > + "svc_id = %s(%d) on DEST id = > %d " > + "got NO_ACTIVE for svc_id = > %s(%d) " > +"on Vdest id = %d Adest = %s, rem_svc_pvt_ver=%d", > + get_svc_names( > + > m_MDS_GET_SVC_ID_FROM_SVC_HDL(local_svc_hdl)), > + m_MDS_GET_SVC_ID_FROM_SVC_HDL( > + local_svc_hdl), > + m_MDS_GET_VDEST_ID_FROM_SVC_HDL( > + local_svc_hdl), > + get_svc_names(svc_id), svc_id, > + vdest_id, > + > log_subtn_result_info->sub_adest_details, > + svc_sub_part_ver); > + status = mds_mcm_user_event_callback( > + local_svc_hdl, pwe_id, svc_id, > + role, vdest_id, 0, NCSMDS_DOWN, > +
Re: [devel] [PATCH 1/1] amfd: reset snd_msg_id in LostFound state [#2952]
Ack, review only/Thanks HansN Från: Gary Lee Skickat: den 2 november 2018 06:23:09 Till: nagendra @ hasolutions . in; Minh Hon Chau; Hans Nordebäck Kopia: opensaf-devel@lists.sourceforge.net; Gary Lee Ämne: [PATCH 1/1] amfd: reset snd_msg_id in LostFound state [#2952] If a PL rejoins the main network partition before the node failover timer expires, it is told to reboot by AMFD. AMFND thinks it has become headless and resets rcv_msg_id to 0, and shows this when it receives the reboot msg from AMFD: Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: Message ID mismatch, rec xx, expected 1, OwnNodeId = xx, SupervisionTime = 60 We can avoid this by resetting snd_msg_id for this PL in AMFD in state LostFound, before the reboot msg is sent. --- src/amf/amfd/node_state.cc | 5 + 1 file changed, 5 insertions(+) diff --git a/src/amf/amfd/node_state.cc b/src/amf/amfd/node_state.cc index a8659dcf7..787ddab94 100644 --- a/src/amf/amfd/node_state.cc +++ b/src/amf/amfd/node_state.cc @@ -126,6 +126,11 @@ void LostFound::TimerExpired() { node->node_name.c_str()); if (fsm_->Active() == true) { +// amfnd thinks it's been headless and resets its rcv_msg_id to 0, +// also do the same here to avoid 'Message ID mismatch' errors +// at amfnd +node->snd_msg_id = 0; + LOG_WA("Sending node reboot order"); avd_d2n_reboot_snd(node); -- 2.17.1 ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] imm: fix osafimmnd coredump genereted during sanity test [#2947]
Ack, review only/Thanks HansN -Original Message- From: Vu Minh Nguyen Sent: den 29 oktober 2018 10:15 To: Hans Nordebäck ; Lennart Lund ; Gary Lee Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen Subject: [PATCH 1/1] imm: fix osafimmnd coredump genereted during sanity test [#2947] The coredump is generated in the context of processing the message type "IMMND_EVT_D2ND_IMPLDELETE" because the memory is corrupted at the time of decoding that message. It allocated 'size' bytes of memory with the boundary in range [0 - 'size - 1'], but modified - added null terminated, the memory at the index of `size` which was out of that range. This patch fixes such issue. The memory should be allocated with `size + 1` bytes in length. --- src/imm/common/immsv_evt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/imm/common/immsv_evt.c b/src/imm/common/immsv_evt.c index 03a7f8125..c93f82a0f 100644 --- a/src/imm/common/immsv_evt.c +++ b/src/imm/common/immsv_evt.c @@ -2898,7 +2898,7 @@ static uint32_t immsv_evt_dec_sublevels(NCS_UBAID *i_ub, IMMSV_EVT *o_evt) implNameList[i].size = ncs_decode_32bit(); ncs_dec_skip_space(i_ub, 4); - implNameList[i].buf = (char *)malloc(implNameList[i].size); + implNameList[i].buf = (char *)malloc(implNameList[i].size + 1); if (implNameList[i].buf == NULL || ncs_decode_n_octets_from_uba(i_ub, (uint8_t *)implNameList[i].buf, -- 2.18.0 ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/4] amfd: add class definitions for new timers [#2918]
ack, code review only/Thanks HansN On 10/24/18 14:26, Gary Lee wrote: > osafAmfDelayNodeFailoverTimeout - the number of seconds we wait > after MDS down is received before we consider it truly down. > > osafAmfDelayNodeFailoverNodeUpWait - the number of seconds we > wait for Node Up after receving MDS up, before we send reboot > to the node. After sending reboot to a node, also wait up to > this number of seconds before we consider the node to be > down (unless MDs down is received first). > --- > src/amf/config/amf_classes.xml | 14 +- > src/amf/config/amf_objects.xml | 8 > 2 files changed, 21 insertions(+), 1 deletion(-) > > diff --git a/src/amf/config/amf_classes.xml b/src/amf/config/amf_classes.xml > index df5cbd92a..182bd97e5 100644 > --- a/src/amf/config/amf_classes.xml > +++ b/src/amf/config/amf_classes.xml > @@ -1452,5 +1452,17 @@ > SA_CONFIG > SA_WRITABLE > > - > + > + osafAmfDelayNodeFailoverTimeout > + SA_TIME_T > + SA_CONFIG > + SA_WRITABLE > + > + > + osafAmfDelayNodeFailoverNodeUpWait > + SA_TIME_T > + SA_CONFIG > + SA_WRITABLE > + > + > > diff --git a/src/amf/config/amf_objects.xml b/src/amf/config/amf_objects.xml > index 6ed68d83d..c008c7520 100644 > --- a/src/amf/config/amf_objects.xml > +++ b/src/amf/config/amf_objects.xml > @@ -6,6 +6,14 @@ > osafAmfRestrictAutoRepairEnable > 1 > > + > + osafAmfDelayNodeFailoverTimeout > + 0 > + > + > + osafAmfDelayNodeFailoverNodeUpWait > + 180 > + > > > safAppType=OpenSafApplicationType ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 3/4] amfd: add checkpointing of node failover state [#2918]
ack, review only/Thanks HansN On 10/24/18 14:26, Gary Lee wrote: > --- > src/amf/amfd/chkop.cc| 10 ++ > src/amf/amfd/ckpt.h | 3 ++- > src/amf/amfd/ckpt_dec.cc | 40 +++- > src/amf/amfd/ckpt_enc.cc | 26 -- > src/amf/amfd/ckpt_msg.h | 1 + > 5 files changed, 76 insertions(+), 4 deletions(-) > > diff --git a/src/amf/amfd/chkop.cc b/src/amf/amfd/chkop.cc > index 1ba4140c7..e9a68f4cd 100644 > --- a/src/amf/amfd/chkop.cc > +++ b/src/amf/amfd/chkop.cc > @@ -1042,6 +1042,16 @@ uint32_t avsv_send_ckpt_data(AVD_CL_CB *cb, uint32_t > action, > return NCSCC_RC_SUCCESS; > } > break; > +case AVSV_CKPT_NODE_FAILOVER_STATE: > + if ((avd_cb->other_avd_adest != 0) && > + (avd_cb->avd_peer_ver < AVD_MBCSV_SUB_PART_VERSION_9)) { > +TRACE( > +"No ckpt for AVSV_CKPT_NODE_FAILOVER_STATE as peer AMFD has" > +" lower version:%d", > +avd_cb->avd_peer_ver); > +return NCSCC_RC_SUCCESS; > + } > + break; > default: > return NCSCC_RC_SUCCESS; > } > diff --git a/src/amf/amfd/ckpt.h b/src/amf/amfd/ckpt.h > index c006f9a69..875776a21 100644 > --- a/src/amf/amfd/ckpt.h > +++ b/src/amf/amfd/ckpt.h > @@ -35,9 +35,10 @@ > #define AMF_AMFD_CKPT_H_ > > // current version > -#define AVD_MBCSV_SUB_PART_VERSION 8 > +#define AVD_MBCSV_SUB_PART_VERSION 9 > > // supported versions > +#define AVD_MBCSV_SUB_PART_VERSION_9 9 > #define AVD_MBCSV_SUB_PART_VERSION_8 8 > #define AVD_MBCSV_SUB_PART_VERSION_7 7 > #define AVD_MBCSV_SUB_PART_VERSION_6 6 > diff --git a/src/amf/amfd/ckpt_dec.cc b/src/amf/amfd/ckpt_dec.cc > index 9f3949a15..022fa8f4b 100644 > --- a/src/amf/amfd/ckpt_dec.cc > +++ b/src/amf/amfd/ckpt_dec.cc > @@ -49,6 +49,7 @@ static uint32_t dec_oper_su(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC > *dec); > static uint32_t dec_node_up_info(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC *dec); > static uint32_t dec_node_admin_state(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC *dec); > static uint32_t dec_node_oper_state(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC *dec); > +static uint32_t dec_node_failover_state(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC > *dec); > static uint32_t dec_node_state(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC *dec); > static uint32_t dec_node_rcv_msg_id(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC *dec); > static uint32_t dec_node_snd_msg_id(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC *dec); > @@ -160,7 +161,8 @@ const AVSV_DECODE_CKPT_DATA_FUNC_PTR > avd_dec_data_func_list[] = { > dec_comp_curr_num_csi_stby, dec_comp_oper_state, > dec_comp_readiness_state, > dec_comp_pres_state, dec_comp_restart_count, nullptr, /* > AVSV_SYNC_COMMIT */ > dec_su_restart_count, dec_si_dep_state, dec_ng_admin_state, > -dec_avd_to_avd_job_queue_status > +dec_avd_to_avd_job_queue_status, > +dec_node_failover_state > > }; > > @@ -2958,3 +2960,39 @@ static uint32_t > dec_avd_to_avd_job_queue_status(AVD_CL_CB *cb, > TRACE_LEAVE(); > return NCSCC_RC_SUCCESS; > } > + > +static uint32_t dec_node_failover_state(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC > *dec) { > + TRACE_ENTER(); > + > + uint32_t state; > + SaNameT name; > + > + osaf_decode_sanamet(>i_uba, ); > + const std::string node_name(Amf::to_string()); > + osaf_extended_name_free(); > + > + AVD_AVND* node; > + node = avd_node_get(node_name); > + > + if (node == nullptr) { > +LOG_ER("%s: node not found, nodeid=%s", __FUNCTION__, node_name.c_str()); > +return NCSCC_RC_FAILURE; > + } > + > + osaf_decode_uint32(>i_uba, > + reinterpret_cast()); > + > + auto failed_node = cb->failover_list.find(node->node_info.nodeId); > + if (failed_node != cb->failover_list.end()) { > +failed_node->second->SetState(state); > + } else { > +LOG_NO("Node '%s' not found in failover_list. Create new entry", > +node->node_name.c_str()); > +auto new_node = std::make_shared(cb, > + node->node_info.nodeId); > +new_node->SetState(state); > +cb->failover_list[node->node_info.nodeId] = new_node; > + } > + > + return NCSCC_RC_SUCCESS; > +} > \ No newline at end of file > diff --git a/src/amf/amfd/ckpt_enc.cc b/src/amf/amfd/ckpt_enc.cc > index 0a2d73698..0e675aed5 100644 > --- a/src/amf/amfd/ckpt_enc.cc > +++ b/src/amf/amfd/ckpt_enc.cc > @@ -48,6 +48,7 @@ static uint32_t enc_oper_su(AVD_CL_CB *cb, NCS_MBCSV_CB_ENC > *enc); > static uint32_t enc_node_up_info(AVD_CL_CB *cb, NCS_MBCSV_CB_ENC *enc); > static uint32_t enc_node_admin_state(AVD_CL_CB *cb, NCS_MBCSV_CB_ENC *enc); > static uint32_t enc_node_oper_state(AVD_CL_CB *cb, NCS_MBCSV_CB_ENC *enc); > +static uint32_t enc_node_failover_state(AVD_CL_CB *cb, NCS_MBCSV_CB_ENC > *enc); > static uint32_t enc_node_state(AVD_CL_CB *cb, NCS_MBCSV_CB_ENC *enc); > static uint32_t enc_node_rcv_msg_id(AVD_CL_CB *cb, NCS_MBCSV_CB_ENC *enc); > static uint32_t enc_node_snd_msg_id(AVD_CL_CB *cb,
Re: [devel] [PATCH 2/4] amfnd: allow reboot from any director [#2918]
ack, review only/Thanks HansN On 10/24/18 14:26, Gary Lee wrote: > allow reboot msg to be sent from any director, for > split brain recovery situations > --- > src/amf/amfnd/mds.cc | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/amf/amfnd/mds.cc b/src/amf/amfnd/mds.cc > index 1ee24cf5b..d179ff40e 100644 > --- a/src/amf/amfnd/mds.cc > +++ b/src/amf/amfnd/mds.cc > @@ -328,7 +328,8 @@ uint32_t avnd_mds_rcv(AVND_CB *cb, > MDS_CALLBACK_RECEIVE_INFO *rcv_info) { > * from any other anchor than Active (except for HB message). > */ > if ((rcv_info->i_fr_dest != cb->active_avd_adest) && > - (msg.info.avd->msg_type != AVSV_D2N_HEARTBEAT_MSG)) { > + (msg.info.avd->msg_type != AVSV_D2N_HEARTBEAT_MSG) && > + (msg.info.avd->msg_type != AVSV_D2N_REBOOT_MSG)) { > LOG_ER("Received dest: %" PRIu64 " and cb active AVD adest:%" PRIu64 > " mismatch, message type = %u", > rcv_info->i_fr_dest, cb->active_avd_adest, ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/4] amfd: add class definitions for new timers [#2918]
ack, review only/Thanks HansN On 10/24/18 14:26, Gary Lee wrote: > osafAmfDelayNodeFailoverTimeout - the number of seconds we wait > after MDS down is received before we consider it truly down. > > osafAmfDelayNodeFailoverNodeUpWait - the number of seconds we > wait for Node Up after receving MDS up, before we send reboot > to the node. After sending reboot to a node, also wait up to > this number of seconds before we consider the node to be > down (unless MDs down is received first). > --- > src/amf/config/amf_classes.xml | 14 +- > src/amf/config/amf_objects.xml | 8 > 2 files changed, 21 insertions(+), 1 deletion(-) > > diff --git a/src/amf/config/amf_classes.xml b/src/amf/config/amf_classes.xml > index df5cbd92a..182bd97e5 100644 > --- a/src/amf/config/amf_classes.xml > +++ b/src/amf/config/amf_classes.xml > @@ -1452,5 +1452,17 @@ > SA_CONFIG > SA_WRITABLE > > - > + > + osafAmfDelayNodeFailoverTimeout > + SA_TIME_T > + SA_CONFIG > + SA_WRITABLE > + > + > + osafAmfDelayNodeFailoverNodeUpWait > + SA_TIME_T > + SA_CONFIG > + SA_WRITABLE > + > + > > diff --git a/src/amf/config/amf_objects.xml b/src/amf/config/amf_objects.xml > index 6ed68d83d..c008c7520 100644 > --- a/src/amf/config/amf_objects.xml > +++ b/src/amf/config/amf_objects.xml > @@ -6,6 +6,14 @@ > osafAmfRestrictAutoRepairEnable > 1 > > + > + osafAmfDelayNodeFailoverTimeout > + 0 > + > + > + osafAmfDelayNodeFailoverNodeUpWait > + 180 > + > > > safAppType=OpenSafApplicationType ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] imm: add an admin operation to regenerate db from memory [#2940]
Hi Vu, ack, code review only. /BR HansN On 10/19/18 05:58, Vu Minh Nguyen wrote: > After split-brain recovery, there is possibility of having inconsistencies > between IMM data model in memory held by IMMND and one in the back-end > database (sqlite). > > That could happen as we might have 02 active IMMDs, 02 IMMND coordinators > and more than one PBE processes accessing a shared pbe database. > Change to such database from one therefore might not get noticed by the other. > > By using this admin operation ID and targeting to IMM, IMM will regenerate the > back-end database from one in memory to keep them both consistent. > > immadm -o 303 safRdn=immManagement,safApp=safImmService > --- > src/imm/README | 18 ++ > src/imm/common/immsv_api.h | 4 +++- > src/imm/immnd/ImmModel.cc | 22 +- > src/imm/immnd/ImmModel.h | 3 ++- > src/imm/immnd/immnd_init.h | 2 ++ > src/imm/immnd/immnd_proc.c | 8 ++-- > 6 files changed, 52 insertions(+), 5 deletions(-) > > diff --git a/src/imm/README b/src/imm/README > index 750d811a5..71e5c4fe3 100644 > --- a/src/imm/README > +++ b/src/imm/README > @@ -3033,6 +3033,24 @@ expires. > To be possible to use this new feature, bit 10 must be set in > opensafImmNostdFlags attribute in IMM object. > > + > +Provide an admin-operation for re-generating backend database from one in RAM > += > +https://sourceforge.net/p/opensaf/tickets/2940/ > + > +After split-brain recovery, there is possibility of having inconsistencies > +between IMM data model in memory held by IMMND and one in the back-end > +database (sqlite). > + > +That could happen as we might have 02 active IMMDs, 02 IMMND coordinators > +and more than one PBE processes accessing a shared pbe database. > +Change to such database from one therefore might not get noticed by the > other. > + > +By using this admin operation ID and targeting to IMM, IMM will regenerate > the > +back-end database from one in memory to keep them both consistent. > + > +immadm -o 303 safRdn=immManagement,safApp=safImmService > + > > DEPENDENCIES > > diff --git a/src/imm/common/immsv_api.h b/src/imm/common/immsv_api.h > index 32fc5738e..e6d613705 100644 > --- a/src/imm/common/immsv_api.h > +++ b/src/imm/common/immsv_api.h > @@ -157,7 +157,9 @@ typedef enum { > typedef enum { > SA_IMM_ADMIN_EXPORT = 1, /* Defined in A.02.01 declared in A.03.01 */ > SA_IMM_ADMIN_INIT_FROM_FILE = 100, /* Non standard, force PBE disable. */ > - SA_IMM_ADMIN_ABORT_CCBS = 202 /* Non standard, abort non critical CCBs. */ > + SA_IMM_ADMIN_ABORT_CCBS = 202, /* Non standard, abort non critical CCBs. */ > + /* Non standard, regenerate pbe database from RAM */ > + SA_IMM_ADMIN_REGENERATE_PBE_DB = 303 > } SaImmMngtAdminOperationT; > > /* > diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc > index 21f48ab59..8e3f338dc 100644 > --- a/src/imm/immnd/ImmModel.cc > +++ b/src/imm/immnd/ImmModel.cc > @@ -596,6 +596,7 @@ static const std::string > saImmRepositoryInit("saImmRepositoryInit"); > static const std::string saImmOiTimeout("saImmOiTimeout"); > > static SaImmRepositoryInitModeT immInitMode = SA_IMM_INIT_FROM_FILE; > +static bool sRegenerateDb = false; > > static SaUint32T sCcbIdLongDnGuard = > 0; /* Disallow long DN additions if longDnsAllowed is being changed in > ccb*/ > @@ -2003,6 +2004,14 @@ void immModel_setLoader(IMMND_CB* cb, SaInt32T > loaderPid) { > ImmModel::instance(>immModel)->setLoader(loaderPid); > } > > +void immModel_setRegenerateDbFlag(IMMND_CB* cb, bool value) { > + ImmModel::instance(>immModel)->setRegenerateDbFlag(value); > +} > + > +bool immModel_getRegenerateDbFlag(IMMND_CB* cb) { > + return ImmModel::instance(>immModel)->getRegenerateDbFlag(); > +} > + > void immModel_recognizedIsolated(IMMND_CB* cb) { > ImmModel::instance(>immModel)->recognizedIsolated(); > } > @@ -2901,6 +2910,14 @@ int ImmModel::adjustEpoch(int suggestedEpoch, > SaUint32T* continuationIdPtr, > return suggestedEpoch; > } > > +bool ImmModel::getRegenerateDbFlag() { > + return sRegenerateDb; > +} > + > +void ImmModel::setRegenerateDbFlag(bool value) { > + sRegenerateDb = value; > +} > + > /** >* Fetches the SaImmRepositoryInitT value of the attribute >* 'saImmRepositoryInit' in the object immManagementDn. > @@ -13808,6 +13825,9 @@ SaAisErrorT ImmModel::admoImmMngtObject(const > ImmsvOmAdminOperationInvoke* req, > LOG_IN("sAbortNonCriticalCcbs = true;"); > sAbortNonCriticalCcbs = true; > } > + } else if (req->operationId == SA_IMM_ADMIN_REGENERATE_PBE_DB) { > +LOG_NO("Re-generate the pbe database from one in memory."); > +sRegenerateDb = true; > } else { > LOG_NO("Invalid operation ID %llu, for operation on %s", >
Re: [devel] [PATCH 1/1] amfnd: change log message severity [#2945]
ack/Thanks Hans On 10/25/18 07:28, Gary Lee wrote: > --- > src/amf/amfnd/clm.cc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/amf/amfnd/clm.cc b/src/amf/amfnd/clm.cc > index f1f65bcef..06eb229c7 100644 > --- a/src/amf/amfnd/clm.cc > +++ b/src/amf/amfnd/clm.cc > @@ -124,7 +124,7 @@ static void clm_to_amf_node(void) { > > error = saImmOmInitialize_cond(, nullptr, ); > if (SA_AIS_OK != error) { > -LOG_CR("saImmOmInitialize failed. Use previous value of nodeName."); > +LOG_WA("saImmOmInitialize failed. Use previous value of nodeName."); > osafassert(avnd_cb->amf_nodeName.empty() == false); > goto done1; > } ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] imm: fix coredump is generated after split-brain recovery [#2942]
Hi Vu, ack review only. One minor comment below. /Thanks HansN On 10/19/18 10:16, Vu Minh Nguyen wrote: After split-recovery, there is possibility of having epoch counters mismatched b/w one on IMMND veteran located at this partition and one from active IMMD on another partition. With that, instead of generating coredump in such case, we should syslog error message and have the IMMND veteran self-terminated. --- src/imm/immnd/immnd_evt.c | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c index b260d43ff..bc55ea946 100644 --- a/src/imm/immnd/immnd_evt.c +++ b/src/imm/immnd/immnd_evt.c @@ -10279,22 +10279,21 @@ static uint32_t immnd_evt_proc_start_sync(IMMND_CB *cb, IMMND_EVT *evt, } else { if (cb->mMyEpoch + 1 < cb->mRulingEpoch) { if (cb->mState > IMM_SERVER_LOADING_PENDING) { - LOG_WA( - "Imm at this node has epoch %u, " + LOG_ER( + "Imm at this node has epoch %u, rulling epoch %u" "appears to be a stragler in wrong state %u", - cb->mMyEpoch, cb->mState); - abort(); + cb->mMyEpoch, cb->mRulingEpoch, cb->mState); + exit(1); } else { TRACE_2( "This nodes apparently missed start of sync"); } } else { - osafassert(cb->mMyEpoch + 1 > cb->mRulingEpoch); - LOG_WA( - "Imm at this evs node has epoch %u, " + LOG_ER( + "Imm at this evs node has epoch %u, rulling epoch %u" [HansN] perhaps the log message needs some updates, e.g. "COORDINATOR appears to be a straggler!!, exiting.", "COORDINATOR appears to be a stragler!!, aborting.", - cb->mMyEpoch); - abort(); + cb->mMyEpoch, cb->mRulingEpoch); + exit(1); /* TODO: 080414 re-inserted the osafassert/abort ... This is an extreemely odd case. Possibly it could occur after a failover ?? */ ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] uml: add support for plm to run under uml [#2922]
ack, review only. (Run ok with the legacy uml without plm enabled). Minor comment, the function cmd_build/install_container_testprog() is part of ticket #70? /BR HansN On 09/17/2018 09:53 PM, Alex Jones wrote: > Add support for plm to run under uml. > --- > src/plm/config/openhpi.conf| 18 > tools/cluster_sim_uml/archive/scripts/40opensaf.rc | 30 +++ > tools/cluster_sim_uml/build_uml| 95 > -- > 3 files changed, 138 insertions(+), 5 deletions(-) > create mode 100644 src/plm/config/openhpi.conf > > diff --git a/src/plm/config/openhpi.conf b/src/plm/config/openhpi.conf > new file mode 100644 > index 0..b811de134 > --- /dev/null > +++ b/src/plm/config/openhpi.conf > @@ -0,0 +1,18 @@ > +OPENHPI_AUTOINSERT_TIMEOUT = 50 > +OPENHPI_AUTOINSERT_TIMEOUT_READONLY = "NO" > + > +# Section for dynamic_simulator plugin > +handler libdyn_simulator { > +entity_root = "{ADVANCEDTCA_CHASSIS,2}" > +# Location of the simulation data file > +# Normally an example file is installed in the same directory as > openhpi.conf. > +# Please change the following entry if you have configured another install > +# directory or will use your own simulation.data. > +file = "/etc/openhpi/opensaf-plm-sim.txt" > +# infos goes to logfile and stdout > +# the logfile are log00.log, log01.log ... > +#logflags = "file stdout" > +#logfile = "dynsim" > +# if #logfile_max reached replace the oldest one > +#logfile_max = "5" > +} > diff --git a/tools/cluster_sim_uml/archive/scripts/40opensaf.rc > b/tools/cluster_sim_uml/archive/scripts/40opensaf.rc > index 7df4cfee6..9057d680b 100644 > --- a/tools/cluster_sim_uml/archive/scripts/40opensaf.rc > +++ b/tools/cluster_sim_uml/archive/scripts/40opensaf.rc > @@ -76,4 +76,34 @@ echo "$node_name" > /etc/opensaf/node_name > echo "/tmp/core_%t_%e_%p" > /proc/sys/kernel/core_pattern > ulimit -c unlimited > > +if test -e /etc/plmcd.conf; then > +sc_1_ip=$(grep "SC-1" /etc/hosts | cut -d' ' -f 1) > +sc_2_ip=$(grep "SC-2" /etc/hosts | cut -d' ' -f 1) > +if [ "$node_name" == "SC-1" ]; then > + ee="Linux_os_hosting_clm_node,safHE=f120_slot_1" > + path="my_entity = > \"{ADVANCEDTCA_CHASSIS,2}{PHYSICAL_SLOT,1}{SWITCH_BLADE,0}\"" > +elif [ "$node_name" == "SC-2" ]; then > + ee="Linux_os_hosting_clm_node,safHE=f120_slot_16" > + path="my_entity = > \"{ADVANCEDTCA_CHASSIS,2}{PHYSICAL_SLOT,16}{SWITCH_BLADE,0}\"" > +else > + ee="$node_name" > +fi > +sed -i -e "s/10.105.1.3/$sc_1_ip/" \ > +-e "s/10.105.1.6/$sc_2_ip/" \ > +-e "s/0020f/safEE=$ee,safDomain=domain_1/" \ > +-e "s/1;os;Fedora;2.6.31/1;os;SUSE;2.6/" \ > +-e "/^\/etc\/init.d/s/^/#/" \ > +/etc/plmcd.conf > +cp /etc/openhpi/openhpi.conf /var/opt > +chmod go-rwx /var/opt/openhpi.conf > +echo "$path" > /etc/openhpi/openhpiclient.conf > + > +/usr/sbin/openhpid -c /var/opt/openhpi.conf > + > +# wait for hpi to read in hardware info > +sleep 10 > + > +/usr/local/sbin/plmcd& > +fi > + > /etc/init.d/opensafd start& > diff --git a/tools/cluster_sim_uml/build_uml b/tools/cluster_sim_uml/build_uml > index 16d49d03e..e54e45753 100755 > --- a/tools/cluster_sim_uml/build_uml > +++ b/tools/cluster_sim_uml/build_uml > @@ -121,6 +121,73 @@ cmd_install_testprog() { > cmd_mkcpio > } > > +cmd_build_container_testprog() { > +src=$opensaf_home/samples/amf/container > +libd=$root/usr/local/$lib_dir > +installd=$root/opt/amf_demo > + > +mkdir -p "$installd" > +cp $src/amf_container_script $installd > +gcc -g -O2 -Wall -fPIC -I$opensaf_home/src/amf/saf \ > + -I$opensaf_home/src/ais/include \ > + -DSA_EXTENDED_NAME_SOURCE \ > + -o $installd/amf_container_demo $src/amf_container_demo.c \ > + -Wl,--as-needed "-Wl,-rpath-link,$libd:$libd/opensaf" "-L$libd" > -lSaAmf -lopensaf_core > + > +echo "Creating [$root/root.cpio] ..." > +cmd_mkcpio > +} > + > +## install_container_testprog > +## Build and install the AMF container demo program. > +## > +cmd_install_container_testprog() { > +src=$opensaf_home/samples/amf/container > +libd=$root/usr/local/$lib_dir > +installd=$root/opt/amf_demo > +immxml=$root/etc/opensaf/imm.xml > +containedXml=$src/AppConfig-contained-2N.xml > +containerXml=$src/AppConfig-container.xml > + > +mkdir -p $installd > +cp $src/amf_container_script $installd > +gcc -g -O2 -Wall -fPIC -I$opensaf_home/src/amf/saf \ > + -I$opensaf_home/src/ais/include \ > + -DSA_EXTENDED_NAME_SOURCE \ > + -o $installd/amf_container_demo $src/amf_container_demo.c \ > + -Wl,--as-needed "-Wl,-rpath-link,$libd:$libd/opensaf" "-L$libd" > -lSaAmf > + > +test -r $immxml.orig || cp $immxml $immxml.orig > +$opensaf_home/src/imm/tools/immxml-merge \ > + $immxml.orig
Re: [devel] [PATCH 1/1] osaf: modify log severity level in Consensus::Demote [#2912]
Ack, (code review only)/Thanks Hans -Original Message- From: Gary Lee Sent: den 17 augusti 2018 02:00 To: Hans Nordebäck ; Minh Hon Chau ; Anders Widell Cc: opensaf-devel@lists.sourceforge.net; Gary Lee Subject: [PATCH 1/1] osaf: modify log severity level in Consensus::Demote [#2912] All callers of Consensus::Demote() already log an error if the return code is not SA_AIS_OK. A warning message will suffice. --- src/osaf/consensus/consensus.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc index 2a8e9bb1c..dc5c9bc46 100644 --- a/src/osaf/consensus/consensus.cc +++ b/src/osaf/consensus/consensus.cc @@ -142,7 +142,7 @@ SaAisErrorT Consensus::Demote(const std::string& node) { } if (rc != SA_AIS_OK) { -LOG_ER("Unlock failed (%u)", rc); +LOG_WA("Unlock failed (%u)", rc); return rc; } -- 2.17.1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] amf: remove assignment for NPI component with enable DisableRestart [#2879]
Hi Thang, Ack, review only. I think you should keep V1, with the comments, my only suggestion was to correct the misspelled "thus SU" to "this SU". /Thanks HansN -Original Message- From: thang.nguyen Sent: den 8 augusti 2018 08:49 To: Hans Nordebäck ; Gary Lee ; Minh Hon Chau Cc: opensaf-devel@lists.sourceforge.net; Thang Duc Nguyen Subject: [PATCH 1/1] amf: remove assignment for NPI component with enable DisableRestart [#2879] With NPI component configured with saAmfCtDefDisableRestart=1. Once invoking restart admin op, amfnd does not remove the assignment and cause the crash. Remove assignment before change the pres state to TERMINATION in clc. --- src/amf/amfnd/clc.cc | 4 1 file changed, 4 insertions(+) diff --git a/src/amf/amfnd/clc.cc b/src/amf/amfnd/clc.cc index c8e60e6..126362b 100644 --- a/src/amf/amfnd/clc.cc +++ b/src/amf/amfnd/clc.cc @@ -2217,6 +2217,10 @@ uint32_t avnd_comp_clc_inst_restart_hdler(AVND_CB *cb, AVND_COMP *comp) { /* invoke terminate callback */ rc = avnd_comp_cbk_send(cb, comp, AVSV_AMF_COMP_TERM, 0, 0); else { + /* For NPI component with DisableRestart=1 */ + if (m_AVND_COMP_IS_RESTART_DIS(comp) && (comp->csi_list.n_nodes > 0)) { +su_send_suRestart_recovery_msg(comp->su); + } rc = avnd_comp_clc_cmd_execute(cb, comp, AVND_COMP_CLC_CMD_TYPE_TERMINATE); m_AVND_COMP_REG_PARAM_RESET(cb, comp); -- 2.7.4 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] imm: attrDefaultValue is set to NULL if no default value is given [#2901]
Hi Vu, yes, reinterpret_cast should be avoided if possible, in this case static_cast is better. You can ignore my comment about passing `attrDefaultValueBuffer` directly, it is not valid,. /Thanks HansN -Original Message- From: Vu Minh Nguyen Sent: den 7 augusti 2018 05:33 To: Hans Nordebäck ; Lennart Lund ; Gary Lee Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 1/1] imm: attrDefaultValue is set to NULL if no default value is given [#2901] Hi Hans, Thanks for your comments. I will update the code using static_cast. Passing `attrDefaultValueBuffer` directly to strlen() without type-casting will generate a compile error because of invalid conversion from `void*` to `const char*`, I think. Regards, Vu > -Original Message- > From: Hans Nordeback > Sent: Monday, August 6, 2018 7:07 PM > To: Vu Minh Nguyen ; > lennart.l...@ericsson.com; gary@dektech.com.au > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1/1] imm: attrDefaultValue is set to NULL if no > default value is given [#2901] > > Hi Vu, > > ack, review only. Minor comment below./Thanks HansN > > > On 07/30/2018 10:46 AM, Vu Minh Nguyen wrote: > > When explicitly having tag, but no value is given: > > , set NULL to attrDefaultValue. > > --- > > src/imm/immloadd/imm_loader.cc | 3 ++- > > src/imm/tools/imm_import.cc| 3 ++- > > 2 files changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/src/imm/immloadd/imm_loader.cc > b/src/imm/immloadd/imm_loader.cc > > index de5a575e9..ad9785e92 100644 > > --- a/src/imm/immloadd/imm_loader.cc > > +++ b/src/imm/immloadd/imm_loader.cc > > @@ -1909,7 +1909,8 @@ void addClassAttributeDefinition( > > attrDefinition.attrFlags = attrFlags; > > > > /* Set the default value */ > > - if (attrDefaultValueBuffer) { > > + if (attrDefaultValueBuffer && > [HansN] use static_cast(attrDefaultValueBuffer) instead > or > > (strlen(attrDefaultValueBuffer) > 0)) { (instead of the reinterpret_cast, > not > needed though) > > > + (strlen(reinterpret_cast(attrDefaultValueBuffer)) > > > + 0)) { > > charsToValueHelper(, attrValueType, > > (const char *)attrDefaultValueBuffer); > > } else { > > diff --git a/src/imm/tools/imm_import.cc > > b/src/imm/tools/imm_import.cc index e2bdcba5c..8145ec572 100644 > > --- a/src/imm/tools/imm_import.cc > > +++ b/src/imm/tools/imm_import.cc > > @@ -2444,7 +2444,8 @@ static void > addClassAttributeDefinition(ParserState *state) { > > } > > > > /* Set the default value */ > > - if (state->attrDefaultValueSet) { > > + if (state->attrDefaultValueSet && > [HansN] use static_cast(attrDefaultValueBuffer) instead > or > > (strlen(attrDefaultValueBuffer) > 0)) { (instead of the reinterpret_cast, > not > needed though) > > > + > > + (strlen(reinterpret_cast(state->attrDefaultValueBuffer)) > > > + 0)) { > > if (charsToValueHelper(, > > state->attrValueType, > > state->attrDefaultValueBuffer, > > state->strictParse)) { -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] amfd: remove redundant const_cast [#2907]
Ack, review only/Thanks HansN -Original Message- From: Gary Lee Sent: den 1 augusti 2018 05:47 To: Hans Nordebäck ; Minh Hon Chau Cc: opensaf-devel@lists.sourceforge.net; Gary Lee Subject: [PATCH 1/1] amfd: remove redundant const_cast [#2907] --- src/amf/amfd/clm.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc index e113a65f9..1e67ff389 100644 --- a/src/amf/amfd/clm.cc +++ b/src/amf/amfd/clm.cc @@ -631,7 +631,7 @@ AvdJobDequeueResultT ClmTrackStart::exec(AVD_CL_CB* cb) { AvdJobDequeueResultT res; TRACE_ENTER(); - SaAisErrorT rc = avd_clm_track_start(const_cast(cb)); + SaAisErrorT rc = avd_clm_track_start(cb); if (rc == SA_AIS_OK) { delete Fifo::dequeue(); res = JOB_EXECUTED; @@ -652,7 +652,7 @@ AvdJobDequeueResultT ClmTrackStop::exec(AVD_CL_CB* cb) { AvdJobDequeueResultT res; TRACE_ENTER(); - SaAisErrorT rc = avd_clm_track_stop(const_cast(cb)); + SaAisErrorT rc = avd_clm_track_stop(cb); if (rc == SA_AIS_OK) { delete Fifo::dequeue(); res = JOB_EXECUTED; -- 2.17.1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] amfd: put sync jobs into queue if IMM is busy [#2863]
Hi Gary, You can remove all const_cast's in functions where the const parameter has been removed, e.g. AVD_CL_CB parameter ClmTrackStart::Exec etc. /Thanks HansN -Original Message- From: Gary Lee Sent: den 4 juli 2018 03:16 To: Minh Hon Chau ; Hans Nordebäck Cc: opensaf-devel@lists.sourceforge.net; Gary Lee Subject: [PATCH 1/1] amfd: put sync jobs into queue if IMM is busy [#2863] --- src/amf/amfd/cb.h | 3 ++- src/amf/amfd/clm.cc | 4 ++-- src/amf/amfd/clm.h | 4 ++-- src/amf/amfd/imm.cc | 33 - src/amf/amfd/imm.h | 18 +- src/amf/amfd/ntf.cc | 2 +- 6 files changed, 40 insertions(+), 24 deletions(-) diff --git a/src/amf/amfd/cb.h b/src/amf/amfd/cb.h index 60bb554de..3b7e6d13f 100644 --- a/src/amf/amfd/cb.h +++ b/src/amf/amfd/cb.h @@ -63,7 +63,8 @@ typedef enum { AVD_IMM_INIT_BASE = 1, AVD_IMM_INIT_ONGOING = 2, AVD_IMM_INIT_DONE = 3, - AVD_IMM_TERMINATING = 4, + AVD_IMM_BUSY = 4, + AVD_IMM_TERMINATING = 5, } AVD_IMM_INIT_STATUS; /* * Sync state of the Standby. diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc index 9d317892a..e113a65f9 100644 --- a/src/amf/amfd/clm.cc +++ b/src/amf/amfd/clm.cc @@ -627,7 +627,7 @@ SaAisErrorT avd_start_clm_init_bg(void) { return SA_AIS_OK; } -AvdJobDequeueResultT ClmTrackStart::exec(const AVD_CL_CB* cb) { +AvdJobDequeueResultT ClmTrackStart::exec(AVD_CL_CB* cb) { AvdJobDequeueResultT res; TRACE_ENTER(); @@ -648,7 +648,7 @@ AvdJobDequeueResultT ClmTrackStart::exec(const AVD_CL_CB* cb) { return res; } -AvdJobDequeueResultT ClmTrackStop::exec(const AVD_CL_CB* cb) { +AvdJobDequeueResultT ClmTrackStop::exec(AVD_CL_CB* cb) { AvdJobDequeueResultT res; TRACE_ENTER(); diff --git a/src/amf/amfd/clm.h b/src/amf/amfd/clm.h index 2bbe320f7..f4399c62e 100644 --- a/src/amf/amfd/clm.h +++ b/src/amf/amfd/clm.h @@ -40,14 +40,14 @@ public: class ClmTrackStart : public ClmJob { public: ClmTrackStart() : ClmJob(){}; - AvdJobDequeueResultT exec(const struct cl_cb_tag *cb); + AvdJobDequeueResultT exec(struct cl_cb_tag *cb); ~ClmTrackStart() {} }; class ClmTrackStop : public ClmJob { public: ClmTrackStop() : ClmJob(){}; - AvdJobDequeueResultT exec(const struct cl_cb_tag *cb); + AvdJobDequeueResultT exec(struct cl_cb_tag *cb); ~ClmTrackStop() {} }; diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc index 60a997943..3c1a93729 100644 --- a/src/amf/amfd/imm.cc +++ b/src/amf/amfd/imm.cc @@ -133,7 +133,7 @@ Job::~Job() {} // TODO: Make isImmServiceReady as static to limit its scope // This function should belong to AVD_CB class as a method -static bool isImmServiceReady(const AVD_CL_CB *cb) { +static bool isImmServiceReady(const AVD_CL_CB *cb, bool ignore_busy = +false) { TRACE_ENTER(); bool rc = true; @@ -149,16 +149,21 @@ static bool isImmServiceReady(const AVD_CL_CB *cb) { TRACE("Already IMM init is going, try again after sometime"); rc = false; } + if (avd_cb->avd_imm_status == AVD_IMM_BUSY && +ignore_busy == false) { +TRACE("IMM returned TRY_AGAIN. Postponing synchronous calls"); +rc = false; + } TRACE_LEAVE2("%u:", rc); return rc; } // bool ImmJob::isRunnable(const AVD_CL_CB *cb) { - return isImmServiceReady(cb); + return isImmServiceReady(cb, true); } // -AvdJobDequeueResultT ImmObjCreate::exec(const AVD_CL_CB *cb) { +AvdJobDequeueResultT ImmObjCreate::exec(AVD_CL_CB *cb) { SaAisErrorT rc; AvdJobDequeueResultT res; const SaImmOiHandleT immOiHandle = cb->immOiHandle; @@ -173,6 +178,7 @@ AvdJobDequeueResultT ImmObjCreate::exec(const AVD_CL_CB *cb) { } rc = saImmOiRtObjectCreate_2(immOiHandle, className_, parent_name, attrValues_); + cb->avd_imm_status = AVD_IMM_INIT_DONE; if ((rc == SA_AIS_OK) || (rc == SA_AIS_ERR_EXIST)) { delete Fifo::dequeue(); @@ -180,6 +186,7 @@ AvdJobDequeueResultT ImmObjCreate::exec(const AVD_CL_CB *cb) { } else if (rc == SA_AIS_ERR_TRY_AGAIN) { TRACE("TRY-AGAIN"); res = JOB_ETRYAGAIN; +cb->avd_imm_status = AVD_IMM_BUSY; } else if (rc == SA_AIS_ERR_TIMEOUT) { TRACE("TIMEOUT"); res = JOB_ETRYAGAIN; @@ -228,7 +235,7 @@ ImmObjCreate::~ImmObjCreate() { } // -AvdJobDequeueResultT ImmObjUpdate::exec(const AVD_CL_CB *cb) { +AvdJobDequeueResultT ImmObjUpdate::exec(AVD_CL_CB *cb) { SaAisErrorT rc; AvdJobDequeueResultT res; const SaImmOiHandleT immOiHandle = cb->immOiHandle; @@ -252,6 +259,7 @@ AvdJobDequeueResultT ImmObjUpdate::exec(const AVD_CL_CB *cb) { attrMod.modAttr.attrValues = attrValues; rc = saImmOiRtObjectUpdate_o3(immOiHandle, dn.c_str(), attrMods); + cb->avd_imm_status = AVD_IMM_INIT_DONE; if ((rc == SA_AIS_OK) || (rc == SA_AIS_ERR_NOT_EXIST)) { delete Fifo::dequeue(); @@ -259,6 +267,7 @@ AvdJobDequeueResultT ImmObjUpdate::exec(const AVD_CL_CB *cb) {
Re: [devel] [PATCH 1/1] rded: run controller promotion code in new thread [#2857]
Ack /Thanks HansN -Original Message- From: Gary Lee Sent: den 24 maj 2018 07:57 To: Hans Nordebäck ; ravisekhar.ko...@oracle.com Cc: opensaf-devel@lists.sourceforge.net; Gary Lee Subject: [PATCH 1/1] rded: run controller promotion code in new thread [#2857] Currently, the consensus code relating to node promotion is run from the main thread. We can improve rded's responsiveness by moving this code into another thread. --- src/rde/rded/rde_cb.h| 3 +- src/rde/rded/rde_main.cc | 6 +++- src/rde/rded/role.cc | 83 +++- src/rde/rded/role.h | 2 ++ 4 files changed, 62 insertions(+), 32 deletions(-) diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h index f5ad689c3..877687341 100644 --- a/src/rde/rded/rde_cb.h +++ b/src/rde/rded/rde_cb.h @@ -53,7 +53,8 @@ enum RDE_MSG_TYPE { RDE_MSG_NEW_ACTIVE_CALLBACK = 5, RDE_MSG_NODE_UP = 6, RDE_MSG_NODE_DOWN = 7, - RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8 + RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8, + RDE_MSG_ACTIVE_PROMOTION_SUCCESS = 9 }; struct rde_peer_info { diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc index c5b4b8283..c59aa4536 100644 --- a/src/rde/rded/rde_main.cc +++ b/src/rde/rded/rde_main.cc @@ -55,7 +55,8 @@ const char *rde_msg_name[] = {"-", "RDE_MSG_NEW_ACTIVE_CALLBACK(5)" "RDE_MSG_NODE_UP(6)", "RDE_MSG_NODE_DOWN(7)", - "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)"}; + "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)", + "RDE_MSG_ACTIVE_PROMOTION_SUCCESS(9)"}; static RDE_CONTROL_BLOCK _rde_cb; static RDE_CONTROL_BLOCK *rde_cb = &_rde_cb; @@ -186,6 +187,9 @@ static void handle_mbx_event() { LOG_WA("Received takeover request when not active"); } } break; +case RDE_MSG_ACTIVE_PROMOTION_SUCCESS: + role->NodePromoted(); + break; default: LOG_ER("%s: discarding unknown message type %u", __FUNCTION__, msg->type); break; diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc index 1b5a6ae89..a03372413 100644 --- a/src/rde/rded/role.cc +++ b/src/rde/rded/role.cc @@ -22,6 +22,7 @@ #include "rde/rded/role.h" #include #include +#include #include "base/getenv.h" #include "base/logtrace.h" #include "base/ncs_main_papi.h" @@ -63,6 +64,55 @@ void Role::MonitorCallback(const std::string& key, const std::string& new_value, osafassert(status == NCSCC_RC_SUCCESS); } +void Role::PromoteNode(const uint64_t cluster_size) { + TRACE_ENTER(); + SaAisErrorT rc; + + Consensus consensus_service; + + rc = consensus_service.PromoteThisNode(true, cluster_size); if (rc + != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) { +LOG_ER("Unable to set active controller in consensus service"); +opensaf_reboot(0, nullptr, + "Unable to set active controller in consensus + service"); } + + if (rc == SA_AIS_ERR_EXIST) { +LOG_WA("Another controller is already active"); +return; + } + + RDE_CONTROL_BLOCK* cb = rde_get_control_block(); + + // send msg to main thread + rde_msg* msg = static_cast(malloc(sizeof(rde_msg))); + msg->type = RDE_MSG_ACTIVE_PROMOTION_SUCCESS; + uint32_t status; + status = m_NCS_IPC_SEND(>mbx, msg, NCS_IPC_PRIORITY_HIGH); + osafassert(status == NCSCC_RC_SUCCESS); } + +void Role::NodePromoted() { + ExecutePreActiveScript(); + LOG_NO("Switched to ACTIVE from %s", to_string(role())); + role_ = PCS_RDA_ACTIVE; + rde_rda_send_role(role_); + + Consensus consensus_service; + RDE_CONTROL_BLOCK* cb = rde_get_control_block(); + + // register for callback if active controller is changed + // in consensus service + if (cb->monitor_lock_thread_running == false) { +cb->monitor_lock_thread_running = true; +consensus_service.MonitorLock(MonitorCallback, cb->mbx); + } + if (cb->monitor_takeover_req_thread_running == false) { +cb->monitor_takeover_req_thread_running = true; +consensus_service.MonitorTakeoverRequest(MonitorCallback, cb->mbx); + } +} + Role::Role(NODE_ID own_node_id) : known_nodes_{}, role_{PCS_RDA_QUIESCED}, @@ -82,37 +132,10 @@ timespec* Role::Poll(timespec* ts) { *ts = election_end_time_ - now; timeout = ts; } else { + election_end_time_ = base::kTimespecMax; RDE_CONTROL_BLOCK* cb = rde_get_control_block(); - SaAisErrorT rc; - Consensus consensus_service; - - rc = consensus_service.PromoteThisNode(true, cb->cluster_members.size()); - if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) { -LOG_ER("Unable to set active controller in consensus service"); -
Re: [devel] [PATCH 1/1] base: Destructor of TraceLog causes coredump V2 [#2860]
Hi Minh, yes you are right about the possibility for a segv, but using a std::shared_ptr instead of the naked ptr may be an option ? /Thanks Hans Från: Minh Hon Chau <minh.c...@dektech.com.au> Skickat: den 24 maj 2018 02:34:13 Till: Hans Nordebäck; Anders Widell; Gary Lee Kopia: opensaf-devel@lists.sourceforge.net Ämne: Re: [PATCH 1/1] base: Destructor of TraceLog causes coredump V2 [#2860] Hi Hans, It is good to give an option to Mutex class not to abort. We can avoid the abort in mutex_unlock (as reported in coredump), but I feel the issue is still there. We may hit a problem (segv?) with "mutex_->good()" since the other thread is wiping out the mutex_ in destructor, it is a matter of timing to happen I guess. As we don't have (and don't want to have) any protection between two threads for the TraceLog, so the good one (I hope) is making one of those threads not to touch the TraceLog. If you don't like to remove the destructor, another way is locating the gl_trace/gl_log to the HEAP? Thanks, Minh On 23/05/18 20:50, Hans Nordeback wrote: > Change Mutex class to make it possible for caller to decide if abort > --- > src/base/logtrace_client.cc | 5 - > src/base/mutex.cc | 2 +- > src/base/mutex.h| 22 +- > 3 files changed, 18 insertions(+), 11 deletions(-) > > diff --git a/src/base/logtrace_client.cc b/src/base/logtrace_client.cc > index 0dac6d389..f597c1ae3 100644 > --- a/src/base/logtrace_client.cc > +++ b/src/base/logtrace_client.cc > @@ -76,7 +76,7 @@ bool TraceLog::Init(const char *msg_id, WriteMode mode) { > msg_id_ = base::LogMessage::MsgId{msg_id}; > log_socket_ = new base::UnixClientSocket{Osaflog::kServerSocketPath, > static_cast(mode)}; > - mutex_ = new base::Mutex{}; > + mutex_ = new base::Mutex{false}; > > return true; > } > @@ -91,6 +91,9 @@ void TraceLog::Log(base::LogMessage::Severity severity, > const char *fmt, > void TraceLog::LogInternal(base::LogMessage::Severity severity, const char > *fmt, > va_list ap) { > base::Lock lock(*mutex_); > + > + if (!mutex_->good()) return; > + > uint32_t id = sequence_id_; > sequence_id_ = id < kMaxSequenceId ? id + 1 : 1; > buffer_.clear(); > diff --git a/src/base/mutex.cc b/src/base/mutex.cc > index 5fa6ac55a..1627ac20b 100644 > --- a/src/base/mutex.cc > +++ b/src/base/mutex.cc > @@ -20,7 +20,7 @@ > > namespace base { > > -Mutex::Mutex() : mutex_{} { > +Mutex::Mutex(bool abort) : abort_{abort}, mutex_{}, result_{0} { > pthread_mutexattr_t attr; > int result = pthread_mutexattr_init(); > if (result != 0) osaf_abort(result); > diff --git a/src/base/mutex.h b/src/base/mutex.h > index 7b3cee187..e3c54a711 100644 > --- a/src/base/mutex.h > +++ b/src/base/mutex.h > @@ -31,30 +31,34 @@ namespace base { > class Mutex { >public: > using NativeHandleType = pthread_mutex_t*; > - Mutex(); > + Mutex(bool abort = true); > ~Mutex(); > void Lock() { > -int result = pthread_mutex_lock(_); > -if (result != 0) osaf_abort(result); > +result_ = pthread_mutex_lock(_); > +if (abort_ && result_ != 0) osaf_abort(result_); > } > bool TryLock() { > -int result = pthread_mutex_trylock(_); > -if (result == 0) { > +result_ = pthread_mutex_trylock(_); > +if (result_ == 0) { > return true; > -} else if (result == EBUSY) { > +} else if (result_ == EBUSY) { > return false; > } else { > - osaf_abort(result); > + if (abort_) osaf_abort(result_); > + return false; > } > } > void Unlock() { > -int result = pthread_mutex_unlock(_); > -if (result != 0) osaf_abort(result); > +result_ = pthread_mutex_unlock(_); > +if (abort_ && result_ != 0) osaf_abort(result_); > } > NativeHandleType native_handle() { return _; } > > + bool good() const {return result_ == 0;}; >private: > + bool abort_; > pthread_mutex_t mutex_; > + int result_; > DELETE_COPY_AND_MOVE_OPERATORS(Mutex); > }; > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] imm: inform status to NID before exit during start-up phrase [#2845]
Hi Vu, I'll revise my comment a bit, before sending nid_notify, the fifo monitoring is not started. So removing the exit should not be necessary, good if you can test this. /Thanks HansN -Original Message- From: Hans Nordebäck Sent: den 8 maj 2018 15:06 To: 'Vu Minh Nguyen' <vu.m.ngu...@dektech.com.au>; ravisekhar.ko...@oracle.com; Anders Widell <anders.wid...@ericsson.com>; Lennart Lund <lennart.l...@ericsson.com> Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen <vu.m.ngu...@dektech.com.au> Subject: RE: [PATCH 1/1] imm: inform status to NID before exit during start-up phrase [#2845] Hi Vu, Ack review only with one comment. If the exit() is called after immnd_ackToNid() the fifo monitoring in nodeinit.cc will be activated. I think you should remove the exit(). /Thanks HansN -Original Message- From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] Sent: den 3 maj 2018 12:20 To: ravisekhar.ko...@oracle.com; Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell <anders.wid...@ericsson.com>; Lennart Lund <lennart.l...@ericsson.com> Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen <vu.m.ngu...@dektech.com.au> Subject: [PATCH 1/1] imm: inform status to NID before exit during start-up phrase [#2845] During node starts up phrase, which AMFD has not been come up, there is a case IMMND exit without informing failure result to NID (refer to the ticket to see syslog). As the result, IMMND may not be respawned by NID process. This patch ensures that NID is informed before exit. --- src/imm/immnd/immnd_evt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c index 8f3af92..2b9123d 100644 --- a/src/imm/immnd/immnd_evt.c +++ b/src/imm/immnd/immnd_evt.c @@ -10779,6 +10779,7 @@ static uint32_t immnd_evt_proc_fevs_rcv(IMMND_CB *cb, IMMND_EVT *evt, LOG_ER( "MESSAGE:%llu OUT OF ORDER my highest processed:%llu - exiting", msgNo, cb->highestProcessed); + immnd_ackToNid(NCSCC_RC_FAILURE); exit(1); } else if (cb ->mSync) { /* If we receive out of sync message -- 1.9.1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] imm: inform status to NID before exit during start-up phrase [#2845]
Hi Vu, Ack review only with one comment. If the exit() is called after immnd_ackToNid() the fifo monitoring in nodeinit.cc will be activated. I think you should remove the exit(). /Thanks HansN -Original Message- From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] Sent: den 3 maj 2018 12:20 To: ravisekhar.ko...@oracle.com; Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell <anders.wid...@ericsson.com>; Lennart Lund <lennart.l...@ericsson.com> Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen <vu.m.ngu...@dektech.com.au> Subject: [PATCH 1/1] imm: inform status to NID before exit during start-up phrase [#2845] During node starts up phrase, which AMFD has not been come up, there is a case IMMND exit without informing failure result to NID (refer to the ticket to see syslog). As the result, IMMND may not be respawned by NID process. This patch ensures that NID is informed before exit. --- src/imm/immnd/immnd_evt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c index 8f3af92..2b9123d 100644 --- a/src/imm/immnd/immnd_evt.c +++ b/src/imm/immnd/immnd_evt.c @@ -10779,6 +10779,7 @@ static uint32_t immnd_evt_proc_fevs_rcv(IMMND_CB *cb, IMMND_EVT *evt, LOG_ER( "MESSAGE:%llu OUT OF ORDER my highest processed:%llu - exiting", msgNo, cb->highestProcessed); + immnd_ackToNid(NCSCC_RC_FAILURE); exit(1); } else if (cb ->mSync) { /* If we receive out of sync message -- 1.9.1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]
Hi Alex, Agree, adding a comment in nid.conf and 00-README.conf is good. The backtrace below looks normal, can you share the syslogs? /BR HansN From: Alex Jones [mailto:ajo...@rbbn.com] Sent: den 2 maj 2018 15:43 To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell <anders.wid...@ericsson.com> Cc: opensaf-devel@lists.sourceforge.net Subject: Re: SV: [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839] Hi Hans, I was finally able to get back to this. Having "Restart=on-failure" set works with REBOOT_ON_FAIL_TIMEOUT as long as RestartSec=xxx is also set in the service file to something greater than REBOOT_ON_FAIL_TIMEOUT. Maybe we could put a comment in nid.conf that says if you use systemd you need to also set RestartSec to a failure greater than REBOOT_ON_FAIL_TIMEOUT? Regarding "systemctl start opensafd; sleep 1; pkill -ABRT immnd". In my setup it does not restart after the nid phase. If I increase the time to 3, it starts to work. Here is the backtrace. Nothing looks suspicious. (gdb) thread apply all bt Thread 4 (Thread 0x7fbf852e9b00 (LWP 5123)): #0 0x7fbf839b906d in poll () from /lib64/libc.so.6 #1 0x7fbf8462a370 in poll (__timeout=2, __nfds=2, __fds=) at /usr/include/bits/poll2.h:46 #2 mdtm_process_recv_events_tcp () at src/mds/mds_dt_trans.c:986 #3 0x7fbf83c910db in start_thread () from /lib64/libpthread.so.0 #4 0x7fbf839c1e3d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7fbf85309b00 (LWP 5122)): #0 0x7fbf839b906d in poll () from /lib64/libc.so.6 #1 0x7fbf84601641 in poll (__timeout=4900, __nfds=1, __fds=0x7fbf85309260) at /usr/include/bits/poll2.h:46 #2 osaf_ppoll (io_fds=io_fds@entry=0x7fbf85309260, i_nfds=i_nfds@entry=1, i_timeout_ts=0x7fbf85309280, i_sigmask=i_sigmask@entry=0x0) at src/base/osaf_poll.c:108 #3 0x7fbf84608c2f in ncs_tmr_wait () at src/base/sysf_tmr.c:463 #4 0x7fbf83c910db in start_thread () from /lib64/libpthread.so.0 #5 0x7fbf839c1e3d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7fbf82787700 (LWP 5121)): #0 0x7fbf839b906d in poll () from /lib64/libc.so.6 #1 0x7fbf84601560 in poll (__timeout=-1, __nfds=1, __fds=0x7fbf82786e30) at /usr/include/bits/poll2.h:46 #2 osaf_poll_no_timeout (io_fds=0x7fbf82786e30, i_nfds=1) at src/base/osaf_poll.c:31 #3 0x7fbf846017e5 in osaf_poll (io_fds=io_fds@entry=0x7fbf82786e30, i_nfds=i_nfds@entry=1, i_timeout=i_timeout@entry=-1) at src/base/osaf_poll.c:44 #4 0x7fbf8460197c in auth_server_main (_fd=) at src/base/osaf_secutil.c:176 #5 0x7fbf83c910db in start_thread () from /lib64/libpthread.so.0 #6 0x7fbf839c1e3d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7fbf85341740 (LWP 5120)): #0 0x7fbf839b906d in poll () from /lib64/libc.so.6 #1 0x7fbf850cc3b8 in poll (__timeout=, __nfds=5, __fds=0x7ffdb1e02590) at /usr/include/bits/poll2.h:46 #2 main (argc=, argv=) at src/imm/immnd/immnd_main.c:358 (gdb) Alex On 04/26/2018 03:38 AM, Hans Nordeback wrote: NOTICE: This email was received from an EXTERNAL sender Hi Alex, I tested this, immnd gets restarted and systemd reports opensafd.service as active (running), so it works as expected. In your case, immnd is never restarted after the nid phase, or does it work if you increase the sleep time? One thing you can check is to send an ABRT instead of the KILL and check the core dump at e.g. which address you receive the signal. Perhaps you have found a "window" where immnd is not monitored? /Regards HansN On 04/25/2018 03:23 PM, Alex Jones wrote: Hi Hans, I understand. But, what if it doesn't fail in the nid phase? If you run this command in your setup: "systemctl start opensafd; sleep 2; pkill -KILL immnd", does immnd get restarted? And does opensafd successfully come up according to systemd? Alex On 04/25/2018 09:19 AM, Hans Nordebäck wrote: NOTICE: This email was received from an EXTERNAL sender Hi Alex, the reboot should only happen if REBOOT_ON_FAIL_TIMEOUT is set, (i.e. not 0). I checked the latest version, the reboot works fine if e.g. immnd fails in the nid phase and REBOOT_ON_FAIL_TIMEOUT is set. /Thanks HansN From: Alex Jones [mailto:ajo...@rbbn.com] Sent: den 25 april 2018 15:05 To: Hans Nordebäck <hans.nordeb...@ericsson.com><mailto:hans.nordeb...@ericsson.com>; Anders Widell <anders.wid...@ericsson.com><mailto:anders.wid...@ericsson.com> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net> Subject: Re: SV: [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839] Hi Hans, There must be a hole here, then. Because in our setup, if dtmd or immnd crashes early in the startup process, the node doesn't reboot, a
Re: [devel] [PATCH 1/1] clmd: Increase message priority of CLMSV_CLMS_MDS_NODE_EVT to be sent to main thread [#2842]
Ack, review only. /Thanks HansN -Original Message- From: Minh Chau [mailto:minh.c...@dektech.com.au] Sent: den 26 april 2018 01:22 To: Anders Widell <anders.wid...@ericsson.com>; Hans Nordebäck <hans.nordeb...@ericsson.com>; ravisekhar.ko...@oracle.com Cc: opensaf-devel@lists.sourceforge.net; Minh Hon Chau <minh.c...@dektech.com.au> Subject: [PATCH 1/1] clmd: Increase message priority of CLMSV_CLMS_MDS_NODE_EVT to be sent to main thread [#2842] In the event of stop/start standby controller, the node is stopped that generates the MDS event CLMSV_CLMS_MDS_NODE_EVT. This event is being sent to main thread with NORMAL priority. When the node is started again, the other event like CLMSV_CLUSTER_JOIN_REQ is being sent with HIGH priority. The race happens as CLMSV_CLMS_MDS_NODE_EVT is processed after the event CLMSV_CLUSTER_JOIN_REQ, possibly caused by the priority. The patch sets priority of CLMSV_CLMS_MDS_NODE_EVT as high as the others so that the order of messages processed in main thread should depend on the timing order of events that occurred. --- src/clm/clmd/clms_mds.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/clm/clmd/clms_mds.cc b/src/clm/clmd/clms_mds.cc index a1f5348..58552cc 100644 --- a/src/clm/clmd/clms_mds.cc +++ b/src/clm/clmd/clms_mds.cc @@ -1097,7 +1097,7 @@ static uint32_t clms_mds_node_event(struct ncsmds_callback_info *mds_info) { clmsv_evt->info.node_mds_info.node_id = mds_info->info.node_evt.node_id; clmsv_evt->info.node_mds_info.nodeup = SA_TRUE; -rc = m_NCS_IPC_SEND(_cb->mbx, clmsv_evt, NCS_IPC_PRIORITY_NORMAL); +rc = m_NCS_IPC_SEND(_cb->mbx, clmsv_evt, + NCS_IPC_PRIORITY_HIGH); if (rc != NCSCC_RC_SUCCESS) { TRACE("IPC send failed %d", rc); free(clmsv_evt); -- 2.7.4 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]
Hi Alex, the reboot should only happen if REBOOT_ON_FAIL_TIMEOUT is set, (i.e. not 0). I checked the latest version, the reboot works fine if e.g. immnd fails in the nid phase and REBOOT_ON_FAIL_TIMEOUT is set. /Thanks HansN From: Alex Jones [mailto:ajo...@rbbn.com] Sent: den 25 april 2018 15:05 To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell <anders.wid...@ericsson.com> Cc: opensaf-devel@lists.sourceforge.net Subject: Re: SV: [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839] Hi Hans, There must be a hole here, then. Because in our setup, if dtmd or immnd crashes early in the startup process, the node doesn't reboot, and the executables are not restarted. If I set "Restart=on-failure" it works fine. Can you test this in your setup to see if you see the same thing? Alex On 04/24/2018 05:04 AM, Hans Nordeback wrote: NOTICE: This email was received from an EXTERNAL sender Hi Alex, please see comment below. /Thanks HansN On 04/23/2018 03:56 PM, Alex Jones wrote: Hi Hans, I just did some tests. Maybe there is a bug in nid, but when I do not have "Restart=on-failure", the node does not reboot when I run the command "systemctl start opensafd; sleep 3; pkill -KILL immnd", and opensafd times out and fails, with REBOOT_ON_FAIL_TIMEOUT=30. [HansN] isn't the nid phase finished before the sleep 3 command? It is only during the nid phase that the REBOOT_ON_FAIL_TIMEOUT is used, After the nid phase opensaf enters "normal" operation, no reboot will be performed as immnd is restartable. Instead of the sleep 3, you can edit the nodeinit.conf.controller file and change the immnd line to e.g. "/usr/local/lib/opensaf/clc-cli/osaf-immndx:IMMND ... " then nid should fail to start and REBOOT_ON_FAIL_TIMEOUT should work. But, opensafd restarts every time when I run that command with "Restart=on-failure" set. Alex On 04/19/2018 04:02 PM, Hans Nordebäck wrote: NOTICE: This email was received from an EXTERNAL sender Hi Alex, a question, if opensafd fails, (assert or exit code ne 0) a reboot of the node will be performed if REBOOT_ON_FAIL_TIMEOUT is configured, I have not checked, but how do systemd handle the reboot request if Restart=on-failure is set? /BR HansN Från: Alex Jones <ajo...@rbbn.com><mailto:ajo...@rbbn.com> Skickat: den 19 april 2018 17:27:27 Till: Hans Nordebäck; Anders Widell Kopia: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>; Alex Jones Ämne: [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839] Under certain circumstances opensafd fails to start (immnd or dtmd crashes, etc). Apr 19 15:07:31 ams-idsp-46-novnfm osafdtmd[3315]: src/dtm/dtmnd/dtm_intra_svc.cc:1778: dtm_process_internode_service_up_msg: Assertion '0' failed. We can tell systemd to restart opensafd if it fails to start. --- src/nid/opensafd.service.in | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/nid/opensafd.service.in b/src/nid/opensafd.service.in index 7f4d75ee3..6050f5e88 100644 --- a/src/nid/opensafd.service.in +++ b/src/nid/opensafd.service.in @@ -12,5 +12,7 @@ ControlGroup=cpu:/ TimeoutStartSec=3hours KillMode=none @systemdtasksmax@ +Restart=on-failure + [Install] WantedBy=multi-user.target -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in [#2835]
Hi Alex, Ack, code review and legacy tests run. I added some comments below. /Thanks HansN From: Alex Jones [mailto:ajo...@rbbn.com] Sent: den 23 april 2018 16:05 To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Gary Lee <gary@dektech.com.au>; ravisekhar.ko...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1/1] amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in [#2835] My comments inline: Alex On 04/20/2018 04:00 AM, Hans Nordeback wrote: NOTICE: This email was received from an EXTERNAL sender Hi Alex, please see below for some comments/questions. /Regards HansN On 04/18/2018 03:41 PM, Alex Jones wrote: When using PLM an AMF node mapped to a CLM node mapped to a PLM EE, can get stuck in locked state when rebooting, or going through a PLM EE lock/unlock. When amfd receives a START step from CLM tracking it attempts to gracefully shutdown the AMF node using AMF admin operations lock/lock-in. When PLM is involved this doesn't always work correctly because PLM is also shutting down the node by calling "opensafd stop". There is a race condition between PLM using "opensafd stop", and amfd using the admin operations to bring down the node, so that sometimes the AMF node gets stuck in locked state. If the rootCauseEntity in the CLM tracking is a PLM entity then don't do anything, as "opensafd stop" is already being called. --- src/amf/amfd/clm.cc | 25 - 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc index 2bcea2db0..7f675d8e9 100644 --- a/src/amf/amfd/clm.cc +++ b/src/amf/amfd/clm.cc @@ -274,6 +274,27 @@ static void clm_track_cb( TRACE_3("Already got callback for start of this change."); continue; } + + if (strncmp(osaf_extended_name_borrow(rootCauseEntity), + "safEE=", + sizeof("safEE=") - 1) == 0 || + strncmp(osaf_extended_name_borrow(rootCauseEntity), + "safHE=", + sizeof("safHE=") - 1) == 0) { +// PLM will take care of calling opensafd stop +TRACE("rootCause: %s from PLM operation so skipping %u", + osaf_extended_name_borrow(rootCauseEntity), + notifItem->clusterNode.nodeId); + +SaAisErrorT rc(saClmResponse_4(avd_cb->clmHandle, + invocation, + SA_CLM_CALLBACK_RESPONSE_OK)); [HansN] perhaps use: SaAisErrorT rc = saClmResponse_4 or SaAisErrorT rc{saClmResponse_4 instead? [Alex] I'm not sure what you are asking here. Do you not like the function syntax? And what is '{'? I don't understand your second suggestion. [HansN] ‘{‘ is used for uniform initialization in c++11, (preferred). +if (rc != SA_AIS_OK) + LOG_ER("saClmResponse_4 failed: %i", rc); + [HansN] I think the amf operational state has to be checked and set to disabled? And should break be used instead of continue? [Alex] Setting operational state to disabled is taken care of when COMPLETED is received in the track callback. My code change is only when receiving START. I used "continue" to explicitly mean that we are done processing this node, and we need to move to the next node in the for loop. The same thing is done in legacy code above when checking for "clm_change_start_preceded." [HansN] ok +continue; + } + /* invocation to be used by pending clm response */ node->clm_pend_inv = invocation; clm_node_exit_start(node, notifItem->clusterChange); @@ -304,7 +325,9 @@ static void clm_track_cb( osaf_extended_name_borrow(rootCauseEntity), notifItem->clusterNode.nodeId); if (strncmp(osaf_extended_name_borrow(rootCauseEntity), - "safEE=", 6) == 0) { + "safEE=", 6) == 0 || + strncmp(osaf_extended_name_borrow(rootCauseEntity), + "safHE=", 6) == 0) { [HansN] sizeof("safHE=") as above [Alex] Agreed. I will make this change. And change the older code to conform. /* This callback is because of operation on PLM, so we need to mark the node absent, because PLCD will anyway call opensafd stop.*/ AVD_AVND *node = -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]
Hi Alex, a question, if opensafd fails, (assert or exit code ne 0) a reboot of the node will be performed if REBOOT_ON_FAIL_TIMEOUT is configured, I have not checked, but how do systemd handle the reboot request if Restart=on-failure is set? /BR HansN Från: Alex Jones <ajo...@rbbn.com> Skickat: den 19 april 2018 17:27:27 Till: Hans Nordebäck; Anders Widell Kopia: opensaf-devel@lists.sourceforge.net; Alex Jones Ämne: [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839] Under certain circumstances opensafd fails to start (immnd or dtmd crashes, etc). Apr 19 15:07:31 ams-idsp-46-novnfm osafdtmd[3315]: src/dtm/dtmnd/dtm_intra_svc.cc:1778: dtm_process_internode_service_up_msg: Assertion '0' failed. We can tell systemd to restart opensafd if it fails to start. --- src/nid/opensafd.service.in | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/nid/opensafd.service.in b/src/nid/opensafd.service.in index 7f4d75ee3..6050f5e88 100644 --- a/src/nid/opensafd.service.in +++ b/src/nid/opensafd.service.in @@ -12,5 +12,7 @@ ControlGroup=cpu:/ TimeoutStartSec=3hours KillMode=none @systemdtasksmax@ +Restart=on-failure + [Install] WantedBy=multi-user.target -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/2] base: Add support to direct OpenSAF logging to local node file [#2306]
Hi Minh, One part of this problem is in mbcsv_papi.h, the #ifdef __cplusplus extern "C" { should at least be after the #include stmts. Fixing this should remove the need for the extern "C++". But moving this part to a separate header file would also be good. As a general comment, (all OpenSAF), is that only extern "C" should be needed, and it should be carefully placed only where it is required, c++ access to a c function/variable with external linkage, not placing it in a too wide scope. It can be placed around a group of c-functions but not the complete header file. /Thanks HansN ____ Från: Hans Nordebäck <hans.nordeb...@ericsson.com> Skickat: den 18 april 2018 20:43:22 Till: Minh Hon Chau; Anders Widell; ravisekhar.ko...@oracle.com Kopia: opensaf-devel@lists.sourceforge.net Ämne: Re: [devel] [PATCH 1/2] base: Add support to direct OpenSAF logging to local node file [#2306] Hi Minh, yes, before this patch logtrace.h was a c header file callable from c and c++. Now it is a c/c++ header file so including it from a c program without the extern "c++" will fail. In the first review comment I suggested to move this part to a separate header file and keep logtrace.h as before. /Regards HansN Från: Minh Hon Chau <minh.c...@dektech.com.au> Skickat: den 18 april 2018 16:09:35 Till: Hans Nordebäck; Anders Widell; ravisekhar.ko...@oracle.com Kopia: opensaf-devel@lists.sourceforge.net Ämne: Re: [PATCH 1/2] base: Add support to direct OpenSAF logging to local node file [#2306] Hi Hans, One comment regarding extern C++ as below Thanks, Minh On 18/04/18 23:37, Hans Nordebäck wrote: > Hi Minh, > > See my comments below. > > /Thanks HansN > > -Original Message- > From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] > Sent: den 18 april 2018 15:20 > To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell > <anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1/2] base: Add support to direct OpenSAF logging to local > node file [#2306] > > Hi Hans, > > Please check my response with [Minh] > > Thanks > > Minh > > > On 18/04/18 22:40, Hans Nordeback wrote: >> Hi Minh, >> >> ack, code review only. Some comments below. >> >> /Thanks HansN >> >> >> On 04/12/2018 01:12 AM, Minh Chau wrote: >>> Unify TraceLog and MdsLog class to one class (TraceLog) so it can be >>> used as a common log client. >>> Add new instance TraceLog for OpenSAF logging to local file, which >>> can be enabled/disabled via environment variable OSAF_LOCAL_NODE_LOG >>> --- >>>src/base/logtrace.cc | 167 >>> ++- >>>src/base/logtrace.h | 50 +-- >>>src/mds/mds_log.cc | 114 +++ >>>3 files changed, 140 insertions(+), 191 deletions(-) >>> >>> diff --git a/src/base/logtrace.cc b/src/base/logtrace.cc index >>> b046fab..857e31c 100644 >>> --- a/src/base/logtrace.cc >>> +++ b/src/base/logtrace.cc >>> @@ -36,15 +36,10 @@ >>>#include >>>#include >>>#include >>> -#include "base/buffer.h" >>> -#include "base/conf.h" >>> -#include "base/log_message.h" >>> -#include "base/macros.h" >>> -#include "base/mutex.h" >>> +#include "base/getenv.h" >>>#include "base/ncsgl_defs.h" >>>#include "base/osaf_utility.h" >>>#include "base/time.h" >>> -#include "base/unix_client_socket.h" >>>#include "dtm/common/osaflog_protocol.h" >>> namespace global { >>> @@ -55,65 +50,38 @@ const char *const prefix_name[] = {"EM", "AL", >>> "CR", "ER", "WA", "NO", "IN", >>> "T6", "T7", "T8", ">>", "<<"}; >>>char *msg_id; >>>int logmask; >>> +const char* osaf_log_file = "osaf.log"; bool enable_osaf_log = >>> +false; >>> } // namespace global >>>-class TraceLog { >>> - public: >>> - static bool Init(); >>> - static void Log(base::LogMessage::Severity severity, const char >>> *fmt, >>> - va_list ap); >>> - >>> - private: >>> - TraceLog(const std::strin
Re: [devel] [PATCH 1/2] base: Add support to direct OpenSAF logging to local node file [#2306]
Hi Minh, yes, before this patch logtrace.h was a c header file callable from c and c++. Now it is a c/c++ header file so including it from a c program without the extern "c++" will fail. In the first review comment I suggested to move this part to a separate header file and keep logtrace.h as before. /Regards HansN Från: Minh Hon Chau <minh.c...@dektech.com.au> Skickat: den 18 april 2018 16:09:35 Till: Hans Nordebäck; Anders Widell; ravisekhar.ko...@oracle.com Kopia: opensaf-devel@lists.sourceforge.net Ämne: Re: [PATCH 1/2] base: Add support to direct OpenSAF logging to local node file [#2306] Hi Hans, One comment regarding extern C++ as below Thanks, Minh On 18/04/18 23:37, Hans Nordebäck wrote: > Hi Minh, > > See my comments below. > > /Thanks HansN > > -Original Message- > From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] > Sent: den 18 april 2018 15:20 > To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell > <anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1/2] base: Add support to direct OpenSAF logging to local > node file [#2306] > > Hi Hans, > > Please check my response with [Minh] > > Thanks > > Minh > > > On 18/04/18 22:40, Hans Nordeback wrote: >> Hi Minh, >> >> ack, code review only. Some comments below. >> >> /Thanks HansN >> >> >> On 04/12/2018 01:12 AM, Minh Chau wrote: >>> Unify TraceLog and MdsLog class to one class (TraceLog) so it can be >>> used as a common log client. >>> Add new instance TraceLog for OpenSAF logging to local file, which >>> can be enabled/disabled via environment variable OSAF_LOCAL_NODE_LOG >>> --- >>>src/base/logtrace.cc | 167 >>> ++- >>>src/base/logtrace.h | 50 +-- >>>src/mds/mds_log.cc | 114 +++ >>>3 files changed, 140 insertions(+), 191 deletions(-) >>> >>> diff --git a/src/base/logtrace.cc b/src/base/logtrace.cc index >>> b046fab..857e31c 100644 >>> --- a/src/base/logtrace.cc >>> +++ b/src/base/logtrace.cc >>> @@ -36,15 +36,10 @@ >>>#include >>>#include >>>#include >>> -#include "base/buffer.h" >>> -#include "base/conf.h" >>> -#include "base/log_message.h" >>> -#include "base/macros.h" >>> -#include "base/mutex.h" >>> +#include "base/getenv.h" >>>#include "base/ncsgl_defs.h" >>>#include "base/osaf_utility.h" >>>#include "base/time.h" >>> -#include "base/unix_client_socket.h" >>>#include "dtm/common/osaflog_protocol.h" >>> namespace global { >>> @@ -55,65 +50,38 @@ const char *const prefix_name[] = {"EM", "AL", >>> "CR", "ER", "WA", "NO", "IN", >>> "T6", "T7", "T8", ">>", "<<"}; >>>char *msg_id; >>>int logmask; >>> +const char* osaf_log_file = "osaf.log"; bool enable_osaf_log = >>> +false; >>> } // namespace global >>>-class TraceLog { >>> - public: >>> - static bool Init(); >>> - static void Log(base::LogMessage::Severity severity, const char >>> *fmt, >>> - va_list ap); >>> - >>> - private: >>> - TraceLog(const std::string , const std::string _name, >>> - uint32_t proc_id, const std::string _id, >>> - const std::string _name); >>> - void LogInternal(base::LogMessage::Severity severity, const char >>> *fmt, >>> - va_list ap); >>> - static constexpr const uint32_t kMaxSequenceId = >>> uint32_t{0x7fff}; >>> - static TraceLog *instance_; >>> - const base::LogMessage::HostName fqdn_; >>> - const base::LogMessage::AppName app_name_; >>> - const base::LogMessage::ProcId proc_id_; >>> - const base::LogMessage::MsgId msg_id_; >>> - uint32_t sequence_id_; >>> - base::UnixClientSocket log_socket_; >>> - base::Buffer<512> buffer_; >>> - base::Mutex mutex_; >>> - >>> - DELETE_COPY_AND_MOVE_OPERATORS(TraceLog); >>> -}; >>> - >>> -TraceLog *
Re: [devel] [PATCH 1/2] base: Add support to direct OpenSAF logging to local node file [#2306]
Hi Minh, See my comments below. /Thanks HansN -Original Message- From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] Sent: den 18 april 2018 15:20 To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell <anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1/2] base: Add support to direct OpenSAF logging to local node file [#2306] Hi Hans, Please check my response with [Minh] Thanks Minh On 18/04/18 22:40, Hans Nordeback wrote: > Hi Minh, > > ack, code review only. Some comments below. > > /Thanks HansN > > > On 04/12/2018 01:12 AM, Minh Chau wrote: >> Unify TraceLog and MdsLog class to one class (TraceLog) so it can be >> used as a common log client. >> Add new instance TraceLog for OpenSAF logging to local file, which >> can be enabled/disabled via environment variable OSAF_LOCAL_NODE_LOG >> --- >> src/base/logtrace.cc | 167 >> ++- >> src/base/logtrace.h | 50 +-- >> src/mds/mds_log.cc | 114 +++ >> 3 files changed, 140 insertions(+), 191 deletions(-) >> >> diff --git a/src/base/logtrace.cc b/src/base/logtrace.cc index >> b046fab..857e31c 100644 >> --- a/src/base/logtrace.cc >> +++ b/src/base/logtrace.cc >> @@ -36,15 +36,10 @@ >> #include >> #include >> #include >> -#include "base/buffer.h" >> -#include "base/conf.h" >> -#include "base/log_message.h" >> -#include "base/macros.h" >> -#include "base/mutex.h" >> +#include "base/getenv.h" >> #include "base/ncsgl_defs.h" >> #include "base/osaf_utility.h" >> #include "base/time.h" >> -#include "base/unix_client_socket.h" >> #include "dtm/common/osaflog_protocol.h" >> namespace global { >> @@ -55,65 +50,38 @@ const char *const prefix_name[] = {"EM", "AL", >> "CR", "ER", "WA", "NO", "IN", >> "T6", "T7", "T8", ">>", "<<"}; >> char *msg_id; >> int logmask; >> +const char* osaf_log_file = "osaf.log"; bool enable_osaf_log = >> +false; >> } // namespace global >> -class TraceLog { >> - public: >> - static bool Init(); >> - static void Log(base::LogMessage::Severity severity, const char >> *fmt, >> - va_list ap); >> - >> - private: >> - TraceLog(const std::string , const std::string _name, >> - uint32_t proc_id, const std::string _id, >> - const std::string _name); >> - void LogInternal(base::LogMessage::Severity severity, const char >> *fmt, >> - va_list ap); >> - static constexpr const uint32_t kMaxSequenceId = >> uint32_t{0x7fff}; >> - static TraceLog *instance_; >> - const base::LogMessage::HostName fqdn_; >> - const base::LogMessage::AppName app_name_; >> - const base::LogMessage::ProcId proc_id_; >> - const base::LogMessage::MsgId msg_id_; >> - uint32_t sequence_id_; >> - base::UnixClientSocket log_socket_; >> - base::Buffer<512> buffer_; >> - base::Mutex mutex_; >> - >> - DELETE_COPY_AND_MOVE_OPERATORS(TraceLog); >> -}; >> - >> -TraceLog *TraceLog::instance_ = nullptr; >> - >> -TraceLog::TraceLog(const std::string , const std::string >> _name, >> - uint32_t proc_id, const std::string _id, >> - const std::string _name) >> - : fqdn_{base::LogMessage::HostName{fqdn}}, >> - app_name_{base::LogMessage::AppName{app_name}}, >> - proc_id_{base::LogMessage::ProcId{std::to_string(proc_id)}}, >> - msg_id_{base::LogMessage::MsgId{msg_id}}, >> - sequence_id_{1}, >> - log_socket_{socket_name, base::UnixSocket::kBlocking}, >> - buffer_{}, >> - mutex_{} {} >> - >> -bool TraceLog::Init() { >> - if (instance_ != nullptr) return false; >> - char app_name[49]; >> - char pid_path[1024]; > [HansN] instead of static global use unnamed namespaces instead. Also > try to avoid globals, why change Log and Init from static members? (An > alternative is to use a singleton instead, if needed) [Minh] Changing Log() and Init() not to be static because I am not using singleton any more. Before this ticket, we have 2 singleton, one for TraceLog and one for MdsLog,
Re: [devel] [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V3 [#2795]
Hi Gary, in general const member functions and using logical constness is to prefer, I think. (If needed mutable can be used). /Regards HansN -Original Message- From: Gary Lee [mailto:gary@dektech.com.au] Sent: den 13 april 2018 09:45 To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell <anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V3 [#2795] Hi Hans Yes, they could be declared const member functions, as they generally don't change anything in the object. The changes are actually in the KV store. But I guess we could potentially mislead callers about the intentions of the functions though. What do you think? /Gary On 13/04/18 16:16, Hans Nordebäck wrote: > Hi, > > > > On 04/12/2018 04:15 PM, Gary Lee wrote: >> Hi >> >> >> On 12/04/18 23:34, Anders Widell wrote: >>> Ack with comments: >>> >>> * There is no need to use "const" when passing function arguments by >>> value. E.g. the argument "const uint64_t cluster_size" should be >>> "uint64_t cluster_size". >>> >> >> [GL] Sure, but it doesn't do any harm, and would stop accidental >> assignments (that would be lost anyway). > [HansN] perhaps these functions should be const member functions? E.g. > SaAisErrorT PromoteThisNode(bool graceful_takeover, uint64_t > cluster_size) const; >> >>> * You assume that all nodes in the cluster have synchronized clocks >>> (probably using NTP). Would it be possible to use an expiration time >>> for the etcd key instead of writing a time stamp in the value, so >>> that etcd automatically deletes the takeover request when it >>> expires? That way we would not require synchronized clocks. >>> >> >> [GL] Good idea. I did question why I hadn't use TTL/lease once I had >> finished the ticket. :-) Will see what I can do! >> >>> regards, >>> Anders Widell >>> >>> On 04/11/2018 09:35 AM, Gary Lee wrote: >>>> Summary: split-brain: select active SC from largest network >>>> partition V3 [#2795] Review request for Ticket(s): 2795 Peer >>>> Reviewer(s): Anders, Ravi, Hans Pull request to: *** LIST THE >>>> PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop >>>> Development branch: ticket-2795 Base revision: >>>> 1c302a300e449e8a8527671fbd6c7f4e2b41e95d >>>> Personal repository: git://git.code.sf.net/u/userid-2226215/review >>>> >>>> >>>> Impacted area Impact y/n >>>> >>>> Docs n >>>> Build system n >>>> RPM/packaging n >>>> Configuration files n >>>> Startup scripts n >>>> SAF services n >>>> OpenSAF services y >>>> Core libraries y >>>> Samples n >>>> Tests n >>>> Other n >>>> >>>> >>>> Comments (indicate scope for each "y" above): >>>> - >>>> >>>> *** Changes from V2: *** >>>> >>>> fmd: made cluster_size atomic >>>> fmd: wait 3 seconds before promoting to active, to allow topology >>>> events to be processed first >>>> osaf: add check for existing takeover request, before trying to >>>> lock >>>> etcdv3 plugin: reliablity improvements >>>> >>>> >>>> revision c7bc78656d5de11f6147727bd8612274fb6e438f >>>> Author: Gary Lee <gary@dektech.com.au> >>>> Date: Wed, 11 Apr 2018 17:16:46 +1000 >>>> >>>> rded: adapt to new Consensus API [#2795] >>>> >>>> - add 3 new internal message: >>>> >>>> RDE_MSG_NODE_UP >>>> RDE_MSG_NODE_DOWN >>>> RDE_MSG_TAKEOVER_REQUEST_CALLBACK >>>> >>>> - subscribe to AMFND service up events to keep track of the number >>>> of cluster members >>>> >>>> - listen for takeover requests in KV store >>>> >>>> >>>> >>>> revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393 >>>> Author: Gary Lee <gary@dektech.com.au> >>>> Date: Wed
Re: [devel] [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]
Hi Ravi, stonith is not only valid for virutalized environment, I assume stonith supports other e.g. ipmi in a legacy environment. The probability for "flickering" may be higher in a virtualized environment, but for redundancy there should be two interfaces configured, which is the normal configuration in legacy. If the problem in this ticket is solved by using stonith I don't see a need for adding this patch. BTW do this patch work when stonith is enabled? /Regards HansN On 04/13/2018 10:59 AM, Ravi Sekhar Reddy Konda wrote: HI Hans, The use case that we are addressing here is link flickering when remote fencing is not enabled, Also remote fencing using Stonith is valid only in Virtualization environments. I have not tested using Stonith enabled as the use case is in the case where remote fencing is disabled. Thanks, Ravi *From:*Hans Nordebäck [mailto:hans.nordeb...@ericsson.com] *Sent:* Friday, April 13, 2018 1:10 AM *To:* ravi-sekhar <ravisekhar.ko...@oracle.com>; Anders Widell <anders.wid...@ericsson.com> *Cc:* opensaf-devel@lists.sourceforge.net *Subject:* SV: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833] Hi Ravi, I think stonith, implemented in ticket #1859, handles this case. This "flickering" was one the (manual) tests verifying the added stonith support. It is important to have a separate interface for stonith, to be able to perform the remote fencing, similar to use a back plane. Have you tested with stonith enabled? /Regards HansN *Från:*ravi-sekhar <ravisekhar.ko...@oracle.com <mailto:ravisekhar.ko...@oracle.com>> *Skickat:* den 12 april 2018 15:29:13 *Till:* Hans Nordebäck; Anders Widell *Kopia:* opensaf-devel@lists.sourceforge.net <mailto:opensaf-devel@lists.sourceforge.net>; ravi-sekhar *Ämne:* [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833] --- scripts/opensaf_reboot | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot index df65c26..b219c39 100644 --- a/scripts/opensaf_reboot +++ b/scripts/opensaf_reboot @@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH if [ -f "$pkgsysconfdir/fmd.conf" ]; then . "$pkgsysconfdir/fmd.conf" fi +if [ -f "$pkgsysconfdir/nid.conf" ]; then + . "$pkgsysconfdir/nid.conf" +fi NODE_ID_FILE=$pkglocalstatedir/node_id @@ -118,7 +121,17 @@ else # uncomment the following line if debugging errors that keep restarting the node # exit 0 + # If the application is using different interface for cluster communication, please + # add your application specific isolation commands here + logger -t "opensaf_reboot" "Rebooting local node; timeout=$OPENSAF_REBOOT_TIMEOUT" + + # Isolate the node + if [ "$MDS_TRANSPORT" = "TIPC" ]; then + tipc-config -bd eth:$TIPC_ETH_IF + else + $icmd pkill -STOP osafdtmd + fi # Start a reboot supervision background process. Note that a similar # supervision is also done in the opensaf_reboot() function in LEAP. @@ -128,12 +141,6 @@ else (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" > "/proc/sysrq-trigger") & fi - # Stop some important opensaf processes to prevent bad things from happening - $icmd pkill -STOP osafamfwd - $icmd pkill -STOP osafamfnd - $icmd pkill -STOP osafamfd - $icmd pkill -STOP osaffmd - # Flush OpenSAF internal log server messages to disk. $bindir/osaflog --flush -- 1.9.1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V3 [#2795]
Hi, On 04/12/2018 04:15 PM, Gary Lee wrote: Hi On 12/04/18 23:34, Anders Widell wrote: Ack with comments: * There is no need to use "const" when passing function arguments by value. E.g. the argument "const uint64_t cluster_size" should be "uint64_t cluster_size". [GL] Sure, but it doesn't do any harm, and would stop accidental assignments (that would be lost anyway). [HansN] perhaps these functions should be const member functions? E.g. SaAisErrorT PromoteThisNode(bool graceful_takeover, uint64_t cluster_size) const; * You assume that all nodes in the cluster have synchronized clocks (probably using NTP). Would it be possible to use an expiration time for the etcd key instead of writing a time stamp in the value, so that etcd automatically deletes the takeover request when it expires? That way we would not require synchronized clocks. [GL] Good idea. I did question why I hadn't use TTL/lease once I had finished the ticket. :-) Will see what I can do! regards, Anders Widell On 04/11/2018 09:35 AM, Gary Lee wrote: Summary: split-brain: select active SC from largest network partition V3 [#2795] Review request for Ticket(s): 2795 Peer Reviewer(s): Anders, Ravi, Hans Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2795 Base revision: 1c302a300e449e8a8527671fbd6c7f4e2b41e95d Personal repository: git://git.code.sf.net/u/userid-2226215/review Impacted area Impact y/n Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services n OpenSAF services y Core libraries y Samples n Tests n Other n Comments (indicate scope for each "y" above): - *** Changes from V2: *** fmd: made cluster_size atomic fmd: wait 3 seconds before promoting to active, to allow topology events to be processed first osaf: add check for existing takeover request, before trying to lock etcdv3 plugin: reliablity improvements revision c7bc78656d5de11f6147727bd8612274fb6e438f Author: Gary LeeDate: Wed, 11 Apr 2018 17:16:46 +1000 rded: adapt to new Consensus API [#2795] - add 3 new internal message: RDE_MSG_NODE_UP RDE_MSG_NODE_DOWN RDE_MSG_TAKEOVER_REQUEST_CALLBACK - subscribe to AMFND service up events to keep track of the number of cluster members - listen for takeover requests in KV store revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393 Author: Gary Lee Date: Wed, 11 Apr 2018 17:16:18 +1000 fmd: adapt to new Consensus API [#2795] revision 812a315af21df06b2f9fdcc3d8fd5b7bbad3e550 Author: Gary Lee Date: Wed, 11 Apr 2018 17:15:41 +1000 amfd: adapt to new Consensus API [#2795] revision b8a37c1b8965826e5faffbfebc44a84bdb6433a1 Author: Gary Lee Date: Wed, 11 Apr 2018 17:14:39 +1000 osaf: add lock takeover request fuction [#2795] - add create and set (if previous value matches) functions to KeyValue class - add Consensus::MonitorTakeoverRequest() function for use by RDE to answer takeover requests - add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, it will create a takeover request in the KV store. An existing SC can reject the lock takeover revision 955be872ba5887b1b521eac9f7732dd3f6afc593 Author: Gary Lee Date: Wed, 11 Apr 2018 17:13:45 +1000 osaf: extend API to include a create key and an enhanced set key function [#2795] - add create_key function (fails if key already exists) - add setkey_match_prev function (set value if previous value matches) - add missing quotes - add etcd3.plugin Added Files: src/osaf/consensus/plugins/etcd3.plugin Complete diffstat: -- src/amf/amfd/role.cc | 2 +- src/fm/fmd/fm_cb.h | 2 +- src/fm/fmd/fm_main.cc | 26 +- src/fm/fmd/fm_mds.cc | 2 + src/fm/fmd/fm_rda.cc | 27 +- src/osaf/consensus/consensus.cc | 435 ++- src/osaf/consensus/consensus.h | 55 +++- src/osaf/consensus/key_value.cc | 105 +--- src/osaf/consensus/key_value.h | 19 +- src/osaf/consensus/plugins/etcd.plugin | 86 +- src/osaf/consensus/plugins/etcd3.plugin | 366 ++ src/osaf/consensus/plugins/sample.plugin | 67 - src/rde/rded/rde_cb.h | 12 +- src/rde/rded/rde_main.cc | 75 -- src/rde/rded/rde_mds.cc | 39 ++- src/rde/rded/rde_rda.cc | 2 +- src/rde/rded/role.cc
Re: [devel] [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]
Hi Ravi, I think stonith, implemented in ticket #1859, handles this case. This "flickering" was one the (manual) tests verifying the added stonith support. It is important to have a separate interface for stonith, to be able to perform the remote fencing, similar to use a back plane. Have you tested with stonith enabled? /Regards HansN Från: ravi-sekhar <ravisekhar.ko...@oracle.com> Skickat: den 12 april 2018 15:29:13 Till: Hans Nordebäck; Anders Widell Kopia: opensaf-devel@lists.sourceforge.net; ravi-sekhar Ämne: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833] --- scripts/opensaf_reboot | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot index df65c26..b219c39 100644 --- a/scripts/opensaf_reboot +++ b/scripts/opensaf_reboot @@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH if [ -f "$pkgsysconfdir/fmd.conf" ]; then . "$pkgsysconfdir/fmd.conf" fi +if [ -f "$pkgsysconfdir/nid.conf" ]; then + . "$pkgsysconfdir/nid.conf" +fi NODE_ID_FILE=$pkglocalstatedir/node_id @@ -118,7 +121,17 @@ else # uncomment the following line if debugging errors that keep restarting the node # exit 0 +# If the application is using different interface for cluster communication, please +# add your application specific isolation commands here + logger -t "opensaf_reboot" "Rebooting local node; timeout=$OPENSAF_REBOOT_TIMEOUT" + +# Isolate the node +if [ "$MDS_TRANSPORT" = "TIPC" ]; then + tipc-config -bd eth:$TIPC_ETH_IF +else + $icmd pkill -STOP osafdtmd +fi # Start a reboot supervision background process. Note that a similar # supervision is also done in the opensaf_reboot() function in LEAP. @@ -128,12 +141,6 @@ else (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" > "/proc/sysrq-trigger") & fi - # Stop some important opensaf processes to prevent bad things from happening - $icmd pkill -STOP osafamfwd - $icmd pkill -STOP osafamfnd - $icmd pkill -STOP osafamfd - $icmd pkill -STOP osaffmd - # Flush OpenSAF internal log server messages to disk. $bindir/osaflog --flush -- 1.9.1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] imm: fix memory leaked in immnd [#2825]
ack, review only. /Thanks HansN On 04/05/2018 04:39 AM, Vu Minh Nguyen wrote: The allocated memory is not freed before returning from the function ImmModel::setCcbErrorString(). --- src/imm/immnd/ImmModel.cc | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc index f7c8fc0..87ded27 100644 --- a/src/imm/immnd/ImmModel.cc +++ b/src/imm/immnd/ImmModel.cc @@ -10910,7 +10910,6 @@ SaAisErrorT ImmModel::deleteObject(ObjectMap::iterator& oi, SaUint32T reqConn, void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString, va_list vl) { int errLen = strlen(errorString) + 1; - char* fmtError = (char*)malloc(errLen); int len; va_list args; int isValidationErrString = 0; @@ -10921,6 +10920,9 @@ void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString, return; } + char* fmtError = (char*)malloc(errLen); + osafassert(fmtError); + va_copy(args, vl); len = vsnprintf(fmtError, errLen, errorString, args); va_end(args); @@ -10930,7 +10932,8 @@ void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString, if (len > errLen) { char* newFmtError = (char*)realloc(fmtError, len); if (newFmtError == nullptr) { - TRACE_5("realloc error ,No memory "); + TRACE_5("realloc error, no memory"); + free(fmtError); return; } else { fmtError = newFmtError; -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] build: Add support for google gmock framework V2 [#2823]
Hi Anders, I'll remove the turtle example. Perhaps we also can add a git pull if the repo is older than some date? /Thanks HansN On 04/09/2018 10:09 AM, Anders Widell wrote: Ack with comments: 1) You must replace the turtle example with something that you have written yourself, to avoid potential license problems. 2) One inline comment below, marked AndersW> regards, Anders Widell On 04/03/2018 01:05 PM, Hans Nordeback wrote: --- 00-README.unittest | 24 ++-- src/ais/Makefile.am | 5 - src/amf/Makefile.am | 7 +-- src/base/Makefile.am | 21 + src/base/tests/mock_turtle.cc | 20 src/base/tests/mock_turtle.h | 18 ++ src/base/tests/turtle.h | 17 + src/dtm/Makefile.am | 5 - src/experimental/immcpp/api/Makefile.am | 5 - src/log/Makefile.am | 5 - test.sh | 12 +--- 11 files changed, 116 insertions(+), 23 deletions(-) create mode 100644 src/base/tests/mock_turtle.cc create mode 100644 src/base/tests/mock_turtle.h create mode 100644 src/base/tests/turtle.h diff --git a/00-README.unittest b/00-README.unittest index 79e4b4b41..f297bccd0 100644 --- a/00-README.unittest +++ b/00-README.unittest @@ -1,22 +1,25 @@ -Support for using google unit test in openSAF. Using unit test during e.g. refactoring +Support for using google unit test and google mock in openSAF. Using unit test and mocking during e.g. refactoring to identify units and make code unit testable should improve the overall code quality and robustness. Regarding google unit test, see: https://code.google.com/p/googletest/ To get and install google test do the following: -wget https://googletest.googlecode.com/files/gtest-1.7.0.zip -unzip gtest-1.7.0.zip -cd gtest-1.7.0 -./configure -make -export GTEST_DIR=`pwd` +git clone https://github.com/google/googletest.git +cd googletest + +autoreconf -vi +./configure --with-pthreads +make -j 4 + +export GTEST_DIR=`pwd`/googletest +export GMOCK_DIR=`pwd`/googlemock configure openSAF as usual, for example: ./bootstrap.ch ./configure CFLAGS="-DRUNASROOT -O2" CXXFLAGS="-DRUNASROOT -O2" --enable-tipc -make -j +make -j 4 To build and run the unit tests make check @@ -40,8 +43,9 @@ services/saf/amf/ └── config The test code to have the following naming convention as below: -tests will be in file test_.cc, where is the name of the unit test case, -e.g test_amfdb.cc. No need to call the RUN_ALL_TESTS() macro, it is included in gtest_main +tests will be in file _test.cc, where is the name of the unit test case, +mocks will be in file mock_.cc, where is the name of the mock. +No need to call the RUN_ALL_TESTS() macro, it is included in gtest_main and gmock_main and are automatically linked with the unit test cases. diff --git a/src/ais/Makefile.am b/src/ais/Makefile.am index 1af75a0f4..2ef34b219 100644 --- a/src/ais/Makefile.am +++ b/src/ais/Makefile.am @@ -101,7 +101,8 @@ bin_testlib_CXXFLAGS = \ bin_testlib_CPPFLAGS = \ $(AM_CPPFLAGS) \ - -I$(GTEST_DIR)/include + -I$(GTEST_DIR)/include \ + -I$(GMOCK_DIR)/include bin_testlib_LDFLAGS = \ $(AM_LDFLAGS) @@ -112,4 +113,6 @@ bin_testlib_SOURCES = \ bin_testlib_LDADD = \ $(GTEST_DIR)/lib/libgtest.la \ $(GTEST_DIR)/lib/libgtest_main.la \ + $(GMOCK_DIR)/lib/libgmock.la \ + $(GMOCK_DIR)/lib/libgmock_main.la \ lib/libopensaf_core.la diff --git a/src/amf/Makefile.am b/src/amf/Makefile.am index 25261fded..413571a52 100644 --- a/src/amf/Makefile.am +++ b/src/amf/Makefile.am @@ -194,7 +194,8 @@ bin_testamfd_CXXFLAGS =$(AM_CXXFLAGS) bin_testamfd_CPPFLAGS = \ -DSA_CLM_B01=1 -DSA_EXTENDED_NAME_SOURCE \ $(AM_CPPFLAGS) \ - -I$(GTEST_DIR)/include + -I$(GTEST_DIR)/include \ + -I$(GMOCK_DIR)/include bin_testamfd_LDFLAGS = \ $(AM_LDFLAGS) \ @@ -264,7 +265,9 @@ bin_testamfd_LDADD = \ lib/libSaNtf.la \ lib/libopensaf_core.la \ $(GTEST_DIR)/lib/libgtest.la \ - $(GTEST_DIR)/lib/libgtest_main.la + $(GTEST_DIR)/lib/libgtest_main.la \ + $(GMOCK_DIR)/lib/libgmock.la \ + $(GMOCK_DIR)/lib/libgmock_main.la bin_amfpm_CPPFLAGS = \ -DSA_EXTENDED_NAME_SOURCE \ diff --git a/src/base/Makefile.am b/src/base/Makefile.am index bb13d6c43..a7316ceb7 100644 --- a/src/base/Makefile.am +++ b/src/base/Makefile.am @@ -150,6 +150,8 @@ noinst_HEADERS += \ src/base/tests/mock_osaf_abort.h \ src/base/tests/mock_osafassert.h \ src/base/tests/mock_syslog.h \ + src/base/tests/mock_turtle.h \ + src/base/tests/turtle.h \ src/base/time.h \ src/base/unix_client_socket.h \ src/base/unix_server_socket.h \ @@ -163,7 +165,8 @@
Re: [devel] [PATCH 1/1] base: Check return code from unlink in nid_create_ipc [#2829]
Hi Anders, yes, I'll change to you suggestion. /Thanks HansN On 04/06/2018 04:10 PM, Anders Widell wrote: Ack with minor comment: instead of calling access(), you could maybe simply check for the ENOENT errno value from unlink()? regards, Anders Widell On 04/05/2018 11:53 AM, Hans Nordeback wrote: --- src/nid/agent/nid_ipc.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/nid/agent/nid_ipc.c b/src/nid/agent/nid_ipc.c index 4f43cd309..1a77fd8e2 100644 --- a/src/nid/agent/nid_ipc.c +++ b/src/nid/agent/nid_ipc.c @@ -28,6 +28,7 @@ #include #include +#include #include "osaf/configmake.h" #include "nid/agent/nid_api.h" @@ -56,7 +57,13 @@ uint32_t nid_create_ipc(char *strbuf) mode_t mask; /* Lets Remove any such file if it already exists */ - unlink(NID_FIFO); + if (access(NID_FIFO, F_OK ) != -1 ) { + if (unlink(NID_FIFO) < 0) { + sprintf(strbuf, " FAILURE: Unable To Delete FIFO Error: %s\n", + strerror(errno)); + return NCSCC_RC_FAILURE; + } + } mask = umask(0); -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] amfnd: unlock before releasing the monitoring thread to avoid deadlock [#2818]
Hi Ravi, ack, review only. (I agree, NCS_TASK_RELEASE should not be called with the mutex taken as pthread_mutex_lock is not a cancellation point). /Regards HansN On 03/29/2018 07:59 AM, ravi-sekhar wrote: --- src/amf/amfnd/mon.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/amf/amfnd/mon.cc b/src/amf/amfnd/mon.cc index 9cdfc37..4932d50 100644 --- a/src/amf/amfnd/mon.cc +++ b/src/amf/amfnd/mon.cc @@ -161,6 +161,8 @@ uint32_t avnd_mon_req_del(AVND_CB *cb, SaUint64T pid) { mon_rec = (AVND_MON_REQ *)m_NCS_DBLIST_FIND_FIRST(pid_mon_list); + m_NCS_UNLOCK(>mon_lock, NCS_LOCK_WRITE); + /* No more PIDs exists in the pid_mon_list for monitoring */ if (!mon_rec) { /* destroy the task */ @@ -173,8 +175,6 @@ uint32_t avnd_mon_req_del(AVND_CB *cb, SaUint64T pid) { } } - m_NCS_UNLOCK(>mon_lock, NCS_LOCK_WRITE); - return rc; } -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] imm: fix memory leaked in immnd [#2825]
Hi Zoran, yes you are right, hm, I didn't check the whole function... /BR Hans On 04/04/2018 09:57 AM, Zoran Milinkovic wrote: Hi Hans, Variable arguments with vsnprintf will work. The size of the new buffer is resized in if (len > errLen) { ... osafassert(vsnprintf(fmtError, len, errorString, vl) >= 0); } when there are variable arguments. BR, Zoran -Original Message- From: Hans Nordebäck Sent: den 4 april 2018 08:25 To: Vu Minh Nguyen <vu.m.ngu...@dektech.com.au>; Anders Widell <anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com; Zoran Milinkovic <zoran.milinko...@ericsson.com>; Lennart Lund <lennart.l...@ericsson.com> Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1/1] imm: fix memory leaked in immnd [#2825] Hi Vu, not saying that you should change now, but an alternative can be to: instead of: char* fmtError = (char*) malloc(errLen); osafassert(fmtError); a std::vector can be used, (or a std::array if fixed size): std::vector fmtError(errLen, 0); : len = vsnprintf(fmtError.data(), errLen, errorString, args); : fmtError.resize(len); : errStr->name.buf = strdup(fmtError.data(); : Another thing I noticed in the beginning of this function: void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString, va_list vl) { int errLen = strlen(errorString) + 1; does not include the length of the variable arguments, vsnprintf will work but the resulting string may be cut. /Regards HansN On 04/04/2018 05:02 AM, Vu Minh Nguyen wrote: Hi Hans, Anders, Please see my responses inline, with [Vu]. P.s: Please ignore previous email. I pressed wrong keys... Regards, Vu -Original Message- From: Anders Widell [mailto:anders.wid...@ericsson.com] Sent: Tuesday, April 3, 2018 7:07 PM To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Vu Minh Nguyen <vu.m.ngu...@dektech.com.au>; ravisekhar.ko...@oracle.com; zoran.milinko...@ericsson.com; lennart.l...@ericsson.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1/1] imm: fix memory leaked in immnd [#2825] Ack with comments. There is actually a second memory leak further down in this function: char* newFmtError = (char*)realloc(fmtError, len); if (newFmtError == nullptr) { TRACE_5("realloc error ,No memory "); return; } else { When realloc returns nullptr, the original memory is left untouched (not deallocated). Thus, you need a free(fmtError) before return in the code above. [Vu] Thanks. I will fix this. I agree with Hans that it would be better to use some RAII construction instead, so that you don't need to free() before each return - it is easy to forget. Maybe simply use std::string and resize() it to emulate malloc/realloc? You don't have to do it now but think about it as an improvement. [Vu] The ownership of the allocated memory later on is moved to the `global` variable `ccb`. You can see it at following code lines: if (strstr(errStr->name.buf, IMM_RESOURCE_ABORT) == errStr->name.buf) { free(errStr->name.buf); errStr->name.buf = fmtError; errStr->name.size = len; return; } Or in other case: else { (*errStrTail) = (ImmsvAttrNameList*)malloc(sizeof(ImmsvAttrNameList)); (*errStrTail)->next = NULL; (*errStrTail)->name.size = len; (*errStrTail)->name.buf = fmtError; } As IMMND is mixing C/C++ code, the `CccbInfo ccb` can be used in C code and deallocate memory using free(), therefore I keep using malloc() to avoid mix using new/free() or malloc()/delete(). regards, Anders Widell On 04/03/2018 01:42 PM, Hans Nordebäck wrote: Hi Vu, few minor comments below. /Thanks HansN On 04/03/2018 11:43 AM, Vu Minh Nguyen wrote: The allocated memory is not freed before returning from the function ImmModel::setCcbErrorString(). --- src/imm/immnd/ImmModel.cc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc index f7c8fc0..e01ff8c 100644 --- a/src/imm/immnd/ImmModel.cc +++ b/src/imm/immnd/ImmModel.cc @@ -10910,7 +10910,6 @@ SaAisErrorT ImmModel::deleteObject(ObjectMap::iterator& oi, SaUint32T reqConn, void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString, va_list vl) { int errLen = strlen(errorString) + 1; - char* fmtError = (char*)malloc(errLen); int len; va_list args; int isValidationErrString = 0; @@ -10921,6 +10920,9 @@ void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString, return; } + char* fmtError = (char*)malloc(errLen); + osafassert(fmtError); [HansN] in c++ new should be used instead of malloc. There is no need to check return value of new if "std::set_new_handler(new_handler)" has bee
Re: [devel] [PATCH 1/1] imm: fix memory leaked in immnd [#2825]
Hi Vu, not saying that you should change now, but an alternative can be to: instead of: char* fmtError = (char*) malloc(errLen); osafassert(fmtError); a std::vector can be used, (or a std::array if fixed size): std::vector fmtError(errLen, 0); : len = vsnprintf(fmtError.data(), errLen, errorString, args); : fmtError.resize(len); : errStr->name.buf = strdup(fmtError.data(); : Another thing I noticed in the beginning of this function: void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString, va_list vl) { int errLen = strlen(errorString) + 1; does not include the length of the variable arguments, vsnprintf will work but the resulting string may be cut. /Regards HansN On 04/04/2018 05:02 AM, Vu Minh Nguyen wrote: Hi Hans, Anders, Please see my responses inline, with [Vu]. P.s: Please ignore previous email. I pressed wrong keys... Regards, Vu -Original Message- From: Anders Widell [mailto:anders.wid...@ericsson.com] Sent: Tuesday, April 3, 2018 7:07 PM To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Vu Minh Nguyen <vu.m.ngu...@dektech.com.au>; ravisekhar.ko...@oracle.com; zoran.milinko...@ericsson.com; lennart.l...@ericsson.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1/1] imm: fix memory leaked in immnd [#2825] Ack with comments. There is actually a second memory leak further down in this function: char* newFmtError = (char*)realloc(fmtError, len); if (newFmtError == nullptr) { TRACE_5("realloc error ,No memory "); return; } else { When realloc returns nullptr, the original memory is left untouched (not deallocated). Thus, you need a free(fmtError) before return in the code above. [Vu] Thanks. I will fix this. I agree with Hans that it would be better to use some RAII construction instead, so that you don't need to free() before each return - it is easy to forget. Maybe simply use std::string and resize() it to emulate malloc/realloc? You don't have to do it now but think about it as an improvement. [Vu] The ownership of the allocated memory later on is moved to the `global` variable `ccb`. You can see it at following code lines: if (strstr(errStr->name.buf, IMM_RESOURCE_ABORT) == errStr->name.buf) { free(errStr->name.buf); errStr->name.buf = fmtError; errStr->name.size = len; return; } Or in other case: else { (*errStrTail) = (ImmsvAttrNameList*)malloc(sizeof(ImmsvAttrNameList)); (*errStrTail)->next = NULL; (*errStrTail)->name.size = len; (*errStrTail)->name.buf = fmtError; } As IMMND is mixing C/C++ code, the `CccbInfo ccb` can be used in C code and deallocate memory using free(), therefore I keep using malloc() to avoid mix using new/free() or malloc()/delete(). regards, Anders Widell On 04/03/2018 01:42 PM, Hans Nordebäck wrote: Hi Vu, few minor comments below. /Thanks HansN On 04/03/2018 11:43 AM, Vu Minh Nguyen wrote: The allocated memory is not freed before returning from the function ImmModel::setCcbErrorString(). --- src/imm/immnd/ImmModel.cc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc index f7c8fc0..e01ff8c 100644 --- a/src/imm/immnd/ImmModel.cc +++ b/src/imm/immnd/ImmModel.cc @@ -10910,7 +10910,6 @@ SaAisErrorT ImmModel::deleteObject(ObjectMap::iterator& oi, SaUint32T reqConn, void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString, va_list vl) { int errLen = strlen(errorString) + 1; - char* fmtError = (char*)malloc(errLen); int len; va_list args; int isValidationErrString = 0; @@ -10921,6 +10920,9 @@ void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString, return; } + char* fmtError = (char*)malloc(errLen); + osafassert(fmtError); [HansN] in c++ new should be used instead of malloc. There is no need to check return value of new if "std::set_new_handler(new_handler)" has been called in advance, e.g. in the main function. (also fmtError is a local variable, it should be possible to use RAII and avoid explicit calls to delete). [Vu] Please see my responses previously and let me know your opinion. Thanks. /Vu + va_copy(args, vl); len = vsnprintf(fmtError, errLen, errorString, args); va_end(args); -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] imm: fix memory leaked in immnd [#2825]
Hi Vu, few minor comments below. /Thanks HansN On 04/03/2018 11:43 AM, Vu Minh Nguyen wrote: The allocated memory is not freed before returning from the function ImmModel::setCcbErrorString(). --- src/imm/immnd/ImmModel.cc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc index f7c8fc0..e01ff8c 100644 --- a/src/imm/immnd/ImmModel.cc +++ b/src/imm/immnd/ImmModel.cc @@ -10910,7 +10910,6 @@ SaAisErrorT ImmModel::deleteObject(ObjectMap::iterator& oi, SaUint32T reqConn, void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString, va_list vl) { int errLen = strlen(errorString) + 1; - char* fmtError = (char*)malloc(errLen); int len; va_list args; int isValidationErrString = 0; @@ -10921,6 +10920,9 @@ void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString, return; } + char* fmtError = (char*)malloc(errLen); + osafassert(fmtError); [HansN] in c++ new should be used instead of malloc. There is no need to check return value of new if "std::set_new_handler(new_handler)" has been called in advance, e.g. in the main function. (also fmtError is a local variable, it should be possible to use RAII and avoid explicit calls to delete). + va_copy(args, vl); len = vsnprintf(fmtError, errLen, errorString, args); va_end(args); -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] VB: [PATCH 1/1] log: Handling of IMM OI BAD HANDLE in log server is incorrect [#2799]
Hi Lennart, I have one question below marked [HansN]. /Regards HansN Från: Lennart LundSkickat: den 29 mars 2018 16:04 Till: Vu Minh Nguyen; Canh Van Truong Kopia: opensaf-devel@lists.sourceforge.net Ämne: [devel] [PATCH 1/1] log: Handling of IMM OI BAD HANDLE in log server is incorrect [#2799] Recovery of OI handle shall be started in all places where BAD HANDLE can be returned. Creation of OI must be done in background thread. Ongoing creation must be possible to stop e.g if server is becoming standby --- src/log/Makefile.am | 3 + src/log/logd/lgs.h | 24 --- src/log/logd/lgs_amf.cc | 26 +-- src/log/logd/lgs_cb.h| 3 - src/log/logd/lgs_config.cc | 22 +- src/log/logd/lgs_config.h| 5 +- src/log/logd/lgs_evt.cc | 60 +++--- src/log/logd/lgs_imm.cc | 312 +++-- src/log/logd/lgs_imm.h | 53 + src/log/logd/lgs_main.cc | 116 --- src/log/logd/lgs_mbcsv_v2.cc | 7 +- src/log/logd/lgs_mbcsv_v3.cc | 7 +- src/log/logd/lgs_mbcsv_v5.cc | 3 +- src/log/logd/lgs_oi_admin.cc | 466 +++ src/log/logd/lgs_oi_admin.h | 105 ++ src/log/logd/lgs_recov.cc| 4 +- src/log/logd/lgs_stream.cc | 85 ++-- 17 files changed, 842 insertions(+), 459 deletions(-) create mode 100644 src/log/logd/lgs_imm.h create mode 100644 src/log/logd/lgs_oi_admin.cc create mode 100644 src/log/logd/lgs_oi_admin.h diff --git a/src/log/Makefile.am b/src/log/Makefile.am index 3d951eb5d..5d33d355b 100644 --- a/src/log/Makefile.am +++ b/src/log/Makefile.am @@ -79,6 +79,7 @@ noinst_HEADERS += \ src/log/logd/lgs_file.h \ src/log/logd/lgs_filehdl.h \ src/log/logd/lgs_fmt.h \ + src/log/logd/lgs_imm.h \ src/log/logd/lgs_imm_gcfg.h \ src/log/logd/lgs_mbcsv.h \ src/log/logd/lgs_mbcsv_v1.h \ @@ -86,6 +87,7 @@ noinst_HEADERS += \ src/log/logd/lgs_mbcsv_v3.h \ src/log/logd/lgs_mbcsv_v5.h \ src/log/logd/lgs_mbcsv_v6.h \ + src/log/logd/lgs_oi_admin.h \ src/log/logd/lgs_recov.h \ src/log/logd/lgs_stream.h \ src/log/logd/lgs_util.h \ @@ -139,6 +141,7 @@ bin_osaflogd_SOURCES = \ src/log/logd/lgs_mbcsv_v5.cc \ src/log/logd/lgs_mbcsv_v6.cc \ src/log/logd/lgs_mds.cc \ + src/log/logd/lgs_oi_admin.cc \ src/log/logd/lgs_recov.cc \ src/log/logd/lgs_stream.cc \ src/log/logd/lgs_util.cc \ diff --git a/src/log/logd/lgs.h b/src/log/logd/lgs.h index 18e6d9281..b1d773375 100644 --- a/src/log/logd/lgs.h +++ b/src/log/logd/lgs.h @@ -95,7 +95,6 @@ extern uint32_t mbox_msgs[NCS_IPC_PRIORITY_MAX]; extern bool mbox_full[NCS_IPC_PRIORITY_MAX]; extern uint32_t mbox_low[NCS_IPC_PRIORITY_MAX]; extern pthread_mutex_t lgs_mbox_init_mutex; -extern pthread_mutex_t lgs_OI_init_mutex; extern uint32_t initialize_for_assignment(lgs_cb_t *cb, SaAmfHAStateT ha_state); @@ -108,27 +107,4 @@ extern uint32_t lgs_mds_msg_send(lgs_cb_t *cb, lgsv_msg_t *msg, MDS_DEST *dest, MDS_SYNC_SND_CTXT *mds_ctxt, MDS_SEND_PRIORITY_TYPE prio); -extern SaAisErrorT lgs_imm_create_configStream(lgs_cb_t *cb); -extern void logRootDirectory_filemove(const std::string _logRootDirectory, - const std::string _logRootDirectory, - time_t *cur_time_in); -extern void logDataGroupname_fileown(const char *new_logDataGroupname); - -extern void lgs_imm_impl_reinit_nonblocking(lgs_cb_t *cb); -extern void lgs_imm_init_OI_handle(SaImmOiHandleT *immOiHandle, - SaSelectionObjectT *immSelectionObject); -extern void lgs_imm_impl_set(SaImmOiHandleT *immOiHandle, - SaSelectionObjectT *immSelectionObject); -extern SaAisErrorT lgs_imm_init_configStreams(lgs_cb_t *cb); - -// Functions for recovery handling -void lgs_cleanup_abandoned_streams(); -void lgs_delete_one_stream_object(const std::string _str); -void lgs_search_stream_objects(); -SaUint32T *lgs_get_scAbsenceAllowed_attr(SaUint32T *attr_val); -int lgs_get_streamobj_attr(SaImmAttrValuesT_2 ***attrib_out, - const std::string _name, - SaImmHandleT *immOmHandle); -int lgs_free_streamobj_attr(SaImmHandleT immHandle); - #endif // LOG_LOGD_LGS_H_ diff --git a/src/log/logd/lgs_amf.cc b/src/log/logd/lgs_amf.cc index 6fa044ff2..3f0de8c1c 100644 --- a/src/log/logd/lgs_amf.cc +++ b/src/log/logd/lgs_amf.cc @@ -23,6 +23,7 @@ #include "osaf/immutil/immutil.h" #include "log/logd/lgs.h" #include "log/logd/lgs_config.h" +#include "log/logd/lgs_oi_admin.h" static void close_all_files() { log_stream_t *stream; @@ -63,8 +64,7 @@ static SaAisErrorT amf_active_state_handler(lgs_cb_t *cb, goto done; } -
Re: [devel] [PATCH 1/1] mds: improve thread safety in mdstest - part 2 [#2746]
ack, review only. /Thanks HansN On 03/27/2018 10:34 AM, Hoa Le wrote: - Remove thread-local declaration of svc_to_mds_info --- src/mds/apitest/mdstipc_api.c | 7 ++- src/mds/apitest/mdstipc_conf.c | 43 +- 2 files changed, 27 insertions(+), 23 deletions(-) diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c index 669c770..5bfa7ef 100644 --- a/src/mds/apitest/mdstipc_api.c +++ b/src/mds/apitest/mdstipc_api.c @@ -33,7 +33,6 @@ static MDS_CLIENT_MSG_FORMAT_VER gl_set_msg_fmt_ver; MDS_SVC_ID svc_ids[3] = {2006, 2007, 2008}; -_Thread_local NCSMDS_INFO svc_to_mds_info; pthread_mutex_t safe_printf_mutex = PTHREAD_MUTEX_INITIALIZER; pthread_mutex_t gl_mutex = PTHREAD_MUTEX_INITIALIZER; @@ -3513,6 +3512,7 @@ void tet_just_send_tp_11() MDS_SVC_ID svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN}; gl_vdest_indx = 0; + NCSMDS_INFO svc_to_mds_info; char tmp[] = " Hi Receiver "; TET_MDS_MSG *mesg; mesg = (TET_MDS_MSG *)malloc(sizeof(TET_MDS_MSG)); @@ -8020,6 +8020,7 @@ void tet_direct_just_send_tp_9() { int FAIL = 0; MDS_SVC_ID svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN}; + NCSMDS_INFO svc_to_mds_info; char message[] = "Direct Message"; /*start up*/ @@ -8145,6 +8146,7 @@ void tet_direct_just_send_tp_11() { int FAIL = 0; MDS_SVC_ID svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN}; + NCSMDS_INFO svc_to_mds_info; char message[] = "Direct Message"; /*start up*/ @@ -9998,6 +1,7 @@ void tet_direct_send_ack_tp_10() { int FAIL = 0; MDS_SVC_ID svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN}; + NCSMDS_INFO svc_to_mds_info; char message[] = "Direct Message"; /*start up*/ if (tet_initialise_setup(false)) { @@ -10074,6 +10077,7 @@ void tet_direct_send_ack_tp_11() { int FAIL = 0; MDS_SVC_ID svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN}; + NCSMDS_INFO svc_to_mds_info; char message[] = "Direct Message"; /*start up*/ if (tet_initialise_setup(false)) { @@ -11709,6 +11713,7 @@ void tet_direct_broadcast_to_svc_tp_8() { int FAIL = 0; MDS_SVC_ID svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN}; + NCSMDS_INFO svc_to_mds_info; /*Start up*/ if (tet_initialise_setup(false)) { printf("\n Setup Initialisation has Failed \n"); diff --git a/src/mds/apitest/mdstipc_conf.c b/src/mds/apitest/mdstipc_conf.c index c2d7d01..bf4c1de 100644 --- a/src/mds/apitest/mdstipc_conf.c +++ b/src/mds/apitest/mdstipc_conf.c @@ -25,7 +25,6 @@ extern int fill_syncparameters(int); extern uint32_t mds_vdest_tbl_get_role(MDS_VDEST_ID vdest_id, V_DEST_RL *role); extern pthread_mutex_t gl_mds_library_mutex; -extern _Thread_local NCSMDS_INFO svc_to_mds_info; extern pthread_mutex_t safe_printf_mutex; extern pthread_mutex_t gl_mutex; @@ -418,7 +417,7 @@ uint32_t mds_service_install(MDS_HDL mds_hdl, MDS_SVC_ID svc_id, bool mds_q_ownership, bool fail_no_active_sends) { int i; - memset(_to_mds_info, 0, sizeof(svc_to_mds_info)); + NCSMDS_INFO svc_to_mds_info; svc_to_mds_info.i_mds_hdl = mds_hdl; svc_to_mds_info.i_svc_id = svc_id; @@ -465,7 +464,7 @@ uint32_t mds_service_uninstall(MDS_HDL mds_hdl, MDS_SVC_ID svc_id) { int i, j, k, FOUND; uint32_t YES_ADEST; - memset(_to_mds_info, 0, sizeof(svc_to_mds_info)); + NCSMDS_INFO svc_to_mds_info; /*Find whether this Service is on Adest or Vdest*/ YES_ADEST = is_service_on_adest(mds_hdl, svc_id); @@ -560,7 +559,7 @@ uint32_t mds_service_subscribe(MDS_HDL mds_hdl, MDS_SVC_ID svc_id, { int i, j, k, l, FOUND; uint32_t YES_ADEST; - memset(_to_mds_info, 0, sizeof(svc_to_mds_info)); + NCSMDS_INFO svc_to_mds_info; /*Find whether this Service is on Adest or Vdest*/ YES_ADEST = is_service_on_adest(mds_hdl, svc_id); @@ -746,7 +745,7 @@ uint32_t mds_service_redundant_subscribe(MDS_HDL mds_hdl, MDS_SVC_ID svc_id, { int i, j, k, l, FOUND; uint32_t YES_ADEST; - memset(_to_mds_info, 0, sizeof(svc_to_mds_info)); + NCSMDS_INFO svc_to_mds_info; /*Find whether this Service is on Adest or Vdest*/ YES_ADEST = is_service_on_adest(mds_hdl, svc_id); @@ -931,7 +930,7 @@ uint32_t mds_service_cancel_subscription(MDS_HDL mds_hdl, MDS_SVC_ID svc_id, uint8_t num_svcs, MDS_SVC_ID *svc_ids) { int i, j, k, FOUND; - memset(_to_mds_info, 0, sizeof(svc_to_mds_info)); + NCSMDS_INFO svc_to_mds_info; svc_to_mds_info.i_mds_hdl = mds_hdl; svc_to_mds_info.i_svc_id = svc_id; svc_to_mds_info.i_op = MDS_CANCEL; @@ -998,7 +997,7 @@ uint32_t mds_just_send(MDS_HDL mds_hdl, MDS_SVC_ID svc_id, MDS_SVC_ID to_svc, TET_MDS_MSG
Re: [devel] [PATCH 1/1] mds: improve thread safety in mdstest - part 2 [#2746]
Hi Hoa, Do we need the svc_to_mds_info to be tls? I used it in the sample added to ticket to reduce number of helgrind warnings when I verified safe_printf and safe_fflush. It should have been removed in the sample, as it was not fully verified. /Thanks HansN -Original Message- From: Hoa Le [mailto:hoa...@dektech.com.au] Sent: den 27 mars 2018 03:47 To: Anders Widell <anders.wid...@ericsson.com>; Hans Nordebäck <hans.nordeb...@ericsson.com> Cc: opensaf-devel@lists.sourceforge.net; Hoa Le <hoa...@dektech.com.au> Subject: [PATCH 1/1] mds: improve thread safety in mdstest - part 2 [#2746] - Use __thread if _Thread_local is not supported in GCC version lower than 4.9 --- src/mds/apitest/mdstipc.h | 6 ++ src/mds/apitest/mdstipc_api.c | 2 +- src/mds/apitest/mdstipc_conf.c | 2 +- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/src/mds/apitest/mdstipc.h b/src/mds/apitest/mdstipc.h index 01b58c4..f67890a 100644 --- a/src/mds/apitest/mdstipc.h +++ b/src/mds/apitest/mdstipc.h @@ -21,6 +21,12 @@ #include "base/ncssysf_tsk.h" #include "base/ncssysf_def.h" +#if !defined(_Thread_local) +#define MDS_THREAD_LOCAL __thread +#else +#define MDS_THREAD_LOCAL _Thread_local +#endif + typedef struct tet_task { NCS_OS_CB entry; void *arg; diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c index 669c770..2ff8238 100644 --- a/src/mds/apitest/mdstipc_api.c +++ b/src/mds/apitest/mdstipc_api.c @@ -33,7 +33,7 @@ static MDS_CLIENT_MSG_FORMAT_VER gl_set_msg_fmt_ver; MDS_SVC_ID svc_ids[3] = {2006, 2007, 2008}; -_Thread_local NCSMDS_INFO svc_to_mds_info; +MDS_THREAD_LOCAL NCSMDS_INFO svc_to_mds_info; pthread_mutex_t safe_printf_mutex = PTHREAD_MUTEX_INITIALIZER; pthread_mutex_t gl_mutex = PTHREAD_MUTEX_INITIALIZER; diff --git a/src/mds/apitest/mdstipc_conf.c b/src/mds/apitest/mdstipc_conf.c index c2d7d01..d6ee48e 100644 --- a/src/mds/apitest/mdstipc_conf.c +++ b/src/mds/apitest/mdstipc_conf.c @@ -25,7 +25,7 @@ extern int fill_syncparameters(int); extern uint32_t mds_vdest_tbl_get_role(MDS_VDEST_ID vdest_id, V_DEST_RL *role); extern pthread_mutex_t gl_mds_library_mutex; -extern _Thread_local NCSMDS_INFO svc_to_mds_info; +extern MDS_THREAD_LOCAL NCSMDS_INFO svc_to_mds_info; extern pthread_mutex_t safe_printf_mutex; extern pthread_mutex_t gl_mutex; -- 2.7.4 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] base: Add functions for parsing a string as an integer [#2814]
ack, review only. Minor comment below. /Thanks HansN On 03/22/2018 03:13 PM, Anders Widell wrote: The new functions StrToInt64 and StrToUint64 parse a string as a signed and unsigned 64-bit integer, respectively. Prefixes 0 for octal and 0x for hexadecimal are supported, as well as suffixes k, M and G for kilobytes, megabytes and gigabytes, respectively. --- src/base/Makefile.am| 3 + src/base/string_parse.cc| 79 ++ src/base/string_parse.h | 47 + src/base/tests/string_parse_test.cc | 129 4 files changed, 258 insertions(+) create mode 100644 src/base/string_parse.cc create mode 100644 src/base/string_parse.h create mode 100644 src/base/tests/string_parse_test.cc diff --git a/src/base/Makefile.am b/src/base/Makefile.am index 540c6dfe7..bb13d6c43 100644 --- a/src/base/Makefile.am +++ b/src/base/Makefile.am @@ -65,6 +65,7 @@ lib_libopensaf_core_la_SOURCES += \ src/base/process.cc \ src/base/saf_edu.c \ src/base/saf_error.c \ + src/base/string_parse.cc \ src/base/sysf_def.c \ src/base/sysf_exc_scr.c \ src/base/sysf_ipc.c \ @@ -140,6 +141,7 @@ noinst_HEADERS += \ src/base/saf_error.h \ src/base/saf_mem.h \ src/base/sprr_dl_api.h \ + src/base/string_parse.h \ src/base/sysf_exc_scr.h \ src/base/sysf_ipc.h \ src/base/tests/mock_clock_gettime.h \ @@ -206,6 +208,7 @@ bin_libbase_test_SOURCES = \ src/base/tests/mock_logtrace.cc \ src/base/tests/mock_osaf_abort.cc \ src/base/tests/mock_osafassert.cc \ + src/base/tests/string_parse_test.cc \ src/base/tests/time_add_test.cc \ src/base/tests/time_compare_test.cc \ src/base/tests/time_convert_test.cc \ diff --git a/src/base/string_parse.cc b/src/base/string_parse.cc new file mode 100644 index 0..915f0e95a --- /dev/null +++ b/src/base/string_parse.cc @@ -0,0 +1,79 @@ +/* -*- OpenSAF -*- + * + * Copyright Ericsson AB 2018 - All Rights Reserved. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed + * under the GNU Lesser General Public License Version 2.1, February 1999. + * The complete license can be accessed from the following location: + * http://opensource.org/licenses/lgpl-license.php + * See the Copying file included with the OpenSAF distribution for full + * licensing terms. + * + */ + +#include "base/string_parse.h" + +#include +#include +#include + +namespace { + +int64_t ParseSuffix(char** endptr) { + switch (**endptr) { +case 'k': + ++(*endptr); + return 1024; +case 'M': + ++(*endptr); + return 1024 * 1024; +case 'G': + ++(*endptr); + return 1024 * 1024 * 1024; +default: + return 1; + } +} + +} // namespace + +namespace base { + +int64_t StrToInt64(const char* str, bool* success) { + str = RemoveLeadingWhitespace(str); + errno = 0; + char* endptr; + int64_t val = strtoll(str, , 0); + int64_t multiplier = ParseSuffix(); + endptr = RemoveLeadingWhitespace(endptr); + *success = *str != '\0' && errno == 0 && *endptr == '\0' && + (val >= 0 ? val <= (INT64_MAX / multiplier) + : val >= (INT64_MIN / multiplier)); + return val * multiplier; +} + +uint64_t StrToUint64(const char* str, bool* success) { + str = RemoveLeadingWhitespace(str); + errno = 0; + char* endptr; + uint64_t val = strtoull(str, , 0); + uint64_t multiplier = ParseSuffix(); + endptr = RemoveLeadingWhitespace(endptr); + *success = *str != '\0' && *str != '-' && errno == 0 && *endptr == '\0' && + val <= (~static_cast(0) / multiplier); + return val * multiplier; +} + +const char* RemoveLeadingWhitespace(const char* str) { + while (*str == ' ' || *str == '\t' || *str == '\r' || *str == '\n') ++str; + return str; +} + +char* RemoveLeadingWhitespace(char* str) { + while (*str == ' ' || *str == '\t' || *str == '\r' || *str == '\n') ++str; + return str; +} + [HansN] you can change name of thiese functions and use isspace: const char* TrimLeadingWhitespace(const char* str) { while (isspace(*str)) ++str; return str; } char* TrimLeadingWhitespace(char* str) { while (isspace(*str)) ++str; return str; } +} // namespace base diff --git a/src/base/string_parse.h b/src/base/string_parse.h new file mode 100644 index 0..17569241c --- /dev/null +++ b/src/base/string_parse.h @@ -0,0 +1,47 @@ +/* -*- OpenSAF -*- + * + * Copyright Ericsson AB 2018 - All Rights Reserved. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed + *
Re: [devel] [PATCH 1/1] mds: improve thread safety in mdstest [#2746]
Hi Hoa, ack, I reviewed the V1 patch and it looks good and I agree on Zoran's comments regarding mutex instead of rwlock. I'm waiting for the valgrind/helgrind results, so far it looks good. One question to Zoran, doesn't the performance os rwlock depend on e.g. number of threads, access patterns, atomic operations etc. otherwise there would be no need to have read and write locks? Say you have a large number of threads that are only reading some variable(s), not atomic, and using read locks it should be able to run them in parallel which is not the case with an ordinary mutex, which serializes the readings. Have you tested this or where comes these numbers from? /Regards HansN On 03/22/2018 01:47 PM, Hoa Le wrote: Hi, I replaced "rwlock" with "mutex" as suggested by Zoran in version 2 of the patch, please help review it again. Thank you. -- Best regards, Hoa Le On 03/21/2018 07:43 PM, Zoran Milinkovic wrote: Hi, According to the patch (I haven't checked the code), I don't see the reason for using rwlock. Pthread mutex will even work better than rwlock in the patch. Reasons for using mutex: 1. Mutex is much faster than rwlock in Linux, around 10 times faster, if I remember correctly 2. Here, we are talking about two threads, and not more. 3. rwlock has never been used in OpenSAF before. BR, Zoran -Original Message- From: Hoa Le [mailto:hoa...@dektech.com.au] Sent: den 21 mars 2018 11:34 To: Anders Widell<anders.wid...@ericsson.com>; Hans Nordebäck<hans.nordeb...@ericsson.com> Cc:opensaf-devel@lists.sourceforge.net Subject: [devel] [PATCH 1/1] mds: improve thread safety in mdstest [#2746] - Correct helgrind issues in mds/apitest --- src/mds/apitest/mdstest.c | 7 +- src/mds/apitest/mdstipc.h | 7 +- src/mds/apitest/mdstipc_api.c | 196 + src/mds/apitest/mdstipc_conf.c | 89 +-- 4 files changed, 234 insertions(+), 65 deletions(-) diff --git a/src/mds/apitest/mdstest.c b/src/mds/apitest/mdstest.c index bf6e173..3280e5b 100644 --- a/src/mds/apitest/mdstest.c +++ b/src/mds/apitest/mdstest.c @@ -35,6 +35,7 @@ //#include "mdstest.h" SaAisErrorT rc; +pthread_rwlock_t gl_lock; int mds_startup(void) { @@ -83,13 +84,17 @@ int main(int argc, char **argv) if (suite == 999) { return 0; } - if (mds_startup() != 0) { printf("Fail to start mds agents\n"); return 1; } + pthread_rwlock_init(_lock, NULL); + int rc = test_run(suite, tcase); + + pthread_rwlock_destroy(_lock); + mds_shutdown(); return rc; } diff --git a/src/mds/apitest/mdstipc.h b/src/mds/apitest/mdstipc.h index fbb6468..9e93a17 100644 --- a/src/mds/apitest/mdstipc.h +++ b/src/mds/apitest/mdstipc.h @@ -145,13 +145,12 @@ typedef struct tet_mds_recvd_msg_info { } TET_MDS_RECVD_MSG_INFO; /* GLOBAL variables / +extern _Thread_local NCSMDS_INFO svc_to_mds_info; +extern pthread_rwlock_t gl_lock; + TET_ADEST gl_tet_adest; TET_VDEST gl_tet_vdest[4]; /*change it to 6 to run VDS Redundancy: 101 for Stress*/ -NCSADA_INFO ada_info; -NCSVDA_INFO vda_info; -NCSMDS_INFO svc_to_mds_info; -TET_EVENT_INFO gl_event_data; TET_SVC gl_tet_svc; TET_MDS_RECVD_MSG_INFO gl_rcvdmsginfo, gl_direct_rcvmsginfo; int gl_vdest_indx; diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c index 5eb8bd9..3a98ecd 100644 --- a/src/mds/apitest/mdstipc_api.c +++ b/src/mds/apitest/mdstipc_api.c @@ -33,6 +33,28 @@ static MDS_CLIENT_MSG_FORMAT_VER gl_set_msg_fmt_ver; MDS_SVC_ID svc_ids[3] = {2006, 2007, 2008}; +pthread_mutex_t safe_printf_mutex = PTHREAD_MUTEX_INITIALIZER; +_Thread_local NCSMDS_INFO svc_to_mds_info; + +void safe_printf(const char* format, ... ) { + pthread_mutex_lock(_printf_mutex); + va_list args; + va_start(args, format); + vfprintf(stdout, format, args); + va_end(args); + pthread_mutex_unlock(_printf_mutex); +} +int safe_fflush(FILE *stream) { + int rc = 0; + pthread_mutex_lock(_printf_mutex); + rc = fflush(stream); + pthread_mutex_unlock(_printf_mutex); + return rc; +} + +#define printf safe_printf +#define fflush safe_fflush + /*/ /SERVICE API TEST CASES / /*/ @@ -363,6 +385,7 @@ void tet_svc_install_tp_10() { int FAIL = 0; SaUint32T rc; + NCSCONTEXT t_handle = 0; // Creating a MxN VDEST with id = 2000 rc = create_vdest(NCS_VDEST_TYPE_MxN, 2000); if (rc != NCSCC_RC_SUCCESS) { @@ -373,25 +396,25 @@ void tet_svc_install_tp_10() printf( "
Re: [devel] [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805]
Hi Minh, Yes, that will be ok. /Regards HansN -Original Message- From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] Sent: den 20 mars 2018 12:53 To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell <anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805] Hi Hans, So I guess it's ok to push V1 with comments from you and Anders. Regards, Minh On 20/03/18 19:16, Hans Nordebäck wrote: > Hi Minh, > > I think you can keep v1 but add the missing if stmt. The /proc/self/fd > directory should only contain open fd's and the current and parent directory. > Later you change stroixmax to the new string to integer utility, if needed, > (not likely it is needed though, I think). > /Regards HansN > > -Original Message- > From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] > Sent: den 20 mars 2018 08:29 > To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell > <anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1/1] base: Only close inherited fd(s) after fork() > in child process [#2805] > > Hi Hans, > > I have just seen Anders created #2814 for the utility function, once > #2814 is done, we can change the integer conversion in this patch by new > utility function, maybe for a numbers of atoXX (atoi, atol,..) calls in many > other places for a better error handling. How do you think? > > Thanks, > > Minh > > > On 20/03/18 01:29, Hans Nordebäck wrote: >> Hi Minh, >> >> I think this additional check, current and parent, directory is >> enough for V1 patch. The usage of the 2nd >> >> parameter of strtol in V2 patch can be put in a utility function for >> a broader use. >> >> /Regards HansN >> >> >> On 03/19/2018 08:50 AM, Minh Hon Chau wrote: >>> Hi Hans, >>> >>> Agree that the check of "." and ".." should be added in V1. >>> >>> This V2 I use the second parameter of strtol, it should ensure that >>> anything read from the fd directory is entirely digit, before close >>> the fd. >>> >>> There should not be any alphabet-based directories other than ".", >>> ".." and 0, 1, 2, ..etc, but the usage of 2nd parameter of strtol is >>> more generalized >>> >>> So I think the check of strcmp is not needed? >>> >>> Thanks, >>> Minh >>> On 19/03/18 18:00, Hans Nordebäck wrote: >>>> Hi Minh, >>>> >>>> my comment was that this check could be added: >>>> >>>> if (strcmp(pentry->d_name, ".") == 0 || strcmp(pentry->d_name, >>>> "..") == 0) >>>> >>>> continue; >>>> >>>> /Regards HansN >>>> >>>> On 03/16/2018 01:27 PM, Minh Hon Chau wrote: >>>>> Hi Anders, Hans, >>>>> >>>>> When I tested the patch, I did see the "." and ".." returned from >>>>> readdir, but the stroimax also return 0, so probably it won't be a >>>>> problem to close(0) more than once >>>>> >>>>> But to be more safety, it should check the @second_ptr too, I will >>>>> update and send out a V2. >>>>> >>>>> Thanks >>>>> Minh >>>>> On 16/03/18 23:00, Anders Widell wrote: >>>>>> Hi! >>>>>> >>>>>> See my comments below, marked AndersW2>. >>>>>> >>>>>> regards, >>>>>> >>>>>> Anders Widell >>>>>> >>>>>> >>>>>> On 03/16/2018 12:39 PM, Hans Nordebäck wrote: >>>>>>> Hi Minh, ack with some comments, (on top of AndersW comments). >>>>>>> >>>>>>> /Thanks HansN >>>>>>> >>>>>>> >>>>>>> On 03/15/2018 07:50 AM, Minh Chau wrote: >>>>>>>> --- >>>>>>>> src/base/os_defs.c | 25 - >>>>>>>> 1 file changed, 20 insertions(+), 5 deletions(-) >>>>>>>> >>>>>>>> diff --git a/src/base/os_defs.c b/src/base/os_defs.c index >>>>>>>> 6f9ec52..e914011 100644 >>>>>>>> --- a/src/base/os_defs.c >&g
Re: [devel] [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805]
Hi Minh, I think you can keep v1 but add the missing if stmt. The /proc/self/fd directory should only contain open fd's and the current and parent directory. Later you change stroixmax to the new string to integer utility, if needed, (not likely it is needed though, I think). /Regards HansN -Original Message- From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] Sent: den 20 mars 2018 08:29 To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell <anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805] Hi Hans, I have just seen Anders created #2814 for the utility function, once #2814 is done, we can change the integer conversion in this patch by new utility function, maybe for a numbers of atoXX (atoi, atol,..) calls in many other places for a better error handling. How do you think? Thanks, Minh On 20/03/18 01:29, Hans Nordebäck wrote: > Hi Minh, > > I think this additional check, current and parent, directory is enough > for V1 patch. The usage of the 2nd > > parameter of strtol in V2 patch can be put in a utility function for a > broader use. > > /Regards HansN > > > On 03/19/2018 08:50 AM, Minh Hon Chau wrote: >> Hi Hans, >> >> Agree that the check of "." and ".." should be added in V1. >> >> This V2 I use the second parameter of strtol, it should ensure that >> anything read from the fd directory is entirely digit, before close >> the fd. >> >> There should not be any alphabet-based directories other than ".", >> ".." and 0, 1, 2, ..etc, but the usage of 2nd parameter of strtol is >> more generalized >> >> So I think the check of strcmp is not needed? >> >> Thanks, >> Minh >> On 19/03/18 18:00, Hans Nordebäck wrote: >>> Hi Minh, >>> >>> my comment was that this check could be added: >>> >>> if (strcmp(pentry->d_name, ".") == 0 || strcmp(pentry->d_name, "..") >>> == 0) >>> >>> continue; >>> >>> /Regards HansN >>> >>> On 03/16/2018 01:27 PM, Minh Hon Chau wrote: >>>> Hi Anders, Hans, >>>> >>>> When I tested the patch, I did see the "." and ".." returned from >>>> readdir, but the stroimax also return 0, so probably it won't be a >>>> problem to close(0) more than once >>>> >>>> But to be more safety, it should check the @second_ptr too, I will >>>> update and send out a V2. >>>> >>>> Thanks >>>> Minh >>>> On 16/03/18 23:00, Anders Widell wrote: >>>>> Hi! >>>>> >>>>> See my comments below, marked AndersW2>. >>>>> >>>>> regards, >>>>> >>>>> Anders Widell >>>>> >>>>> >>>>> On 03/16/2018 12:39 PM, Hans Nordebäck wrote: >>>>>> Hi Minh, ack with some comments, (on top of AndersW comments). >>>>>> >>>>>> /Thanks HansN >>>>>> >>>>>> >>>>>> On 03/15/2018 07:50 AM, Minh Chau wrote: >>>>>>> --- >>>>>>> src/base/os_defs.c | 25 - >>>>>>> 1 file changed, 20 insertions(+), 5 deletions(-) >>>>>>> >>>>>>> diff --git a/src/base/os_defs.c b/src/base/os_defs.c index >>>>>>> 6f9ec52..e914011 100644 >>>>>>> --- a/src/base/os_defs.c >>>>>>> +++ b/src/base/os_defs.c >>>>>>> @@ -1052,14 +1052,29 @@ uint32_t >>>>>>> ncs_os_process_execute_timed(NCS_OS_PROC_EXECUTE_TIMED_INFO >>>>>>> *req) >>>>>>> * child */ >>>>>>> if (getenv("OPENSAF_KEEP_FD_OPEN_AFTER_FORK") == NULL) >>>>>>> { >>>>>>> /* Close all inherited file descriptors */ >>>>>>> - int i = sysconf(_SC_OPEN_MAX); >>>>>>> - if (i == -1) { >>>>>>> + int fd_max = sysconf(_SC_OPEN_MAX); >>>>>>> + >>>>>>> + if (fd_max == -1) { >>>>>>> syslog(LOG_ERR, "%s: sysconf failed - %s", >>>>>>> - __FUNCTION__, strerror(errno)); >>>>>&g
Re: [devel] [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805]
Hi Minh, I think this additional check, current and parent, directory is enough for V1 patch. The usage of the 2nd parameter of strtol in V2 patch can be put in a utility function for a broader use. /Regards HansN On 03/19/2018 08:50 AM, Minh Hon Chau wrote: Hi Hans, Agree that the check of "." and ".." should be added in V1. This V2 I use the second parameter of strtol, it should ensure that anything read from the fd directory is entirely digit, before close the fd. There should not be any alphabet-based directories other than ".", ".." and 0, 1, 2, ..etc, but the usage of 2nd parameter of strtol is more generalized So I think the check of strcmp is not needed? Thanks, Minh On 19/03/18 18:00, Hans Nordebäck wrote: Hi Minh, my comment was that this check could be added: if (strcmp(pentry->d_name, ".") == 0 || strcmp(pentry->d_name, "..") == 0) continue; /Regards HansN On 03/16/2018 01:27 PM, Minh Hon Chau wrote: Hi Anders, Hans, When I tested the patch, I did see the "." and ".." returned from readdir, but the stroimax also return 0, so probably it won't be a problem to close(0) more than once But to be more safety, it should check the @second_ptr too, I will update and send out a V2. Thanks Minh On 16/03/18 23:00, Anders Widell wrote: Hi! See my comments below, marked AndersW2>. regards, Anders Widell On 03/16/2018 12:39 PM, Hans Nordebäck wrote: Hi Minh, ack with some comments, (on top of AndersW comments). /Thanks HansN On 03/15/2018 07:50 AM, Minh Chau wrote: --- src/base/os_defs.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/src/base/os_defs.c b/src/base/os_defs.c index 6f9ec52..e914011 100644 --- a/src/base/os_defs.c +++ b/src/base/os_defs.c @@ -1052,14 +1052,29 @@ uint32_t ncs_os_process_execute_timed(NCS_OS_PROC_EXECUTE_TIMED_INFO *req) * child */ if (getenv("OPENSAF_KEEP_FD_OPEN_AFTER_FORK") == NULL) { /* Close all inherited file descriptors */ - int i = sysconf(_SC_OPEN_MAX); - if (i == -1) { + int fd_max = sysconf(_SC_OPEN_MAX); + + if (fd_max == -1) { syslog(LOG_ERR, "%s: sysconf failed - %s", - __FUNCTION__, strerror(errno)); + __FUNCTION__, strerror(errno)); exit(EXIT_FAILURE); } - for (i--; i >= 0; --i) - (void)close(i); /* close all descriptors */ + struct dirent *pentry = NULL; + DIR *dir = opendir("/proc/self/fd"); + + if (dir != NULL) { + while ((pentry = readdir(dir)) != NULL) { [HansN] readdir will return not only 0, 1, 2 etc. but also the current directory '.' and '..' this should be handled here. [HansN] perhaps we should use readdir_r instead? AndersW2> I also thought about this, but it turns out that readdir_r is (or will become) obsolete. It is listed as obsolete on our wiki page: https://sourceforge.net/p/opensaf/wiki/Unsafe%20and%20Obsolete%20Functions/ Maybe we should unlist readdir() from the non-thread-safe section? Regarding . and .., I think we should check for parse errors, i.e. if it was a valid number or not. The second parameter to strtoimax will (if not NULL) tell us how much of the string that was parsed. It should point to a '\0' character if the whole string was parsed as a valid number. In addition, you need to check that the string was not empty to begin with. I am thinking about adding a support function in base, that can parse strings into numbers and handle parse errors in a convenient way. the strtoxxx functions are a bit tricky since you need to check the end pointer, and also errno for overflow/underflow errors. + int fd = strtoimax(pentry->d_name, NULL, 10); + if (fd > INT_MIN && fd < fd_max) (void)close(fd); + } + (void)closedir(dir); + } else { + /* fall back, close all possible descriptors */ + syslog(LOG_ERR, "%s: opendir failed - %s", + __FUNCTION__, strerror(errno)); + for (fd_max--; fd_max >= 0; --fd_max) + (void)close(fd_max); + } /* Redirect standard files to /dev/null */ if (freopen("/dev/null", "r", stdin) == NULL) -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805]
Hi Minh, my comment was that this check could be added: if (strcmp(pentry->d_name, ".") == 0 || strcmp(pentry->d_name, "..") == 0) continue; /Regards HansN On 03/16/2018 01:27 PM, Minh Hon Chau wrote: Hi Anders, Hans, When I tested the patch, I did see the "." and ".." returned from readdir, but the stroimax also return 0, so probably it won't be a problem to close(0) more than once But to be more safety, it should check the @second_ptr too, I will update and send out a V2. Thanks Minh On 16/03/18 23:00, Anders Widell wrote: Hi! See my comments below, marked AndersW2>. regards, Anders Widell On 03/16/2018 12:39 PM, Hans Nordebäck wrote: Hi Minh, ack with some comments, (on top of AndersW comments). /Thanks HansN On 03/15/2018 07:50 AM, Minh Chau wrote: --- src/base/os_defs.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/src/base/os_defs.c b/src/base/os_defs.c index 6f9ec52..e914011 100644 --- a/src/base/os_defs.c +++ b/src/base/os_defs.c @@ -1052,14 +1052,29 @@ uint32_t ncs_os_process_execute_timed(NCS_OS_PROC_EXECUTE_TIMED_INFO *req) * child */ if (getenv("OPENSAF_KEEP_FD_OPEN_AFTER_FORK") == NULL) { /* Close all inherited file descriptors */ - int i = sysconf(_SC_OPEN_MAX); - if (i == -1) { + int fd_max = sysconf(_SC_OPEN_MAX); + + if (fd_max == -1) { syslog(LOG_ERR, "%s: sysconf failed - %s", - __FUNCTION__, strerror(errno)); + __FUNCTION__, strerror(errno)); exit(EXIT_FAILURE); } - for (i--; i >= 0; --i) - (void)close(i); /* close all descriptors */ + struct dirent *pentry = NULL; + DIR *dir = opendir("/proc/self/fd"); + + if (dir != NULL) { + while ((pentry = readdir(dir)) != NULL) { [HansN] readdir will return not only 0, 1, 2 etc. but also the current directory '.' and '..' this should be handled here. [HansN] perhaps we should use readdir_r instead? AndersW2> I also thought about this, but it turns out that readdir_r is (or will become) obsolete. It is listed as obsolete on our wiki page: https://sourceforge.net/p/opensaf/wiki/Unsafe%20and%20Obsolete%20Functions/ Maybe we should unlist readdir() from the non-thread-safe section? Regarding . and .., I think we should check for parse errors, i.e. if it was a valid number or not. The second parameter to strtoimax will (if not NULL) tell us how much of the string that was parsed. It should point to a '\0' character if the whole string was parsed as a valid number. In addition, you need to check that the string was not empty to begin with. I am thinking about adding a support function in base, that can parse strings into numbers and handle parse errors in a convenient way. the strtoxxx functions are a bit tricky since you need to check the end pointer, and also errno for overflow/underflow errors. + int fd = strtoimax(pentry->d_name, NULL, 10); + if (fd > INT_MIN && fd < fd_max) (void)close(fd); + } + (void)closedir(dir); + } else { + /* fall back, close all possible descriptors */ + syslog(LOG_ERR, "%s: opendir failed - %s", + __FUNCTION__, strerror(errno)); + for (fd_max--; fd_max >= 0; --fd_max) + (void)close(fd_max); + } /* Redirect standard files to /dev/null */ if (freopen("/dev/null", "r", stdin) == NULL) -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805]
Hi, yes, I missed that readdir_r is marked deprecated in newer man pages, it is not marked deprecated nor in latest LSB or The Open Group Base Specification. Unlisting readdir sounds ok then. /Thanks HansN On 03/16/2018 01:00 PM, Anders Widell wrote: Hi! See my comments below, marked AndersW2>. regards, Anders Widell On 03/16/2018 12:39 PM, Hans Nordebäck wrote: Hi Minh, ack with some comments, (on top of AndersW comments). /Thanks HansN On 03/15/2018 07:50 AM, Minh Chau wrote: --- src/base/os_defs.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/src/base/os_defs.c b/src/base/os_defs.c index 6f9ec52..e914011 100644 --- a/src/base/os_defs.c +++ b/src/base/os_defs.c @@ -1052,14 +1052,29 @@ uint32_t ncs_os_process_execute_timed(NCS_OS_PROC_EXECUTE_TIMED_INFO *req) * child */ if (getenv("OPENSAF_KEEP_FD_OPEN_AFTER_FORK") == NULL) { /* Close all inherited file descriptors */ - int i = sysconf(_SC_OPEN_MAX); - if (i == -1) { + int fd_max = sysconf(_SC_OPEN_MAX); + + if (fd_max == -1) { syslog(LOG_ERR, "%s: sysconf failed - %s", - __FUNCTION__, strerror(errno)); + __FUNCTION__, strerror(errno)); exit(EXIT_FAILURE); } - for (i--; i >= 0; --i) - (void)close(i); /* close all descriptors */ + struct dirent *pentry = NULL; + DIR *dir = opendir("/proc/self/fd"); + + if (dir != NULL) { + while ((pentry = readdir(dir)) != NULL) { [HansN] readdir will return not only 0, 1, 2 etc. but also the current directory '.' and '..' this should be handled here. [HansN] perhaps we should use readdir_r instead? AndersW2> I also thought about this, but it turns out that readdir_r is (or will become) obsolete. It is listed as obsolete on our wiki page: https://sourceforge.net/p/opensaf/wiki/Unsafe%20and%20Obsolete%20Functions/ Maybe we should unlist readdir() from the non-thread-safe section? Regarding . and .., I think we should check for parse errors, i.e. if it was a valid number or not. The second parameter to strtoimax will (if not NULL) tell us how much of the string that was parsed. It should point to a '\0' character if the whole string was parsed as a valid number. In addition, you need to check that the string was not empty to begin with. I am thinking about adding a support function in base, that can parse strings into numbers and handle parse errors in a convenient way. the strtoxxx functions are a bit tricky since you need to check the end pointer, and also errno for overflow/underflow errors. + int fd = strtoimax(pentry->d_name, NULL, 10); + if (fd > INT_MIN && fd < fd_max) (void)close(fd); + } + (void)closedir(dir); + } else { + /* fall back, close all possible descriptors */ + syslog(LOG_ERR, "%s: opendir failed - %s", + __FUNCTION__, strerror(errno)); + for (fd_max--; fd_max >= 0; --fd_max) + (void)close(fd_max); + } /* Redirect standard files to /dev/null */ if (freopen("/dev/null", "r", stdin) == NULL) -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805]
Hi Minh, ack with some comments, (on top of AndersW comments). /Thanks HansN On 03/15/2018 07:50 AM, Minh Chau wrote: --- src/base/os_defs.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/src/base/os_defs.c b/src/base/os_defs.c index 6f9ec52..e914011 100644 --- a/src/base/os_defs.c +++ b/src/base/os_defs.c @@ -1052,14 +1052,29 @@ uint32_t ncs_os_process_execute_timed(NCS_OS_PROC_EXECUTE_TIMED_INFO *req) * child */ if (getenv("OPENSAF_KEEP_FD_OPEN_AFTER_FORK") == NULL) { /* Close all inherited file descriptors */ - int i = sysconf(_SC_OPEN_MAX); - if (i == -1) { + int fd_max = sysconf(_SC_OPEN_MAX); + + if (fd_max == -1) { syslog(LOG_ERR, "%s: sysconf failed - %s", - __FUNCTION__, strerror(errno)); + __FUNCTION__, strerror(errno)); exit(EXIT_FAILURE); } - for (i--; i >= 0; --i) - (void)close(i); /* close all descriptors */ + struct dirent *pentry = NULL; + DIR *dir = opendir("/proc/self/fd"); + + if (dir != NULL) { + while ((pentry = readdir(dir)) != NULL) { [HansN] readdir will return not only 0, 1, 2 etc. but also the current directory '.' and '..' this should be handled here. [HansN] perhaps we should use readdir_r instead? + int fd = strtoimax(pentry->d_name, NULL, 10); + if (fd > INT_MIN && fd < fd_max) (void)close(fd); + } + (void)closedir(dir); + } else { + /* fall back, close all possible descriptors */ + syslog(LOG_ERR, "%s: opendir failed - %s", + __FUNCTION__, strerror(errno)); + for (fd_max--; fd_max >= 0; --fd_max) + (void)close(fd_max); + } /* Redirect standard files to /dev/null */ if (freopen("/dev/null", "r", stdin) == NULL) -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805]
Hi Minh, I'll review the patch shortly, (today)/Thanks HansN On 03/16/2018 06:17 AM, Minh Hon Chau wrote: Thanks Anders for your comments. Hi Hans, Ravi, Is there any comment you would like to add, otherwise I update the patch with Anders' comments. Thanks, Minh On 16/03/18 01:41, Anders Widell wrote: Ack with minor comments, marked AndersW> below. regards, Anders Widell On 03/15/2018 07:50 AM, Minh Chau wrote: --- src/base/os_defs.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/src/base/os_defs.c b/src/base/os_defs.c index 6f9ec52..e914011 100644 --- a/src/base/os_defs.c +++ b/src/base/os_defs.c @@ -1052,14 +1052,29 @@ uint32_t ncs_os_process_execute_timed(NCS_OS_PROC_EXECUTE_TIMED_INFO *req) * child */ if (getenv("OPENSAF_KEEP_FD_OPEN_AFTER_FORK") == NULL) { /* Close all inherited file descriptors */ - int i = sysconf(_SC_OPEN_MAX); - if (i == -1) { + int fd_max = sysconf(_SC_OPEN_MAX); AndersW> sysconf() returns a long. Maybe fd_max should have the type long to match the return type of sysconf()? + + if (fd_max == -1) { syslog(LOG_ERR, "%s: sysconf failed - %s", - __FUNCTION__, strerror(errno)); + __FUNCTION__, strerror(errno)); exit(EXIT_FAILURE); } - for (i--; i >= 0; --i) - (void)close(i); /* close all descriptors */ + struct dirent *pentry = NULL; AndersW> pentry is not a good name (avoid abbreviations, and separate words with underscores). Maybe rename it to dir_entry or just entry? + DIR *dir = opendir("/proc/self/fd"); + + if (dir != NULL) { + while ((pentry = readdir(dir)) != NULL) { + int fd = strtoimax(pentry->d_name, NULL, 10); AndersW> strtoimax() is declared in the inttypes.h header file. Add an #include at the top of the file. AndersW> strtoimax() returns an intmax_t. Change the type of fd to intmax_t. + if (fd > INT_MIN && fd < fd_max) (void)close(fd); AndersW> File descriptors cannot be negative. Use fd >= 0 instead of fd > INT_MIN. AndersW> Remove (void). + } + (void)closedir(dir); AndersW> Remove (void). + } else { + /* fall back, close all possible descriptors */ + syslog(LOG_ERR, "%s: opendir failed - %s", + __FUNCTION__, strerror(errno)); + for (fd_max--; fd_max >= 0; --fd_max) + (void)close(fd_max); AndersW> Remove (void). + } /* Redirect standard files to /dev/null */ if (freopen("/dev/null", "r", stdin) == NULL) -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel