Re: [devel] [PATCH 1/1] osaf: perform handshake in tcp_server in new thread [#3099]

2019-10-11 Thread Hans Nordebäck via Opensaf-devel
Hi Gary, ack code review only. A question, with this changes it looks as an 
arbitrary  client can just connect to the TCP server and  e.g. monitor the 
"connect state" of the TCP server, but to exchange any data an SSL session has 
to be established after the TCP connect, if so I think this change looks 
good./BR Hans
 

-Original Message-
From: Gary Lee  
Sent: den 11 oktober 2019 05:22
To: Hans Nordebäck ; Minh Hon Chau 
; Thuan Tran 
Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
Subject: [PATCH 1/1] osaf: perform handshake in tcp_server in new thread [#3099]

---
 src/osaf/consensus/plugins/tcp/tcp_server.py | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/osaf/consensus/plugins/tcp/tcp_server.py 
b/src/osaf/consensus/plugins/tcp/tcp_server.py
index a7f22f2..c10859c 100755
--- a/src/osaf/consensus/plugins/tcp/tcp_server.py
+++ b/src/osaf/consensus/plugins/tcp/tcp_server.py
@@ -73,10 +73,15 @@ class ThreadedRPCServer(ThreadingMixIn,
 certfile=CERTFILE,
 keyfile=KEYFILE,
 cert_reqs=ssl.CERT_NONE,
-ssl_version=ssl.PROTOCOL_TLSv1_2)
+ssl_version=ssl.PROTOCOL_TLSv1_2,
+do_handshake_on_connect=False)
 self.server_bind()
 self.server_activate()
 
+def finish_request(self, request, client_address):
+ request.do_handshake()
+ return SimpleXMLRPCServer.finish_request(self, request, 
client_address)
+
 
 class Arbitrator(object):
 """ Implementation of a simple arbitrator """
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] osaf: add tcp arbitrator [#3064]

2019-10-04 Thread Hans Nordebäck via Opensaf-devel
Hi Gary, ack, review only. One comment/suggestion can we provide a
small script that generates the x509 certificate (use e.g. openssl X509
... ) instead of including a self signed cert? /BR Hans
On Tue, 2019-10-01 at 12:53 +1000, Gary Lee wrote:
> ---
>  src/osaf/consensus/plugins/tcp/README  |  41 ++
>  src/osaf/consensus/plugins/tcp/certificate.pem |  20 +
>  src/osaf/consensus/plugins/tcp/key.pem |  28 ++
>  src/osaf/consensus/plugins/tcp/tcp.plugin  | 520
> +
>  src/osaf/consensus/plugins/tcp/tcp_server.py   | 157 
>  5 files changed, 766 insertions(+)
>  create mode 100644 src/osaf/consensus/plugins/tcp/README
>  create mode 100644 src/osaf/consensus/plugins/tcp/certificate.pem
>  create mode 100644 src/osaf/consensus/plugins/tcp/key.pem
>  create mode 100755 src/osaf/consensus/plugins/tcp/tcp.plugin
>  create mode 100755 src/osaf/consensus/plugins/tcp/tcp_server.py
> 
> diff --git a/src/osaf/consensus/plugins/tcp/README
> b/src/osaf/consensus/plugins/tcp/README
> new file mode 100644
> index 000..6f739e8
> --- /dev/null
> +++ b/src/osaf/consensus/plugins/tcp/README
> @@ -0,0 +1,41 @@
> +TCP arbitrator
> +
> +The TCP arbitrator may be useful for deployments where deploying
> etcd is not
> +feasible. An example arbitrator is provided to help prevent split
> brain in
> +clusters that contain up to 2 system controllers.
> +
> +The example arbitrator is a simple python based program that can be
> deployed on
> +a single payload or a node external to the cluster.
> +
> +Two main pieces of information are stored on the arbitrator: the
> hostname of the
> +current active controller and a heartbeat timestamp.
> +
> +An active controller sends a heartbeat to the controller every 100ms
> using TLs
> +over a persistent TCP connection. It should self-fence if it is
> unable to
> +heartbeat, as it is likely to be separated from the arbitrator.
> +
> +A candidate active controller must check the existing controller is
> not
> +heartbeating before promoting itself active. On a cluster using
> TIPC,
> +the timeout value is the TIPC link tolerance timeout. On a TCP based
> cluster,
> +the timeout is calculated from FMS_TAKEOVER_REQUEST_VALID_TIME.
> +
> +Suggested fmd.conf configuration:
> +
> +export FMS_SPLIT_BRAIN_PREVENTION=1
> +export FMS_KEYVALUE_STORE_PLUGIN_CMD=/full/path/to/tcp.plugin
> +export FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE=0 (any other setting
> is ignored)
> +export FMS_RELAXED_NODE_PROMOTION=1
> +
> +The above settings will allow a controller to be elected active
> during
> +cluster startup, even if the arbitrator is not yet running.
> +If the arbitrator becomes temporarily unavailable, the controllers
> will
> +remain running if they can see each other. If an active controller
> becomes
> +isolated from the standby *and* the arbitrator, it will self-fence
> and the
> +standby will become active (if located in the same network partition
> as
> +the arbitrator).
> +
> +The provided self-signed certificate is an example only, and was
> generated using:
> +
> +openssl req -newkey rsa:2048 -nodes -keyout key.pem -x509 -days
> 10 -out certificate.pem
> +
> +It must be replaced in an actual deployment!!
> diff --git a/src/osaf/consensus/plugins/tcp/certificate.pem
> b/src/osaf/consensus/plugins/tcp/certificate.pem
> new file mode 100644
> index 000..e0b4993
> --- /dev/null
> +++ b/src/osaf/consensus/plugins/tcp/certificate.pem
> @@ -0,0 +1,20 @@
> +-BEGIN CERTIFICATE-
> +MIIDUTCCAjmgAwIBAgIJANrPYThNMllvMA0GCSqGSIb3DQEBCwUAMD4xCzAJBgNV
> +BAYTAkFVMQ4wDAYDVQQIDAVTdGF0ZTENMAsGA1UEBwwEQ2l0eTEQMA4GA1UECgwH
> +T3BlblNBRjAgFw0xOTA5MzAwMDMxNTRaGA8yMjkzMDcxNTAwMzE1NFowPjELMAkG
> +A1UEBhMCQVUxDjAMBgNVBAgMBVN0YXRlMQ0wCwYDVQQHDARDaXR5MRAwDgYDVQQK
> +DAdPcGVuU0FGMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA5pCFKYnS
> ++pi0gzrRWPRYg1sak9VpNK+MkKbj+m0bptRt/8JvosV62js4q5Da3ldq2AAcEJyf
> +gd02YZ4HUDdCMgMtlWT1CAx89rNpozRwyj5g+4cfmOqiz7ApeZ9yqltInjG720DT
> +lam2/R4/00zmFGAqD2ZGPiOY93bjYx+GhtiHcDvpJuZS2Z2vQ/Dd09v6Omhus0rZ
> +WMrENyfavc7HwFv2z/qi4Hsb/Aa9ZuAXUKp1Q2cvC0XWdRJMdZaZfGUlTfY6X8ar
> +hSnswHJJKIjBq/0jYpztntOubceOuGVyezxPVXPw5qiBLO7ZyYNgN9IMoF6Rbu9y
> +K1O1MvPw3ShlDQIDAQABo1AwTjAdBgNVHQ4EFgQU7UCcR6MgV5c5JXjCHpwcUC+9
> +HIAwHwYDVR0jBBgwFoAU7UCcR6MgV5c5JXjCHpwcUC+9HIAwDAYDVR0TBAUwAwEB
> +/zANBgkqhkiG9w0BAQsFAAOCAQEAAOP3iMgjCx8JNKevOSq24mGcWAqlX0iHP0/1
> +hl7Dd/xRQywM90NfrMmiNTgO9Yyw1rOEKoeM4BFM/qs854iEHpAa7vlcW1ZidvHz
> +eMQZA2Y6+AZ9zyt41bRJGqkqW7YdKVl9yuqWHcFBqBKf1pUsvt0bkab5EZFOBPuB
> +tmKsODrU7cN1qeA1wjINZiOa88Kkh2YxkRoi7tL8NIMp2E40NLS3M5+xLEE8LKTH
> +ouhReM4eEfGfzE171NPe/kzRRp+ujNZwmyQ8xmWp6jPjfD7Mfqdf1WYjspiGzziQ
> +R/cdEHHAWq+wZrfG1aB5/yU4iA0h8xR8PNfVHjAjuUn4N6tSFg==
> +-END CERTIFICATE-
> diff --git a/src/osaf/consensus/plugins/tcp/key.pem
> b/src/osaf/consensus/plugins/tcp/key.pem
> new file mode 100644
> index 000..66b8bcb
> --- /dev/null
> +++ b/src/osaf/consensus/plugins/tcp/key.pem
> @@ -0,0 +1,28 @@
> +-BEGIN PRIVATE 

Re: [devel] [PATCH 3/9] mds: Add implementation for TIPC buffer overflow solution [#1960]

2019-08-23 Thread Hans Nordebäck
Hi Minh,

see one comment below. /Thanks Hans

On 2019-08-23 03:48, Minh Hon Chau wrote:
> Hi Hans,
>
> Thanks for your time to review the patch, please see my replies below 
> your comments.
>
> Regards,
>
> Minh
>
> On 22/8/19 11:07 pm, Hans Nordebäck wrote:
>> Hi Minh,
>>
>> it is a large patch so i have to review parts of it, below are my
>> comments, marked with [HansN], for files:
>>
>> src/mds/Makefile.am
>> src/mds/mds_dt.h
>> src/mds/mds_dt_tipc.c
>>
>> I'll continue with the rest of the files a bit later. /Thanks Hans
>>
>> On 2019-08-14 08:38, Minh Chau wrote:
>>> This is a collaborative patch of two participants:Thuan, Minh.
>>>
>>> Main changes:
>>> - Add mds_tipc_fctrl_intf.h, mds_tipc_fctrl_intf.cc: These two files
>>> introduce new functions which are called in mds_dt_tipc.c if the flow
>>> control is enabled
>>> - Add mds_tipc_fctrl_portid.h, mds_tipc_fctrl_portid.cc: These files
>>> implements the tipc portid instance, which supports the sliding window,
>>> mds msg queue
>>> - Add mds_tipc_fctrl_msg.h, mds_tipc_fctrl_msg.cc: These files define
>>> the event and messages which are used for this solution.
>>> ---
>>>    src/mds/Makefile.am  |  10 +-
>>>    src/mds/mds_dt.h |   8 +-
>>>    src/mds/mds_dt_tipc.c    | 188 +---
>>>    src/mds/mds_tipc_fctrl_intf.cc   | 376 
>>> +++
>>>    src/mds/mds_tipc_fctrl_intf.h    |  47 +
>>>    src/mds/mds_tipc_fctrl_msg.cc    | 142 +++
>>>    src/mds/mds_tipc_fctrl_msg.h | 129 ++
>>>    src/mds/mds_tipc_fctrl_portid.cc | 261 +++
>>>    src/mds/mds_tipc_fctrl_portid.h  |  87 +
>>>    9 files changed, 1184 insertions(+), 64 deletions(-)
>>>    create mode 100644 src/mds/mds_tipc_fctrl_intf.cc
>>>    create mode 100644 src/mds/mds_tipc_fctrl_intf.h
>>>    create mode 100644 src/mds/mds_tipc_fctrl_msg.cc
>>>    create mode 100644 src/mds/mds_tipc_fctrl_msg.h
>>>    create mode 100644 src/mds/mds_tipc_fctrl_portid.cc
>>>    create mode 100644 src/mds/mds_tipc_fctrl_portid.h
>>>
>>> diff --git a/src/mds/Makefile.am b/src/mds/Makefile.am
>>> index 2d7b652..d849e8f 100644
>>> --- a/src/mds/Makefile.am
>>> +++ b/src/mds/Makefile.am
>>> @@ -48,10 +48,16 @@ lib_libopensaf_core_la_SOURCES += \
>>>    if ENABLE_TIPC_TRANSPORT
>>>    noinst_HEADERS += src/mds/mds_dt_tipc.h \
>>>    src/mds/mds_tipc_recvq_stats.h \
>>> -    src/mds/mds_tipc_recvq_stats_impl.h
>>> +    src/mds/mds_tipc_recvq_stats_impl.h \
>>> +    src/mds/mds_tipc_fctrl_intf.h \
>>> +    src/mds/mds_tipc_fctrl_portid.h \
>>> +    src/mds/mds_tipc_fctrl_msg.h
>>>    lib_libopensaf_core_la_SOURCES += src/mds/mds_dt_tipc.c \
>>>    src/mds/mds_tipc_recvq_stats.cc \
>>> -    src/mds/mds_tipc_recvq_stats_impl.cc
>>> +    src/mds/mds_tipc_recvq_stats_impl.cc \
>>> +    src/mds/mds_tipc_fctrl_intf.cc \
>>> +    src/mds/mds_tipc_fctrl_portid.cc \
>>> +    src/mds/mds_tipc_fctrl_msg.cc
>>>    endif
>>>       if ENABLE_TESTS
>>> diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h
>>> index b645bb4..d9e8633 100644
>>> --- a/src/mds/mds_dt.h
>>> +++ b/src/mds/mds_dt.h
>>> @@ -162,7 +162,7 @@ uint32_t mdtm_del_from_ref_tbl(MDS_SUBTN_REF_VAL 
>>> ref);
>>>    uint32_t mds_tmr_mailbox_processing(void);
>>>    uint32_t mdtm_get_from_ref_tbl(MDS_SUBTN_REF_VAL ref, MDS_SVC_HDL 
>>> *svc_hdl);
>>>    uint32_t mdtm_add_frag_hdr(uint8_t *buf_ptr, uint16_t len, 
>>> uint32_t seq_num,
>>> -   uint16_t frag_byte);
>>> +   uint16_t frag_byte, uint16_t 
>>> fctrl_seq_num);
>>>    uint32_t mdtm_free_reassem_msg_mem(MDS_ENCODED_MSG *msg);
>>>    uint32_t mdtm_process_recv_data(uint8_t *buf, uint16_t len, 
>>> uint64_t tipc_id,
>>>    uint32_t *buff_dump);
>>> @@ -240,9 +240,13 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, 
>>> NCSCONTEXT msg);
>>>       #define MDS_PROT 0xA0
>>>    #define MDS_VERSION 0x08
>>> -#define MDS_PROT_VER_MASK (MDS_PROT | MDS_VERSION)
>>> +#define MDS_PROT_VER_MASK 0xFC
>>>    #define MDTM_PRI_MASK 0x3
>>>    +/* MDS protocol/version for flow control */
>>> +#de

Re: [devel] [PATCH 3/9] mds: Add implementation for TIPC buffer overflow solution [#1960]

2019-08-22 Thread Hans Nordebäck
Hi Minh,

it is a large patch so i have to review parts of it, below are my 
comments, marked with [HansN], for files:

src/mds/Makefile.am
src/mds/mds_dt.h
src/mds/mds_dt_tipc.c

I'll continue with the rest of the files a bit later. /Thanks Hans

On 2019-08-14 08:38, Minh Chau wrote:
> This is a collaborative patch of two participants:Thuan, Minh.
>
> Main changes:
> - Add mds_tipc_fctrl_intf.h, mds_tipc_fctrl_intf.cc: These two files
> introduce new functions which are called in mds_dt_tipc.c if the flow
> control is enabled
> - Add mds_tipc_fctrl_portid.h, mds_tipc_fctrl_portid.cc: These files
> implements the tipc portid instance, which supports the sliding window,
> mds msg queue
> - Add mds_tipc_fctrl_msg.h, mds_tipc_fctrl_msg.cc: These files define
> the event and messages which are used for this solution.
> ---
>   src/mds/Makefile.am  |  10 +-
>   src/mds/mds_dt.h |   8 +-
>   src/mds/mds_dt_tipc.c| 188 +---
>   src/mds/mds_tipc_fctrl_intf.cc   | 376 
> +++
>   src/mds/mds_tipc_fctrl_intf.h|  47 +
>   src/mds/mds_tipc_fctrl_msg.cc| 142 +++
>   src/mds/mds_tipc_fctrl_msg.h | 129 ++
>   src/mds/mds_tipc_fctrl_portid.cc | 261 +++
>   src/mds/mds_tipc_fctrl_portid.h  |  87 +
>   9 files changed, 1184 insertions(+), 64 deletions(-)
>   create mode 100644 src/mds/mds_tipc_fctrl_intf.cc
>   create mode 100644 src/mds/mds_tipc_fctrl_intf.h
>   create mode 100644 src/mds/mds_tipc_fctrl_msg.cc
>   create mode 100644 src/mds/mds_tipc_fctrl_msg.h
>   create mode 100644 src/mds/mds_tipc_fctrl_portid.cc
>   create mode 100644 src/mds/mds_tipc_fctrl_portid.h
>
> diff --git a/src/mds/Makefile.am b/src/mds/Makefile.am
> index 2d7b652..d849e8f 100644
> --- a/src/mds/Makefile.am
> +++ b/src/mds/Makefile.am
> @@ -48,10 +48,16 @@ lib_libopensaf_core_la_SOURCES += \
>   if ENABLE_TIPC_TRANSPORT
>   noinst_HEADERS += src/mds/mds_dt_tipc.h \
>   src/mds/mds_tipc_recvq_stats.h \
> - src/mds/mds_tipc_recvq_stats_impl.h
> + src/mds/mds_tipc_recvq_stats_impl.h \
> + src/mds/mds_tipc_fctrl_intf.h \
> + src/mds/mds_tipc_fctrl_portid.h \
> + src/mds/mds_tipc_fctrl_msg.h
>   lib_libopensaf_core_la_SOURCES += src/mds/mds_dt_tipc.c \
>   src/mds/mds_tipc_recvq_stats.cc \
> - src/mds/mds_tipc_recvq_stats_impl.cc
> + src/mds/mds_tipc_recvq_stats_impl.cc \
> + src/mds/mds_tipc_fctrl_intf.cc \
> + src/mds/mds_tipc_fctrl_portid.cc \
> + src/mds/mds_tipc_fctrl_msg.cc
>   endif
>   
>   if ENABLE_TESTS
> diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h
> index b645bb4..d9e8633 100644
> --- a/src/mds/mds_dt.h
> +++ b/src/mds/mds_dt.h
> @@ -162,7 +162,7 @@ uint32_t mdtm_del_from_ref_tbl(MDS_SUBTN_REF_VAL ref);
>   uint32_t mds_tmr_mailbox_processing(void);
>   uint32_t mdtm_get_from_ref_tbl(MDS_SUBTN_REF_VAL ref, MDS_SVC_HDL *svc_hdl);
>   uint32_t mdtm_add_frag_hdr(uint8_t *buf_ptr, uint16_t len, uint32_t seq_num,
> -   uint16_t frag_byte);
> +   uint16_t frag_byte, uint16_t fctrl_seq_num);
>   uint32_t mdtm_free_reassem_msg_mem(MDS_ENCODED_MSG *msg);
>   uint32_t mdtm_process_recv_data(uint8_t *buf, uint16_t len, uint64_t 
> tipc_id,
>   uint32_t *buff_dump);
> @@ -240,9 +240,13 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, NCSCONTEXT 
> msg);
>   
>   #define MDS_PROT 0xA0
>   #define MDS_VERSION 0x08
> -#define MDS_PROT_VER_MASK (MDS_PROT | MDS_VERSION)
> +#define MDS_PROT_VER_MASK 0xFC
>   #define MDTM_PRI_MASK 0x3
>   
> +/* MDS protocol/version for flow control */
> +#define MDS_PROT_FCTRL (0xB0 | MDS_VERSION)
> +#define MDS_PROT_FCTRL_ID 0x00AC13F5
> +
>   /* Added for the subscription changes */
>   #define MDS_NCS_CHASSIS_ID (m_NCS_GET_NODE_ID & 0x00ff)
>   #define MDS_TIPC_COMMON_ID 0x01001000
> diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
> index 86b52bb..fef1c50 100644
> --- a/src/mds/mds_dt_tipc.c
> +++ b/src/mds/mds_dt_tipc.c
> @@ -47,6 +47,7 @@
>   #include "mds_dt_tipc.h"
>   #include "mds_dt_tcp_disc.h"
>   #include "mds_core.h"
> +#include "mds_tipc_fctrl_intf.h"
>   #include "mds_tipc_recvq_stats.h"
>   #include "base/osaf_utility.h"
>   #include "base/osaf_poll.h"
> @@ -165,20 +166,22 @@ NCS_PATRICIA_TREE mdtm_reassembly_list;
>   uint32_t mdtm_global_frag_num;
>   
>   const unsigned int MAX_RECV_THRESHOLD = 30;
> +uint8_t gl_mds_pro_ver = MDS_PROT | MDS_VERSION;
>   
> -static bool get_tipc_port_id(int sock, uint32_t* port_id) {
> +static bool get_tipc_port_id(int sock, struct tipc_portid* port_id) {
>   struct sockaddr_tipc addr;
>   socklen_t sz = sizeof(addr);
>   
>   memset(, 0, sizeof(addr));
> - *port_id = 0;
> + port_id->node = 0;
> + port_id->ref = 0;
>   if (0 > getsockname(sock, (struct sockaddr *), )) {
>   syslog(LOG_ERR, "MDTM:TIPC Failed to get socket 

[devel] [PATCH 0/1] Review Request for util: Fenced should only write a log record when two acitve controllers is seen [#3073]

2019-08-22 Thread Hans Nordebäck
Summary: util: Fenced should only write a log record when two acitve 
controllers is seen [#3073]
Review request for Ticket(s): 3073
Peer Reviewer(s): Gary, Duc
Pull request to: 
Affected branch(es): develop
Development branch: ticket-3073
Base revision: 729f71fbfff0eea6d4a6a394780142b87a9fb472
Personal repository: git://git.code.sf.net/u/hansnordeback/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   y


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 81ec4d662ecdcf6b147e2376697ff423625463d4
Author: Hans Nordeback 
Date:   Thu, 22 Aug 2019 09:14:12 +0200

util: Fenced should only write a log record when two acitve controllers is seen 
[#3073]



Complete diffstat:
--
 tools/devel/fenced/node_state_hdlr_pl.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] util: Fenced should only write a log record when two acitve controllers is seen [#3073]

2019-08-22 Thread Hans Nordebäck
---
 tools/devel/fenced/node_state_hdlr_pl.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/devel/fenced/node_state_hdlr_pl.cc 
b/tools/devel/fenced/node_state_hdlr_pl.cc
index c74fe72b9..6bf032e5a 100644
--- a/tools/devel/fenced/node_state_hdlr_pl.cc
+++ b/tools/devel/fenced/node_state_hdlr_pl.cc
@@ -169,8 +169,8 @@ void NodeStateHdlrPl::check_isolation() {
   isolated_ = NodeIsolationState::kNotIsolated;
   syslog(LOG_NOTICE, "one active controller detected");
 } else {
-  isolated_ = NodeIsolationState::kIsolated;
-  syslog(LOG_NOTICE, "%d active controllers detected, split brain", 
no_of_active);
+  isolated_ = NodeIsolationState::kNotIsolated;
+  syslog(LOG_NOTICE, "%d active controllers detected", no_of_active);
 }
   }
 notify:
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 2/9] mds: Resolve c/c++ linking issue [#1960]

2019-08-20 Thread Hans Nordebäck
Hi Minh,

ack, code review only/Thanks HansN

On 2019-08-14 08:38, Minh Chau wrote:
> (Sending on behalf of Thuan)
> This patch solves the linking issue if mds_dt.h or mds_core.h
> is included in c++ sources.
> ---
>   src/mds/mds_core.h| 74 
> +++
>   src/mds/mds_dt.h  |  4 +--
>   src/mds/mds_dt2c.h| 67 --
>   src/mds/mds_dt_tcp.c  |  2 ++
>   src/mds/mds_dt_tcp.h  |  1 -
>   src/mds/mds_dt_tipc.c |  2 ++
>   6 files changed, 80 insertions(+), 70 deletions(-)
>
> diff --git a/src/mds/mds_core.h b/src/mds/mds_core.h
> index 37696d4..c09b428 100644
> --- a/src/mds/mds_core.h
> +++ b/src/mds/mds_core.h
> @@ -573,6 +573,80 @@ extern uint32_t 
> mds_mcm_free_msg_uba_start(MDS_ENCODED_MSG msg);
>   extern void get_adest_details(MDS_DEST adest, char *adest_details);
>   extern void get_subtn_adest_details(MDS_PWE_HDL pwe_hdl, MDS_SVC_ID svc_id,
>   MDS_DEST adest, char *adest_details);
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +/*  */
> +/*  */
> +/*MCM to MDTM   */
> +/*  */
> +/*  */
> +
> +/* Initialization of MDTM Module */
> +uint32_t (*mds_mdtm_init)(NODE_ID node_id, uint32_t *mds_tipc_ref);
> +
> +/* Destroying the MDTM Module*/
> +uint32_t (*mds_mdtm_destroy)(void);
> +
> +uint32_t (*mds_mdtm_send)(MDTM_SEND_REQ *req);
> +
> +/* SVC Install */
> +uint32_t (*mds_mdtm_svc_install)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id,
> + NCSMDS_SCOPE_TYPE install_scope,
> + V_DEST_RL role, MDS_VDEST_ID vdest_id,
> + NCS_VDEST_TYPE vdest_policy,
> + MDS_SVC_PVT_SUB_PART_VER mds_svc_pvt_ver);
> +
> +/* SVC Uninstall */
> +uint32_t (*mds_mdtm_svc_uninstall)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id,
> +   NCSMDS_SCOPE_TYPE install_scope,
> +   V_DEST_RL role, MDS_VDEST_ID vdest_id,
> +   NCS_VDEST_TYPE vdest_policy,
> +   MDS_SVC_PVT_SUB_PART_VER mds_svc_pvt_ver);
> +
> +/* SVC Subscribe */
> +uint32_t (*mds_mdtm_svc_subscribe)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id,
> +   NCSMDS_SCOPE_TYPE subscribe_scope,
> +   MDS_SVC_HDL local_svc_hdl,
> +   MDS_SUBTN_REF_VAL *subtn_ref_val);
> +
> +/*  added svc_hdl */
> +/* SVC Unsubscribe */
> +uint32_t (*mds_mdtm_svc_unsubscribe)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id,
> + NCSMDS_SCOPE_TYPE subscribe_scope,
> + MDS_SUBTN_REF_VAL subtn_ref_val);
> +
> +/* VDEST Install */
> +uint32_t (*mds_mdtm_vdest_install)(MDS_VDEST_ID vdest_id);
> +
> +/* VDEST Uninstall */
> +uint32_t (*mds_mdtm_vdest_uninstall)(MDS_VDEST_ID vdest_id);
> +
> +/* VDEST Subscribe */
> +uint32_t (*mds_mdtm_vdest_subscribe)(MDS_VDEST_ID vdest_id,
> + MDS_SUBTN_REF_VAL *subtn_ref_val);
> +
> +/* VDEST Unsubscribe */
> +uint32_t (*mds_mdtm_vdest_unsubscribe)(MDS_VDEST_ID vdest_id,
> +   MDS_SUBTN_REF_VAL subtn_ref_val);
> +
> +/* Tx Register (For incrementing the use count) */
> +uint32_t (*mds_mdtm_tx_hdl_register)(MDS_DEST adest);
> +
> +/* Tx Unregister (For decrementing the use count) */
> +uint32_t (*mds_mdtm_tx_hdl_unregister)(MDS_DEST adest);
> +
> +/* Node subscription */
> +uint32_t (*mds_mdtm_node_subscribe)(MDS_SVC_HDL svc_hdl,
> +MDS_SUBTN_REF_VAL *subtn_ref_val);
> +
> +/* Node unsubscription */
> +uint32_t (*mds_mdtm_node_unsubscribe)(MDS_SUBTN_REF_VAL subtn_ref_val);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
>   /*  */
>   /*  */
>   /* MMGR Macros  */
> diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h
> index a6e2801..b645bb4 100644
> --- a/src/mds/mds_dt.h
> +++ b/src/mds/mds_dt.h
> @@ -214,10 +214,10 @@ typedef struct mdtm_ref_hdl_list {
> MDS_SVC_HDL svc_hdl;
>   } MDTM_REF_HDL_LIST;
>   
> -MDTM_REF_HDL_LIST *mdtm_ref_hdl_list_hdr;
> +extern MDTM_REF_HDL_LIST *mdtm_ref_hdl_list_hdr;
> +extern NCS_PATRICIA_TREE mdtm_reassembly_list;
>   uint32_t mdtm_attach_mbx(SYSF_MBX mbx);
>   void mds_buff_dump(uint8_t *buff, uint32_t len, uint32_t max);
> -NCS_PATRICIA_TREE mdtm_reassembly_list;
>   
>   uint32_t mdtm_set_transport(MDTM_TX_TYPE transport);
>   bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, NCSCONTEXT msg);
> diff --git a/src/mds/mds_dt2c.h b/src/mds/mds_dt2c.h
> index 012999c..c92fecb 100644
> --- 

Re: [devel] [PATCH 1/9] mds: Add README for solution of TIPC buffer overflow at MDS [#1960]

2019-08-14 Thread Hans Nordebäck
Hi Minh,

ack, some minor comments below/Thanks Hans

On 2019-08-14 08:38, Minh Chau wrote:
> ---
>   src/mds/README | 221 
> +
>   1 file changed, 221 insertions(+)
>   create mode 100644 src/mds/README
>
> diff --git a/src/mds/README b/src/mds/README
> new file mode 100644
> index 000..1b94632
> --- /dev/null
> +++ b/src/mds/README
> @@ -0,0 +1,221 @@
> +/*  -*- OpenSAF  -*-
> + *
> + * (C) Copyright 2019 The OpenSAF Foundation
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> + * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
> + * under the GNU Lesser General Public License Version 2.1, February 1999.
> + * The complete license can be accessed from the following location:
> + * http://opensource.org/licenses/lgpl-license.php
> + * See the Copying file included with the OpenSAF distribution for full
> + * licensing terms.
> + *
> + * Author(s): Ericsson AB
> + *
> + */
> +Background
> +==
> +If OpenSAF configures TIPC as transport, the MDS library today will use
> +TIPC SOCK_RDM socket for message distribution in the cluster. The SOCK_RDM
> +datagram socket possibly encounters buffer overflow at receiver ends which
> +has been documented in tipc.io[1]. A temporary solution for this buffer
> +overflow issue is that the socket buffer size can be increased to a larger
> +number. However, if the cluster continues either scaling out or adding more
> +components, the system will be under dimensioned, thus the TIPC buffer
> +overflow can occur again.
> +
> +MDS's solution for TIPC buffer overflow
> +===
> +If MDS disables TIPC_DEST_DROPPABLE, TIPC will return the ancillary message
> +when the original message is failed to deliver. By this event, if the message
> +has been saved in queue, MDS at sender sides can search and retransmit this
> +message to the receivers.
> +Once the messages in the sender's queue has been delivered successfully, MDS
> +needs to remove them. MDS introduces its internal ACK message as an
> +acknowledgment from receivers so that the senders can remove the messages
> +out of the queue.
> +Also, as such situation of buffer overflow at receivers, the retransmission 
> may
> +not succeed or even become worse at receiver ends (the more retransmission,
> +the more overflow to occur). MDS imitates the sliding window in TCP[2] to
> +control the flow of data message towards the receivers.
> +
> +Legacy MDS data message, new (data + ACK) MDS message, and upgradability
> +
> +Below is the MDS legacy message format that has been used till OpenSAF 
> 5.19.07
> +
> +oct 0  message length
> +oct 1
> +--
> +oct 2  sequence number: incremented for every message sent out to all 
> destined
> +...   tipc portid.
> +oct 5
> +--
> +oct 6  fragment number: a message with same sequence number can be 
> fragmented,
> +oct 7  identified by this fragment number.
> +--
> +oct 8  length check: cross check with message length(oct0,1), NOT USED.
> +oct 9
> +--
> +oct 10 protocol version: (MDS_PROT:0xA0 | MDS_VERSION:0x08) = 0xA8, NOT USED
> +--
> +oct 11 mds length: length of mds header and mds data, starting from oct13
> +oct 12
> +--
> +oct 13 mds header and data
> +...
> +--
> +
> +The current sequence number/fragment number are being used in MDS for all
> +messages sent to all discovered tipc portid(s), meaning that every message 
> is sent
> +to any tipc portid, the sequence/fragment number is increased. The flow 
> control
> +needs its own sequence number sliding between two tipc porid(s) so that 
> receivers
> +can detect message drop due to buffer overload. Therefore, the oct8 and oct9 
> are
> +now reused as flow control sequence number. The oct10, protocol version, has 
> new
> +value of 0xB8. The format of new data message as below:
> +
> +oct 0  same
> +...
> +oct 7
> +--
> +oct 8  flow control sequence number
> +oct 9
> +--
> +oct 10 protocol version: (MDS_PROT_TIPC_FCTRL:0xB0 | MDS_VERSION:0x08) = 0xB8
> +--
> +oct 11 same
> +...
> +--
> +
> +The ACK message is introduced to acknowledge one data message or a chunk of
> +accumulative data message. The ACK message format:
> +
> +oct 0  message length
> +oct 1
> +--
> +oct 2  8 bytes, NOT USED
> +
> +oct 9
> +--
> +oct 10 

Re: [devel] [PATCH 1/1] amfd: add support for dynamically changing saAmfRank of SaAmfSIRankedSU [#3058]

2019-08-05 Thread Hans Nordebäck
Hi Alex, ack code review only, a few minor comments below/Thanks HansN

On 2019-07-18 21:04, Jones, Alex wrote:
Allow saAmfRank of SaAmfSIRankedSU to be changed at runtime
---
src/amf/amfd/si.cc | 103 +
src/amf/amfd/si.h | 3 ++
src/amf/amfd/siass.cc | 38 ++
src/amf/amfd/sirankedsu.cc | 73 +-
src/amf/amfd/util.cc | 30 ++-
5 files changed, 243 insertions(+), 4 deletions(-)

diff --git a/src/amf/amfd/si.cc b/src/amf/amfd/si.cc
index b308e14a9..3f0a8bf51 100644
--- a/src/amf/amfd/si.cc
+++ b/src/amf/amfd/si.cc
@@ -255,6 +255,109 @@ void AVD_SI::remove_rankedsu(const std::string ) {
TRACE_LEAVE();
}

+/**
+ * @brief Update order of sisu list with new rank.
+ *
+ * @param suname
+ * @param saAmfRank
+ */
+void AVD_SI::update_sisu_rank(const std::string& suname, uint32_t newRank) {
+ TRACE_ENTER();
+
+ do {
+ // if there is only one entry nothing really to do
+ if (!list_of_sisu || !list_of_sisu->si_next)
+ break;
+
+ // first find the su, and remove it from the linked list
[HansN] AVD_SU_SI_REL *matched_susi{nullptr};  (also some more places below)
+ AVD_SU_SI_REL *matched_susi(0);
+
+ for (AVD_SU_SI_REL *susi(list_of_sisu), *prev(0);
[HansN] perhaps, for (AVD_SU_SI_REL *susi = list_of_sisu, *prev = nullptr; or 
for (AVD_SU_SI_REL *susi{list_of_sisu}, *prev{nullptr}  ?
+ susi;
+ prev = susi, susi = susi->si_next) {
+ if (suname == susi->su->name) {
+ matched_susi = susi;
+ if (prev)
+ prev->si_next = susi->si_next;
+ else
+ list_of_sisu = susi->si_next;
+ break;
+ }
+ }
+
+ osafassert(matched_susi);
+
+ // now reinsert it at the correct place
+ AVD_SU_SI_REL *prev(nullptr);
+
+ for (AVD_SU_SI_REL *curr_susi(list_of_sisu);
+ curr_susi;
+ prev = curr_susi, curr_susi = curr_susi->si_next) {
+ if (curr_susi->is_per_si == true) {
+ if (false == matched_susi->is_per_si) continue;
+
+ AVD_SUS_PER_SI_RANK *i_su_rank_rec(0);
+
+ /* determine the su_rank rec for this rec */
+ for (const auto  : *sirankedsu_db) {
+ i_su_rank_rec = value.second;
+ if (i_su_rank_rec->indx.si_name.compare(name) != 0) continue;
+ AVD_SU *curr_su(su_db->find(i_su_rank_rec->su_name));
+ if (curr_su == curr_susi->su) break;
+ }
+
+ osafassert(i_su_rank_rec);
+
+ if (newRank <= i_su_rank_rec->indx.su_rank) break;
+ } else {
+ if (true == matched_susi->is_per_si) break;
+
+ if (newRank <= curr_susi->su->saAmfSURank) break;
+ }
+ }
+
+ if (prev) {
+ matched_susi->si_next = prev->si_next;
+ prev->si_next = matched_susi;
+ } else {
+ matched_susi->si_next = list_of_sisu;
+ list_of_sisu = matched_susi;
+ }
+
+ // update PG rank
+ for (AVD_SU_SI_REL *curr_susi(matched_susi->si->list_of_sisu);
+ curr_susi;
+ curr_susi = curr_susi->si_next) {
+ if (curr_susi->state == SA_AMF_HA_STANDBY)
+ avd_pg_susi_chg_prc(avd_cb, curr_susi);
+ }
+ } while (false);
+
+ TRACE_LEAVE();
+}
+
+uint32_t AVD_SI::get_sisu_rank(const std::string& suname) const {
+ uint32_t rank(0);
+
+ TRACE_ENTER2("%s", suname.c_str());
+
+ for (const AVD_SU_SI_REL *susi(list_of_sisu); susi; susi = susi->si_next) {
+ TRACE("su: %s si: %s state: %i",
+ susi->su->name.c_str(),
+ susi->si->name.c_str(),
+ susi->state);
+ if (susi->state == SA_AMF_HA_STANDBY)
+ rank++;
+
+ if (suname == susi->su->name)
+ break;
+ }
+
+ TRACE_LEAVE();
+
+ return rank;
+}
+
void AVD_SI::remove_csi(AVD_CSI *csi) {
osafassert(csi->si == this);
/* remove CSI from the SI */
diff --git a/src/amf/amfd/si.h b/src/amf/amfd/si.h
index 3b93e56b1..0db8dde13 100644
--- a/src/amf/amfd/si.h
+++ b/src/amf/amfd/si.h
@@ -128,6 +128,9 @@ class AVD_SI {

void add_rankedsu(const std::string , uint32_t saAmfRank);
void remove_rankedsu(const std::string );
+ void update_sisu_rank(const std::string& suname, uint32_t saAmfRank);
+
+ uint32_t get_sisu_rank(const std::string& suname) const;

void set_si_switch(AVD_CL_CB *cb, const SaToggleState state);

diff --git a/src/amf/amfd/siass.cc b/src/amf/amfd/siass.cc
index 8a2d2175e..8895af3b4 100644
--- a/src/amf/amfd/siass.cc
+++ b/src/amf/amfd/siass.cc
@@ -616,6 +616,21 @@ done:
avd_gen_su_ha_state_changed_ntf(cb, su_si);
}

+ /*
+ * If we are adding an entry which is not the last we need to send PG updates
+ * for rank.
+ */
+ if (su_si->si_next) {
+ for (AVD_SU_SI_REL *curr_susi(si->list_of_sisu);
+ curr_susi;
+ curr_susi = curr_susi->si_next) {
+ // skip this one since a pg update will sent for it already
+ if (su_si == curr_susi || curr_susi->state != SA_AMF_HA_STANDBY) continue;
+
+ avd_pg_susi_chg_prc(avd_cb, curr_susi);
+ }
+ }
+
TRACE_LEAVE();
return su_si;
}
@@ -742,13 +757,36 @@ uint32_t avd_susi_delete(AVD_CL_CB *cb, AVD_SU_SI_REL 
*susi, bool ckpt) {
susi->su_next = nullptr;
}

+ bool sendPgUpdate(false);
+
/* now delete it from the SI list */
if (p_si_su == nullptr) {
susi->si->list_of_sisu = susi->si_next;
susi->si_next = nullptr;
+
+ if (susi->si->list_of_sisu)
+ sendPgUpdate = true;
} else {
p_si_su->si_next = susi->si_next;
susi->si_next = nullptr;
+
+ if (p_si_su->si_next)
+ sendPgUpdate = true;
+ 

Re: [devel] [PATCH 1/1] amfd: prevent infinite loop [#3050]

2019-06-20 Thread Hans Nordebäck
Hi Gary,

ack, code review only/Thanks HansN

On 2019-06-20 04:13, Gary Lee wrote:
> In handle_event_in_failover_state(), we iterate through
> queue_evt in a while loop, but process_event() can insert
> items into the queue inside the loop, and we may end
> up never exiting the while loop.
> ---
>   src/amf/amfd/main.cc | 10 --
>   1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/src/amf/amfd/main.cc b/src/amf/amfd/main.cc
> index 50daa59..e3d0957 100644
> --- a/src/amf/amfd/main.cc
> +++ b/src/amf/amfd/main.cc
> @@ -406,12 +406,18 @@ static void handle_event_in_failover_state(AVD_EVT 
> *evt) {
>   
>   /* Dequeue, all the messages from the queue
>  and process them now */
> -
> -while (!cb->evt_queue.empty()) {
> +auto size_before_loop = cb->evt_queue.size();
> +std::queue::size_type count = 0;
> +while (count < size_before_loop) {
> +  // note: process_event() may insert items into
> +  // the queue, so terminate loop when we have
> +  // processed all the original elements
> +  // to avoid infinite loop
> AVD_EVT_QUEUE *queue_evt = cb->evt_queue.front();
> cb->evt_queue.pop();
> process_event(cb, queue_evt->evt);
> delete queue_evt;
> +  ++count;
>   }
>   
>   /* Walk through all the nodes to check if any of the nodes state is

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for utils: Use a fence daemon as an alternative to payload reboot fencing [#3048]

2019-06-05 Thread Hans Nordebäck
Summary: utils: Use a fence daemon as an alternative to payload reboot fencing 
[#3048]
Review request for Ticket(s): 3048
Peer Reviewer(s): Duc, Gary, Anders
Pull request to: 
Affected branch(es): develop
Development branch: ticket-3048
Base revision: 3895c7a88bdb3c6f86da1083ea0fd9e2cd642d01
Personal repository: git://git.code.sf.net/u/hansnordeback/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   y

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-


revision 810b5dc5ce2f1e8830c157b09f2649e47a8ea070
Author: Hans Nordeback 
Date:   Wed, 5 Jun 2019 10:22:30 +0200

utils: Use a fence daemon as an alternative to payload reboot fencing [#3048]



Added Files:

 src/fm/fmd/tipc_server.cc
 src/fm/fmd/tipc_server.h
 tools/devel/fenced/command.cc
 tools/devel/fenced/command.h
 tools/devel/fenced/cpp_macros.h
 tools/devel/fenced/fenced.conf
 tools/devel/fenced/fenced_main.cc
 tools/devel/fenced/Makefile
 tools/devel/fenced/node_state_file.cc
 tools/devel/fenced/node_state_file.h
 tools/devel/fenced/node_state_hdlr.cc
 tools/devel/fenced/node_state_hdlr_factory.cc
 tools/devel/fenced/node_state_hdlr_factory.h
 tools/devel/fenced/node_state_hdlr.h
 tools/devel/fenced/node_state_hdlr_pl.cc
 tools/devel/fenced/node_state_hdlr_pl.h
 tools/devel/fenced/node_state_hdlr_sc.cc
 tools/devel/fenced/node_state_hdlr_sc.h
 tools/devel/fenced/osaffenced.service
 tools/devel/fenced/README_TOOLS
 tools/devel/fenced/service.cc
 tools/devel/fenced/service.h
 tools/devel/fenced/timer.cc
 tools/devel/fenced/timer.h
 tools/devel/fenced/watchdog.cc
 tools/devel/fenced/watchdog.h


Complete diffstat:
--
 src/fm/Makefile.am|   6 +-
 src/fm/fmd/fm_amf.cc  |  14 ++
 src/fm/fmd/tipc_server.cc |  93 
 src/fm/fmd/tipc_server.h  |  45 
 tools/devel/fenced/Makefile   |  63 ++
 tools/devel/fenced/README_TOOLS   |  15 ++
 tools/devel/fenced/command.cc | 134 
 tools/devel/fenced/command.h  |  43 
 tools/devel/fenced/cpp_macros.h   |  33 +++
 tools/devel/fenced/fenced.conf|  17 ++
 tools/devel/fenced/fenced_main.cc | 179 
 tools/devel/fenced/node_state_file.cc |  87 
 tools/devel/fenced/node_state_file.h  |  41 
 tools/devel/fenced/node_state_hdlr.cc |  54 +
 tools/devel/fenced/node_state_hdlr.h  |  45 
 tools/devel/fenced/node_state_hdlr_factory.cc |  66 ++
 tools/devel/fenced/node_state_hdlr_factory.h  |  35 +++
 tools/devel/fenced/node_state_hdlr_pl.cc  | 292 ++
 tools/devel/fenced/node_state_hdlr_pl.h   |  60 ++
 tools/devel/fenced/node_state_hdlr_sc.cc  |  42 
 tools/devel/fenced/node_state_hdlr_sc.h   |  41 
 tools/devel/fenced/osaffenced.service |  14 ++
 tools/devel/fenced/service.cc |  53 +
 tools/devel/fenced/service.h  |  42 
 tools/devel/fenced/timer.cc   |  62 ++
 tools/devel/fenced/timer.h|  53 +
 tools/devel/fenced/watchdog.cc|  37 
 tools/devel/fenced/watchdog.h |  39 
 28 files changed, 1703 insertions(+), 2 deletions(-)


Testing Commands:
-
At tools/devel/fenced run make ; make install to build and install fenced on a 
payload.
enable 'headless' via enabling export IMMSV_SC_ABSENCE_ALLOWED=900 in immd.conf.
osaffenced on the payload will stop opensaf if the node becomes 'isolated', 
i.e. no active SC.
opensaf will be started by osaffenced when the node is 'not isolated', i.e. 
active SC.
A file will also be dropped to 'expose' the node isolated/not isolated state.

Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need 

[devel] [PATCH 1/1] utils: Use a fence daemon as an alternative to payload reboot fencing [#3048]

2019-06-05 Thread Hans Nordebäck
---
 src/fm/Makefile.am|   6 +-
 src/fm/fmd/fm_amf.cc  |  14 +
 src/fm/fmd/tipc_server.cc |  93 ++
 src/fm/fmd/tipc_server.h  |  45 +++
 tools/devel/fenced/Makefile   |  63 
 tools/devel/fenced/README_TOOLS   |  15 +
 tools/devel/fenced/command.cc | 134 
 tools/devel/fenced/command.h  |  43 +++
 tools/devel/fenced/cpp_macros.h   |  33 ++
 tools/devel/fenced/fenced.conf|  17 +
 tools/devel/fenced/fenced_main.cc | 179 +++
 tools/devel/fenced/node_state_file.cc |  87 ++
 tools/devel/fenced/node_state_file.h  |  41 +++
 tools/devel/fenced/node_state_hdlr.cc |  54 
 tools/devel/fenced/node_state_hdlr.h  |  45 +++
 tools/devel/fenced/node_state_hdlr_factory.cc |  66 
 tools/devel/fenced/node_state_hdlr_factory.h  |  35 +++
 tools/devel/fenced/node_state_hdlr_pl.cc  | 292 ++
 tools/devel/fenced/node_state_hdlr_pl.h   |  60 
 tools/devel/fenced/node_state_hdlr_sc.cc  |  42 +++
 tools/devel/fenced/node_state_hdlr_sc.h   |  41 +++
 tools/devel/fenced/osaffenced.service |  14 +
 tools/devel/fenced/service.cc |  53 
 tools/devel/fenced/service.h  |  42 +++
 tools/devel/fenced/timer.cc   |  62 
 tools/devel/fenced/timer.h|  53 
 tools/devel/fenced/watchdog.cc|  37 +++
 tools/devel/fenced/watchdog.h |  39 +++
 28 files changed, 1703 insertions(+), 2 deletions(-)
 create mode 100644 src/fm/fmd/tipc_server.cc
 create mode 100644 src/fm/fmd/tipc_server.h
 create mode 100755 tools/devel/fenced/Makefile
 create mode 100644 tools/devel/fenced/README_TOOLS
 create mode 100644 tools/devel/fenced/command.cc
 create mode 100644 tools/devel/fenced/command.h
 create mode 100644 tools/devel/fenced/cpp_macros.h
 create mode 100644 tools/devel/fenced/fenced.conf
 create mode 100644 tools/devel/fenced/fenced_main.cc
 create mode 100644 tools/devel/fenced/node_state_file.cc
 create mode 100644 tools/devel/fenced/node_state_file.h
 create mode 100644 tools/devel/fenced/node_state_hdlr.cc
 create mode 100644 tools/devel/fenced/node_state_hdlr.h
 create mode 100644 tools/devel/fenced/node_state_hdlr_factory.cc
 create mode 100644 tools/devel/fenced/node_state_hdlr_factory.h
 create mode 100644 tools/devel/fenced/node_state_hdlr_pl.cc
 create mode 100644 tools/devel/fenced/node_state_hdlr_pl.h
 create mode 100644 tools/devel/fenced/node_state_hdlr_sc.cc
 create mode 100644 tools/devel/fenced/node_state_hdlr_sc.h
 create mode 100644 tools/devel/fenced/osaffenced.service
 create mode 100644 tools/devel/fenced/service.cc
 create mode 100644 tools/devel/fenced/service.h
 create mode 100644 tools/devel/fenced/timer.cc
 create mode 100644 tools/devel/fenced/timer.h
 create mode 100644 tools/devel/fenced/watchdog.cc
 create mode 100644 tools/devel/fenced/watchdog.h

diff --git a/src/fm/Makefile.am b/src/fm/Makefile.am
index 0f254b94f..325847ae9 100644
--- a/src/fm/Makefile.am
+++ b/src/fm/Makefile.am
@@ -20,7 +20,8 @@ noinst_HEADERS += \
src/fm/fmd/fm_cb.h \
src/fm/fmd/fm_evt.h \
src/fm/fmd/fm_mds.h \
-   src/fm/fmd/fm_mem.h
+   src/fm/fmd/fm_mem.h \
+   src/fm/fmd/tipc_server.h
 
 osaf_execbin_PROGRAMS += bin/osaffmd
 nodist_pkgclccli_SCRIPTS += \
@@ -44,7 +45,8 @@ bin_osaffmd_SOURCES = \
src/fm/fmd/fm_amf.cc \
src/fm/fmd/fm_main.cc \
src/fm/fmd/fm_mds.cc \
-   src/fm/fmd/fm_rda.cc
+   src/fm/fmd/fm_rda.cc \
+   src/fm/fmd/tipc_server.cc
 
 bin_osaffmd_LDADD = \
lib/libSaAmf.la \
diff --git a/src/fm/fmd/fm_amf.cc b/src/fm/fmd/fm_amf.cc
index e99f3ba7e..8cf284f97 100644
--- a/src/fm/fmd/fm_amf.cc
+++ b/src/fm/fmd/fm_amf.cc
@@ -34,6 +34,12 @@
 **/
 
 #include "fm.h"
+#include "tipc_server.h"
+
+namespace {
+TIPCServer tipc_srv;
+}
+
 extern uint32_t gl_fm_hdl;
 
 uint32_t fm_amf_init(FM_AMF_CB *fm_amf_cb);
@@ -151,6 +157,11 @@ void fm_saf_CSI_set_callback(SaInvocationT invocation, 
const SaNameT *compName,
 } else {
   fm_cb->amf_state = new_haState;
   fm_cb->csi_assigned = true;
+  if (new_haState == SA_AMF_HA_ACTIVE) {
+tipc_srv.publish();
+  } else {
+tipc_srv.unpublish();
+  }
 }
 error = saAmfResponse(fm_amf_cb->amf_hdl, invocation, error);
   }
@@ -300,6 +311,9 @@ uint32_t fm_amf_init(FM_AMF_CB *fm_amf_cb) {
   SaNameT sname;
   uint32_t rc = NCSCC_RC_SUCCESS;
   TRACE_ENTER();
+
+  tipc_srv.init();
+
   memset(, 0, sizeof(SaAmfCallbacksT));
   if (fm_amf_cb->nid_started &&
   amf_comp_name_get_set_from_file("FM_COMP_NAME_FILE", ) !=
diff --git a/src/fm/fmd/tipc_server.cc b/src/fm/fmd/tipc_server.cc
new file mode 100644
index 

Re: [devel] [PATCH 1/1] rded: improve self-fencing response time [#3039]

2019-05-27 Thread Hans Nordebäck
Hi Gary,

ack, review only/Thanks HansN

On 5/27/19 2:09 AM, Gary Lee wrote:
> When connectivity to consensus service is lost, it is recorded
> in a state variable. When all RDE peers are lost, the node will
> now self-fence immediately.
> ---
>   src/rde/rded/rde_cb.h|  5 +
>   src/rde/rded/rde_main.cc | 18 --
>   src/rde/rded/role.cc | 24 
>   src/rde/rded/role.h  |  3 +++
>   4 files changed, 48 insertions(+), 2 deletions(-)
>
> diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h
> index 9a0919c..e35fdab 100644
> --- a/src/rde/rded/rde_cb.h
> +++ b/src/rde/rded/rde_cb.h
> @@ -18,6 +18,7 @@
>   #ifndef RDE_RDED_RDE_CB_H_
>   #define RDE_RDED_RDE_CB_H_
>   
> +#include 
>   #include 
>   #include 
>   #include "base/osaf_utility.h"
> @@ -37,6 +38,8 @@
>   enum class State {kNotActive = 0, kNotActiveSeenPeer, kActiveElected,
> kActiveElectedSeenPeer, kActiveFailover};
>   
> +enum class ConsensusState {kUnknown = 0, kConnected, kDisconnected};
> +
>   struct RDE_CONTROL_BLOCK {
> SYSF_MBX mbx;
> NCSCONTEXT task_handle;
> @@ -49,6 +52,8 @@ struct RDE_CONTROL_BLOCK {
> // used for discovering peer controllers, regardless of their role
> std::set peer_controllers{};
> State state{State::kNotActive};
> +  std::atomic 
> consensus_service_state{ConsensusState::kUnknown};
> +  std::atomic state_refresh_thread_started{false}; // consensus service
>   };
>   
>   enum RDE_MSG_TYPE {
> diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
> index 456d2ce..1a7e587 100644
> --- a/src/rde/rded/rde_main.cc
> +++ b/src/rde/rded/rde_main.cc
> @@ -178,6 +178,19 @@ static void handle_mbx_event() {
>   case RDE_MSG_CONTROLLER_DOWN:
> rde_cb->peer_controllers.erase(msg->fr_node_id);
> TRACE("peer_controllers: size %zu", rde_cb->peer_controllers.size());
> +  if (role->role() == PCS_RDA_ACTIVE) {
> +Consensus consensus_service;
> +if (consensus_service.IsEnabled() == true &&
> +rde_cb->consensus_service_state == ConsensusState::kDisconnected 
> &&
> +consensus_service.IsRelaxedNodePromotionEnabled() == true &&
> +role->IsPeerPresent() == false) {
> +LOG_NO("Lost connectivity to consensus service. No peer 
> present");
> +if (consensus_service.IsRemoteFencingEnabled() == false) {
> +opensaf_quick_reboot("Lost connectivity to consensus 
> service. "
> + "Rebooting this node");
> +}
> +}
> +  }
> break;
>   case RDE_MSG_TAKEOVER_REQUEST_CALLBACK: {
> rde_cb->monitor_takeover_req_thread_running = false;
> @@ -214,7 +227,7 @@ static void handle_mbx_event() {
> if (consensus_service.IsRelaxedNodePromotionEnabled() == true) {
> if (rde_cb->state == State::kActiveElected) {
>   TRACE("Relaxed mode is enabled");
> -TRACE(" No peer SC yet seen, ignore consensus service 
> failure");
> +TRACE("No peer SC yet seen, ignore consensus service 
> failure");
>   // if relaxed node promotion is enabled, and we have yet to 
> see
>   // a peer SC after being promoted, tolerate consensus 
> service
>   // not working
> @@ -227,13 +240,14 @@ static void handle_mbx_event() {
>   // we have seen the peer, and peer is still connected, 
> tolerate
>   // consensus service not working
>   fencing_required = false;
> +rde_cb->consensus_service_state = 
> ConsensusState::kDisconnected;
> }
> }
> if (fencing_required == true) {
>   LOG_NO("Lost connectivity to consensus service");
>   if (consensus_service.IsRemoteFencingEnabled() == false) {
>   opensaf_quick_reboot("Lost connectivity to consensus 
> service. "
> -   "Rebooting this node");
> + "Rebooting this node");
>   }
> }
>   }
> diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
> index 3effc25..b8c8157 100644
> --- a/src/rde/rded/role.cc
> +++ b/src/rde/rded/role.cc
> @@ -215,6 +215,18 @@ timespec* Role::Poll(timespec* ts) {
>   is_candidate).detach();
> }
>   }
> +  } else if (role_ == PCS_RDA_ACTIVE) {
> +RDE_CONTROL_BLOCK* cb = rde_get_control_block();
> +if (cb->consensus_service_state == ConsensusState::kUnknown ||
> +cb->consensus_service_state == ConsensusState::kDisconnected) {
> +  // consensus service was previously disconnected, refresh state
> +  Consensus consensus_service;
> +  if (consensus_service.IsEnabled() == true &&
> +cb->state_refresh_thread_started == false) {
> +cb->state_refresh_thread_started = true;
> +

Re: [devel] [PATCH 1/1] uml: Update to Linux 4.18.20, iproute2 5.1.0 and busybox 1.30.1 [#3042]

2019-05-20 Thread Hans Nordebäck
Hi Anders,

ack, (I think CONFIG_KCOV=n should be added to the config file). /BR HansN

On 2019-05-20 14:38, Anders Widell wrote:
> Uplift the Linux kernel version for UML to 4.18.20, to make it possible to 
> build
> UML with newer glibc version (e.g. on Ubuntu 18.04).
> ---
>   tools/cluster_sim_uml/README  |   2 +
>   tools/cluster_sim_uml/uml/bin/uml_start   |   2 +-
>   tools/cluster_sim_uml/uml/build_uml   |  21 +-
>   .../config/{busybox-1.27.2 => busybox-1.30.1} | 120 +++---
>   .../{linux-4.13.3-i686 => linux-4.18.20-i686} | 225 --
>   ...nux-4.13.3-x86_64 => linux-4.18.20-x86_64} | 221 -
>   6 files changed, 291 insertions(+), 300 deletions(-)
>   rename tools/cluster_sim_uml/uml/config/{busybox-1.27.2 => busybox-1.30.1} 
> (93%)
>   rename tools/cluster_sim_uml/uml/config/{linux-4.13.3-i686 => 
> linux-4.18.20-i686} (88%)
>   rename tools/cluster_sim_uml/uml/config/{linux-4.13.3-x86_64 => 
> linux-4.18.20-x86_64} (88%)
>
> diff --git a/tools/cluster_sim_uml/README b/tools/cluster_sim_uml/README
> index 1d5912156..e747786ae 100644
> --- a/tools/cluster_sim_uml/README
> +++ b/tools/cluster_sim_uml/README
> @@ -202,6 +202,8 @@ The following Debian/Ubuntu packages are known to work. 
> Also make sure that you
>   have installed the corresponding development packages for these libraries.
>   
>   - bash  4.3
> +- bison  3.0.4
> +- flex   2.6.0
>   - libc6 2.23
>   - libgcc1   6.0.1
>   - libmnl0   1.0.3
> diff --git a/tools/cluster_sim_uml/uml/bin/uml_start 
> b/tools/cluster_sim_uml/uml/bin/uml_start
> index de4cb289e..501bf73f8 100755
> --- a/tools/cluster_sim_uml/uml/bin/uml_start
> +++ b/tools/cluster_sim_uml/uml/bin/uml_start
> @@ -36,7 +36,7 @@ uid=$(id -u)
>   byte1=2
>   byte2=0
>   byte3=0
> -if [ "$OSAF_UML_DYNAMIC_MAC" -eq "1" ]; then
> +if [ "$OSAF_UML_DYNAMIC_MAC" = "1" ]; then
> byte4=$(echo $(od -N1 -An -tx1 /dev/urandom))
>   else
> byte4=1
> diff --git a/tools/cluster_sim_uml/uml/build_uml 
> b/tools/cluster_sim_uml/uml/build_uml
> index ac7246058..df59b0caf 100755
> --- a/tools/cluster_sim_uml/uml/build_uml
> +++ b/tools/cluster_sim_uml/uml/build_uml
> @@ -65,28 +65,32 @@ help() {
>   exit 0
>   }
>   test -n "$1" || help
> +
> +type -t bison > /dev/null || die "Missing the tool 'bison'"
> +type -t flex > /dev/null || die "Missing the tool 'flex'"
> +
>   cd "$dir"
>   archive=${OSAF_UML_ARCHIVE:-$dir/archive}
>   build=${OSAF_UML_BUILD:-$dir}
>   configd=${OSAF_UML_CONFIGD:-$dir/config}
>   
> -kver=${OSAF_UML_KVER:-4.13.3}
> +kver=${OSAF_UML_KVER:-4.18.20}
>   kbasedir=$(echo "$kver" | cut -d. -f1).x
>   #kurlbase=${KURLBASE:-"https://www.kernel.org/pub/linux/kernel/v$kbasedir"}
>   
> kurlbase=${KURLBASE:-"http://ftp.funet.fi/pub/mirrors/ftp.kernel.org/pub/linux/kernel/v$kbasedir"}
>   
> #kurlbase=${KURLBASE:-"http://ftp.funet.fi/pub/mirrors/ftp.kernel.org/pub/linux/kernel/v$kbasedir/testing"}
>   kernel_decompress=xz
>   kurl="$kurlbase/linux-$kver.tar.xz"
> -kernel_sha256sum='03d22c74a102b66341b6f52e72142f0544cea3b413ca78bffe7d2a09e288caab
>   linux-4.13.3.tar.xz'
> +kernel_sha256sum='68ac319e0fb7edd6b6051541d9cf112cd4f77a29e16a69ae1e133ff51117f653
>   linux-4.18.20.tar.xz'
>   
> -iproute2ver=${OSAF_UML_IPRVER:-4.13.0}
> +iproute2ver=${OSAF_UML_IPRVER:-5.1.0}
>   
> iproute2url="https://www.kernel.org/pub/linux/utils/net/iproute2/iproute2-$iproute2ver.tar.xz;
>   
> #iproute2url="http://ftp.funet.fi/pub/mirrors/ftp.kernel.org/pub/linux/utils/net/iproute2/iproute2-$iproute2ver.tar.xz;
> -iproute2_sha256sum='9cfb81edf8c8509e03daa77cf62aead01c4a827132f6c506578f94cc19415c50
>   iproute2-4.13.0.tar.xz'
> +iproute2_sha256sum='dc5a980873eabf6b00c0be976b6e5562b1400d47d1d07d2ac35d5e5acbcf7bcf
>   iproute2-5.1.0.tar.xz'
>   
> -bbver=${OSAF_UML_BBVER:-1.27.2}
> +bbver=${OSAF_UML_BBVER:-1.30.1}
>   bburl="http://busybox.net/downloads/busybox-$bbver.tar.bz2;
> -bb_sha256sum='9d4be516b61e6480f156b11eb42577a13529f75d3383850bb75c50c285de63df
>   busybox-1.27.2.tar.bz2'
> +bb_sha256sum='3d1d04a4dbd34048f4794815a5c48ebb9eb53c5277e09c060323b95dfbdc
>   busybox-1.30.1.tar.bz2'
>   
>   umlutilsver=20070815
>   
> umlutilsurl="http://user-mode-linux.sourceforge.net/uml_utilities_$umlutilsver.tar.bz2;
> @@ -375,10 +379,7 @@ cmd_build_iproute2()
>   test -d bin || mkdir -p bin
>   cd iproute2-$iproute2ver
>   ./configure
> -cd tipc
> -mkdir linux
> -cp $build/linux-$kver/include/uapi/linux/tipc*.h linux
> -make -j$no_of_processors CFLAGS="-s -pipe -O2 -I." tipc
> +make -j$no_of_processors
>   cd $build
>   cp iproute2-$iproute2ver/tipc/tipc bin || die "Could not build tipc"
>   }
> diff --git a/tools/cluster_sim_uml/uml/config/busybox-1.27.2 
> b/tools/cluster_sim_uml/uml/config/busybox-1.30.1
> similarity index 93%
> rename from tools/cluster_sim_uml/uml/config/busybox-1.27.2
> rename to 

[devel] [PATCH 0/1] Review Request for mds: use new TIPC getsockopt to log receive buffer utilization [#3038]

2019-05-20 Thread Hans Nordebäck
Summary: mds: use new TIPC getsockopt to log receive queue utilization [#3038]
Review request for Ticket(s): 3038
Peer Reviewer(s): AndersW, Lennart, Gary
Pull request to: 
Affected branch(es): develop
Development branch: ticket-3038
Base revision: 3b124020051730287ace8bd9ab28a8fa431fc85a
Personal repository: git://git.code.sf.net/u/hansnordeback/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 7d1b566f8167f9fbe1512a78f0bcf4fb1c58f449
Author: Hans Nordeback 
Date:   Mon, 20 May 2019 13:47:28 +0200

mds: use new TIPC getsockopt to log receive queue utilization [#3038]



Added Files:

 src/base/statistics.h
 src/mds/mds_tipc_recvq_stats.cc
 src/mds/mds_tipc_recvq_stats.h
 src/mds/mds_tipc_recvq_stats_impl.cc
 src/mds/mds_tipc_recvq_stats_impl.h


Complete diffstat:
--
 00-README.conf   |  14 +++
 src/base/Makefile.am |   1 +
 src/base/statistics.h|  88 +
 src/mds/Makefile.am  |   8 +-
 src/mds/mds_dt_tipc.c|   3 +
 src/mds/mds_tipc_recvq_stats.cc  |  29 ++
 src/mds/mds_tipc_recvq_stats.h   |  32 +++
 src/mds/mds_tipc_recvq_stats_impl.cc | 178 +++
 src/mds/mds_tipc_recvq_stats_impl.h  |  39 
 9 files changed, 390 insertions(+), 2 deletions(-)


Testing Commands:
-
Pls. see 00-README.conf


Testing, Expected Results:
--
Pls. see 00-README.conf


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] mds: use new TIPC getsockopt to log receive queue utilization [#3038]

2019-05-20 Thread Hans Nordebäck
---
 00-README.conf   |  14 +++
 src/base/Makefile.am |   1 +
 src/base/statistics.h|  88 +
 src/mds/Makefile.am  |   8 +-
 src/mds/mds_dt_tipc.c|   3 +
 src/mds/mds_tipc_recvq_stats.cc  |  29 +
 src/mds/mds_tipc_recvq_stats.h   |  32 +
 src/mds/mds_tipc_recvq_stats_impl.cc | 178 +++
 src/mds/mds_tipc_recvq_stats_impl.h  |  39 ++
 9 files changed, 390 insertions(+), 2 deletions(-)
 create mode 100644 src/base/statistics.h
 create mode 100644 src/mds/mds_tipc_recvq_stats.cc
 create mode 100644 src/mds/mds_tipc_recvq_stats.h
 create mode 100644 src/mds/mds_tipc_recvq_stats_impl.cc
 create mode 100644 src/mds/mds_tipc_recvq_stats_impl.h

diff --git a/00-README.conf b/00-README.conf
index 8f20e5209..da1825f06 100644
--- a/00-README.conf
+++ b/00-README.conf
@@ -737,3 +737,17 @@ initiate a 'self-fencing' by rebooting the node, if it 
determines the node
 should no longer be active according to the consensus service, to prevent
 a split-brain situation.
 
+TIPC receive queue utilization
+==
+
+If setting the environment variable MDS_RECVQ_STATS_LOG_FREQ_SEC in a service 
config
+file enables TIPC receive queue utilisation statistics. The argument is how 
often the
+statistics will be written to syslog.
+
+Example amfd.conf:
+
+export MDS_RECVQ_STATS_LOG_FREQ_SEC=5
+
+then every 5 seconds a log record is written:
+
+May 20 12:23:30 SC-1 local0.notice osafamfd[545]: NO TIPC receive queue 
utilization (in %): min: 3.86 max: 4.38 mean: 4.15 std dev: 0.18
diff --git a/src/base/Makefile.am b/src/base/Makefile.am
index ce93562e5..025fb86a2 100644
--- a/src/base/Makefile.am
+++ b/src/base/Makefile.am
@@ -157,6 +157,7 @@ noinst_HEADERS += \
src/base/saf_error.h \
src/base/saf_mem.h \
src/base/sprr_dl_api.h \
+   src/base/statistics.h \
src/base/string_parse.h \
src/base/sysf_exc_scr.h \
src/base/sysf_ipc.h \
diff --git a/src/base/statistics.h b/src/base/statistics.h
new file mode 100644
index 0..9ce980fc1
--- /dev/null
+++ b/src/base/statistics.h
@@ -0,0 +1,88 @@
+/*  -*- OpenSAF  -*-
+ *
+ * (C) Copyright 2019 The OpenSAF Foundation
+ * Copyright Ericsson AB 2019 - All Rights Reserved.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
+ * under the GNU Lesser General Public License Version 2.1, February 1999.
+ * The complete license can be accessed from the following location:
+ * http://opensource.org/licenses/lgpl-license.php
+ * See the Copying file included with the OpenSAF distribution for full
+ * licensing terms.
+ *
+ * Author(s): Ericsson AB
+ *
+ */
+
+#ifndef STATISTICS_H_
+#define STATISTICS_H_
+
+#include 
+
+namespace base {
+
+class Statistics {
+ public:
+  void clear() {
+n_ = 0;
+  }
+
+  void push(double x) {
+n_++;
+
+// See Knuth, Art Of Computer Programming, Volume 2. The Seminumerical 
Algorithms, 4.2.2. Accuracy of Floating Point Arithmetic,
+// using the recurrence formulas:
+// M1 = x1, Mk = Mk-1 + (xk - Mk-1) / k  (15)
+// S1 = 0, Sk = Sk-1 + (xk - Mk-1) * (xk - Mk)  (16)
+// for 2 <= k <= n, sqrt(Sn/(n-1)
+if (n_ == 1) {
+  prev_m_ = current_m_ = x;
+  prev_s_ = 0;
+  min_ = x;
+  max_ = x;
+} else {
+  current_m_ = prev_m_ + (x - prev_m_) / n_;
+  current_s_ =  prev_s_ + (x - prev_m_) * (x - current_m_);
+
+  if (x > max_) max_ = x;
+  if (x < min_) min_ = x;
+  prev_m_ = current_m_;
+  prev_s_ = current_s_;
+}
+  }
+
+  double mean() const {
+return (n_ > 0) ?  current_m_ : 0;
+  }
+
+  double variance() const {
+return (n_ > 1) ? current_s_ / (n_ - 1) : 0;
+  }
+
+  double std_dev() const {
+return sqrt(variance());
+  }
+
+  double min() const {
+return min_;
+  }
+  double max() const {
+return max_;
+  }
+
+ private:
+  int n_{0};
+  double prev_m_{0};
+  double current_m_{0};
+  double prev_s_{0};
+  double current_s_{0};
+  double min_{0};
+  double max_{0};
+};
+
+}  // namespace base
+
+#endif  // STATISTICS_H_
+
diff --git a/src/mds/Makefile.am b/src/mds/Makefile.am
index 3724d2ea8..2d7b652e9 100644
--- a/src/mds/Makefile.am
+++ b/src/mds/Makefile.am
@@ -46,8 +46,12 @@ lib_libopensaf_core_la_SOURCES += \
src/mds/ncs_vda.c
 
 if ENABLE_TIPC_TRANSPORT
-noinst_HEADERS += src/mds/mds_dt_tipc.h
-lib_libopensaf_core_la_SOURCES += src/mds/mds_dt_tipc.c
+noinst_HEADERS += src/mds/mds_dt_tipc.h \
+   src/mds/mds_tipc_recvq_stats.h \
+   src/mds/mds_tipc_recvq_stats_impl.h
+lib_libopensaf_core_la_SOURCES += src/mds/mds_dt_tipc.c \
+   src/mds/mds_tipc_recvq_stats.cc \
+   src/mds/mds_tipc_recvq_stats_impl.cc
 endif
 
 if ENABLE_TESTS
diff --git 

Re: [devel] [PATCH 1/1] base: strip leading and trailing quotes [#3041]

2019-05-17 Thread Hans Nordebäck
Hi Gary, ack, code review only/Thanks HansN

On 2019-05-17 14:47, Gary Lee wrote:
> ConfigFileReader enables runtime 'reload' of .conf files.
> However, if the environment variable is surrounded by quotes,
> it adds the quotes to the value which is not the expected behaviour.
>
> export FOO="foo"
>
> FOO should contain just foo, not "foo".
> ---
>   src/base/config_file_reader.cc  | 15 +++
>   src/osaf/consensus/consensus.cc |  1 +
>   2 files changed, 16 insertions(+)
>
> diff --git a/src/base/config_file_reader.cc b/src/base/config_file_reader.cc
> index 63cad7d..0132547 100644
> --- a/src/base/config_file_reader.cc
> +++ b/src/base/config_file_reader.cc
> @@ -36,6 +36,18 @@ static void trim(std::string& str) {
> right_trim(str);
>   }
>   
> +static void strip_quotes(std::string& str) {
> +  // trim leading and trailing quotes
> +  if (str.front() == '"' ||
> +  str.front() == '\'') {
> +str.erase(0, 1);  // delete first char
> +  }
> +  if (str.back() == '"' ||
> +str.back() == '\'') {
> +str.pop_back();  // delete last char
> +  }
> +}
> +
>   ConfigFileReader::SettingsMap ConfigFileReader::ParseFile(
>   const std::string& filename) {
> const std::string prefix("export");
> @@ -80,6 +92,9 @@ ConfigFileReader::SettingsMap ConfigFileReader::ParseFile(
> std::string value = line.substr(equal + 1);
> trim(value);
>   
> +  strip_quotes(key);
> +  strip_quotes(value);
> +
> map[key] = value;
>   }
>   file.close();
> diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc
> index 480f7d2..0bebab2 100644
> --- a/src/osaf/consensus/consensus.cc
> +++ b/src/osaf/consensus/consensus.cc
> @@ -295,6 +295,7 @@ bool Consensus::ReloadConfiguration() {
> continue;
>   }
>   int rc;
> +TRACE("Setting '%s' to '%s'", kv.first.c_str(), kv.second.c_str());
>   rc = setenv(kv.first.c_str(), kv.second.c_str(), 1);
>   osafassert(rc == 0);
> }

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3

2019-05-03 Thread Hans Nordebäck
Hi Thuan,

Ack, code review only. Few comments below:

- Agree, but shouldn't the description in the ticket be something like

"mds: At MDS broadcast use TIPC multicast for fragmented messages instead of 
unicasting to only one destination' ?

- And adding the test cases would be good.

/BR HansN

On 2019-05-03 11:03, Tran Thuan wrote:
Hi Hans,

static uint32_t mdtm_sendto(uint8_t *buffer, uint16_t buff_len,
struct tipc_portid tipc_id);
static uint32_t mdtm_mcast_sendto(void *buffer, size_t size,
  const 
MDTM_SEND_REQ *req);

Before the fix, fragment package is sent via mdtm_sendto() which is designed 
(MDS design) to send to one destination.
After the fix, fragment package is sent via mdtm_mcast_sendto() which is 
designed (MDS design) to send to all destination.

Both functions are call sendto() of TIPC but just different parameters.

Best Regards,
ThuanTr

From: Hans Nordebäck 
<mailto:hans.nordeb...@ericsson.com>
Sent: Friday, May 3, 2019 3:50 PM
To: Thuan Tran <mailto:thuan.t...@dektech.com.au>; 
Minh Hon Chau <mailto:minh.c...@dektech.com.au>; Vu 
Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au>
Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>;
 Anders Widell <mailto:anders.wid...@ericsson.com>
Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3


Hi Thuan,

a question, the old code uses unicast for the fragments, now multicast is used. 
But from the 'sendto' TIPC documentation:

"If the destination is a service range, the message is a multicast to all 
matching sockets."

so before when unicast was used for the fragments, TIPC  multicast the 
fragments? What problem do this patch

solves, can you clarify? /Thanks HansN
On 2019-05-03 10:20, Tran Thuan wrote:

Hi Hans,



Yes, we try that kind of basic test, IMMD can deliver big message via multicast.



Best Regards,

ThuanTr



-Original Message-

From: Hans Nordebäck 
<mailto:hans.nordeb...@ericsson.com>

Sent: Friday, May 3, 2019 3:11 PM

To: Thuan Tran <mailto:thuan.t...@dektech.com.au>; 
Minh Hon Chau <mailto:minh.c...@dektech.com.au>; Vu 
Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au>

Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>

Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3



Hi Thuan,



that sounds good, is this how you test now? When looking at the MDS code before 
this change it looks that



large multicast messages are fragmented and only sent to one receiver using 
unicast, but with this change the fragments



are multicasted to all receivers, which seems more correct. /Thanks HansN



On 2019-05-03 10:04, Tran Thuan wrote:

Hi Hans,



Current MDS apitest only binary execution on one node.

It is easier if create IMM test case to make IMMD send broadcast big message.

I think we can create new ticket for this additional test.



Best Regards,

ThuanTr



-Original Message-

From: Hans Nordebäck 
<mailto:hans.nordeb...@ericsson.com>

Sent: Friday, May 3, 2019 2:45 PM

To: Thuan Tran <mailto:thuan.t...@dektech.com.au>; 
Minh Hon Chau

<mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen

<mailto:vu.m.ngu...@dektech.com.au>

Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>

Subject: RE: [PATCH 0/1] Review Request for mds: support multicast

fragmented messages [#3033] V3



Hi Thuan,

ok, if we can add additional tests to the mds api test suite would be

good/Thanks HansN



-----Original Message-

From: Tran Thuan <mailto:thuan.t...@dektech.com.au>

Sent: den 3 maj 2019 09:41

To: Hans Nordebäck 
<mailto:hans.nordeb...@ericsson.com>; Minh Hon Chau

<mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen

<mailto:vu.m.ngu...@dektech.com.au>

Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>

Subject: RE: [PATCH 0/1] Review Request for mds: support multicast

fragmented messages [#3033] V3



Hi Hans,



I don't see this kind of test in mds apitests.



Best Regards,

ThuanTr



-Original Message-

From: Hans Nordebäck 
<mailto:hans.nordeb...@ericsson.com>

Sent: Friday, May 3, 2019 2:31 PM

To: Thuan Tran <mailto:thuan.t...@dektech.com.au>; 
Minh Hon Chau

<mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen

<mailto:vu.m.ngu...@dektech.com.au>

Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>

Subject: RE: [PATCH 0/1] Review Request for mds: support multicast

fragmented messages [#3033] V3



Hi Thuan,

I'm reviewing the patch now. I haven't checked yet but do you know if

the mds apitests cover this case sending large multicast messages?

/Thanks HansN



Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3

2019-05-03 Thread Hans Nordebäck
Hi Thuan,

a question, the old code uses unicast for the fragments, now multicast is used. 
But from the 'sendto' TIPC documentation:

"If the destination is a service range, the message is a multicast to all 
matching sockets."

so before when unicast was used for the fragments, TIPC  multicast the 
fragments? What problem do this patch

solves, can you clarify? /Thanks HansN

On 2019-05-03 10:20, Tran Thuan wrote:

Hi Hans,

Yes, we try that kind of basic test, IMMD can deliver big message via multicast.

Best Regards,
ThuanTr

-Original Message-----
From: Hans Nordebäck 
<mailto:hans.nordeb...@ericsson.com>
Sent: Friday, May 3, 2019 3:11 PM
To: Thuan Tran <mailto:thuan.t...@dektech.com.au>; 
Minh Hon Chau <mailto:minh.c...@dektech.com.au>; Vu 
Minh Nguyen <mailto:vu.m.ngu...@dektech.com.au>
Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>
Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi Thuan,

that sounds good, is this how you test now? When looking at the MDS code before 
this change it looks that

large multicast messages are fragmented and only sent to one receiver using 
unicast, but with this change the fragments

are multicasted to all receivers, which seems more correct. /Thanks HansN

On 2019-05-03 10:04, Tran Thuan wrote:


Hi Hans,

Current MDS apitest only binary execution on one node.
It is easier if create IMM test case to make IMMD send broadcast big message.
I think we can create new ticket for this additional test.

Best Regards,
ThuanTr

-Original Message-
From: Hans Nordebäck 
<mailto:hans.nordeb...@ericsson.com>
Sent: Friday, May 3, 2019 2:45 PM
To: Thuan Tran <mailto:thuan.t...@dektech.com.au>; 
Minh Hon Chau
<mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen
<mailto:vu.m.ngu...@dektech.com.au>
Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast
fragmented messages [#3033] V3

Hi Thuan,
ok, if we can add additional tests to the mds api test suite would be
good/Thanks HansN

-Original Message-
From: Tran Thuan <mailto:thuan.t...@dektech.com.au>
Sent: den 3 maj 2019 09:41
To: Hans Nordebäck 
<mailto:hans.nordeb...@ericsson.com>; Minh Hon Chau
<mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen
<mailto:vu.m.ngu...@dektech.com.au>
Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast
fragmented messages [#3033] V3

Hi Hans,

I don't see this kind of test in mds apitests.

Best Regards,
ThuanTr

-Original Message-
From: Hans Nordebäck 
<mailto:hans.nordeb...@ericsson.com>
Sent: Friday, May 3, 2019 2:31 PM
To: Thuan Tran <mailto:thuan.t...@dektech.com.au>; 
Minh Hon Chau
<mailto:minh.c...@dektech.com.au>; Vu Minh Nguyen
<mailto:vu.m.ngu...@dektech.com.au>
Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast
fragmented messages [#3033] V3

Hi Thuan,
I'm reviewing the patch now. I haven't checked yet but do you know if
the mds apitests cover this case sending large multicast messages?
/Thanks HansN

-Original Message-
From: Tran Thuan <mailto:thuan.t...@dektech.com.au>
Sent: den 2 maj 2019 05:56
To: Minh Hon Chau <mailto:minh.c...@dektech.com.au>; 
Vu Minh Nguyen
<mailto:vu.m.ngu...@dektech.com.au>; Hans Nordebäck
<mailto:hans.nordeb...@ericsson.com>
Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast
fragmented messages [#3033] V3

Hi Hans,

Do you have any further comment?
Can we push the patch?

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau <mailto:minh.c...@dektech.com.au>
Sent: Friday, April 26, 2019 4:11 PM
To: Vu Minh Nguyen 
<mailto:vu.m.ngu...@dektech.com.au>; 'Hans 
Nordebäck'
<mailto:hans.nordeb...@ericsson.com>; 'Thuan Tran'
<mailto:thuan.t...@dektech.com.au>
Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>
Subject: Re: [PATCH 0/1] Review Request for mds: support multicast
fragmented messages [#3033] V3

Hi,

ack from me (code review)

Thanks

Minh

On 25/4/19 9:33 pm, Vu Minh Nguyen wrote:


Hi Hans,

Probably you were looking at code that included this Thuan's patch.

In legacy code, only mdtm_sendto() is called inside the function 
mdtm_frag_and_send().

Regards, Vu



-Original Message-
From: Hans Nordebäck 
<mailto:hans.nordeb...@ericsson.com>
Sent: Thursday, April 25, 2019 6:10 PM
To: Vu Minh Nguyen 
<mailto:vu.m.ngu...@dektech.com.au>; Thuan Tran
<mailto:thuan.t...@dektech.com.au>; Mi

Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3

2019-05-03 Thread Hans Nordebäck
Hi Thuan,

that sounds good, is this how you test now? When looking at the MDS code 
before this change it looks that

large multicast messages are fragmented and only sent to one receiver 
using unicast, but with this change the fragments

are multicasted to all receivers, which seems more correct. /Thanks HansN

On 2019-05-03 10:04, Tran Thuan wrote:
> Hi Hans,
>
> Current MDS apitest only binary execution on one node.
> It is easier if create IMM test case to make IMMD send broadcast big message.
> I think we can create new ticket for this additional test.
>
> Best Regards,
> ThuanTr
>
> -Original Message-
> From: Hans Nordebäck 
> Sent: Friday, May 3, 2019 2:45 PM
> To: Thuan Tran ; Minh Hon Chau 
> ; Vu Minh Nguyen 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented 
> messages [#3033] V3
>
> Hi Thuan,
> ok, if we can add additional tests to the mds api test suite would be 
> good/Thanks HansN
>
> -Original Message-
> From: Tran Thuan 
> Sent: den 3 maj 2019 09:41
> To: Hans Nordebäck ; Minh Hon Chau 
> ; Vu Minh Nguyen 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented 
> messages [#3033] V3
>
> Hi Hans,
>
> I don't see this kind of test in mds apitests.
>
> Best Regards,
> ThuanTr
>
> -Original Message-
> From: Hans Nordebäck 
> Sent: Friday, May 3, 2019 2:31 PM
> To: Thuan Tran ; Minh Hon Chau 
> ; Vu Minh Nguyen 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented 
> messages [#3033] V3
>
> Hi Thuan,
> I'm reviewing the patch now. I haven't checked yet but do you know if the mds 
> apitests cover this case sending large multicast messages? /Thanks HansN
>
> -Original Message-
> From: Tran Thuan 
> Sent: den 2 maj 2019 05:56
> To: Minh Hon Chau ; Vu Minh Nguyen 
> ; Hans Nordebäck 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented 
> messages [#3033] V3
>
> Hi Hans,
>
> Do you have any further comment?
> Can we push the patch?
>
> Best Regards,
> ThuanTr
>
> -Original Message-
> From: Minh Hon Chau 
> Sent: Friday, April 26, 2019 4:11 PM
> To: Vu Minh Nguyen ; 'Hans Nordebäck' 
> ; 'Thuan Tran' 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented 
> messages [#3033] V3
>
> Hi,
>
> ack from me (code review)
>
> Thanks
>
> Minh
>
> On 25/4/19 9:33 pm, Vu Minh Nguyen wrote:
>> Hi Hans,
>>
>> Probably you were looking at code that included this Thuan's patch.
>>
>> In legacy code, only mdtm_sendto() is called inside the function 
>> mdtm_frag_and_send().
>>
>> Regards, Vu
>>
>>> -Original Message-
>>> From: Hans Nordebäck 
>>> Sent: Thursday, April 25, 2019 6:10 PM
>>> To: Vu Minh Nguyen ; Thuan Tran
>>> ; Minh Hon Chau 
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast
>>> fragmented messages [#3033] V3
>>>
>>>
>>> Hi Vu,
>>> It seems mdtm_mcast_sendto is used in mdtm_frag_and_send, at
>>> MDS_SENDTYPE_BCAST/BR Hans -Original Message-
>>> From: Vu Minh Nguyen 
>>> Sent: den 25 april 2019 12:20
>>> To: Hans Nordebäck ; Thuan Tran
>>> ; Minh Hon Chau 
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast
>>> fragmented messages [#3033] V3
>>>
>>> Hi Hans,
>>>
>>> See my responses inline.
>>>
>>> Regards, Vu
>>>
>>>> -Original Message-
>>>> From: Hans Nordebäck 
>>>> Sent: Thursday, April 25, 2019 4:28 PM
>>>> To: Thuan Tran ; Vu Minh Nguyen
>>>> ; Minh Hon Chau
>>> 
>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast
>>>> fragmented messages [#3033] V3
>>>>
>>>> Hi Vu and Thuan,
>>>>
>>>> a few question, is the text in the ticket description correct? E.g
>>>> it says unicast is used if a multicast message is fragmented, (I
>>>> think multicast still is used
>>>>
>>>> to send the fragments), this is what you mean with 2 different channels?
>>>> (only

Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3

2019-05-03 Thread Hans Nordebäck
Hi Thuan,
I'm reviewing the patch now. I haven't checked yet but do you know if the mds 
apitests cover this case sending large multicast messages? /Thanks HansN 

-Original Message-
From: Tran Thuan  
Sent: den 2 maj 2019 05:56
To: Minh Hon Chau ; Vu Minh Nguyen 
; Hans Nordebäck 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi Hans,

Do you have any further comment?
Can we push the patch?

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau 
Sent: Friday, April 26, 2019 4:11 PM
To: Vu Minh Nguyen ; 'Hans Nordebäck' 
; 'Thuan Tran' 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi,

ack from me (code review)

Thanks

Minh

On 25/4/19 9:33 pm, Vu Minh Nguyen wrote:
> Hi Hans,
>
> Probably you were looking at code that included this Thuan's patch.
>
> In legacy code, only mdtm_sendto() is called inside the function 
> mdtm_frag_and_send().
>
> Regards, Vu
>
>> -----Original Message-
>> From: Hans Nordebäck 
>> Sent: Thursday, April 25, 2019 6:10 PM
>> To: Vu Minh Nguyen ; Thuan Tran 
>> ; Minh Hon Chau 
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
>> fragmented messages [#3033] V3
>>
>>
>> Hi Vu,
>> It seems mdtm_mcast_sendto is used in mdtm_frag_and_send, at 
>> MDS_SENDTYPE_BCAST/BR Hans -Original Message-
>> From: Vu Minh Nguyen 
>> Sent: den 25 april 2019 12:20
>> To: Hans Nordebäck ; Thuan Tran 
>> ; Minh Hon Chau 
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
>> fragmented messages [#3033] V3
>>
>> Hi Hans,
>>
>> See my responses inline.
>>
>> Regards, Vu
>>
>>> -Original Message-
>>> From: Hans Nordebäck 
>>> Sent: Thursday, April 25, 2019 4:28 PM
>>> To: Thuan Tran ; Vu Minh Nguyen 
>>> ; Minh Hon Chau
>> 
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast 
>>> fragmented messages [#3033] V3
>>>
>>> Hi Vu and Thuan,
>>>
>>> a few question, is the text in the ticket description correct? E.g 
>>> it says unicast is used if a multicast message is fragmented, (I 
>>> think multicast still is used
>>>
>>> to send the fragments), this is what you mean with 2 different channels?
>>> (only one socket is used, BSRsock),
>> [Vu] Yes. Unicast is used to send fragmented messages. Here is the 
>> current logic in case of sending a large package:
>> Iterate over destinations { // mcm_pvt_process_svc_bcast_common() @ 
>> mds_c_sndrcv.c
>>  1) Fragment the package // mdtm_frag_and_send() @ mds_dt_tipc.c
>>  2) Unicast to a specific adest  // mdtm_sendto() @
>> mds_dt_tipc.c
>>  4) Continue with next adest
>> }
>>
>>> The problem stated is sending one large multicast message and then 
>>> several smaller multicast messages, have you checked the
>>>
>>> fragment re-assembly part of the common code?
>> [Vu] Yes. At the receive side, if msg is fragmented, mds will not 
>> forward to upper layer until all fragmented msgs are collected.
>> If the message is not fragmented, mds will transfer the msg to upper 
>> right away.
>>
>> I checked with TIPC guys here, and he said that TIPC does not 
>> guarantee the order if we send msgs in different channels (unicast vs mcast).
>>
>>> /BR Hans
>>>
>>>
>>> On 2019-04-24 13:06, thuan.tran wrote:
>>>> Summary: mds: support multicast fragmented messages [#3033] Review 
>>>> request for Ticket(s): 3033 Peer Reviewer(s): Hans, Minh, Vu Pull 
>>>> request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected
>>>> branch(es): develop Development branch: ticket-3033 Base revision:
>>>> 7916ac316e86478c621c8359cf2aca4886288a38
>>>> Personal repository: git://git.code.sf.net/u/thuantr/review
>>>>
>>>> 
>>>> Impacted area   Impact y/n
>>>> 
>>>>Docsn
>>>>Build systemn
>>>>RPM/packaging   n
>>>>Configuration files n
>>>>Startup scripts n
>>>>SAF servicesy
>>>>OpenSAF se

Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3

2019-05-03 Thread Hans Nordebäck
Hi Thuan,
ok, if we can add additional tests to the mds api test suite would be 
good/Thanks HansN

-Original Message-
From: Tran Thuan  
Sent: den 3 maj 2019 09:41
To: Hans Nordebäck ; Minh Hon Chau 
; Vu Minh Nguyen 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi Hans,

I don't see this kind of test in mds apitests.

Best Regards,
ThuanTr

-Original Message-
From: Hans Nordebäck 
Sent: Friday, May 3, 2019 2:31 PM
To: Thuan Tran ; Minh Hon Chau 
; Vu Minh Nguyen 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi Thuan,
I'm reviewing the patch now. I haven't checked yet but do you know if the mds 
apitests cover this case sending large multicast messages? /Thanks HansN 

-Original Message-
From: Tran Thuan 
Sent: den 2 maj 2019 05:56
To: Minh Hon Chau ; Vu Minh Nguyen 
; Hans Nordebäck 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi Hans,

Do you have any further comment?
Can we push the patch?

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau 
Sent: Friday, April 26, 2019 4:11 PM
To: Vu Minh Nguyen ; 'Hans Nordebäck' 
; 'Thuan Tran' 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi,

ack from me (code review)

Thanks

Minh

On 25/4/19 9:33 pm, Vu Minh Nguyen wrote:
> Hi Hans,
>
> Probably you were looking at code that included this Thuan's patch.
>
> In legacy code, only mdtm_sendto() is called inside the function 
> mdtm_frag_and_send().
>
> Regards, Vu
>
>> -----Original Message-
>> From: Hans Nordebäck 
>> Sent: Thursday, April 25, 2019 6:10 PM
>> To: Vu Minh Nguyen ; Thuan Tran 
>> ; Minh Hon Chau 
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
>> fragmented messages [#3033] V3
>>
>>
>> Hi Vu,
>> It seems mdtm_mcast_sendto is used in mdtm_frag_and_send, at 
>> MDS_SENDTYPE_BCAST/BR Hans -Original Message-
>> From: Vu Minh Nguyen 
>> Sent: den 25 april 2019 12:20
>> To: Hans Nordebäck ; Thuan Tran 
>> ; Minh Hon Chau 
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
>> fragmented messages [#3033] V3
>>
>> Hi Hans,
>>
>> See my responses inline.
>>
>> Regards, Vu
>>
>>> -Original Message-
>>> From: Hans Nordebäck 
>>> Sent: Thursday, April 25, 2019 4:28 PM
>>> To: Thuan Tran ; Vu Minh Nguyen 
>>> ; Minh Hon Chau
>> 
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast 
>>> fragmented messages [#3033] V3
>>>
>>> Hi Vu and Thuan,
>>>
>>> a few question, is the text in the ticket description correct? E.g 
>>> it says unicast is used if a multicast message is fragmented, (I 
>>> think multicast still is used
>>>
>>> to send the fragments), this is what you mean with 2 different channels?
>>> (only one socket is used, BSRsock),
>> [Vu] Yes. Unicast is used to send fragmented messages. Here is the 
>> current logic in case of sending a large package:
>> Iterate over destinations { // mcm_pvt_process_svc_bcast_common() @ 
>> mds_c_sndrcv.c
>>  1) Fragment the package // mdtm_frag_and_send() @ mds_dt_tipc.c
>>  2) Unicast to a specific adest  // mdtm_sendto() @
>> mds_dt_tipc.c
>>  4) Continue with next adest
>> }
>>
>>> The problem stated is sending one large multicast message and then 
>>> several smaller multicast messages, have you checked the
>>>
>>> fragment re-assembly part of the common code?
>> [Vu] Yes. At the receive side, if msg is fragmented, mds will not 
>> forward to upper layer until all fragmented msgs are collected.
>> If the message is not fragmented, mds will transfer the msg to upper 
>> right away.
>>
>> I checked with TIPC guys here, and he said that TIPC does not 
>> guarantee the order if we send msgs in different channels (unicast vs mcast).
>>
>>> /BR Hans
>>>
>>>
>>> On 2019-04-24 13:06, thuan.tran wrote:
>>>> Summary: mds: support multicast fragmented messages [#3033] Review 
>>>> request for Ticket(s): 3033 Peer Reviewer(s): Hans, Minh, Vu Pull 
>>>> request to: *** LIST THE PERSON

Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3

2019-04-25 Thread Hans Nordebäck
Hi Vu and Thuan,

a few question, is the text in the ticket description correct? E.g it 
says unicast is used if a multicast message is fragmented, (I think 
multicast still is used

to send the fragments), this is what you mean with 2 different channels? 
(only one socket is used, BSRsock),

The problem stated is sending one large multicast message and then 
several smaller multicast messages, have you checked the

fragment re-assembly part of the common code?

/BR Hans


On 2019-04-24 13:06, thuan.tran wrote:
> Summary: mds: support multicast fragmented messages [#3033]
> Review request for Ticket(s): 3033
> Peer Reviewer(s): Hans, Minh, Vu
> Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
> Affected branch(es): develop
> Development branch: ticket-3033
> Base revision: 7916ac316e86478c621c8359cf2aca4886288a38
> Personal repository: git://git.code.sf.net/u/thuantr/review
>
> 
> Impacted area   Impact y/n
> 
>   Docsn
>   Build systemn
>   RPM/packaging   n
>   Configuration files n
>   Startup scripts n
>   SAF servicesy
>   OpenSAF servicesn
>   Core libraries  n
>   Samples n
>   Tests   n
>   Other   n
>
> NOTE: Patch(es) contain lines longer than 80 characers
>
> Comments (indicate scope for each "y" above):
> -
> N/A
>
> revision 568f09774f936506f5e05e03813fa572af0fe0d3
> Author:   thuan.tran 
> Date: Wed, 24 Apr 2019 17:54:25 +0700
>
> mds: support multicast fragmented messages [#3033]
>
> - Sender may send broadcast big messages (> 65K) then small messages (< 65K).
> Current MDS just loop via all destinations to unicast all fragmented messages
> to one by one destinations. But sending multicast non-fragment messages to all
> destinations. Therefor, receivers may get messages with incorrect order,
> non-fragment messages may come before fragmented messages.
> For example, it may lead to OUT OF ORDER for IMMNDs during IMMD sync.
> - Solution: support send multicast each fragmented messages to avoid
> disorder of arrived broadcast messages.
>
>
>
> Complete diffstat:
> --
>   src/mds/mds_c_sndrcv.c |   3 +-
>   src/mds/mds_dt_tipc.c  | 104 
> +++--
>   2 files changed, 40 insertions(+), 67 deletions(-)
>
>
> Testing Commands:
> -
> N/A
>
> Testing, Expected Results:
> --
> N/A
>
> Conditions of Submission:
> -
> N/A
>
> Arch  Built StartedLinux distro
> ---
> mipsn  n
> mips64  n  n
> x86 n  n
> x86_64  y  y
> powerpc n  n
> powerpc64   n  n
>
>
> Reviewer Checklist:
> ---
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank entries
>  that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
>  (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
>  Demonstrate some level of runtime or other sanity testing.
>
> ___ You have ^M present in some of your files. These have to be removed.
>
> ___ You have needlessly changed whitespace or added whitespace crimes
>  like trailing spaces, or spaces before tabs.
>
> ___ You have mixed real technical changes with whitespace and other
>  cosmetic code cleanup changes. These have to be separate commits.
>
> ___ You need to refactor your submission into logical chunks; there is
>  too much content into a single commit.
>
> ___ You have extraneous garbage in your review (merge commits etc)
>
> ___ You have giant attachments which should never have been sent;
>  Instead you should place your content in a public tree to be pulled.
>
> ___ You have too many commits attached to an e-mail; resend as threaded
>  commits, or place in a public tree for a pull.
>
> ___ You have resent this content multiple times without a clear indication
>  of what has changed between each re-send.
>
> ___ You have failed to adequately and individually address all of the
>  comments and change requests that were proposed in the initial review.
>
> ___ You have a misconfigured 

Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3

2019-04-25 Thread Hans Nordebäck
Hi Vu, you are right, my concern was the description of the problem, and 
it looks ok with your explanation. /Thanks Hans

On 2019-04-25 13:33, Vu Minh Nguyen wrote:
> Hi Hans,
>
> Probably you were looking at code that included this Thuan's patch.
>
> In legacy code, only mdtm_sendto() is called inside the function 
> mdtm_frag_and_send().
>
> Regards, Vu
>
>> -----Original Message-
>> From: Hans Nordebäck 
>> Sent: Thursday, April 25, 2019 6:10 PM
>> To: Vu Minh Nguyen ; Thuan Tran
>> ; Minh Hon Chau
>> 
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast
>> fragmented messages [#3033] V3
>>
>>
>> Hi Vu,
>> It seems mdtm_mcast_sendto is used in mdtm_frag_and_send, at
>> MDS_SENDTYPE_BCAST/BR Hans
>> -Original Message-
>> From: Vu Minh Nguyen 
>> Sent: den 25 april 2019 12:20
>> To: Hans Nordebäck ; Thuan Tran
>> ; Minh Hon Chau
>> 
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast
>> fragmented messages [#3033] V3
>>
>> Hi Hans,
>>
>> See my responses inline.
>>
>> Regards, Vu
>>
>>> -Original Message-
>>> From: Hans Nordebäck 
>>> Sent: Thursday, April 25, 2019 4:28 PM
>>> To: Thuan Tran ; Vu Minh Nguyen
>>> ; Minh Hon Chau
>> 
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast
>>> fragmented messages [#3033] V3
>>>
>>> Hi Vu and Thuan,
>>>
>>> a few question, is the text in the ticket description correct? E.g it
>>> says unicast is used if a multicast message is fragmented, (I think
>>> multicast still is used
>>>
>>> to send the fragments), this is what you mean with 2 different channels?
>>> (only one socket is used, BSRsock),
>> [Vu] Yes. Unicast is used to send fragmented messages. Here is the current
>> logic in case of sending a large package:
>> Iterate over destinations { // mcm_pvt_process_svc_bcast_common() @
>> mds_c_sndrcv.c
>>  1) Fragment the package // mdtm_frag_and_send() @ mds_dt_tipc.c
>>  2) Unicast to a specific adest  // mdtm_sendto() @
>> mds_dt_tipc.c
>>  4) Continue with next adest
>> }
>>
>>> The problem stated is sending one large multicast message and then
>>> several smaller multicast messages, have you checked the
>>>
>>> fragment re-assembly part of the common code?
>> [Vu] Yes. At the receive side, if msg is fragmented, mds will not forward to
>> upper layer until all fragmented msgs are collected.
>> If the message is not fragmented, mds will transfer the msg to upper right
>> away.
>>
>> I checked with TIPC guys here, and he said that TIPC does not guarantee the
>> order if we send msgs in different channels (unicast vs mcast).
>>
>>> /BR Hans
>>>
>>>
>>> On 2019-04-24 13:06, thuan.tran wrote:
>>>> Summary: mds: support multicast fragmented messages [#3033] Review
>>>> request for Ticket(s): 3033 Peer Reviewer(s): Hans, Minh, Vu Pull
>>>> request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected
>>>> branch(es): develop Development branch: ticket-3033 Base revision:
>>>> 7916ac316e86478c621c8359cf2aca4886288a38
>>>> Personal repository: git://git.code.sf.net/u/thuantr/review
>>>>
>>>> 
>>>> Impacted area   Impact y/n
>>>> 
>>>>Docsn
>>>>Build systemn
>>>>RPM/packaging   n
>>>>Configuration files n
>>>>Startup scripts n
>>>>SAF servicesy
>>>>OpenSAF servicesn
>>>>Core libraries  n
>>>>Samples n
>>>>Tests   n
>>>>Other   n
>>>>
>>>> NOTE: Patch(es) contain lines longer than 80 characers
>>>>
>>>> Comments (indicate scope for each "y" above):
>>>> -
>>>> N/A
>>>>
>>>> revision 568f09774f936506f5e05e03813fa572af0fe0d3
>>>> Author:thuan.tran 
>>>> Date:  Wed, 24 Apr 2019 17:54:25 +0700
>>>>
>>>> mds: 

Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3

2019-04-25 Thread Hans Nordebäck

Hi Vu, 
It seems mdtm_mcast_sendto is used in mdtm_frag_and_send, at 
MDS_SENDTYPE_BCAST/BR Hans
-Original Message-
From: Vu Minh Nguyen  
Sent: den 25 april 2019 12:20
To: Hans Nordebäck ; Thuan Tran 
; Minh Hon Chau 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi Hans,

See my responses inline.

Regards, Vu

> -Original Message-
> From: Hans Nordebäck 
> Sent: Thursday, April 25, 2019 4:28 PM
> To: Thuan Tran ; Vu Minh Nguyen 
> ; Minh Hon Chau 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast 
> fragmented messages [#3033] V3
> 
> Hi Vu and Thuan,
> 
> a few question, is the text in the ticket description correct? E.g it 
> says unicast is used if a multicast message is fragmented, (I think 
> multicast still is used
> 
> to send the fragments), this is what you mean with 2 different channels?
> (only one socket is used, BSRsock),
[Vu] Yes. Unicast is used to send fragmented messages. Here is the current 
logic in case of sending a large package:
Iterate over destinations { // mcm_pvt_process_svc_bcast_common() @ 
mds_c_sndrcv.c
1) Fragment the package // mdtm_frag_and_send() @ mds_dt_tipc.c
2) Unicast to a specific adest  // mdtm_sendto() @ mds_dt_tipc.c
4) Continue with next adest
}

> 
> The problem stated is sending one large multicast message and then 
> several smaller multicast messages, have you checked the
> 
> fragment re-assembly part of the common code?
[Vu] Yes. At the receive side, if msg is fragmented, mds will not forward to 
upper layer until all fragmented msgs are collected.
If the message is not fragmented, mds will transfer the msg to upper right away.

I checked with TIPC guys here, and he said that TIPC does not guarantee the 
order if we send msgs in different channels (unicast vs mcast).

> 
> /BR Hans
> 
> 
> On 2019-04-24 13:06, thuan.tran wrote:
> > Summary: mds: support multicast fragmented messages [#3033] Review 
> > request for Ticket(s): 3033 Peer Reviewer(s): Hans, Minh, Vu Pull 
> > request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected 
> > branch(es): develop Development branch: ticket-3033 Base revision: 
> > 7916ac316e86478c621c8359cf2aca4886288a38
> > Personal repository: git://git.code.sf.net/u/thuantr/review
> >
> > 
> > Impacted area   Impact y/n
> > 
> >   Docsn
> >   Build systemn
> >   RPM/packaging   n
> >   Configuration files n
> >   Startup scripts n
> >   SAF servicesy
> >   OpenSAF servicesn
> >   Core libraries  n
> >   Samples n
> >   Tests   n
> >   Other   n
> >
> > NOTE: Patch(es) contain lines longer than 80 characers
> >
> > Comments (indicate scope for each "y" above):
> > -
> > N/A
> >
> > revision 568f09774f936506f5e05e03813fa572af0fe0d3
> > Author: thuan.tran 
> > Date:   Wed, 24 Apr 2019 17:54:25 +0700
> >
> > mds: support multicast fragmented messages [#3033]
> >
> > - Sender may send broadcast big messages (> 65K) then small messages 
> > (<
> 65K).
> > Current MDS just loop via all destinations to unicast all fragmented
> messages
> > to one by one destinations. But sending multicast non-fragment 
> > messages
> to all
> > destinations. Therefor, receivers may get messages with incorrect 
> > order, non-fragment messages may come before fragmented messages.
> > For example, it may lead to OUT OF ORDER for IMMNDs during IMMD sync.
> > - Solution: support send multicast each fragmented messages to avoid 
> > disorder of arrived broadcast messages.
> >
> >
> >
> > Complete diffstat:
> > --
> >   src/mds/mds_c_sndrcv.c |   3 +-
> >   src/mds/mds_dt_tipc.c  | 104 
> > +++-
> -
> >   2 files changed, 40 insertions(+), 67 deletions(-)
> >
> >
> > Testing Commands:
> > -
> > N/A
> >
> > Testing, Expected Results:
> > --
> > N/A
> >
> > Conditions of Submission:
> > -
> > N/A
> >
> > Arch  Built StartedLinux distro
> > ---
> > mipsn  n
> > mips64  n  n
&

Re: [devel] [PATCH 1/1] osaf: ensure an error is returned if takeover_request fails [#3023]

2019-03-26 Thread Hans Nordebäck
ack, review only/Thanks HansN

-Original Message-
From: Gary Lee  
Sent: den 26 mars 2019 02:05
To: Minh Hon Chau ; Hans Nordebäck 

Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
Subject: [PATCH 1/1] osaf: ensure an error is returned if takeover_request 
fails [#3023]

if we cannot read the result of a takeover_request, ensure we return an error
---
 src/osaf/consensus/consensus.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc 
index cf307b3..480f7d2 100644
--- a/src/osaf/consensus/consensus.cc
+++ b/src/osaf/consensus/consensus.cc
@@ -433,6 +433,8 @@ SaAisErrorT Consensus::CreateTakeoverRequest(const 
std::string& current_owner,
  return rc;
   }
 
+  // in case takeover request cannot be read  rc = 
+ SA_AIS_ERR_FAILED_OPERATION;
   // wait up to max_takeover_retry seconds for request to be answered
   retries = 0;
   while (retries < max_takeover_retry_) {
--
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] osaf: improve response time in etcd3.plugin [#3016]

2019-03-12 Thread Hans Nordebäck
Hi Gary,

ack, review only. /BR HansN

On 3/12/19 01:32, Gary Lee wrote:
> if the initial call to watch takeover request in etcd3.plugin
> is made when etcd has already been shutdown (for example,
> when etcd is running locally and the node is being shutdown),
> the plugin should return 0 with a fake takeover request to ensure
> rded shuts down promptly. Otherwise, it will keep calling
> watch, delaying node shutdown.
> ---
>   src/osaf/consensus/plugins/etcd3.plugin | 11 +--
>   1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/src/osaf/consensus/plugins/etcd3.plugin 
> b/src/osaf/consensus/plugins/etcd3.plugin
> index acccd98..d926885 100644
> --- a/src/osaf/consensus/plugins/etcd3.plugin
> +++ b/src/osaf/consensus/plugins/etcd3.plugin
> @@ -357,9 +357,16 @@ watch() {
>   return 0
> fi
>   done
> +  else
> +# etcd down?
> +if [ "$watch_key" == "$takeover_request" ]; then
> +  hostname=`cat $node_name_file`
> +  echo "$hostname SC-0 1000 UNDEFINED"
> +  return 0
> +else
> +  return 1
> +fi
> fi
> -
> -  return 1
>   }
>   
>   # argument parsing

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] dtm: Fix dtm close socket due to duplication of adding node IP info [#2984]

2019-03-06 Thread Hans Nordebäck
Hi Canh,

ack, review only. I think it would be good to separate the re-factoring 
part in a separate ticket though.

/BR Hans

On 12/18/18 08:25, Canh Van Truong wrote:
> During cluster start, one node (node 1) broadcast up msg to other node. The
> remote node (node 2) get this msg and send the connection to node 1 
> (connect()).
> Similarly node 1 send the connection to  node 2 after node 2 broadcast up msg 
> to.
> Beside of node 2 connect() to node 1, node 2 also add the IP and ID info of 
> node 1 to database.
> But before of that, node 2 may also accept the connection that come from node 
> 1. The
> acception is also add node ID of node 1. So there is 2 times adding the node 
> ID
> info of node 1 to database in node 2. This causes the socket connection is 
> closed
> and node is  restart again.
>
> The patch change to retrieve node from database by node IP instead node ID in
> processing connection. This will reject the double of establishing connection
> between 2 nodes and also double of adding node IP to database.
> ---
>   src/dtm/dtmnd/dtm.h   | 11 --
>   src/dtm/dtmnd/dtm_inter_trans.cc  |  3 +-
>   src/dtm/dtmnd/dtm_node.cc |  2 +-
>   src/dtm/dtmnd/dtm_node_db.cc  | 79 
> ---
>   src/dtm/dtmnd/dtm_node_sockets.cc | 20 ++
>   5 files changed, 72 insertions(+), 43 deletions(-)
>
> diff --git a/src/dtm/dtmnd/dtm.h b/src/dtm/dtmnd/dtm.h
> index 28c811e65..a06b8f503 100644
> --- a/src/dtm/dtmnd/dtm.h
> +++ b/src/dtm/dtmnd/dtm.h
> @@ -45,6 +45,11 @@ typedef enum {
> DTM_MBX_MSG_TYPE = 5,
>   } MBX_POST_TYPES;
>   
> +typedef enum {
> +  DTM_NODE_ID_KEY_TYPE = 0,
> +  DTM_NODE_IP_KEY_TYPE = 2,
> +} KEY_TYPES;
> +
>   typedef struct dtm_rcv_msg_elem {
> void *next;
> MBX_POST_TYPES type;
> @@ -99,10 +104,10 @@ typedef struct dtm_snd_msg_elem {
>   
>   extern void node_discovery_process(void *arg);
>   extern uint32_t dtm_cb_init(DTM_INTERNODE_CB *dtms_cb);
> -extern DTM_NODE_DB *dtm_node_get_by_id(uint32_t nodeid);
> +extern DTM_NODE_DB *dtm_node_get(uint8_t *key, KEY_TYPES type);
>   extern DTM_NODE_DB *dtm_node_getnext_by_id(uint32_t node_id);
> -extern uint32_t dtm_node_add(DTM_NODE_DB *node, int i);
> -extern uint32_t dtm_node_delete(DTM_NODE_DB *nnode, int i);
> +extern uint32_t dtm_node_add(DTM_NODE_DB *node, KEY_TYPES type);
> +extern uint32_t dtm_node_delete(DTM_NODE_DB *nnode, KEY_TYPES type);
>   extern DTM_NODE_DB *dtm_node_new(const DTM_NODE_DB *new_node);
>   extern void dtm_print_config(DTM_INTERNODE_CB *config);
>   extern int dtm_read_config(DTM_INTERNODE_CB *config,
> diff --git a/src/dtm/dtmnd/dtm_inter_trans.cc 
> b/src/dtm/dtmnd/dtm_inter_trans.cc
> index 9d8335466..9b4194614 100644
> --- a/src/dtm/dtmnd/dtm_inter_trans.cc
> +++ b/src/dtm/dtmnd/dtm_inter_trans.cc
> @@ -235,9 +235,10 @@ static uint32_t dtm_internode_snd_msg_common(DTM_NODE_DB 
> *node, uint8_t *buffer,
>   uint32_t dtm_internode_snd_msg_to_node(uint8_t *buffer, uint16_t len,
>  NODE_ID node_id) {
> DTM_NODE_DB *node = nullptr;
> +  uint8_t *key = reinterpret_cast(_id);
>   
> TRACE_ENTER();
> -  node = dtm_node_get_by_id(node_id);
> +  node = dtm_node_get(key, DTM_NODE_ID_KEY_TYPE);
>   
> if (nullptr != node) {
>   if (NCSCC_RC_SUCCESS != dtm_internode_snd_msg_common(node, buffer, 
> len)) {
> diff --git a/src/dtm/dtmnd/dtm_node.cc b/src/dtm/dtmnd/dtm_node.cc
> index de2f94738..72506f262 100644
> --- a/src/dtm/dtmnd/dtm_node.cc
> +++ b/src/dtm/dtmnd/dtm_node.cc
> @@ -125,7 +125,7 @@ uint32_t dtm_process_node_info(DTM_INTERNODE_CB *dtms_cb, 
> DTM_NODE_DB *node,
> memcpy(node->node_name, data, nodename_len);
> node->node_name[nodename_len] = '\0';
> node->comm_status = true;
> -  if (dtm_node_add(node, 0) != NCSCC_RC_SUCCESS) {
> +  if (dtm_node_add(node, DTM_NODE_ID_KEY_TYPE) != NCSCC_RC_SUCCESS) {
>   LOG_ER(
>   "DTM:  A node already exists in the cluster with similar "
>   "configuration (possible duplicate IP address and/or node id), 
> please "
> diff --git a/src/dtm/dtmnd/dtm_node_db.cc b/src/dtm/dtmnd/dtm_node_db.cc
> index 1c9da4dac..1038f0918 100644
> --- a/src/dtm/dtmnd/dtm_node_db.cc
> +++ b/src/dtm/dtmnd/dtm_node_db.cc
> @@ -123,24 +123,49 @@ uint32_t dtm_cb_init(DTM_INTERNODE_CB *dtms_cb) {
>   }
>   
>   /**
> - * Retrieve node from node db by nodeid
> + * Retrieve node from node db
>*
> - * @param nodeid
> + * @param key
> + * @param i
>*
> - * @return NCSCC_RC_SUCCESS
> - * @return NCSCC_RC_FAILURE
> + * @return node
>*
>*/
> -DTM_NODE_DB *dtm_node_get_by_id(uint32_t nodeid) {
> +DTM_NODE_DB *dtm_node_get(uint8_t *key, KEY_TYPES type) {
> TRACE_ENTER();
> DTM_INTERNODE_CB *dtms_cb = dtms_gl_cb;
> +  DTM_NODE_DB *node = nullptr;
>   
> -  DTM_NODE_DB *node = reinterpret_cast(ncs_patricia_tree_get(
> -  _cb->nodeid_tree, reinterpret_cast()));
> -  if (node != nullptr) 

Re: [devel] [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]

2019-03-04 Thread Hans Nordebäck
Hi Vu,

ack, with one comment below. /BR HansN

On 3/1/19 10:07, Vu Minh Nguyen wrote:
> There is a dependency b/w svc_monitor_thread and spawn_services.
> The coredump happens when spawn_services is executed while
> the thread has not yet started. In this case, data is sent to the
> pipe but no one consumed it. When it comes to consume the data,
> will get unexpected data and crash the program.
>
> This patch ensures the things will happen in the right order:
> svc_monitor_thread must be in ready state before spawn_services()
> is executed.
> ---
>   src/nid/nodeinit.cc | 34 +++---
>   1 file changed, 23 insertions(+), 11 deletions(-)
>
> diff --git a/src/nid/nodeinit.cc b/src/nid/nodeinit.cc
> index 5f15916b4..2e6a5cd05 100644
> --- a/src/nid/nodeinit.cc
> +++ b/src/nid/nodeinit.cc
> @@ -47,6 +47,8 @@
>*any notification.  *
>/
>   
> +#include "nid/nodeinit.h"
> +
>   #include 
>   #include 
>   #include 
> @@ -61,20 +63,18 @@
>   #include 
>   #include 
>   
> -#include "osaf/configmake.h"
> -#include "rde/agent/rda_papi.h"
> -#include "base/logtrace.h"
> -
> +#include 
>   #include 
>   #include 
>   #include 
>   #include 
>   
> +#include "osaf/configmake.h"
> +#include "rde/agent/rda_papi.h"
> +#include "base/logtrace.h"
>   #include "base/conf.h"
>   #include "base/osaf_poll.h"
>   #include "base/osaf_time.h"
> -
> -#include "nid/nodeinit.h"
>   #include "base/file_notify.h"
>   
>   #define SETSIG(sa, sig, fun, flags) \
> @@ -134,6 +134,7 @@ static int start_monitor_svc(const char *svc);
>   /* Data declarations for service monitoring */
>   static int svc_mon_fd = -1;
>   static int next_svc_fds_slot = 0;
> +static std::atomic svc_monitor_thread_ready{false};
>   
>   struct SAFServices {
> const std::string fifo_dir = PKGLOCALSTATEDIR;
> @@ -712,9 +713,9 @@ int32_t fork_daemon(NID_SPAWN_INFO *service, char *app, 
> char *args[],
>   
>   tmp_pid = getpid();
>   while (write(filedes[1], _pid, sizeof(int)) < 0) {
> -  if (errno == EINTR)
> +  if (errno == EINTR) {
>   continue;
> -  else if (errno == EPIPE) {
> +  } else if (errno == EPIPE) {
>   LOG_ER("Reader not available to return my PID");
> } else {
>   LOG_ER("Problem writing to pipe, err=%s", strerror(errno));
> @@ -1517,6 +1518,7 @@ void *svc_monitor_thread(void *fd) {
> next_svc_fds_slot++;
>   
> while (true) {
> +svc_monitor_thread_ready = true;
>   unsigned rc = osaf_poll(fds, next_svc_fds_slot, -1);
>   if (rc > 0) {
> // check if any monitored service has exit
> @@ -1529,9 +1531,9 @@ void *svc_monitor_thread(void *fd) {
>   
> if (fds[FD_SVC_MON_THR].revents & POLLIN) {
>   while (true) {
> -  read_rc = read(svc_mon_thr_fd, nid_name, NID_MAXSNAME);
> +  read_rc = recv(svc_mon_thr_fd, nid_name, NID_MAXSNAME, 
> MSG_DONTWAIT);
> if (read_rc == -1) {
> -if (errno == EINTR) {
> +if (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK) {

[HansN] should be

   if (errno == EINTR)

     continue;

   if (errno == EAGAIN || errno == EWOULDBLOCK)

     break;


> continue;
>   } else {
> LOG_ER("Failed to read on socketpair descriptor: %s",
> @@ -1574,7 +1576,7 @@ uint32_t create_svc_monitor_thread(void) {
>   
> TRACE_ENTER();
>   
> -  if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0, s_pair) == -1) {
> +  if (socketpair(AF_UNIX, SOCK_DGRAM | SOCK_CLOEXEC, 0, s_pair) == -1) {
>   LOG_ER("socketpair FAILED: %s", strerror(errno));
>   return NCSCC_RC_FAILURE;
> }
> @@ -1655,6 +1657,16 @@ int main(int argc, char *argv[]) {
>   exit(EXIT_FAILURE);
> }
>   
> +  // Waiting until svc_monitor_thread is up and in ready state.
> +  unsigned no_repeat = 0;
> +  while (svc_monitor_thread_ready == false && no_repeat < 100) {
> +osaf_nanosleep();
> +no_repeat++;
> +  }
> +
> +  osafassert(svc_monitor_thread_ready);
> +  LOG_NO("svc_monitor_thread is up and in ready state");
> +
> if (parse_nodeinit_conf(sbuf) != NCSCC_RC_SUCCESS) {
>   LOG_ER("Failed to parse file %s. Exiting", sbuf);
>   exit(EXIT_FAILURE);

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]

2019-02-28 Thread Hans Nordebäck
Hi Vu,
fine,  perhaps also changing the  static bool svc_monitor_thread_running = 
false  to std::atomic?/BR Hans

From: Vu Minh Nguyen 
Sent: den 28 februari 2019 09:30
To: Hans Nordebäck ; Gary Lee 

Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]

Thanks Hans. I will send the V2 for these updates.

Regards, Vu

From: Hans Nordebäck 
mailto:hans.nordeb...@ericsson.com>>
Sent: Thursday, February 28, 2019 2:16 PM
To: Vu Minh Nguyen 
mailto:vu.m.ngu...@dektech.com.au>>; Gary Lee 
mailto:gary@dektech.com.au>>
Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>
Subject: Re: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]


Hi Vu,

you can keep your patch for the ready state, but also change SOCK_STREAM to 
SOCK_DGRAM and change

the read(svc_mon_thr_fd, nid_name, NID_MAXSNAME) in svc_monitor_thread to

recv(svc_mon_thr_fd, nid_name, NID_MAXSNAME, MSG_DONTWAIT) and also handle 
EAGAIN and

EWOULDBLOCK. Then only one nid_name per read/recv will be given instead of 
several nid_names

as in the SOCK_STREAM case.

/BR Hans


On 2/28/19 05:30, Vu Minh Nguyen wrote:

Hi Hans,



Thanks for your comment.



But I has a concern that the service-monitoring function may not fully work

if a service is crashed before the svc_monitor_thread goes to ready state?



Is it mandatory for monitoring thread to enter ready state before spawning

SAF services?



Regards, Vu



-Original Message-----

From: Hans Nordebäck 
<mailto:hans.nordeb...@ericsson.com>

Sent: Wednesday, February 27, 2019 8:23 PM

To: Vu Minh Nguyen 
<mailto:vu.m.ngu...@dektech.com.au>; Gary Lee

<mailto:gary@dektech.com.au>

Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>;
 Vu Minh Nguyen

<mailto:vu.m.ngu...@dektech.com.au>

Subject: RE: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]



Hi Vu,

I discussed a bit with Anders, likely it should work if the socketpair is

changed

to socketpair(AF_UNIX, SOCK_DGRAM .. from SOCK_STREAM. /BR Hans



-Original Message-

From: Hans Nordebäck

Sent: den 27 februari 2019 11:55

To: 'Vu Minh Nguyen' 
<mailto:vu.m.ngu...@dektech.com.au>; Gary Lee

<mailto:gary@dektech.com.au>

Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>;
 Vu Minh Nguyen

<mailto:vu.m.ngu...@dektech.com.au>

Subject: RE: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]



Hi Vu,

ack, code review only/Thanks HansN



-Original Message-

From: Vu Minh Nguyen 
<mailto:vu.m.ngu...@dektech.com.au>

Sent: den 27 februari 2019 11:48

To: Hans Nordebäck 
<mailto:hans.nordeb...@ericsson.com>; Gary Lee

<mailto:gary@dektech.com.au>

Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>;
 Vu Minh Nguyen

<mailto:vu.m.ngu...@dektech.com.au>

Subject: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]



There is a dependency b/w svc_monitor_thread and spawn_services.

The coredump happens when spawn_services is executed while the thread

has not yet started. In this case, data is sent to the pipe but no one

consumed

it. Later on, reading data from the pipe, will get unexpected data and

crash

the program.



This patch ensures the order: svc_monitor_thread must be in ready state

before spawn_services() is executed.

---

 src/nid/nodeinit.cc | 11 +++

 1 file changed, 11 insertions(+)



diff --git a/src/nid/nodeinit.cc b/src/nid/nodeinit.cc index

5f15916b4..b4945b05c 100644

--- a/src/nid/nodeinit.cc

+++ b/src/nid/nodeinit.cc

@@ -134,6 +134,7 @@ static int start_monitor_svc(const char *svc);

 /* Data declarations for service monitoring */  static int svc_mon_fd =

-1;

static int next_svc_fds_slot = 0;

+static bool svc_monitor_thread_running = false;



 struct SAFServices {

   const std::string fifo_dir = PKGLOCALSTATEDIR; @@ -1517,6 +1518,7 @@

void *svc_monitor_thread(void *fd) {

   next_svc_fds_slot++;



   while (true) {

+svc_monitor_thread_running = true;

 unsigned rc = osaf_poll(fds, next_svc_fds_slot, -1);

 if (rc > 0) {

   // check if any monitored service has exit @@ -1655,6 +1657,15 @@

int

main(int argc, char *argv[]) {

 exit(EXIT_FAILURE);

   }



+  // Waiting until svc_monitor_thread is up and in ready state.

+  // If spawn_services runs before the thread is in ready state,  //

+ receive side of the pipe s_pair will get unexpected data and  // may

+ crash the process.

+  while (svc_monitor_thread_running == false) {

+usleep(100);

+  }

+

+  LOG_NO("svc_monitor_thread is up and in ready state");

   if (parse_nodeinit_conf(sbuf) != NCSCC_RC_SUCCESS) {

 LOG_ER("Failed to parse file %s. Exiting", sbuf);

 exit(EXIT_FAILURE);

--

2.19.2





___

Re: [devel] [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]

2019-02-27 Thread Hans Nordebäck
Hi Vu,

you can keep your patch for the ready state, but also change SOCK_STREAM to 
SOCK_DGRAM and change

the read(svc_mon_thr_fd, nid_name, NID_MAXSNAME) in svc_monitor_thread to

recv(svc_mon_thr_fd, nid_name, NID_MAXSNAME, MSG_DONTWAIT) and also handle 
EAGAIN and

EWOULDBLOCK. Then only one nid_name per read/recv will be given instead of 
several nid_names

as in the SOCK_STREAM case.

/BR Hans


On 2/28/19 05:30, Vu Minh Nguyen wrote:

Hi Hans,

Thanks for your comment.

But I has a concern that the service-monitoring function may not fully work
if a service is crashed before the svc_monitor_thread goes to ready state?

Is it mandatory for monitoring thread to enter ready state before spawning
SAF services?

Regards, Vu



-Original Message-
From: Hans Nordebäck 
<mailto:hans.nordeb...@ericsson.com>
Sent: Wednesday, February 27, 2019 8:23 PM
To: Vu Minh Nguyen 
<mailto:vu.m.ngu...@dektech.com.au>; Gary Lee
<mailto:gary@dektech.com.au>
Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>;
 Vu Minh Nguyen
<mailto:vu.m.ngu...@dektech.com.au>
Subject: RE: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]

Hi Vu,
I discussed a bit with Anders, likely it should work if the socketpair is


changed


to socketpair(AF_UNIX, SOCK_DGRAM .. from SOCK_STREAM. /BR Hans

-----Original Message-
From: Hans Nordebäck
Sent: den 27 februari 2019 11:55
To: 'Vu Minh Nguyen' 
<mailto:vu.m.ngu...@dektech.com.au>; Gary Lee
<mailto:gary@dektech.com.au>
Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>;
 Vu Minh Nguyen
<mailto:vu.m.ngu...@dektech.com.au>
Subject: RE: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]

Hi Vu,
ack, code review only/Thanks HansN

-Original Message-
From: Vu Minh Nguyen 
<mailto:vu.m.ngu...@dektech.com.au>
Sent: den 27 februari 2019 11:48
To: Hans Nordebäck 
<mailto:hans.nordeb...@ericsson.com>; Gary Lee
<mailto:gary@dektech.com.au>
Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>;
 Vu Minh Nguyen
<mailto:vu.m.ngu...@dektech.com.au>
Subject: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]

There is a dependency b/w svc_monitor_thread and spawn_services.
The coredump happens when spawn_services is executed while the thread
has not yet started. In this case, data is sent to the pipe but no one


consumed


it. Later on, reading data from the pipe, will get unexpected data and


crash


the program.

This patch ensures the order: svc_monitor_thread must be in ready state
before spawn_services() is executed.
---
 src/nid/nodeinit.cc | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/nid/nodeinit.cc b/src/nid/nodeinit.cc index
5f15916b4..b4945b05c 100644
--- a/src/nid/nodeinit.cc
+++ b/src/nid/nodeinit.cc
@@ -134,6 +134,7 @@ static int start_monitor_svc(const char *svc);
 /* Data declarations for service monitoring */  static int svc_mon_fd =


-1;


static int next_svc_fds_slot = 0;
+static bool svc_monitor_thread_running = false;

 struct SAFServices {
   const std::string fifo_dir = PKGLOCALSTATEDIR; @@ -1517,6 +1518,7 @@
void *svc_monitor_thread(void *fd) {
   next_svc_fds_slot++;

   while (true) {
+svc_monitor_thread_running = true;
 unsigned rc = osaf_poll(fds, next_svc_fds_slot, -1);
 if (rc > 0) {
   // check if any monitored service has exit @@ -1655,6 +1657,15 @@


int


main(int argc, char *argv[]) {
 exit(EXIT_FAILURE);
   }

+  // Waiting until svc_monitor_thread is up and in ready state.
+  // If spawn_services runs before the thread is in ready state,  //
+ receive side of the pipe s_pair will get unexpected data and  // may
+ crash the process.
+  while (svc_monitor_thread_running == false) {
+usleep(100);
+  }
+
+  LOG_NO("svc_monitor_thread is up and in ready state");
   if (parse_nodeinit_conf(sbuf) != NCSCC_RC_SUCCESS) {
 LOG_ER("Failed to parse file %s. Exiting", sbuf);
 exit(EXIT_FAILURE);
--
2.19.2






___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]

2019-02-27 Thread Hans Nordebäck
Hi Vu,
I discussed a bit with Anders, likely it should work if the socketpair is 
changed to socketpair(AF_UNIX, SOCK_DGRAM .. from SOCK_STREAM. /BR Hans

-Original Message-
From: Hans Nordebäck 
Sent: den 27 februari 2019 11:55
To: 'Vu Minh Nguyen' ; Gary Lee 

Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: RE: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]

Hi Vu,
ack, code review only/Thanks HansN

-Original Message-
From: Vu Minh Nguyen 
Sent: den 27 februari 2019 11:48
To: Hans Nordebäck ; Gary Lee 

Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]

There is a dependency b/w svc_monitor_thread and spawn_services.
The coredump happens when spawn_services is executed while the thread has not 
yet started. In this case, data is sent to the pipe but no one consumed it. 
Later on, reading data from the pipe, will get unexpected data and crash the 
program.

This patch ensures the order: svc_monitor_thread must be in ready state before 
spawn_services() is executed.
---
 src/nid/nodeinit.cc | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/nid/nodeinit.cc b/src/nid/nodeinit.cc index 
5f15916b4..b4945b05c 100644
--- a/src/nid/nodeinit.cc
+++ b/src/nid/nodeinit.cc
@@ -134,6 +134,7 @@ static int start_monitor_svc(const char *svc);
 /* Data declarations for service monitoring */  static int svc_mon_fd = -1;  
static int next_svc_fds_slot = 0;
+static bool svc_monitor_thread_running = false;
 
 struct SAFServices {
   const std::string fifo_dir = PKGLOCALSTATEDIR; @@ -1517,6 +1518,7 @@ void 
*svc_monitor_thread(void *fd) {
   next_svc_fds_slot++;
 
   while (true) {
+svc_monitor_thread_running = true;
 unsigned rc = osaf_poll(fds, next_svc_fds_slot, -1);
 if (rc > 0) {
   // check if any monitored service has exit @@ -1655,6 +1657,15 @@ int 
main(int argc, char *argv[]) {
 exit(EXIT_FAILURE);
   }
 
+  // Waiting until svc_monitor_thread is up and in ready state.
+  // If spawn_services runs before the thread is in ready state,  // 
+ receive side of the pipe s_pair will get unexpected data and  // may 
+ crash the process.
+  while (svc_monitor_thread_running == false) {
+usleep(100);
+  }
+
+  LOG_NO("svc_monitor_thread is up and in ready state");
   if (parse_nodeinit_conf(sbuf) != NCSCC_RC_SUCCESS) {
 LOG_ER("Failed to parse file %s. Exiting", sbuf);
 exit(EXIT_FAILURE);
--
2.19.2



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]

2019-02-27 Thread Hans Nordebäck
Hi Vu,
ack, code review only/Thanks HansN

-Original Message-
From: Vu Minh Nguyen  
Sent: den 27 februari 2019 11:48
To: Hans Nordebäck ; Gary Lee 

Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013]

There is a dependency b/w svc_monitor_thread and spawn_services.
The coredump happens when spawn_services is executed while the thread has not 
yet started. In this case, data is sent to the pipe but no one consumed it. 
Later on, reading data from the pipe, will get unexpected data and crash the 
program.

This patch ensures the order: svc_monitor_thread must be in ready state before 
spawn_services() is executed.
---
 src/nid/nodeinit.cc | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/nid/nodeinit.cc b/src/nid/nodeinit.cc index 
5f15916b4..b4945b05c 100644
--- a/src/nid/nodeinit.cc
+++ b/src/nid/nodeinit.cc
@@ -134,6 +134,7 @@ static int start_monitor_svc(const char *svc);
 /* Data declarations for service monitoring */  static int svc_mon_fd = -1;  
static int next_svc_fds_slot = 0;
+static bool svc_monitor_thread_running = false;
 
 struct SAFServices {
   const std::string fifo_dir = PKGLOCALSTATEDIR; @@ -1517,6 +1518,7 @@ void 
*svc_monitor_thread(void *fd) {
   next_svc_fds_slot++;
 
   while (true) {
+svc_monitor_thread_running = true;
 unsigned rc = osaf_poll(fds, next_svc_fds_slot, -1);
 if (rc > 0) {
   // check if any monitored service has exit @@ -1655,6 +1657,15 @@ int 
main(int argc, char *argv[]) {
 exit(EXIT_FAILURE);
   }
 
+  // Waiting until svc_monitor_thread is up and in ready state.
+  // If spawn_services runs before the thread is in ready state,  // 
+ receive side of the pipe s_pair will get unexpected data and  // may 
+ crash the process.
+  while (svc_monitor_thread_running == false) {
+usleep(100);
+  }
+
+  LOG_NO("svc_monitor_thread is up and in ready state");
   if (parse_nodeinit_conf(sbuf) != NCSCC_RC_SUCCESS) {
 LOG_ER("Failed to parse file %s. Exiting", sbuf);
 exit(EXIT_FAILURE);
--
2.19.2



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] base: fix warnings [#3006]

2019-02-26 Thread Hans Nordebäck
Hi Gary,

ack, review only/Thanks HansN

On 2/9/19 04:11, Gary Lee wrote:
> fix warnings about unused variables and add SA_RESTART
> ---
>   src/base/daemon.c | 5 -
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/src/base/daemon.c b/src/base/daemon.c
> index 2f7f37f..e24eaaa 100644
> --- a/src/base/daemon.c
> +++ b/src/base/daemon.c
> @@ -539,6 +539,9 @@ static void sigterm_handler(int signum, siginfo_t *info, 
> void *ptr)
>*/
>   static void sighup_handler(int signum, siginfo_t *info, void *ptr)
>   {
> + (void)signum;
> + (void)info;
> + (void)ptr;
>   ncs_sel_obj_ind(_sel_obj);
>   }
>   
> @@ -605,7 +608,7 @@ NCS_SEL_OBJ* daemon_sighup_install(int *hangup_fd)
>   
>   sigemptyset(_mask);
>   act.sa_sigaction = sighup_handler;
> - act.sa_flags = SA_SIGINFO;
> + act.sa_flags = SA_RESTART | SA_SIGINFO;
>   if (sigaction(SIGHUP, , NULL) < 0) {
>   syslog(LOG_ERR, "sigaction HUP failed: %s", strerror(errno));
>   exit(EXIT_FAILURE);

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/2] fmd: improve failover response time [#3008]

2019-02-20 Thread Hans Nordebäck
Hi Gary,

ack, review only/BR HansN

On 2/19/19 05:10, Gary Lee wrote:
> Improve failover response time if split brain prevention is enabled
> but FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is set to 0.
>
> Also, return immediately if node promotion fails to avoid
> sending active role to RDA.
> ---
>   src/fm/fmd/fm_rda.cc | 14 +-
>   1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/src/fm/fmd/fm_rda.cc b/src/fm/fmd/fm_rda.cc
> index 504757c..d3063ba 100644
> --- a/src/fm/fmd/fm_rda.cc
> +++ b/src/fm/fmd/fm_rda.cc
> @@ -88,17 +88,20 @@ uint32_t fm_rda_set_role(FM_CB *fm_cb, PCS_RDA_ROLE role) 
> {
>   
> Consensus consensus_service;
> if (consensus_service.IsEnabled() == true) {
> -// Allow topology events to be processed first. The MDS thread may
> -// be processing MDS down events and updating cluster_size concurrently.
> -// We need cluster_size to be as accurate as possible, without waiting
> -// too long for node down events.
> -std::this_thread::sleep_for(std::chrono::seconds(4));
> +if (consensus_service.PrioritisePartitionSize() == true) {
> +  // Allow topology events to be processed first. The MDS thread may
> +  // be processing MDS down events and updating cluster_size 
> concurrently.
> +  // We need cluster_size to be as accurate as possible, without waiting
> +  // too long for node down events.
> +  std::this_thread::sleep_for(std::chrono::seconds(4));
> +}
>   
>   rc = consensus_service.PromoteThisNode(true, fm_cb->cluster_size);
>   if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) {
> LOG_ER("Unable to set active controller in consensus service");
> opensaf_quick_reboot("Unable to set active controller "
> "in consensus service");
> +  return NCSCC_RC_FAILURE;
>   } else if (rc == SA_AIS_ERR_EXIST) {
> // @todo if we don't reboot, we don't seem to recover from this. Can 
> we
> // improve?
> @@ -107,6 +110,7 @@ uint32_t fm_rda_set_role(FM_CB *fm_cb, PCS_RDA_ROLE role) 
> {
> "cluster?");
> opensaf_quick_reboot("A controller is already active. We were 
> separated "
>  "from the cluster?");
> +  return NCSCC_RC_FAILURE;
>   }
> }
>   

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 2/2] rded: do not send SUCCESS to main thread [#3008]

2019-02-20 Thread Hans Nordebäck
Hi Gary,

a question, why was the return's added? /BR HansN

On 2/19/19 05:10, Gary Lee wrote:
> do not send RDE_MSG_ACTIVE_PROMOTION_SUCCESS to
> main thread if lock cannot be obtained
> ---
>   src/rde/rded/role.cc | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
> index 06e93c6..3effc25 100644
> --- a/src/rde/rded/role.cc
> +++ b/src/rde/rded/role.cc
> @@ -114,6 +114,7 @@ void Role::PromoteNode(const uint64_t cluster_size,
>   LOG_ER("Unable to set active controller in consensus service");
>   opensaf_quick_reboot("Unable to set active controller "
>   "in consensus service");
> +return;
> }
>   
> RDE_CONTROL_BLOCK* cb = rde_get_control_block();
> @@ -135,6 +136,7 @@ void Role::PromoteNode(const uint64_t cluster_size,
>   LOG_ER("Unable to set active controller in consensus service");
>   opensaf_quick_reboot("Unable to set active controller in "
>   "consensus service");
> +return;
> }
> std::this_thread::sleep_for(std::chrono::seconds(1));
>   }

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] clm: Incorrect encode/decode time_super [#3007]

2019-02-20 Thread Hans Nordebäck
Hi Thanh,

ack, review only/Thanks HansN

On 2/20/19 06:19, Thanh Nguyen wrote:
> Changing ecoding of time_super using 64 bit instead of 32 bit.
> ---
>   src/clm/clmd/clms_mds.cc | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/clm/clmd/clms_mds.cc b/src/clm/clmd/clms_mds.cc
> index 833d18c..5a77885 100644
> --- a/src/clm/clmd/clms_mds.cc
> +++ b/src/clm/clmd/clms_mds.cc
> @@ -542,7 +542,7 @@ static uint32_t clms_enc_track_cbk_msg(NCS_UBAID *uba, 
> CLMSV_MSG *msg) {
>   TRACE("p8 nullptr!!!");
>   return 0;
> }
> -  ncs_encode_32bit(, track->time_super);
> +  ncs_encode_64bit(, track->time_super);
> ncs_enc_claim_space(uba, 8);
> total_bytes += 8;
>   

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] osaf: Call opensaf_quick_reboot if failed to set active role in consensus [#3001]

2019-02-15 Thread Hans Nordebäck
Hi Minh,

ack, review only/Thanks HansN

On 2/15/19 10:51, Minh Chau wrote:
> ---
>   src/fm/fmd/fm_rda.cc | 4 ++--
>   src/rde/rded/rde_main.cc | 8 +++-
>   src/rde/rded/role.cc | 8 
>   3 files changed, 9 insertions(+), 11 deletions(-)
>
> diff --git a/src/fm/fmd/fm_rda.cc b/src/fm/fmd/fm_rda.cc
> index 028bfa3..0aa5a3d 100644
> --- a/src/fm/fmd/fm_rda.cc
> +++ b/src/fm/fmd/fm_rda.cc
> @@ -97,8 +97,8 @@ uint32_t fm_rda_set_role(FM_CB *fm_cb, PCS_RDA_ROLE role) {
>   rc = consensus_service.PromoteThisNode(true, fm_cb->cluster_size);
>   if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) {
> LOG_ER("Unable to set active controller in consensus service");
> -  opensaf_reboot(0, nullptr,
> - "Unable to set active controller in consensus service");
> +  opensaf_quick_reboot("Unable to set active controller"
> +  "in consensus service");
>   } else if (rc == SA_AIS_ERR_EXIST) {
> // @todo if we don't reboot, we don't seem to recover from this. Can 
> we
> // improve?
> diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
> index bb17133..3487f0b 100644
> --- a/src/rde/rded/rde_main.cc
> +++ b/src/rde/rded/rde_main.cc
> @@ -203,9 +203,8 @@ static void handle_mbx_event() {
>   if (state == Consensus::TakeoverState::ACCEPTED) {
> LOG_NO("Accepted takeover request");
> if (consensus_service.IsRemoteFencingEnabled() == false) {
> -opensaf_reboot(0, nullptr,
> -   "Another controller is taking over the active 
> role. "
> -   "Rebooting this node");
> +opensaf_quick_reboot("Another controller is taking over"
> +"the active role. Rebooting this node");
> }
>   } else if (state == Consensus::TakeoverState::UNDEFINED) {
> bool fencing_required = true;
> @@ -233,8 +232,7 @@ static void handle_mbx_event() {
> if (fencing_required == true) {
>   LOG_NO("Lost connectivity to consensus service");
>   if (consensus_service.IsRemoteFencingEnabled() == false) {
> -opensaf_reboot(0, nullptr,
> -   "Lost connectivity to consensus service. "
> +opensaf_quick_reboot("Lost connectivity to consensus 
> service. "
>  "Rebooting this node");
>   }
> }
> diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
> index 499f7c8..b2b9b49 100644
> --- a/src/rde/rded/role.cc
> +++ b/src/rde/rded/role.cc
> @@ -112,8 +112,8 @@ void Role::PromoteNode(const uint64_t cluster_size,
>   promotion_pending = true;
> } else if (rc != SA_AIS_OK) {
>   LOG_ER("Unable to set active controller in consensus service");
> -opensaf_reboot(0, nullptr,
> -   "Unable to set active controller in consensus service");
> +opensaf_quick_reboot("Unable to set active controller"
> +"in consensus service");
> }
>   
> RDE_CONTROL_BLOCK* cb = rde_get_control_block();
> @@ -133,8 +133,8 @@ void Role::PromoteNode(const uint64_t cluster_size,
> rc = consensus_service.PromoteThisNode(true, cluster_size);
> if (rc == SA_AIS_ERR_EXIST) {
>   LOG_ER("Unable to set active controller in consensus service");
> -opensaf_reboot(0, nullptr,
> -   "Unable to set active controller in consensus 
> service");
> +opensaf_quick_reboot("Unable to set active controller in"
> +"consensus service");
> }
> std::this_thread::sleep_for(std::chrono::seconds(1));
>   }

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] osaf: do quick local node reboot when split network [#3001]

2019-01-29 Thread Hans Nordebäck
Hi Vu, see my comment below/Hans

On 1/28/19 10:59, Vu Minh Nguyen wrote:
> Hi Hans,
>
> Thanks for your comments. See my comment inline. Thanks
>
> Regards, Vu
>
>> -Original Message-
>> From: Hans Nordebäck 
>> Sent: Monday, January 28, 2019 4:37 PM
>> To: Hans Nordebäck ; Vu Minh Nguyen
>> ; Gary Lee ;
>> Minh Hon Chau 
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [devel] [PATCH 1/1] osaf: do quick local node reboot when
> split
>> network [#3001]
>>
>> Hi Vu,
>> See one more comment below/Thanks HansN
>>
>> -Original Message-
>> From: Hans Nordebäck 
>> Sent: den 28 januari 2019 10:15
>> To: Vu Minh Nguyen ; Gary Lee
>> ; Minh Hon Chau 
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [devel] [PATCH 1/1] osaf: do quick local node reboot when
> split
>> network [#3001]
>>
>> Hi Vu, ack review only. Two comments below/Thanks HansN
>>
>> On 1/25/19 12:34, Vu Minh Nguyen wrote:
>>> ---
>>>scripts/opensaf_reboot   | 33 +++--
>>>src/amf/amfd/ndproc.cc   |  4 ++--
>>>src/base/ncssysf_def.h   |  6 ++
>>>src/base/sysf_def.c  | 10 ++
>>>src/fm/fmd/fm_main.cc|  6 +++---
>>>src/fm/fmd/fm_rda.cc |  5 ++---
>>>src/rde/rded/rde_main.cc |  6 ++
>>>7 files changed, 52 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot index
>>> 727272e1d..2f7a7daeb 100644
>>> --- a/scripts/opensaf_reboot
>>> +++ b/scripts/opensaf_reboot
>>> @@ -31,7 +31,7 @@ export
>> LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
>>># Node fencing: OpenSAF cannot reboot a node when there's no CLM
>> node to
>>># PLM EE mapping in the information model. In such cases rebooting
>>> would be done -# through proprietary mechanisms, i.e. not through PLM.
>>> Node_id is (the only
>>> +# through proprietary mechanisms, i.e. not through PLM. Node_id is
>>> +(the only
>>># entity) at the disposal of such a mechanism.
>>>
>>>if [ -f "$pkgsysconfdir/fmd.conf" ]; then @@ -81,7 +81,6 @@
>>> opensaf_reboot_with_remote_fencing()
>>>#if plm exists in the system,then the reboot is performed using the
>> eename.
>>>opensaf_reboot_with_plm()
>>>{
>>> -
>>>immadm -o 7 $ee_name
>>>retval=$?
>>>if [ $retval != 0 ]; then
>>> @@ -96,12 +95,29 @@ opensaf_reboot_with_plm()
>>>logger -t "opensaf_reboot" "abrupt restart failed for $ee_name: unable
> to
>> restart remote node"
>>>exit 1
>>>fi
>>> -fi
>>> +fi
>>>fi
>>>#Note: Operation Id SA_PLM_ADMIN_RESTART=7
>>>#In the example the $ee_name would expand to (for eg:-)
>> safEE=my_linux_os,safHE=64bitmulticore,safDomain=my_domain
>>>}
>>>
>>> +# Force local node reboot as fast as possible
>>> +quick_local_node_reboot()
>>> +{
>>> +logger -t "opensaf_reboot" "Do quick local node reboot"
>> [HansN] perhaps reuse the same logic as in sysf_def.c, i.e. use the sysrq
> as
>> fallback and use a short timeout
> [Vu]
> Forcing node reboot by touching /proc/sysrq-trigger is not allowed on
> containers such as LXC
> (as container is immutable), therefore I provided 02 more alternatives below
> in case the first try is failed.
[HansN] preferable to only run the sysrq if the reboot fails, i.e. the 
same logic as in sysf_def.c, see the SIGALRM and supervision_time.
>>> +
>>> +$icmd /bin/echo -n 'b' 2> /dev/null > /proc/sysrq-trigger
>> [HansN] if not run as root, i.e. icmd is sudo, I think you need to use
>> cmd: /bin/echo -n 'b' | $icmd tee /proc/sysrq-trigger , please check
>> [HansN] or $icmd  /bin/sh -c "/bin/echo -n 'b' 2> /dev/null > /proc/sysrq-
>> trigger"
> [Vu] Thanks for your suggestion. Will update accordingly.
>>> +ret_code=$?
>>> +
>>> +if [ $ret_code != 0 ] && [ -x /bin/systemctl ]; then
>>> +$icmd /bin/systemctl --force --force reboot
>>> +ret_code=$?
>>> +fi
>>> +
>>> +if [ $ret_code != 0 ]; then
>>> +$icmd /sbin/reboot -f
>>> +fi
>>> +}
>>>
>>>if ! test -f "$NODE_ID_FILE"; then
>>>logger -t "opensaf_reboot" "$NODE_ID_FILE doesnt exists,reboot failed
> &quo

Re: [devel] [PATCH 1/1] osaf: do quick local node reboot when split network [#3001]

2019-01-28 Thread Hans Nordebäck
Hi Vu, 
See one more comment below/Thanks HansN

-Original Message-
From: Hans Nordebäck  
Sent: den 28 januari 2019 10:15
To: Vu Minh Nguyen ; Gary Lee 
; Minh Hon Chau 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] osaf: do quick local node reboot when split 
network [#3001]

Hi Vu, ack review only. Two comments below/Thanks HansN

On 1/25/19 12:34, Vu Minh Nguyen wrote:
> ---
>   scripts/opensaf_reboot   | 33 +++--
>   src/amf/amfd/ndproc.cc   |  4 ++--
>   src/base/ncssysf_def.h   |  6 ++
>   src/base/sysf_def.c  | 10 ++
>   src/fm/fmd/fm_main.cc|  6 +++---
>   src/fm/fmd/fm_rda.cc |  5 ++---
>   src/rde/rded/rde_main.cc |  6 ++
>   7 files changed, 52 insertions(+), 18 deletions(-)
>
> diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot index 
> 727272e1d..2f7a7daeb 100644
> --- a/scripts/opensaf_reboot
> +++ b/scripts/opensaf_reboot
> @@ -31,7 +31,7 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
>   
>   # Node fencing: OpenSAF cannot reboot a node when there's no CLM node to
>   # PLM EE mapping in the information model. In such cases rebooting 
> would be done -# through proprietary mechanisms, i.e. not through PLM. 
> Node_id is (the only
> +# through proprietary mechanisms, i.e. not through PLM. Node_id is 
> +(the only
>   # entity) at the disposal of such a mechanism.
>   
>   if [ -f "$pkgsysconfdir/fmd.conf" ]; then @@ -81,7 +81,6 @@ 
> opensaf_reboot_with_remote_fencing()
>   #if plm exists in the system,then the reboot is performed using the eename.
>   opensaf_reboot_with_plm()
>   {
> -
>   immadm -o 7 $ee_name
>   retval=$?
>   if [ $retval != 0 ]; then
> @@ -96,12 +95,29 @@ opensaf_reboot_with_plm()
>   logger -t "opensaf_reboot" "abrupt restart 
> failed for $ee_name: unable to restart remote node"
>   exit 1
>   fi
> - fi
> + fi
>   fi
>   #Note: Operation Id SA_PLM_ADMIN_RESTART=7
>   #In the example the $ee_name would expand to (for eg:-) 
> safEE=my_linux_os,safHE=64bitmulticore,safDomain=my_domain
>   }
>   
> +# Force local node reboot as fast as possible
> +quick_local_node_reboot()
> +{
> + logger -t "opensaf_reboot" "Do quick local node reboot"
[HansN] perhaps reuse the same logic as in sysf_def.c, i.e. use the sysrq as 
fallback and use a short timeout
> +
> + $icmd /bin/echo -n 'b' 2> /dev/null > /proc/sysrq-trigger
[HansN] if not run as root, i.e. icmd is sudo, I think you need to use
cmd: /bin/echo -n 'b' | $icmd tee /proc/sysrq-trigger , please check
[HansN] or $icmd  /bin/sh -c "/bin/echo -n 'b' 2> /dev/null > 
/proc/sysrq-trigger"
> + ret_code=$?
> +
> + if [ $ret_code != 0 ] && [ -x /bin/systemctl ]; then
> + $icmd /bin/systemctl --force --force reboot
> + ret_code=$?
> + fi
> +
> + if [ $ret_code != 0 ]; then
> + $icmd /sbin/reboot -f
> + fi
> +}
>   
>   if ! test -f "$NODE_ID_FILE"; then
>   logger -t "opensaf_reboot" "$NODE_ID_FILE doesnt exists,reboot failed "
> @@ -112,8 +128,13 @@ temp_node_id=`cat "$NODE_ID_FILE"`
>   temp_node_id=`echo "$temp_node_id" |sed -e 's:^0[bBxX]::'| sed -e 's:^:0x:'`
>   self_node_id=`printf "%d" $temp_node_id`
>   
> -# If clm cluster reboot requested argument one and two are set but 
> not used, argument 3 is set to 1, "safe reboot" request -if [ 
> "$safe_reboot" = 1 ]; then
> +# If no argument is provided, forcing node reboot immediately without 
> +log # flushing, process terminating, disk un-mounting.
> +# If clm cluster reboot requested argument one and two are set but 
> +not used, # argument 3 is set to 1, "safe reboot" request.
> +if [ "$#" = 0 ]; then
> + quick_local_node_reboot
> +elif [ "$safe_reboot" = 1 ]; then
>   opensaf_safe_reboot
>   else
>   # A node ID of zero(0) means an order to reboot the local node @@ 
> -165,7 +186,7 @@ else
>   logger -t "opensaf_reboot" "Not 
> rebooting remote node $ee_name as it is not in INSTANTIATED state"
>   elif [ $plm_node_state != 2 ]; then
>   opensaf_reboot_with_plm
> - else
> + else
>   logger -t "opensaf_reboot" "Not 
> rebooting remote node $ee_name as it is already in locked state"
&g

Re: [devel] [PATCH 1/2] osaf: update etcd v2 plugin [#3003]

2019-01-28 Thread Hans Nordebäck
ack, review only/Thanks HansN

On 1/24/19 02:17, Gary Lee wrote:
> 'etcdctl watch' will return if connection to the etcd server is lost.
> If that occurs, send a 'fake' takeover request to rded so rded
> will reboot the node. This is in alignment with the etcd v3 plugin.
> ---
>   src/osaf/consensus/plugins/etcd.plugin | 29 +
>   1 file changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/src/osaf/consensus/plugins/etcd.plugin 
> b/src/osaf/consensus/plugins/etcd.plugin
> index f62cc89..f88a7e7 100644
> --- a/src/osaf/consensus/plugins/etcd.plugin
> +++ b/src/osaf/consensus/plugins/etcd.plugin
> @@ -17,7 +17,9 @@
>   # backward compatible. This plugin may need to be adapted.
>   
>   readonly keyname="opensaf_consensus_lock"
> +readonly takeover_request="takeover_request"
>   readonly directory="/opensaf/"
> +readonly node_name_file="/etc/opensaf/node_name"
>   readonly etcd_options="--no-sync"
>   readonly etcd_timeout="5s"
>   
> @@ -27,7 +29,8 @@ readonly etcd_timeout="5s"
>   #   $1 - 
>   # returns:
>   #   0 - success,  is echoed to stdout
> -#   non-zero - failure
> +#   1 - invalid param
> +#   other - failure
>   get() {
> readonly key="$1"
>   
> @@ -36,7 +39,7 @@ get() {
>   echo "$value"
>   return 0
> else
> -return 1
> +return 2
> fi
>   }
>   
> @@ -73,7 +76,8 @@ setkey() {
>   # returns:
>   #   0 - success
>   #   1 - already exists
> -#   2 or above - other failure
> +#   2 - invalid param
> +#   3 or above - other failure
>   create_key() {
> readonly key="$1"
> readonly value="$2"
> @@ -90,7 +94,7 @@ create_key() {
>   fi
> fi
>   
> -  return 2
> +  return 3
>   }
>   
>   # set
> @@ -103,7 +107,8 @@ create_key() {
>   #   $4 - 
>   # returns:
>   #   0 - success
> -#   non-zero - failure
> +#   1 - invalid param
> +#   other - failure
>   setkey_match_prev() {
> readonly key="$1"
> readonly value="$2"
> @@ -115,7 +120,7 @@ setkey_match_prev() {
> then
>   return 0
> else
> -return 1
> +return 2
> fi
>   }
>   
> @@ -158,7 +163,8 @@ lock() {
>   return 0
> fi
>   
> -  if current_owner=$(etcdctl get "$directory$keyname")
> +  if current_owner=$(etcdctl $etcd_options --timeout $etcd_timeout \
> +get "$directory$keyname")
> then
>   # see if we already hold the lock
>   if [ "$current_owner" = "$owner" ]; then
> @@ -252,7 +258,14 @@ watch() {
>   echo "$value"
>   return 0
> else
> -return 1
> +# etcd down?
> +if [ "$key" = "$takeover_request" ]; then
> +  hostname=`cat $node_name_file`
> +  echo "$hostname SC-0 1000 UNDEFINED"
> +  return 0
> +else
> +  return 1
> +fi
> fi
>   }
>   

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 2/2] osaf: update sample plugin [#3003]

2019-01-27 Thread Hans Nordebäck
ack, code review only/Thanks HansN

On 1/24/19 02:17, Gary Lee wrote:
> ---
>   src/osaf/consensus/plugins/sample.plugin | 20 +++-
>   1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/src/osaf/consensus/plugins/sample.plugin 
> b/src/osaf/consensus/plugins/sample.plugin
> index fc4c54c..cadb9e0 100644
> --- a/src/osaf/consensus/plugins/sample.plugin
> +++ b/src/osaf/consensus/plugins/sample.plugin
> @@ -17,6 +17,8 @@
>   # backward compatible.
>   
>   readonly keyname="opensaf_consensus_lock"
> +readonly takeover_request="takeover_request"
> +readonly node_name_file="/etc/opensaf/node_name"
>   
>   # get
>   #   retrieve  of  from key-value store
> @@ -24,7 +26,8 @@ readonly keyname="opensaf_consensus_lock"
>   #   $1 - 
>   # returns:
>   #   0 - success,  is echoed to stdout
> -#   non-zero - failure
> +#   1 - invalid param
> +#   other - failure
>   get() {
> readonly key="$1"
> ...
> @@ -56,7 +59,8 @@ setkey() {
>   # returns:
>   #   0 - success
>   #   1 - already exists
> -#   2 or above - other failure
> +#   2 - invalid param
> +#   3 or above - other failure
>   create_key() {
> readonly key="$1"
> readonly value="$2"
> @@ -74,7 +78,8 @@ create_key() {
>   #   $4 - 
>   # returns:
>   #   0 - success
> -#   non-zero - failure
> +#   1 - invalid param
> +#   other - failure
>   setkey_match_prev() {
> readonly key="$1"
> readonly value="$2"
> @@ -101,7 +106,8 @@ erase() {
>   #   $2 - , will automatically unlock after  seconds
>   # returns:
>   #   0 - success
> -#   non-zero - failure
> +#   1 - the lock is owned by someone else
> +#   2 or above - other failure
>   lock() {
> readonly owner="$1"
> readonly timeout="$2"
> @@ -129,7 +135,7 @@ lock_owner() {
>   # returns:
>   #   0 - success
>   #   1 - the lock is owned by someone else
> -#   2 or above - other failure#
> +#   2 or above - other failure
>   unlock() {
> readonly owner="$1"
> readonly forced=${2:-false}
> @@ -146,6 +152,10 @@ unlock() {
>   watch() {
> readonly key="$1"
> ..
> +  # if  is $takeover_request and we have lost connectivity to the
> +  # consensus service, a fake takeover_request can be returned to force
> +  # rded to fence this node. Eg:
> +  # "$hostname SC-0 1000 UNDEFINED"
>   }
>   
>   # argument parsing

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 5/5] rded: add relaxed node promotion feature [#2996]

2019-01-22 Thread Hans Nordebäck
ack, review only, one question below/Thanks HansN

On 1/21/19 04:52, Gary Lee wrote:
> Allow promotion of node to active at cluster startup, even if the
> consensus service is unavailable, if the peer SC can be seen.
>
> During normal cluster operation, if the consensus service becomes
> unavailable but the peer SC can still be seen, allow the existing
> active SC to remain active.
>
> A new NCSMDS_SVC_ID_RDE_DISCOVERY service ID is exported by rded.
> This is installed as soon as rded is started, unlike
> NCSMDS_SVC_ID_RDE which is only installed when it becomes
> a candidate for election.
> ---
>   src/mds/mds_papi.h   |  1 +
>   src/rde/rded/rde_cb.h| 12 +-
>   src/rde/rded/rde_main.cc | 71 +++
>   src/rde/rded/rde_mds.cc  | 94 --
>   src/rde/rded/role.cc | 97 
> +++-
>   src/rde/rded/role.h  |  4 +-
>   6 files changed, 256 insertions(+), 23 deletions(-)
>
> diff --git a/src/mds/mds_papi.h b/src/mds/mds_papi.h
> index 03d755d..7cd543c 100644
> --- a/src/mds/mds_papi.h
> +++ b/src/mds/mds_papi.h
> @@ -191,6 +191,7 @@ typedef enum ncsmds_svc_id {
> NCSMDS_SVC_ID_PLMS = 37,
> NCSMDS_SVC_ID_PLMS_HRB = 38,
> NCSMDS_SVC_ID_PLMA = 39,
> +  NCSMDS_SVC_ID_RDE_DISCOVERY = 40,
> NCSMDS_SVC_ID_NCSMAX, /* This mnemonic always last */
>   
> /* The range below is for OpenSAF internal use */
> diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h
> index d3f5a24..9a0919c 100644
> --- a/src/rde/rded/rde_cb.h
> +++ b/src/rde/rded/rde_cb.h
> @@ -34,6 +34,9 @@
>**
>*/
>   
> +enum class State {kNotActive = 0, kNotActiveSeenPeer, kActiveElected,
> +  kActiveElectedSeenPeer, kActiveFailover};
> +
>   struct RDE_CONTROL_BLOCK {
> SYSF_MBX mbx;
> NCSCONTEXT task_handle;
> @@ -43,6 +46,9 @@ struct RDE_CONTROL_BLOCK {
> bool monitor_lock_thread_running{false};
> bool monitor_takeover_req_thread_running{false};
> std::set cluster_members{};
> +  // used for discovering peer controllers, regardless of their role
> +  std::set peer_controllers{};
> +  State state{State::kNotActive};
>   };
>   
>   enum RDE_MSG_TYPE {
> @@ -54,7 +60,9 @@ enum RDE_MSG_TYPE {
> RDE_MSG_NODE_UP = 6,
> RDE_MSG_NODE_DOWN = 7,
> RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8,
> -  RDE_MSG_ACTIVE_PROMOTION_SUCCESS = 9
> +  RDE_MSG_ACTIVE_PROMOTION_SUCCESS = 9,
> +  RDE_MSG_CONTROLLER_UP = 10,
> +  RDE_MSG_CONTROLLER_DOWN = 11
>   };
>   
>   struct rde_peer_info {
> @@ -82,7 +90,9 @@ extern const char *rde_msg_name[];
>   
>   extern RDE_CONTROL_BLOCK *rde_get_control_block();
>   extern uint32_t rde_mds_register();
> +extern uint32_t rde_discovery_mds_register();
>   extern uint32_t rde_mds_unregister();
> +extern uint32_t rde_discovery_mds_unregister();
>   extern uint32_t rde_mds_send(rde_msg *msg, MDS_DEST to_dest);
>   extern uint32_t rde_set_role(PCS_RDA_ROLE role);
>   
> diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
> index e5813e4..2d9aa51 100644
> --- a/src/rde/rded/rde_main.cc
> +++ b/src/rde/rded/rde_main.cc
> @@ -39,6 +39,7 @@
>   #include "osaf/consensus/consensus.h"
>   #include "rde/rded/rde_cb.h"
>   #include "rde/rded/role.h"
> +#include "rde_cb.h"
>   
>   #define RDA_MAX_CLIENTS 32
>   
> @@ -56,7 +57,9 @@ const char *rde_msg_name[] = {"-",
> "RDE_MSG_NODE_UP(6)",
> "RDE_MSG_NODE_DOWN(7)",
> "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)",
> -  "RDE_MSG_ACTIVE_PROMOTION_SUCCESS(9)"};
> +  "RDE_MSG_ACTIVE_PROMOTION_SUCCESS(9)",
> +  "RDE_MSG_CONTROLLER_UP(10)",
> +  "RDE_MSG_CONTROLLER_DOWN(11)"};
>   
>   static RDE_CONTROL_BLOCK _rde_cb;
>   static RDE_CONTROL_BLOCK *rde_cb = &_rde_cb;
> @@ -157,6 +160,23 @@ static void handle_mbx_event() {
> rde_cb->cluster_members.erase(msg->fr_node_id);
> TRACE("cluster_size %zu", rde_cb->cluster_members.size());
> break;
> +case RDE_MSG_CONTROLLER_UP:
> +  if (msg->fr_node_id != own_node_id) {
> +rde_cb->peer_controllers.insert(msg->fr_node_id);
> +TRACE("peer_controllers: size %zu", rde_cb->peer_controllers.size());
> +if (rde_cb->state == State::kNotActive) {
> +  TRACE("Set state to kNotActiveSeenPeer");
> +  rde_cb->state = State::kNotActiveSeenPeer;
> +} else if (rde_cb->state == State::kActiveElected) {
> +  TRACE("Set state to kActiveElectedSeenPeer");
> +  rde_cb->state = State::kActiveElectedSeenPeer;
> +}
> +  }
> +  break;
> +case RDE_MSG_CONTROLLER_DOWN:
> +  rde_cb->peer_controllers.erase(msg->fr_node_id);
> +  TRACE("peer_controllers: size %zu", rde_cb->peer_controllers.size());
> +  break;
>   case 

Re: [devel] [PATCH 2/5] fmd: add configuration parameters [#2996]

2019-01-21 Thread Hans Nordebäck
ack, review only/Thanks HansN

On 1/21/19 04:52, Gary Lee wrote:
> Add parameters FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE and
> FMS_RELAXED_NODE_PROMOTION.
> ---
>   src/fm/fmd/fmd.conf | 17 +
>   1 file changed, 17 insertions(+)
>
> diff --git a/src/fm/fmd/fmd.conf b/src/fm/fmd/fmd.conf
> index 9a106bf..209e484 100644
> --- a/src/fm/fmd/fmd.conf
> +++ b/src/fm/fmd/fmd.conf
> @@ -30,6 +30,23 @@ export FMS_TAKEOVER_REQUEST_VALID_TIME=20
>   # Full path to key-value store plugin
>   #export FMS_KEYVALUE_STORE_PLUGIN_CMD=
>   
> +# In the event of SCs being split into network partitions, we can try to make
> +# the active SC reside in the largest network partition. If it is preferable
> +# to keep the current SC active, then set this to 0
> +# Default is 1
> +#export FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE=1
> +
> +# Default behaviour is not to allow promotion of this node to Active
> +# unless a lock can be obtained, if split brain prevention is enabled.
> +# Uncomment the next line to allow promotion of this node at cluster startup,
> +# if a peer SC can be seen and we have a lower node ID, in the event the
> +# consensus service is not available.
> +# Also if the consensus service is down, but a peer SC can be seen,
> +# then an active SC may remain active.
> +# This mode should not be used together with the roaming SC feature
> +# Default is 0
> +#export FMS_RELAXED_NODE_PROMOTION=0
> +
>   # FM will supervise transitions to the ACTIVE role when this variable is 
> set to
>   # a non-zero value. The value is the time in the unit of 10 ms to wait for a
>   # role change to ACTIVE to take effect. If AMF has not give FM an active

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/5] osaf: update etcd3 to poll instead of watch [#2996]

2019-01-21 Thread Hans Nordebäck
ack, review only/Thanks HansN

On 1/21/19 04:52, Gary Lee wrote:
> The 'watch' command does not return if the etcd server goes down.
> We need to poll the etcd server to properly check we still have
> connectivity to the etcd server.
> ---
>   src/osaf/consensus/plugins/etcd3.plugin | 50 
> ++---
>   1 file changed, 40 insertions(+), 10 deletions(-)
>
> diff --git a/src/osaf/consensus/plugins/etcd3.plugin 
> b/src/osaf/consensus/plugins/etcd3.plugin
> index b3814c9..4998df0 100644
> --- a/src/osaf/consensus/plugins/etcd3.plugin
> +++ b/src/osaf/consensus/plugins/etcd3.plugin
> @@ -17,9 +17,12 @@
>   # backward compatible. This plugin may need to be adapted.
>   
>   readonly keyname="opensaf_consensus_lock"
> +readonly takeover_request="takeover_request"
> +readonly node_name_file="/etc/opensaf/node_name"
>   readonly directory="/opensaf/"
>   readonly etcd_options=""
> -readonly etcd_timeout="10s"
> +readonly etcd_timeout="3s"
> +readonly heartbeat_interval=2
>   
>   export ETCDCTL_API=3
>   
> @@ -29,7 +32,8 @@ export ETCDCTL_API=3
>   #   $1 - 
>   # returns:
>   #   0 - success,  is echoed to stdout
> -#   non-zero - failure
> +#   1 - invalid param
> +#   other - failure
>   get() {
> readonly key="$1"
>   
> @@ -51,7 +55,7 @@ get() {
> return 1
>   fi
> else
> -return 1
> +return 2
> fi
>   }
>   
> @@ -101,7 +105,8 @@ setkey() {
>   # returns:
>   #   0 - success
>   #   1 - already exists
> -#   2 or above - other failure
> +#   2 - invalid param
> +#   3 or above - other failure
>   create_key() {
> readonly key="$1"
> readonly value="$2"
> @@ -114,7 +119,7 @@ create_key() {
> lease_id=$(echo $output | awk '{print $2}')
> lease_param="--lease="$lease_id""
>   else
> -  return 2
> +  return 3
>   fi
> else
>   lease_param=""
> @@ -135,7 +140,7 @@ create_key() {
> then
>   return 1
> else
> -return 2
> +return 3
> fi
>   }
>   
> @@ -149,6 +154,7 @@ create_key() {
>   #   $4 - 
>   # returns:
>   #   0 - success
> +#   1 - invalid param
>   #   non-zero - failure
>   setkey_match_prev() {
> readonly key="$1"
> @@ -326,10 +332,34 @@ unlock() {
>   #   non-zero - failure
>   watch() {
> readonly watch_key="$1"
> -  etcdctl $etcd_options --dial-timeout $etcd_timeout \
> -watch "$directory$watch_key" | grep -m0 \"\" 2>&1
> -  get "$watch_key"
> -  return 0
> +
> +  # get baseline
> +  orig_value=$(get "$watch_key")
> +  result=$?
> +
> +  if [ "$result" -le "1" ]; then
> +while true
> +do
> +  sleep $heartbeat_interval
> +  current_value=$(get "$watch_key")
> +  result=$?
> +  if [ "$result" -gt "1" ]; then
> +# etcd down?
> +if [ "$watch_key" == "$takeover_request" ]; then
> +  hostname=`cat $node_name_file`
> +  echo "$hostname SC-0 1000 UNDEFINED"
> +  return 0
> +else
> +  return 1
> +fi
> +  elif [ "$orig_value" != "$current_value" ]; then
> +echo $current_value
> +return 0
> +  fi
> +done
> +  fi
> +
> +  return 1
>   }
>   
>   # argument parsing

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amfd: Only start clm track for 2N Opensaf SU in failover [#2980]

2018-12-11 Thread Hans Nordebäck
Hi Minh,

ack, review only/Thanks HansN

On 12/10/18 07:00, Minh Chau wrote:
> ---
>   src/amf/amfd/sg_2n_fsm.cc | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/amf/amfd/sg_2n_fsm.cc b/src/amf/amfd/sg_2n_fsm.cc
> index a218786..91ffc63 100644
> --- a/src/amf/amfd/sg_2n_fsm.cc
> +++ b/src/amf/amfd/sg_2n_fsm.cc
> @@ -1784,7 +1784,8 @@ uint32_t SG_2N::susi_success_sg_realign(AVD_SU *su, 
> AVD_SU_SI_REL *susi,
> }
>   
> if ((state == SA_AMF_HA_ACTIVE) &&
> -  (cb->node_id_avd == su->su_on_node->node_info.nodeId)) {
> +  (cb->node_id_avd == su->su_on_node->node_info.nodeId) &&
> +  (su->sg_of_su->sg_ncs_spec == true)) {
>   /* This is as a result of failover, start CLM tracking*/
>   if (avd_clm_track_start(cb) == SA_AIS_ERR_TRY_AGAIN)
> Fifo::queue(new ClmTrackStart());

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amfd: Update the assignment counters after restore absent assignment from imm [#2977]

2018-12-11 Thread Hans Nordebäck
Hi Minh,

ack, code review only/Thanks HansN

On 12/3/18 04:29, Minh Chau wrote:
> AMF performs headless recovery by syncing the assignments from AMFND(s) and
> re-create them in AMFD's db and IMM. Next step, AMFD compares the assignment
> objects from IMM and from AMFND(s) to figure out the on-going assignments
> that have been left over before headless and failover them, the assignments
> states/counters are also restored in this step. If all payloads come from
> headless without occurence of network split (legacy headless), IMM db in all
> payloads should be consistent, thus AMFD creates the IMM assignments normally
> without any problem. But if the payloads come from headless and there was a
> network split before, IMM appears often busy at the time AMFD creates the
> synced assignments in IMM. The assignment object creation is pending in the
> queue and executed later, but AMFD has missed to restore the assignment states
> and counters of the synced assignments at the time comparision between IMM
> and AMFND(s).
> Also in legacy headless, when both SCs go down, the assignment objects are
> still in IMM. Even IMM is busy, AMFD has not missed the counter updates.
>
> The patch moves the counter update after restoring absent assignment from IMM.
> ---
>   src/amf/amfd/siass.cc | 67 
> +--
>   1 file changed, 38 insertions(+), 29 deletions(-)
>
> diff --git a/src/amf/amfd/siass.cc b/src/amf/amfd/siass.cc
> index ffde7b1..8a2d217 100644
> --- a/src/amf/amfd/siass.cc
> +++ b/src/amf/amfd/siass.cc
> @@ -264,14 +264,48 @@ void avd_susi_read_headless_cached_rta(AVD_CL_CB *cb) {
> }
>   
>   #endif
> +} else {  // For ABSENT SUSI
> +  TRACE("Check absent SUSI, ha_state:'%u', fsm_state:'%u'", imm_ha_state,
> +imm_susi_fsm);
> +  if (avd_susi_validate_absent_assignment(su, si,
> +  imm_ha_state, imm_susi_fsm) == false) {
> +avd_saImmOiRtObjectDelete(Amf::to_string());
> +continue;
> +  }
> +  absent_susi = avd_susi_create(avd_cb, si, su, imm_ha_state, false,
> +  AVSV_SUSI_ACT_BASE);
> +  // Restore the fsm of this absent SUSI, which is used to determine
> +  // whether a SU should be added in SG's SUOperationList
> +  // Memorize it in temporary var @absent
> +  // The fsm of this SUSI will be changed to AVD_SU_SI_STATE_ABSENT
> +  // after restoring SUOperationList
> +  absent_susi->fsm = imm_susi_fsm;
> +  absent_susi->absent = true;
> +  if (absent_susi->si->saAmfSIAdminState == SA_AMF_ADMIN_LOCKED ||
> +  absent_susi->si->saAmfSIAdminState == SA_AMF_ADMIN_SHUTTING_DOWN) {
> +if (absent_susi->fsm == AVD_SU_SI_STATE_MODIFY &&
> +(absent_susi->state == SA_AMF_HA_QUIESCED ||
> +absent_susi->state == SA_AMF_HA_QUIESCING)) {
> +  m_AVD_SET_SG_ADMIN_SI(cb, si);
> +}
> +  }
> +}
> +  }
> +  (void)immutil_saImmOmSearchFinalize(searchHandle);
> +
> +  // Update all PRESENT SUSI, in case that a SUSI is missed to update because
> +  // it is not present in IMM
> +  for (const auto  : *su_db) {
> +AVD_SU *su = value.second;
> +susi = su->list_of_susi;
> +while (susi != nullptr && susi->absent == false) {
> +  AVD_SI *si = susi->si;
> // validate SUSI assignments that are over assigned
> if (avd_susi_validate_excessive_assignment(susi) == true) {
>   susi->fsm = AVD_SU_SI_STATE_EXCESSIVE;
> }
> -
> // Checkpoint to add this SUSI
> m_AVSV_SEND_CKPT_UPDT_ASYNC_ADD(avd_cb, susi, AVSV_CKPT_AVD_SI_ASS);
> -
> // restore assignment counter
> if (susi->fsm == AVD_SU_SI_STATE_ASGN ||
> susi->fsm == AVD_SU_SI_STATE_ASGND ||
> @@ -296,36 +330,11 @@ void avd_susi_read_headless_cached_rta(AVD_CL_CB *cb) {
> // only restore if not done
> if (susi->su->su_on_node->admin_ng == nullptr)
>   avd_ng_restore_headless_states(cb, susi);
> -} else {  // For ABSENT SUSI
> -  TRACE("Check absent SUSI, ha_state:'%u', fsm_state:'%u'", imm_ha_state,
> -imm_susi_fsm);
> -  if (avd_susi_validate_absent_assignment(su, si,
> -  imm_ha_state, imm_susi_fsm) == false) {
> -avd_saImmOiRtObjectDelete(Amf::to_string());
> -continue;
> -  }
> -  absent_susi = avd_susi_create(avd_cb, si, su, imm_ha_state, false,
> -  AVSV_SUSI_ACT_BASE);
> -  // Restore the fsm of this absent SUSI, which is used to determine
> -  // whether a SU should be added in SG's SUOperationList
> -  // Memorize it in temporary var @absent
> -  // The fsm of this SUSI will be changed to AVD_SU_SI_STATE_ABSENT
> -  // after restoring SUOperationList
> -  absent_susi->fsm = imm_susi_fsm;
> -  absent_susi->absent = true;
> -  if (absent_susi->si->saAmfSIAdminState == SA_AMF_ADMIN_LOCKED ||
> -  absent_susi->si->saAmfSIAdminState == 

Re: [devel] [PATCH 0/4] Review Request for clm: add new test cases in clm apitest [#2914]

2018-11-09 Thread Hans Nordebäck
Hi Mohan,

ack, review only/Thanks HansN

On 11/6/18 11:19, Mohan Kanakam wrote:
> Summary: clm: add new test case of API saClmClusterNotificationFree_4() of 
> apitest [#2914]
> Review request for Ticket(s): 2914-1
> Peer Reviewer(s):Anders, Hans, Ravi
> Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
> Affected branch(es): develop
> Development branch: ticket-2914-1
> Base revision: f8a6848a1cdbff0b518c3db951e4689e260226c7
> Personal repository: git://git.code.sf.net/u/mohan-hasoln/review
>
> 
> Impacted area   Impact y/n
> 
>   Docsn
>   Build systemn
>   RPM/packaging   n
>   Configuration files n
>   Startup scripts n
>   SAF servicesn
>   OpenSAF servicesn
>   Core libraries  n
>   Samples n
>   Tests   y
>   Other   n
>
>
> Comments (indicate scope for each "y" above):
> -
> *** EXPLAIN/COMMENT THE PATCH SERIES HERE ***
>
> revision 9a6bf69135567cdceded5cd89b0a22138b7c116a
> Author:   Mohan Kanakam 
> Date: Tue, 6 Nov 2018 15:41:15 +0530
>
> clm: add new test case in saClmResponse_4() of apitest [#2914]
>
>
>
> revision afd96329f2b2cd6211e0dae90a6e46881d140092
> Author:   Mohan Kanakam 
> Date: Tue, 6 Nov 2018 15:15:32 +0530
>
> clm: add new test case saClmClusterNodeGetAsync() of apitest [#2914]
>
>
>
> revision 71b52f9c79477fe8ee567123b68ebf52cd6ee433
> Author:   Mohan Kanakam 
> Date: Tue, 6 Nov 2018 15:05:34 +0530
>
> clm: add new test case of API saClmClusterNodeGet_4() of apitest [#2914]
>
>
>
> revision eda88ade8b2d87ba657465ef17d73cb553082551
> Author:   Mohan Kanakam 
> Date: Tue, 6 Nov 2018 14:39:02 +0530
>
> clm: add new test case of API saClmClusterNotificationFree_4() of apitest 
> [#2914]
>
>
>
> Complete diffstat:
> --
>   src/clm/apitest/tet_saClmClusterNodeGet.cc  | 21 
> +
>   src/clm/apitest/tet_saClmClusterNodeGetAsync.cc | 19 +++
>   src/clm/apitest/tet_saClmClusterNotificationFree.cc | 14 ++
>   src/clm/apitest/tet_saClmResponse.cc|  9 +
>   4 files changed, 63 insertions(+)
>
>
> Testing Commands:
> -
> ./clmtest
>
> Testing, Expected Results:
> --
> 5  PASSED   saClmClusterNotificationFree with finalized handle
> 10  PASSED/PASSEDsaClmClusterNodeGet & saClmClusterNodeGet_4 with 
> Finalized handle
> 10  PASSED/PASSEDsaClmClusterNodeGetAsync with finalized handle
> 6  PASSED   saClmResponse with Finalized handle
>
>
> Conditions of Submission:
> -
> Ack from mainatiners
>
> Arch  Built StartedLinux distro
> ---
> mipsn  n
> mips64  n  n
> x86 n  n
> x86_64  y  y
> powerpc n  n
> powerpc64   n  n
>
>
> Reviewer Checklist:
> ---
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank entries
>  that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
>  (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
>  Demonstrate some level of runtime or other sanity testing.
>
> ___ You have ^M present in some of your files. These have to be removed.
>
> ___ You have needlessly changed whitespace or added whitespace crimes
>  like trailing spaces, or spaces before tabs.
>
> ___ You have mixed real technical changes with whitespace and other
>  cosmetic code cleanup changes. These have to be separate commits.
>
> ___ You need to refactor your submission into logical chunks; there is
>  too much content into a single commit.
>
> ___ You have extraneous garbage in your review (merge commits etc)
>
> ___ You have giant attachments which should never have been sent;
>  Instead you should place your content in a public tree to be pulled.
>
> ___ You have too many commits attached to an e-mail; resend as threaded
>  commits, or place in a public tree for a pull.
>
> ___ You have resent this content multiple times without a clear indication
>  of what has changed between each re-send.
>
> ___ You have failed to adequately and individually 

Re: [devel] [PATCH 1/1] osaf: Set sticky bit for socket and pipe files [#2953]

2018-11-09 Thread Hans Nordebäck
Hi Minh,

the "sticky" bit here is in fact  the "restricted deletion bit", it is 
used on directories,

e.g. the /tmp directory where several users have r/w access but when the 
't' bit is

set only the file owners may delete its files. It should not be set on 
files only directories

and I don't think it is need here. /Thanks HansN

On 11/5/18 09:56, Minh Anh Du wrote:
> There are files, sockets and pipes have world writable permission,
> but only root user and owner should be able to create/delete
> these files. Sticky bit should be set for these sockets and pipes
> for security reason.
> ---
>   src/base/daemon.c   | 2 +-
>   src/base/osaf_secutil.c | 2 +-
>   src/dtm/transport/log_server.cc | 2 +-
>   src/nid/agent/nid_ipc.c | 2 +-
>   4 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/src/base/daemon.c b/src/base/daemon.c
> index cdde7fd..50ddc50 100644
> --- a/src/base/daemon.c
> +++ b/src/base/daemon.c
> @@ -162,7 +162,7 @@ static void create_fifofile(const char *fifofile)
>   
>   mask = umask(0);
>   
> - if (mkfifo(fifofile, 0666) == -1) {
> + if (mkfifo(fifofile, 01666) == -1) {
>   if (errno == EEXIST) {
>   syslog(LOG_INFO, "mkfifo already exists: %s %s",
>  fifofile, strerror(errno));
> diff --git a/src/base/osaf_secutil.c b/src/base/osaf_secutil.c
> index 0e175c9..71e512a 100644
> --- a/src/base/osaf_secutil.c
> +++ b/src/base/osaf_secutil.c
> @@ -147,7 +147,7 @@ static int server_sock_create(const char *pathname)
>   }
>   
>   /* Connecting to the socket object requires read/write permission. */
> - if (chmod(pathname, 0777) == -1) {
> + if (chmod(pathname, 01777) == -1) {
>   LOG_ER("%s: chmod failed - %s", __FUNCTION__, strerror(errno));
>   return -1;
>   }
> diff --git a/src/dtm/transport/log_server.cc b/src/dtm/transport/log_server.cc
> index bef1f07..866fe59 100644
> --- a/src/dtm/transport/log_server.cc
> +++ b/src/dtm/transport/log_server.cc
> @@ -35,7 +35,7 @@ LogServer::LogServer(int term_fd)
> max_backups_{9},
> max_file_size_{5 * 1024 * 1024},
> log_socket_{Osaflog::kServerSocketPath, 
> base::UnixSocket::kNonblocking,
> -  0777},
> +  01777},
> log_streams_{},
> current_stream_{new LogStream{kMdsLogStreamName, 1, 5 * 1024 * 1024}},
> no_of_log_streams_{1} {
> diff --git a/src/nid/agent/nid_ipc.c b/src/nid/agent/nid_ipc.c
> index 172063a..eae8de3 100644
> --- a/src/nid/agent/nid_ipc.c
> +++ b/src/nid/agent/nid_ipc.c
> @@ -66,7 +66,7 @@ uint32_t nid_create_ipc(char *strbuf)
>   mask = umask(0);
>   
>   /* Create nid fifo */
> - if (mkfifo(NID_FIFO, 0666) < 0) {
> + if (mkfifo(NID_FIFO, 01666) < 0) {
>   sprintf(strbuf, " FAILURE: Unable To Create FIFO Error:%s\n",
>   strerror(errno));
>   umask(mask);

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] mds: Send NCSMDS_DOWN with vdest if there is no any adest [#2941]

2018-11-02 Thread Hans Nordebäck
Hi Minh,

ack, code review and mdstest run. One minor comment below. /Thanks HansN

On 10/25/18 04:40, Minh Chau wrote:
> If split brain happens and network merges back, at this point in time
> there are a few mds events coming to payloads, which are the SVC UP
> from the other controller; SVC down from services in both controllers
> due to reboot from split brain detection.
> In the ticket description, the first partition includes SC1, PL3,
> the second partition includes SC2, PL4, PL5. The amfnd on PL3 is
> missing NCSMDS_DOWN with vdest in the below scenario:
>
> - SVC up event from the other amfd (on SC2)
> - SVC down event from amfd (SC1), it's the same active adest from
> mds-PL3's view, start await_active timer, but no NCSMDS_DOWN with
> vdest is sent because the adest on SC2 exists.
> - SVC down event from amfd (SC2), it's different active adest.
>
> Because the payloads reside in different partitions so they don't
> have the same active adest view at mds level. When both SCs go down
> due to split brain detection, the same SVC down events occur and
> comes to all payloads, but they have different view so they behave
> differently to the payloads in the other partition.
>
> The patch adds an additional condition to send NCSMDS_DOWN if there is
> no actual adest existed
> ---
>   src/mds/mds_c_api.c | 80 
> ++---
>   1 file changed, 46 insertions(+), 34 deletions(-)
>
> diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c
> index f5ba318..73849cc 100644
> --- a/src/mds/mds_c_api.c
> +++ b/src/mds/mds_c_api.c
> @@ -3644,13 +3644,58 @@ uint32_t mds_mcm_svc_down(PW_ENV_ID pwe_id, 
> MDS_SVC_ID svc_id, V_DEST_RL role,
>   local_svc_hdl, svc_id, vdest_id,
>   _adest, _running,
>   _result_info, true);
> -
[HansN] is this log message informative/needed?
> + m_MDS_LOG_INFO("MCM:API: svc_down: "
> +   "active_adest:%lu", active_adest);
>   /* First delete the entry */
>   mds_subtn_res_tbl_del(
>   local_svc_hdl, svc_id, vdest_id, adest,
>   vdest_policy, svc_sub_part_ver,
>   archword_type);
>   
> + MDS_SUBSCRIPTION_RESULTS_INFO *s_info = NULL;
> + bool adest_exists = false;
> +
> + /* if no adest remains for this svc
> +  * send MDS_DOWN
> +  */
> + status = mds_subtn_res_tbl_getnext_any(
> + local_svc_hdl, svc_id,
> + _info);
> +
> + while (status != NCSCC_RC_FAILURE) {
> + if (s_info->key.vdest_id !=
> + m_VDEST_ID_FOR_ADEST_ENTRY) {
> + adest_exists = true;
> + break;
> + }
> +
> + status = mds_subtn_res_tbl_getnext_any(
> + local_svc_hdl, svc_id, _info);
> + }
> +
> + if (active_adest != adest
> +   && vdest_policy == NCS_VDEST_TYPE_MxN
> + && adest_exists == false) {
> + m_MDS_LOG_INFO("MCM:API: svc_down : "
> + "svc_id = %s(%d) on DEST id = 
> %d "
> + "got NO_ACTIVE for svc_id = 
> %s(%d) "
> +"on Vdest id = %d Adest = %s, rem_svc_pvt_ver=%d",
> + get_svc_names(
> + 
> m_MDS_GET_SVC_ID_FROM_SVC_HDL(local_svc_hdl)),
> + m_MDS_GET_SVC_ID_FROM_SVC_HDL(
> + local_svc_hdl),
> + m_MDS_GET_VDEST_ID_FROM_SVC_HDL(
> + local_svc_hdl),
> + get_svc_names(svc_id), svc_id,
> + vdest_id,
> + 
> log_subtn_result_info->sub_adest_details,
> + svc_sub_part_ver);
> + status = mds_mcm_user_event_callback(
> +   local_svc_hdl, pwe_id, svc_id,
> +   role, vdest_id, 0, NCSMDS_DOWN,
> + 

Re: [devel] [PATCH 1/1] amfd: reset snd_msg_id in LostFound state [#2952]

2018-11-02 Thread Hans Nordebäck
Ack, review only/Thanks HansN

Från: Gary Lee 
Skickat: den 2 november 2018 06:23:09
Till: nagendra @ hasolutions . in; Minh Hon Chau; Hans Nordebäck
Kopia: opensaf-devel@lists.sourceforge.net; Gary Lee
Ämne: [PATCH 1/1] amfd: reset snd_msg_id in LostFound state [#2952]

If a PL rejoins the main network partition before the node failover timer 
expires,
it is told to reboot by AMFD. AMFND thinks it has become headless and
resets rcv_msg_id to 0, and shows this when it receives the reboot msg from 
AMFD:

Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: Message ID 
mismatch, rec xx, expected 1, OwnNodeId = xx, SupervisionTime = 60

We can avoid this by resetting snd_msg_id for this PL in AMFD in state 
LostFound,
before the reboot msg is sent.
---
 src/amf/amfd/node_state.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/amf/amfd/node_state.cc b/src/amf/amfd/node_state.cc
index a8659dcf7..787ddab94 100644
--- a/src/amf/amfd/node_state.cc
+++ b/src/amf/amfd/node_state.cc
@@ -126,6 +126,11 @@ void LostFound::TimerExpired() {
   node->node_name.c_str());

   if (fsm_->Active() == true) {
+// amfnd thinks it's been headless and resets its rcv_msg_id to 0,
+// also do the same here to avoid 'Message ID mismatch' errors
+// at amfnd
+node->snd_msg_id = 0;
+
 LOG_WA("Sending node reboot order");
 avd_d2n_reboot_snd(node);

--
2.17.1


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] imm: fix osafimmnd coredump genereted during sanity test [#2947]

2018-11-01 Thread Hans Nordebäck
Ack, review only/Thanks HansN

-Original Message-
From: Vu Minh Nguyen  
Sent: den 29 oktober 2018 10:15
To: Hans Nordebäck ; Lennart Lund 
; Gary Lee 
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 1/1] imm: fix osafimmnd coredump genereted during sanity test 
[#2947]

The coredump is generated in the context of processing the message type 
"IMMND_EVT_D2ND_IMPLDELETE" because the memory is corrupted at the time of 
decoding that message.

It allocated 'size' bytes of memory with the boundary in range [0 - 'size - 
1'], but modified - added null terminated, the memory at the index of `size` 
which was out of that range.

This patch fixes such issue. The memory should be allocated with `size + 1` 
bytes in length.
---
 src/imm/common/immsv_evt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/imm/common/immsv_evt.c b/src/imm/common/immsv_evt.c index 
03a7f8125..c93f82a0f 100644
--- a/src/imm/common/immsv_evt.c
+++ b/src/imm/common/immsv_evt.c
@@ -2898,7 +2898,7 @@ static uint32_t immsv_evt_dec_sublevels(NCS_UBAID *i_ub, 
IMMSV_EVT *o_evt)
implNameList[i].size = ncs_decode_32bit();
ncs_dec_skip_space(i_ub, 4);
 
-   implNameList[i].buf = (char 
*)malloc(implNameList[i].size);
+   implNameList[i].buf = (char 
*)malloc(implNameList[i].size + 1);
if (implNameList[i].buf == NULL ||

ncs_decode_n_octets_from_uba(i_ub,
(uint8_t 
*)implNameList[i].buf,
--
2.18.0



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/4] amfd: add class definitions for new timers [#2918]

2018-10-31 Thread Hans Nordebäck
ack, code review only/Thanks HansN

On 10/24/18 14:26, Gary Lee wrote:
> osafAmfDelayNodeFailoverTimeout - the number of seconds we wait
> after MDS down is received before we consider it truly down.
>
> osafAmfDelayNodeFailoverNodeUpWait - the number of seconds we
> wait for Node Up after receving MDS up, before we send reboot
> to the node. After sending  reboot to a node, also wait up to
> this number of seconds before we consider the node to be
> down (unless MDs down is received first).
> ---
>   src/amf/config/amf_classes.xml | 14 +-
>   src/amf/config/amf_objects.xml |  8 
>   2 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/src/amf/config/amf_classes.xml b/src/amf/config/amf_classes.xml
> index df5cbd92a..182bd97e5 100644
> --- a/src/amf/config/amf_classes.xml
> +++ b/src/amf/config/amf_classes.xml
> @@ -1452,5 +1452,17 @@
>   SA_CONFIG
>   SA_WRITABLE
>   
> - 
> + 
> + osafAmfDelayNodeFailoverTimeout
> + SA_TIME_T
> + SA_CONFIG
> + SA_WRITABLE
> + 
> + 
> + osafAmfDelayNodeFailoverNodeUpWait
> + SA_TIME_T
> + SA_CONFIG
> + SA_WRITABLE
> + 
> +
>   
> diff --git a/src/amf/config/amf_objects.xml b/src/amf/config/amf_objects.xml
> index 6ed68d83d..c008c7520 100644
> --- a/src/amf/config/amf_objects.xml
> +++ b/src/amf/config/amf_objects.xml
> @@ -6,6 +6,14 @@
>   osafAmfRestrictAutoRepairEnable
>   1
>   
> + 
> + osafAmfDelayNodeFailoverTimeout
> + 0
> + 
> + 
> + osafAmfDelayNodeFailoverNodeUpWait
> + 180
> + 
>   
>   
>   safAppType=OpenSafApplicationType

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 3/4] amfd: add checkpointing of node failover state [#2918]

2018-10-31 Thread Hans Nordebäck
ack, review only/Thanks HansN

On 10/24/18 14:26, Gary Lee wrote:
> ---
>   src/amf/amfd/chkop.cc| 10 ++
>   src/amf/amfd/ckpt.h  |  3 ++-
>   src/amf/amfd/ckpt_dec.cc | 40 +++-
>   src/amf/amfd/ckpt_enc.cc | 26 --
>   src/amf/amfd/ckpt_msg.h  |  1 +
>   5 files changed, 76 insertions(+), 4 deletions(-)
>
> diff --git a/src/amf/amfd/chkop.cc b/src/amf/amfd/chkop.cc
> index 1ba4140c7..e9a68f4cd 100644
> --- a/src/amf/amfd/chkop.cc
> +++ b/src/amf/amfd/chkop.cc
> @@ -1042,6 +1042,16 @@ uint32_t avsv_send_ckpt_data(AVD_CL_CB *cb, uint32_t 
> action,
>   return NCSCC_RC_SUCCESS;
> }
> break;
> +case AVSV_CKPT_NODE_FAILOVER_STATE:
> +  if ((avd_cb->other_avd_adest != 0) &&
> +  (avd_cb->avd_peer_ver < AVD_MBCSV_SUB_PART_VERSION_9)) {
> +TRACE(
> +"No ckpt for AVSV_CKPT_NODE_FAILOVER_STATE as peer AMFD has"
> +" lower version:%d",
> +avd_cb->avd_peer_ver);
> +return NCSCC_RC_SUCCESS;
> +  }
> +  break;
>   default:
> return NCSCC_RC_SUCCESS;
> }
> diff --git a/src/amf/amfd/ckpt.h b/src/amf/amfd/ckpt.h
> index c006f9a69..875776a21 100644
> --- a/src/amf/amfd/ckpt.h
> +++ b/src/amf/amfd/ckpt.h
> @@ -35,9 +35,10 @@
>   #define AMF_AMFD_CKPT_H_
>   
>   // current version
> -#define AVD_MBCSV_SUB_PART_VERSION 8
> +#define AVD_MBCSV_SUB_PART_VERSION 9
>   
>   // supported versions
> +#define AVD_MBCSV_SUB_PART_VERSION_9 9
>   #define AVD_MBCSV_SUB_PART_VERSION_8 8
>   #define AVD_MBCSV_SUB_PART_VERSION_7 7
>   #define AVD_MBCSV_SUB_PART_VERSION_6 6
> diff --git a/src/amf/amfd/ckpt_dec.cc b/src/amf/amfd/ckpt_dec.cc
> index 9f3949a15..022fa8f4b 100644
> --- a/src/amf/amfd/ckpt_dec.cc
> +++ b/src/amf/amfd/ckpt_dec.cc
> @@ -49,6 +49,7 @@ static uint32_t dec_oper_su(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC 
> *dec);
>   static uint32_t dec_node_up_info(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC *dec);
>   static uint32_t dec_node_admin_state(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC *dec);
>   static uint32_t dec_node_oper_state(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC *dec);
> +static uint32_t dec_node_failover_state(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC 
> *dec);
>   static uint32_t dec_node_state(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC *dec);
>   static uint32_t dec_node_rcv_msg_id(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC *dec);
>   static uint32_t dec_node_snd_msg_id(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC *dec);
> @@ -160,7 +161,8 @@ const AVSV_DECODE_CKPT_DATA_FUNC_PTR 
> avd_dec_data_func_list[] = {
>   dec_comp_curr_num_csi_stby, dec_comp_oper_state, 
> dec_comp_readiness_state,
>   dec_comp_pres_state, dec_comp_restart_count, nullptr, /* 
> AVSV_SYNC_COMMIT */
>   dec_su_restart_count, dec_si_dep_state, dec_ng_admin_state,
> -dec_avd_to_avd_job_queue_status
> +dec_avd_to_avd_job_queue_status,
> +dec_node_failover_state
>   
>   };
>   
> @@ -2958,3 +2960,39 @@ static uint32_t 
> dec_avd_to_avd_job_queue_status(AVD_CL_CB *cb,
> TRACE_LEAVE();
> return NCSCC_RC_SUCCESS;
>   }
> +
> +static uint32_t dec_node_failover_state(AVD_CL_CB *cb, NCS_MBCSV_CB_DEC 
> *dec) {
> +  TRACE_ENTER();
> +
> +  uint32_t state;
> +  SaNameT name;
> +
> +  osaf_decode_sanamet(>i_uba, );
> +  const std::string node_name(Amf::to_string());
> +  osaf_extended_name_free();
> +
> +  AVD_AVND* node;
> +  node = avd_node_get(node_name);
> +
> +  if (node == nullptr) {
> +LOG_ER("%s: node not found, nodeid=%s", __FUNCTION__, node_name.c_str());
> +return NCSCC_RC_FAILURE;
> +  }
> +
> +  osaf_decode_uint32(>i_uba,
> + reinterpret_cast());
> +
> +  auto failed_node = cb->failover_list.find(node->node_info.nodeId);
> +  if (failed_node != cb->failover_list.end()) {
> +failed_node->second->SetState(state);
> +  } else {
> +LOG_NO("Node '%s' not found in failover_list. Create new entry",
> +node->node_name.c_str());
> +auto new_node = std::make_shared(cb,
> +  node->node_info.nodeId);
> +new_node->SetState(state);
> +cb->failover_list[node->node_info.nodeId] = new_node;
> +  }
> +
> +  return NCSCC_RC_SUCCESS;
> +}
> \ No newline at end of file
> diff --git a/src/amf/amfd/ckpt_enc.cc b/src/amf/amfd/ckpt_enc.cc
> index 0a2d73698..0e675aed5 100644
> --- a/src/amf/amfd/ckpt_enc.cc
> +++ b/src/amf/amfd/ckpt_enc.cc
> @@ -48,6 +48,7 @@ static uint32_t enc_oper_su(AVD_CL_CB *cb, NCS_MBCSV_CB_ENC 
> *enc);
>   static uint32_t enc_node_up_info(AVD_CL_CB *cb, NCS_MBCSV_CB_ENC *enc);
>   static uint32_t enc_node_admin_state(AVD_CL_CB *cb, NCS_MBCSV_CB_ENC *enc);
>   static uint32_t enc_node_oper_state(AVD_CL_CB *cb, NCS_MBCSV_CB_ENC *enc);
> +static uint32_t enc_node_failover_state(AVD_CL_CB *cb, NCS_MBCSV_CB_ENC 
> *enc);
>   static uint32_t enc_node_state(AVD_CL_CB *cb, NCS_MBCSV_CB_ENC *enc);
>   static uint32_t enc_node_rcv_msg_id(AVD_CL_CB *cb, NCS_MBCSV_CB_ENC *enc);
>   static uint32_t enc_node_snd_msg_id(AVD_CL_CB *cb, 

Re: [devel] [PATCH 2/4] amfnd: allow reboot from any director [#2918]

2018-10-31 Thread Hans Nordebäck
ack, review only/Thanks HansN

On 10/24/18 14:26, Gary Lee wrote:
> allow reboot msg to be sent from any director, for
> split brain recovery situations
> ---
>   src/amf/amfnd/mds.cc | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/amf/amfnd/mds.cc b/src/amf/amfnd/mds.cc
> index 1ee24cf5b..d179ff40e 100644
> --- a/src/amf/amfnd/mds.cc
> +++ b/src/amf/amfnd/mds.cc
> @@ -328,7 +328,8 @@ uint32_t avnd_mds_rcv(AVND_CB *cb, 
> MDS_CALLBACK_RECEIVE_INFO *rcv_info) {
>  * from any other anchor than Active (except for HB message).
>  */
> if ((rcv_info->i_fr_dest != cb->active_avd_adest) &&
> -  (msg.info.avd->msg_type != AVSV_D2N_HEARTBEAT_MSG)) {
> +  (msg.info.avd->msg_type != AVSV_D2N_HEARTBEAT_MSG) &&
> +  (msg.info.avd->msg_type != AVSV_D2N_REBOOT_MSG)) {
>   LOG_ER("Received dest: %" PRIu64 " and cb active AVD adest:%" PRIu64
>  " mismatch, message type = %u",
>  rcv_info->i_fr_dest, cb->active_avd_adest,

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/4] amfd: add class definitions for new timers [#2918]

2018-10-31 Thread Hans Nordebäck
ack, review only/Thanks HansN

On 10/24/18 14:26, Gary Lee wrote:
> osafAmfDelayNodeFailoverTimeout - the number of seconds we wait
> after MDS down is received before we consider it truly down.
>
> osafAmfDelayNodeFailoverNodeUpWait - the number of seconds we
> wait for Node Up after receving MDS up, before we send reboot
> to the node. After sending  reboot to a node, also wait up to
> this number of seconds before we consider the node to be
> down (unless MDs down is received first).
> ---
>   src/amf/config/amf_classes.xml | 14 +-
>   src/amf/config/amf_objects.xml |  8 
>   2 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/src/amf/config/amf_classes.xml b/src/amf/config/amf_classes.xml
> index df5cbd92a..182bd97e5 100644
> --- a/src/amf/config/amf_classes.xml
> +++ b/src/amf/config/amf_classes.xml
> @@ -1452,5 +1452,17 @@
>   SA_CONFIG
>   SA_WRITABLE
>   
> - 
> + 
> + osafAmfDelayNodeFailoverTimeout
> + SA_TIME_T
> + SA_CONFIG
> + SA_WRITABLE
> + 
> + 
> + osafAmfDelayNodeFailoverNodeUpWait
> + SA_TIME_T
> + SA_CONFIG
> + SA_WRITABLE
> + 
> +
>   
> diff --git a/src/amf/config/amf_objects.xml b/src/amf/config/amf_objects.xml
> index 6ed68d83d..c008c7520 100644
> --- a/src/amf/config/amf_objects.xml
> +++ b/src/amf/config/amf_objects.xml
> @@ -6,6 +6,14 @@
>   osafAmfRestrictAutoRepairEnable
>   1
>   
> + 
> + osafAmfDelayNodeFailoverTimeout
> + 0
> + 
> + 
> + osafAmfDelayNodeFailoverNodeUpWait
> + 180
> + 
>   
>   
>   safAppType=OpenSafApplicationType

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] imm: add an admin operation to regenerate db from memory [#2940]

2018-10-26 Thread Hans Nordebäck
Hi Vu,

ack, code review only.

/BR HansN

On 10/19/18 05:58, Vu Minh Nguyen wrote:
> After split-brain recovery, there is possibility of having inconsistencies
> between IMM data model in memory held by IMMND and one in the back-end
> database (sqlite).
>
> That could happen as we might have 02 active IMMDs, 02 IMMND coordinators
> and more than one PBE processes accessing a shared pbe database.
> Change to such database from one therefore might not get noticed by the other.
>
> By using this admin operation ID and targeting to IMM, IMM will regenerate the
> back-end database from one in memory to keep them both consistent.
>
> immadm -o 303 safRdn=immManagement,safApp=safImmService
> ---
>   src/imm/README | 18 ++
>   src/imm/common/immsv_api.h |  4 +++-
>   src/imm/immnd/ImmModel.cc  | 22 +-
>   src/imm/immnd/ImmModel.h   |  3 ++-
>   src/imm/immnd/immnd_init.h |  2 ++
>   src/imm/immnd/immnd_proc.c |  8 ++--
>   6 files changed, 52 insertions(+), 5 deletions(-)
>
> diff --git a/src/imm/README b/src/imm/README
> index 750d811a5..71e5c4fe3 100644
> --- a/src/imm/README
> +++ b/src/imm/README
> @@ -3033,6 +3033,24 @@ expires.
>   To be possible to use this new feature, bit 10 must be set in
>   opensafImmNostdFlags attribute in IMM object.
>   
> +
> +Provide an admin-operation for re-generating backend database from one in RAM
> +=
> +https://sourceforge.net/p/opensaf/tickets/2940/
> +
> +After split-brain recovery, there is possibility of having inconsistencies
> +between IMM data model in memory held by IMMND and one in the back-end
> +database (sqlite).
> +
> +That could happen as we might have 02 active IMMDs, 02 IMMND coordinators
> +and more than one PBE processes accessing a shared pbe database.
> +Change to such database from one therefore might not get noticed by the 
> other.
> +
> +By using this admin operation ID and targeting to IMM, IMM will regenerate 
> the
> +back-end database from one in memory to keep them both consistent.
> +
> +immadm -o 303 safRdn=immManagement,safApp=safImmService
> +
>   
>   DEPENDENCIES
>   
> diff --git a/src/imm/common/immsv_api.h b/src/imm/common/immsv_api.h
> index 32fc5738e..e6d613705 100644
> --- a/src/imm/common/immsv_api.h
> +++ b/src/imm/common/immsv_api.h
> @@ -157,7 +157,9 @@ typedef enum {
>   typedef enum {
> SA_IMM_ADMIN_EXPORT = 1, /* Defined in A.02.01 declared in  A.03.01 */
> SA_IMM_ADMIN_INIT_FROM_FILE = 100, /* Non standard, force PBE disable. */
> -  SA_IMM_ADMIN_ABORT_CCBS = 202 /* Non standard, abort non critical CCBs. */
> +  SA_IMM_ADMIN_ABORT_CCBS = 202, /* Non standard, abort non critical CCBs. */
> +  /* Non standard, regenerate pbe database from RAM */
> +  SA_IMM_ADMIN_REGENERATE_PBE_DB = 303
>   } SaImmMngtAdminOperationT;
>   
>   /*
> diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc
> index 21f48ab59..8e3f338dc 100644
> --- a/src/imm/immnd/ImmModel.cc
> +++ b/src/imm/immnd/ImmModel.cc
> @@ -596,6 +596,7 @@ static const std::string 
> saImmRepositoryInit("saImmRepositoryInit");
>   static const std::string saImmOiTimeout("saImmOiTimeout");
>   
>   static SaImmRepositoryInitModeT immInitMode = SA_IMM_INIT_FROM_FILE;
> +static bool sRegenerateDb = false;
>   
>   static SaUint32T sCcbIdLongDnGuard =
>   0; /* Disallow long DN additions if longDnsAllowed is being changed in 
> ccb*/
> @@ -2003,6 +2004,14 @@ void immModel_setLoader(IMMND_CB* cb, SaInt32T 
> loaderPid) {
> ImmModel::instance(>immModel)->setLoader(loaderPid);
>   }
>   
> +void immModel_setRegenerateDbFlag(IMMND_CB* cb, bool value) {
> +  ImmModel::instance(>immModel)->setRegenerateDbFlag(value);
> +}
> +
> +bool immModel_getRegenerateDbFlag(IMMND_CB* cb) {
> +  return ImmModel::instance(>immModel)->getRegenerateDbFlag();
> +}
> +
>   void immModel_recognizedIsolated(IMMND_CB* cb) {
> ImmModel::instance(>immModel)->recognizedIsolated();
>   }
> @@ -2901,6 +2910,14 @@ int ImmModel::adjustEpoch(int suggestedEpoch, 
> SaUint32T* continuationIdPtr,
> return suggestedEpoch;
>   }
>   
> +bool ImmModel::getRegenerateDbFlag() {
> +  return sRegenerateDb;
> +}
> +
> +void ImmModel::setRegenerateDbFlag(bool value) {
> +  sRegenerateDb = value;
> +}
> +
>   /**
>* Fetches the SaImmRepositoryInitT value of the attribute
>* 'saImmRepositoryInit' in the object immManagementDn.
> @@ -13808,6 +13825,9 @@ SaAisErrorT ImmModel::admoImmMngtObject(const 
> ImmsvOmAdminOperationInvoke* req,
> LOG_IN("sAbortNonCriticalCcbs = true;");
> sAbortNonCriticalCcbs = true;
>   }
> +  } else if (req->operationId == SA_IMM_ADMIN_REGENERATE_PBE_DB) {
> +LOG_NO("Re-generate the pbe database from one in memory.");
> +sRegenerateDb = true;
> } else {
>   LOG_NO("Invalid operation ID %llu, for operation on %s",
>  

Re: [devel] [PATCH 1/1] amfnd: change log message severity [#2945]

2018-10-25 Thread Hans Nordebäck
ack/Thanks Hans

On 10/25/18 07:28, Gary Lee wrote:
> ---
>   src/amf/amfnd/clm.cc | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/amf/amfnd/clm.cc b/src/amf/amfnd/clm.cc
> index f1f65bcef..06eb229c7 100644
> --- a/src/amf/amfnd/clm.cc
> +++ b/src/amf/amfnd/clm.cc
> @@ -124,7 +124,7 @@ static void clm_to_amf_node(void) {
>   
> error = saImmOmInitialize_cond(, nullptr, );
> if (SA_AIS_OK != error) {
> -LOG_CR("saImmOmInitialize failed. Use previous value of nodeName.");
> +LOG_WA("saImmOmInitialize failed. Use previous value of nodeName.");
>   osafassert(avnd_cb->amf_nodeName.empty() == false);
>   goto done1;
> }

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] imm: fix coredump is generated after split-brain recovery [#2942]

2018-10-19 Thread Hans Nordebäck
Hi Vu, ack review only. One minor comment below. /Thanks HansN

On 10/19/18 10:16, Vu Minh Nguyen wrote:

After split-recovery, there is possibility of having epoch counters mismatched
b/w one on IMMND veteran located at this partition and one from active IMMD on
another partition.

With that, instead of generating coredump in such case, we should syslog error
message and have the IMMND veteran self-terminated.
---
 src/imm/immnd/immnd_evt.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
index b260d43ff..bc55ea946 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -10279,22 +10279,21 @@ static uint32_t immnd_evt_proc_start_sync(IMMND_CB 
*cb, IMMND_EVT *evt,
} else {
if (cb->mMyEpoch + 1 < cb->mRulingEpoch) {
if (cb->mState > IMM_SERVER_LOADING_PENDING) {
-   LOG_WA(
-   "Imm at this node has epoch %u, "
+   LOG_ER(
+   "Imm at this node has epoch %u, rulling 
epoch %u"
"appears to be a stragler in wrong state 
%u",
-   cb->mMyEpoch, cb->mState);
-   abort();
+   cb->mMyEpoch, cb->mRulingEpoch, cb->mState);
+   exit(1);
} else {
TRACE_2(
"This nodes apparently missed start of 
sync");
}
} else {
-   osafassert(cb->mMyEpoch + 1 > cb->mRulingEpoch);
-   LOG_WA(
-   "Imm at this evs node has epoch %u, "
+   LOG_ER(
+   "Imm at this evs node has epoch %u, rulling epoch 
%u"

[HansN] perhaps the log message needs some updates, e.g. "COORDINATOR appears 
to be a straggler!!, exiting.",


"COORDINATOR appears to be a stragler!!, aborting.",
-   cb->mMyEpoch);
-   abort();
+   cb->mMyEpoch, cb->mRulingEpoch);
+   exit(1);
/* TODO: 080414 re-inserted the osafassert/abort ...
   This is an extreemely odd case. Possibly it could
   occur after a failover ?? */


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] uml: add support for plm to run under uml [#2922]

2018-09-18 Thread Hans Nordebäck
ack, review only. (Run ok with the legacy uml without plm enabled).

Minor comment, the function cmd_build/install_container_testprog() is 
part of ticket #70?

/BR HansN

On 09/17/2018 09:53 PM, Alex Jones wrote:
> Add support for plm to run under uml.
> ---
>   src/plm/config/openhpi.conf| 18 
>   tools/cluster_sim_uml/archive/scripts/40opensaf.rc | 30 +++
>   tools/cluster_sim_uml/build_uml| 95 
> --
>   3 files changed, 138 insertions(+), 5 deletions(-)
>   create mode 100644 src/plm/config/openhpi.conf
>
> diff --git a/src/plm/config/openhpi.conf b/src/plm/config/openhpi.conf
> new file mode 100644
> index 0..b811de134
> --- /dev/null
> +++ b/src/plm/config/openhpi.conf
> @@ -0,0 +1,18 @@
> +OPENHPI_AUTOINSERT_TIMEOUT = 50
> +OPENHPI_AUTOINSERT_TIMEOUT_READONLY = "NO"
> +
> +# Section for dynamic_simulator plugin
> +handler libdyn_simulator {
> +entity_root = "{ADVANCEDTCA_CHASSIS,2}"
> +# Location of the simulation data file
> +# Normally an example file is installed in the same directory as 
> openhpi.conf.
> +# Please change the following entry if you have configured another install
> +# directory or will use your own simulation.data.
> +file = "/etc/openhpi/opensaf-plm-sim.txt"
> +# infos goes to logfile and stdout
> +# the logfile are log00.log, log01.log ...
> +#logflags = "file stdout"
> +#logfile = "dynsim"
> +# if #logfile_max reached replace the oldest one
> +#logfile_max = "5"
> +}
> diff --git a/tools/cluster_sim_uml/archive/scripts/40opensaf.rc 
> b/tools/cluster_sim_uml/archive/scripts/40opensaf.rc
> index 7df4cfee6..9057d680b 100644
> --- a/tools/cluster_sim_uml/archive/scripts/40opensaf.rc
> +++ b/tools/cluster_sim_uml/archive/scripts/40opensaf.rc
> @@ -76,4 +76,34 @@ echo "$node_name" > /etc/opensaf/node_name
>   echo "/tmp/core_%t_%e_%p" > /proc/sys/kernel/core_pattern
>   ulimit -c unlimited
>   
> +if test -e /etc/plmcd.conf; then
> +sc_1_ip=$(grep "SC-1" /etc/hosts | cut -d' ' -f 1)
> +sc_2_ip=$(grep "SC-2" /etc/hosts | cut -d' ' -f 1)
> +if [ "$node_name" == "SC-1" ]; then
> +  ee="Linux_os_hosting_clm_node,safHE=f120_slot_1"
> +  path="my_entity = 
> \"{ADVANCEDTCA_CHASSIS,2}{PHYSICAL_SLOT,1}{SWITCH_BLADE,0}\""
> +elif [ "$node_name" == "SC-2" ]; then
> +  ee="Linux_os_hosting_clm_node,safHE=f120_slot_16"
> +  path="my_entity = 
> \"{ADVANCEDTCA_CHASSIS,2}{PHYSICAL_SLOT,16}{SWITCH_BLADE,0}\""
> +else
> +  ee="$node_name"
> +fi
> +sed -i -e "s/10.105.1.3/$sc_1_ip/" \
> +-e "s/10.105.1.6/$sc_2_ip/" \
> +-e "s/0020f/safEE=$ee,safDomain=domain_1/" \
> +-e "s/1;os;Fedora;2.6.31/1;os;SUSE;2.6/" \
> +-e "/^\/etc\/init.d/s/^/#/" \
> +/etc/plmcd.conf
> +cp /etc/openhpi/openhpi.conf /var/opt
> +chmod go-rwx /var/opt/openhpi.conf
> +echo "$path" > /etc/openhpi/openhpiclient.conf
> +
> +/usr/sbin/openhpid -c /var/opt/openhpi.conf
> +
> +# wait for hpi to read in hardware info
> +sleep 10
> +
> +/usr/local/sbin/plmcd&
> +fi
> +
>   /etc/init.d/opensafd start&
> diff --git a/tools/cluster_sim_uml/build_uml b/tools/cluster_sim_uml/build_uml
> index 16d49d03e..e54e45753 100755
> --- a/tools/cluster_sim_uml/build_uml
> +++ b/tools/cluster_sim_uml/build_uml
> @@ -121,6 +121,73 @@ cmd_install_testprog() {
>   cmd_mkcpio
>   }
>   
> +cmd_build_container_testprog() {
> +src=$opensaf_home/samples/amf/container
> +libd=$root/usr/local/$lib_dir
> +installd=$root/opt/amf_demo
> +
> +mkdir -p "$installd"
> +cp $src/amf_container_script $installd
> +gcc -g -O2 -Wall -fPIC -I$opensaf_home/src/amf/saf \
> +   -I$opensaf_home/src/ais/include \
> +   -DSA_EXTENDED_NAME_SOURCE \
> +   -o $installd/amf_container_demo $src/amf_container_demo.c \
> +   -Wl,--as-needed "-Wl,-rpath-link,$libd:$libd/opensaf" "-L$libd" 
> -lSaAmf -lopensaf_core
> +
> +echo "Creating [$root/root.cpio] ..."
> +cmd_mkcpio
> +}
> +
> +##   install_container_testprog
> +## Build and install the AMF container demo program.
> +##
> +cmd_install_container_testprog() {
> +src=$opensaf_home/samples/amf/container
> +libd=$root/usr/local/$lib_dir
> +installd=$root/opt/amf_demo
> +immxml=$root/etc/opensaf/imm.xml
> +containedXml=$src/AppConfig-contained-2N.xml
> +containerXml=$src/AppConfig-container.xml
> +
> +mkdir -p $installd
> +cp $src/amf_container_script $installd
> +gcc -g -O2 -Wall -fPIC -I$opensaf_home/src/amf/saf \
> +   -I$opensaf_home/src/ais/include \
> +   -DSA_EXTENDED_NAME_SOURCE \
> +   -o $installd/amf_container_demo $src/amf_container_demo.c \
> +   -Wl,--as-needed "-Wl,-rpath-link,$libd:$libd/opensaf" "-L$libd" 
> -lSaAmf
> +
> +test -r $immxml.orig || cp $immxml $immxml.orig
> +$opensaf_home/src/imm/tools/immxml-merge \
> +   $immxml.orig 

Re: [devel] [PATCH 1/1] osaf: modify log severity level in Consensus::Demote [#2912]

2018-08-17 Thread Hans Nordebäck
Ack, (code review only)/Thanks Hans

-Original Message-
From: Gary Lee  
Sent: den 17 augusti 2018 02:00
To: Hans Nordebäck ; Minh Hon Chau 
; Anders Widell 
Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
Subject: [PATCH 1/1] osaf: modify log severity level in Consensus::Demote 
[#2912]

All callers of Consensus::Demote() already log an error if the return code is 
not SA_AIS_OK.
A warning message will suffice.
---
 src/osaf/consensus/consensus.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc 
index 2a8e9bb1c..dc5c9bc46 100644
--- a/src/osaf/consensus/consensus.cc
+++ b/src/osaf/consensus/consensus.cc
@@ -142,7 +142,7 @@ SaAisErrorT Consensus::Demote(const std::string& node) {
   }
 
   if (rc != SA_AIS_OK) {
-LOG_ER("Unlock failed (%u)", rc);
+LOG_WA("Unlock failed (%u)", rc);
 return rc;
   }
 
--
2.17.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amf: remove assignment for NPI component with enable DisableRestart [#2879]

2018-08-08 Thread Hans Nordebäck
Hi Thang,
Ack, review only. I think you should keep V1,  with the comments, my only 
suggestion was to correct the misspelled "thus SU" to "this SU". /Thanks HansN 

-Original Message-
From: thang.nguyen  
Sent: den 8 augusti 2018 08:49
To: Hans Nordebäck ; Gary Lee 
; Minh Hon Chau 
Cc: opensaf-devel@lists.sourceforge.net; Thang Duc Nguyen 

Subject: [PATCH 1/1] amf: remove assignment for NPI component with enable 
DisableRestart [#2879]

With NPI component configured with saAmfCtDefDisableRestart=1. Once invoking 
restart admin op, amfnd does not remove the assignment and cause the crash.

Remove assignment before change the pres state to TERMINATION in clc.
---
 src/amf/amfnd/clc.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/amf/amfnd/clc.cc b/src/amf/amfnd/clc.cc index c8e60e6..126362b 
100644
--- a/src/amf/amfnd/clc.cc
+++ b/src/amf/amfnd/clc.cc
@@ -2217,6 +2217,10 @@ uint32_t avnd_comp_clc_inst_restart_hdler(AVND_CB *cb, 
AVND_COMP *comp) {
 /* invoke terminate callback */
 rc = avnd_comp_cbk_send(cb, comp, AVSV_AMF_COMP_TERM, 0, 0);
 else {
+  /* For NPI component with DisableRestart=1 */
+  if (m_AVND_COMP_IS_RESTART_DIS(comp) && (comp->csi_list.n_nodes > 0)) {
+su_send_suRestart_recovery_msg(comp->su);
+  }
   rc =
   avnd_comp_clc_cmd_execute(cb, comp, 
AVND_COMP_CLC_CMD_TYPE_TERMINATE);
   m_AVND_COMP_REG_PARAM_RESET(cb, comp);
--
2.7.4


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] imm: attrDefaultValue is set to NULL if no default value is given [#2901]

2018-08-06 Thread Hans Nordebäck
Hi Vu,

yes, reinterpret_cast should be avoided if possible, in this case 
static_cast is better. You can ignore my comment
about passing `attrDefaultValueBuffer` directly, it is not valid,.
/Thanks HansN

-Original Message-
From: Vu Minh Nguyen  
Sent: den 7 augusti 2018 05:33
To: Hans Nordebäck ; Lennart Lund 
; Gary Lee 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] imm: attrDefaultValue is set to NULL if no default 
value is given [#2901]

Hi Hans,

Thanks for your comments. I will update the code using static_cast.

Passing `attrDefaultValueBuffer` directly to strlen() without type-casting will 
generate a compile error because of invalid conversion from `void*` to `const 
char*`, I think.

Regards, Vu

> -Original Message-
> From: Hans Nordeback 
> Sent: Monday, August 6, 2018 7:07 PM
> To: Vu Minh Nguyen ; 
> lennart.l...@ericsson.com; gary@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1/1] imm: attrDefaultValue is set to NULL if no 
> default value is given [#2901]
> 
> Hi Vu,
> 
> ack, review only. Minor comment below./Thanks HansN
> 
> 
> On 07/30/2018 10:46 AM, Vu Minh Nguyen wrote:
> > When explicitly having  tag, but no value is given:
> > , set NULL to attrDefaultValue.
> > ---
> >   src/imm/immloadd/imm_loader.cc | 3 ++-
> >   src/imm/tools/imm_import.cc| 3 ++-
> >   2 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/imm/immloadd/imm_loader.cc
> b/src/imm/immloadd/imm_loader.cc
> > index de5a575e9..ad9785e92 100644
> > --- a/src/imm/immloadd/imm_loader.cc
> > +++ b/src/imm/immloadd/imm_loader.cc
> > @@ -1909,7 +1909,8 @@ void addClassAttributeDefinition(
> > attrDefinition.attrFlags = attrFlags;
> >
> > /* Set the default value */
> > -  if (attrDefaultValueBuffer) {
> > +  if (attrDefaultValueBuffer &&
> [HansN] use static_cast(attrDefaultValueBuffer) instead 
> or
> 
> (strlen(attrDefaultValueBuffer) > 0)) {   (instead of the reinterpret_cast, 
> not
> needed though)
> 
> > +  (strlen(reinterpret_cast(attrDefaultValueBuffer)) > 
> > + 0)) {
> >   charsToValueHelper(, attrValueType,
> >  (const char *)attrDefaultValueBuffer);
> > } else {
> > diff --git a/src/imm/tools/imm_import.cc 
> > b/src/imm/tools/imm_import.cc index e2bdcba5c..8145ec572 100644
> > --- a/src/imm/tools/imm_import.cc
> > +++ b/src/imm/tools/imm_import.cc
> > @@ -2444,7 +2444,8 @@ static void
> addClassAttributeDefinition(ParserState *state) {
> > }
> >
> > /* Set the default value */
> > -  if (state->attrDefaultValueSet) {
> > +  if (state->attrDefaultValueSet &&
> [HansN] use static_cast(attrDefaultValueBuffer) instead 
> or
> 
> (strlen(attrDefaultValueBuffer) > 0)) {   (instead of the reinterpret_cast, 
> not
> needed though)
> 
> > +  
> > + (strlen(reinterpret_cast(state->attrDefaultValueBuffer)) > 
> > + 0)) {
> >   if (charsToValueHelper(,
> >  state->attrValueType, 
> > state->attrDefaultValueBuffer,
> >  state->strictParse)) {


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amfd: remove redundant const_cast [#2907]

2018-08-01 Thread Hans Nordebäck
Ack, review only/Thanks HansN

-Original Message-
From: Gary Lee  
Sent: den 1 augusti 2018 05:47
To: Hans Nordebäck ; Minh Hon Chau 

Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
Subject: [PATCH 1/1] amfd: remove redundant const_cast [#2907]

---
 src/amf/amfd/clm.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc index 
e113a65f9..1e67ff389 100644
--- a/src/amf/amfd/clm.cc
+++ b/src/amf/amfd/clm.cc
@@ -631,7 +631,7 @@ AvdJobDequeueResultT ClmTrackStart::exec(AVD_CL_CB* cb) {
   AvdJobDequeueResultT res;
   TRACE_ENTER();
 
-  SaAisErrorT rc = avd_clm_track_start(const_cast(cb));
+  SaAisErrorT rc = avd_clm_track_start(cb);
   if (rc == SA_AIS_OK) {
 delete Fifo::dequeue();
 res = JOB_EXECUTED;
@@ -652,7 +652,7 @@ AvdJobDequeueResultT ClmTrackStop::exec(AVD_CL_CB* cb) {
   AvdJobDequeueResultT res;
   TRACE_ENTER();
 
-  SaAisErrorT rc = avd_clm_track_stop(const_cast(cb));
+  SaAisErrorT rc = avd_clm_track_stop(cb);
   if (rc == SA_AIS_OK) {
 delete Fifo::dequeue();
 res = JOB_EXECUTED;
--
2.17.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amfd: put sync jobs into queue if IMM is busy [#2863]

2018-07-31 Thread Hans Nordebäck
Hi Gary,

You can remove all const_cast's in functions where the const parameter has been 
removed, e.g. AVD_CL_CB parameter ClmTrackStart::Exec etc. /Thanks HansN 

-Original Message-
From: Gary Lee  
Sent: den 4 juli 2018 03:16
To: Minh Hon Chau ; Hans Nordebäck 

Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
Subject: [PATCH 1/1] amfd: put sync jobs into queue if IMM is busy [#2863]

---
 src/amf/amfd/cb.h   |  3 ++-
 src/amf/amfd/clm.cc |  4 ++--
 src/amf/amfd/clm.h  |  4 ++--
 src/amf/amfd/imm.cc | 33 -  src/amf/amfd/imm.h 
 | 18 +-  src/amf/amfd/ntf.cc |  2 +-
 6 files changed, 40 insertions(+), 24 deletions(-)

diff --git a/src/amf/amfd/cb.h b/src/amf/amfd/cb.h index 60bb554de..3b7e6d13f 
100644
--- a/src/amf/amfd/cb.h
+++ b/src/amf/amfd/cb.h
@@ -63,7 +63,8 @@ typedef enum {
   AVD_IMM_INIT_BASE = 1,
   AVD_IMM_INIT_ONGOING = 2,
   AVD_IMM_INIT_DONE = 3,
-  AVD_IMM_TERMINATING = 4,
+  AVD_IMM_BUSY = 4,
+  AVD_IMM_TERMINATING = 5,
 } AVD_IMM_INIT_STATUS;
 /*
  * Sync state of the Standby.
diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc index 
9d317892a..e113a65f9 100644
--- a/src/amf/amfd/clm.cc
+++ b/src/amf/amfd/clm.cc
@@ -627,7 +627,7 @@ SaAisErrorT avd_start_clm_init_bg(void) {
   return SA_AIS_OK;
 }
 
-AvdJobDequeueResultT ClmTrackStart::exec(const AVD_CL_CB* cb) {
+AvdJobDequeueResultT ClmTrackStart::exec(AVD_CL_CB* cb) {
   AvdJobDequeueResultT res;
   TRACE_ENTER();
 
@@ -648,7 +648,7 @@ AvdJobDequeueResultT ClmTrackStart::exec(const AVD_CL_CB* 
cb) {
   return res;
 }
 
-AvdJobDequeueResultT ClmTrackStop::exec(const AVD_CL_CB* cb) {
+AvdJobDequeueResultT ClmTrackStop::exec(AVD_CL_CB* cb) {
   AvdJobDequeueResultT res;
   TRACE_ENTER();
 
diff --git a/src/amf/amfd/clm.h b/src/amf/amfd/clm.h index 2bbe320f7..f4399c62e 
100644
--- a/src/amf/amfd/clm.h
+++ b/src/amf/amfd/clm.h
@@ -40,14 +40,14 @@ public:
 class ClmTrackStart : public ClmJob {
  public:
   ClmTrackStart() : ClmJob(){};
-  AvdJobDequeueResultT exec(const struct cl_cb_tag *cb);
+  AvdJobDequeueResultT exec(struct cl_cb_tag *cb);
   ~ClmTrackStart() {}
 };
 
 class ClmTrackStop : public ClmJob {
  public:
   ClmTrackStop() : ClmJob(){};
-  AvdJobDequeueResultT exec(const struct cl_cb_tag *cb);
+  AvdJobDequeueResultT exec(struct cl_cb_tag *cb);
   ~ClmTrackStop() {}
 };
 
diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc index 
60a997943..3c1a93729 100644
--- a/src/amf/amfd/imm.cc
+++ b/src/amf/amfd/imm.cc
@@ -133,7 +133,7 @@ Job::~Job() {}
 
 // TODO: Make isImmServiceReady as static to limit its scope  // This function 
should belong to AVD_CB class as a method -static bool isImmServiceReady(const 
AVD_CL_CB *cb) {
+static bool isImmServiceReady(const AVD_CL_CB *cb, bool ignore_busy = 
+false) {
   TRACE_ENTER();
   bool rc = true;
 
@@ -149,16 +149,21 @@ static bool isImmServiceReady(const AVD_CL_CB *cb) {
 TRACE("Already IMM init is going, try again after sometime");
 rc = false;
   }
+  if (avd_cb->avd_imm_status == AVD_IMM_BUSY &&
+ignore_busy == false) {
+TRACE("IMM returned TRY_AGAIN. Postponing synchronous calls");
+rc = false;
+  }
   TRACE_LEAVE2("%u:", rc);
   return rc;
 }
 
 //
 bool ImmJob::isRunnable(const AVD_CL_CB *cb) {
-  return isImmServiceReady(cb);
+  return isImmServiceReady(cb, true);
 }
 //
-AvdJobDequeueResultT ImmObjCreate::exec(const AVD_CL_CB *cb) {
+AvdJobDequeueResultT ImmObjCreate::exec(AVD_CL_CB *cb) {
   SaAisErrorT rc;
   AvdJobDequeueResultT res;
   const SaImmOiHandleT immOiHandle = cb->immOiHandle; @@ -173,6 +178,7 @@ 
AvdJobDequeueResultT ImmObjCreate::exec(const AVD_CL_CB *cb) {
   }
   rc = saImmOiRtObjectCreate_2(immOiHandle, className_, parent_name,
attrValues_);
+  cb->avd_imm_status = AVD_IMM_INIT_DONE;
 
   if ((rc == SA_AIS_OK) || (rc == SA_AIS_ERR_EXIST)) {
 delete Fifo::dequeue();
@@ -180,6 +186,7 @@ AvdJobDequeueResultT ImmObjCreate::exec(const AVD_CL_CB 
*cb) {
   } else if (rc == SA_AIS_ERR_TRY_AGAIN) {
 TRACE("TRY-AGAIN");
 res = JOB_ETRYAGAIN;
+cb->avd_imm_status = AVD_IMM_BUSY;
   } else if (rc == SA_AIS_ERR_TIMEOUT) {
 TRACE("TIMEOUT");
 res = JOB_ETRYAGAIN;
@@ -228,7 +235,7 @@ ImmObjCreate::~ImmObjCreate() {  }
 
 //
-AvdJobDequeueResultT ImmObjUpdate::exec(const AVD_CL_CB *cb) {
+AvdJobDequeueResultT ImmObjUpdate::exec(AVD_CL_CB *cb) {
   SaAisErrorT rc;
   AvdJobDequeueResultT res;
   const SaImmOiHandleT immOiHandle = cb->immOiHandle; @@ -252,6 +259,7 @@ 
AvdJobDequeueResultT ImmObjUpdate::exec(const AVD_CL_CB *cb) {
   attrMod.modAttr.attrValues = attrValues;
 
   rc = saImmOiRtObjectUpdate_o3(immOiHandle, dn.c_str(), attrMods);
+  cb->avd_imm_status = AVD_IMM_INIT_DONE;
 
   if ((rc == SA_AIS_OK) || (rc == SA_AIS_ERR_NOT_EXIST)) {
 delete Fifo::dequeue();
@@ -259,6 +267,7 @@ AvdJobDequeueResultT ImmObjUpdate::exec(const AVD_CL_CB 
*cb) {
 

Re: [devel] [PATCH 1/1] rded: run controller promotion code in new thread [#2857]

2018-05-30 Thread Hans Nordebäck
Ack /Thanks HansN

-Original Message-
From: Gary Lee  
Sent: den 24 maj 2018 07:57
To: Hans Nordebäck ; ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
Subject: [PATCH 1/1] rded: run controller promotion code in new thread [#2857]

Currently, the consensus code relating to node promotion is run from the main 
thread. We can improve rded's responsiveness by moving this code into another 
thread.
---
 src/rde/rded/rde_cb.h|  3 +-
 src/rde/rded/rde_main.cc |  6 +++-
 src/rde/rded/role.cc | 83 +++-
 src/rde/rded/role.h  |  2 ++
 4 files changed, 62 insertions(+), 32 deletions(-)

diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h index 
f5ad689c3..877687341 100644
--- a/src/rde/rded/rde_cb.h
+++ b/src/rde/rded/rde_cb.h
@@ -53,7 +53,8 @@ enum RDE_MSG_TYPE {
   RDE_MSG_NEW_ACTIVE_CALLBACK = 5,
   RDE_MSG_NODE_UP = 6,
   RDE_MSG_NODE_DOWN = 7,
-  RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8
+  RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8,  
+ RDE_MSG_ACTIVE_PROMOTION_SUCCESS = 9
 };
 
 struct rde_peer_info {
diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc index 
c5b4b8283..c59aa4536 100644
--- a/src/rde/rded/rde_main.cc
+++ b/src/rde/rded/rde_main.cc
@@ -55,7 +55,8 @@ const char *rde_msg_name[] = {"-",
   "RDE_MSG_NEW_ACTIVE_CALLBACK(5)"
   "RDE_MSG_NODE_UP(6)",
   "RDE_MSG_NODE_DOWN(7)",
-  "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)"};
+  "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)",
+  "RDE_MSG_ACTIVE_PROMOTION_SUCCESS(9)"};
 
 static RDE_CONTROL_BLOCK _rde_cb;
 static RDE_CONTROL_BLOCK *rde_cb = &_rde_cb; @@ -186,6 +187,9 @@ static void 
handle_mbx_event() {
 LOG_WA("Received takeover request when not active");
   }
 } break;
+case RDE_MSG_ACTIVE_PROMOTION_SUCCESS:
+  role->NodePromoted();
+  break;
 default:
   LOG_ER("%s: discarding unknown message type %u", __FUNCTION__, 
msg->type);
   break;
diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc index 
1b5a6ae89..a03372413 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -22,6 +22,7 @@
 #include "rde/rded/role.h"
 #include 
 #include 
+#include 
 #include "base/getenv.h"
 #include "base/logtrace.h"
 #include "base/ncs_main_papi.h"
@@ -63,6 +64,55 @@ void Role::MonitorCallback(const std::string& key, const 
std::string& new_value,
   osafassert(status == NCSCC_RC_SUCCESS);  }
 
+void Role::PromoteNode(const uint64_t cluster_size) {
+  TRACE_ENTER();
+  SaAisErrorT rc;
+
+  Consensus consensus_service;
+
+  rc = consensus_service.PromoteThisNode(true, cluster_size);  if (rc 
+ != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) {
+LOG_ER("Unable to set active controller in consensus service");
+opensaf_reboot(0, nullptr,
+   "Unable to set active controller in consensus 
+ service");  }
+
+  if (rc == SA_AIS_ERR_EXIST) {
+LOG_WA("Another controller is already active");
+return;
+  }
+
+  RDE_CONTROL_BLOCK* cb = rde_get_control_block();
+
+  // send msg to main thread
+  rde_msg* msg = static_cast(malloc(sizeof(rde_msg)));
+  msg->type = RDE_MSG_ACTIVE_PROMOTION_SUCCESS;
+  uint32_t status;
+  status = m_NCS_IPC_SEND(>mbx, msg, NCS_IPC_PRIORITY_HIGH);
+  osafassert(status == NCSCC_RC_SUCCESS); }
+
+void Role::NodePromoted() {
+  ExecutePreActiveScript();
+  LOG_NO("Switched to ACTIVE from %s", to_string(role()));
+  role_ = PCS_RDA_ACTIVE;
+  rde_rda_send_role(role_);
+
+  Consensus consensus_service;
+  RDE_CONTROL_BLOCK* cb = rde_get_control_block();
+
+  // register for callback if active controller is changed
+  // in consensus service
+  if (cb->monitor_lock_thread_running == false) {
+cb->monitor_lock_thread_running = true;
+consensus_service.MonitorLock(MonitorCallback, cb->mbx);
+  }
+  if (cb->monitor_takeover_req_thread_running == false) {
+cb->monitor_takeover_req_thread_running = true;
+consensus_service.MonitorTakeoverRequest(MonitorCallback, cb->mbx);
+  }
+}
+
 Role::Role(NODE_ID own_node_id)
 : known_nodes_{},
   role_{PCS_RDA_QUIESCED},
@@ -82,37 +132,10 @@ timespec* Role::Poll(timespec* ts) {
   *ts = election_end_time_ - now;
   timeout = ts;
 } else {
+  election_end_time_ = base::kTimespecMax;
   RDE_CONTROL_BLOCK* cb = rde_get_control_block();
-  SaAisErrorT rc;
-  Consensus consensus_service;
-
-  rc = consensus_service.PromoteThisNode(true, cb->cluster_members.size());
-  if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) {
-LOG_ER("Unable to set active controller in consensus service");
-   

Re: [devel] [PATCH 1/1] base: Destructor of TraceLog causes coredump V2 [#2860]

2018-05-23 Thread Hans Nordebäck
Hi Minh,


yes you are right about the possibility for a segv, but using a std::shared_ptr 
instead of the naked ptr may be an option ?


/Thanks Hans


Från: Minh Hon Chau <minh.c...@dektech.com.au>
Skickat: den 24 maj 2018 02:34:13
Till: Hans Nordebäck; Anders Widell; Gary Lee
Kopia: opensaf-devel@lists.sourceforge.net
Ämne: Re: [PATCH 1/1] base: Destructor of TraceLog causes coredump V2 [#2860]

Hi Hans,

It is good to give an option to Mutex class not to abort. We can avoid
the abort in mutex_unlock (as reported in coredump), but I feel the
issue is still there.

We may hit a problem (segv?) with "mutex_->good()" since the other
thread is wiping out the mutex_ in destructor, it is a matter of timing
to happen I guess.

As we don't have (and don't want to have) any protection between two
threads for the TraceLog, so the good one (I hope) is making one of
those threads not to touch the TraceLog.

If you don't like to remove the destructor, another way is locating the
gl_trace/gl_log to the HEAP?

Thanks,

Minh



On 23/05/18 20:50, Hans Nordeback wrote:
> Change Mutex class to make it possible for caller to decide if abort
> ---
>   src/base/logtrace_client.cc |  5 -
>   src/base/mutex.cc   |  2 +-
>   src/base/mutex.h| 22 +-
>   3 files changed, 18 insertions(+), 11 deletions(-)
>
> diff --git a/src/base/logtrace_client.cc b/src/base/logtrace_client.cc
> index 0dac6d389..f597c1ae3 100644
> --- a/src/base/logtrace_client.cc
> +++ b/src/base/logtrace_client.cc
> @@ -76,7 +76,7 @@ bool TraceLog::Init(const char *msg_id, WriteMode mode) {
> msg_id_ = base::LogMessage::MsgId{msg_id};
> log_socket_ = new base::UnixClientSocket{Osaflog::kServerSocketPath,
>   static_cast(mode)};
> -  mutex_ = new base::Mutex{};
> +  mutex_ = new base::Mutex{false};
>
> return true;
>   }
> @@ -91,6 +91,9 @@ void TraceLog::Log(base::LogMessage::Severity severity, 
> const char *fmt,
>   void TraceLog::LogInternal(base::LogMessage::Severity severity, const char 
> *fmt,
>  va_list ap) {
> base::Lock lock(*mutex_);
> +
> +  if (!mutex_->good()) return;
> +
> uint32_t id = sequence_id_;
> sequence_id_ = id < kMaxSequenceId ? id + 1 : 1;
> buffer_.clear();
> diff --git a/src/base/mutex.cc b/src/base/mutex.cc
> index 5fa6ac55a..1627ac20b 100644
> --- a/src/base/mutex.cc
> +++ b/src/base/mutex.cc
> @@ -20,7 +20,7 @@
>
>   namespace base {
>
> -Mutex::Mutex() : mutex_{} {
> +Mutex::Mutex(bool abort) : abort_{abort}, mutex_{}, result_{0} {
> pthread_mutexattr_t attr;
> int result = pthread_mutexattr_init();
> if (result != 0) osaf_abort(result);
> diff --git a/src/base/mutex.h b/src/base/mutex.h
> index 7b3cee187..e3c54a711 100644
> --- a/src/base/mutex.h
> +++ b/src/base/mutex.h
> @@ -31,30 +31,34 @@ namespace base {
>   class Mutex {
>public:
> using NativeHandleType = pthread_mutex_t*;
> -  Mutex();
> +  Mutex(bool abort = true);
> ~Mutex();
> void Lock() {
> -int result = pthread_mutex_lock(_);
> -if (result != 0) osaf_abort(result);
> +result_ = pthread_mutex_lock(_);
> +if (abort_ && result_ != 0) osaf_abort(result_);
> }
> bool TryLock() {
> -int result = pthread_mutex_trylock(_);
> -if (result == 0) {
> +result_ = pthread_mutex_trylock(_);
> +if (result_ == 0) {
> return true;
> -} else if (result == EBUSY) {
> +} else if (result_ == EBUSY) {
> return false;
>   } else {
> -  osaf_abort(result);
> +  if (abort_) osaf_abort(result_);
> +  return false;
>   }
> }
> void Unlock() {
> -int result = pthread_mutex_unlock(_);
> -if (result != 0) osaf_abort(result);
> +result_ = pthread_mutex_unlock(_);
> +if (abort_ && result_ != 0) osaf_abort(result_);
> }
> NativeHandleType native_handle() { return _; }
>
> +  bool good() const {return result_ == 0;};
>private:
> +  bool abort_;
> pthread_mutex_t mutex_;
> +  int result_;
> DELETE_COPY_AND_MOVE_OPERATORS(Mutex);
>   };
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] imm: inform status to NID before exit during start-up phrase [#2845]

2018-05-09 Thread Hans Nordebäck
Hi Vu,

I'll revise my comment a bit, before sending nid_notify, the fifo monitoring is 
not started. So removing the exit should not be necessary, good if 
you can test this.

/Thanks HansN

-Original Message-
From: Hans Nordebäck 
Sent: den 8 maj 2018 15:06
To: 'Vu Minh Nguyen' <vu.m.ngu...@dektech.com.au>; ravisekhar.ko...@oracle.com; 
Anders Widell <anders.wid...@ericsson.com>; Lennart Lund 
<lennart.l...@ericsson.com>
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 
<vu.m.ngu...@dektech.com.au>
Subject: RE: [PATCH 1/1] imm: inform status to NID before exit during start-up 
phrase [#2845]

Hi Vu,

Ack review only with one comment. If the exit() is called after 
immnd_ackToNid() the fifo monitoring in nodeinit.cc will be activated.
I think you should remove the exit().
/Thanks HansN

-Original Message-
From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] 
Sent: den 3 maj 2018 12:20
To: ravisekhar.ko...@oracle.com; Hans Nordebäck <hans.nordeb...@ericsson.com>; 
Anders Widell <anders.wid...@ericsson.com>; Lennart Lund 
<lennart.l...@ericsson.com>
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 
<vu.m.ngu...@dektech.com.au>
Subject: [PATCH 1/1] imm: inform status to NID before exit during start-up 
phrase [#2845]

During node starts up phrase, which AMFD has not been come up, there is a case 
IMMND exit without informing failure result to NID (refer to the ticket to see 
syslog). As the result, IMMND may not be respawned by NID process.

This patch ensures that NID is informed before exit.
---
 src/imm/immnd/immnd_evt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c index 
8f3af92..2b9123d 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -10779,6 +10779,7 @@ static uint32_t immnd_evt_proc_fevs_rcv(IMMND_CB *cb, 
IMMND_EVT *evt,
LOG_ER(
"MESSAGE:%llu OUT OF ORDER my highest 
processed:%llu - exiting",
msgNo, cb->highestProcessed);
+   immnd_ackToNid(NCSCC_RC_FAILURE);
exit(1);
} else if (cb
   ->mSync) { /* If we receive out of sync message
--
1.9.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] imm: inform status to NID before exit during start-up phrase [#2845]

2018-05-08 Thread Hans Nordebäck
Hi Vu,

Ack review only with one comment. If the exit() is called after 
immnd_ackToNid() the fifo monitoring in nodeinit.cc will be activated.
I think you should remove the exit().
/Thanks HansN

-Original Message-
From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] 
Sent: den 3 maj 2018 12:20
To: ravisekhar.ko...@oracle.com; Hans Nordebäck <hans.nordeb...@ericsson.com>; 
Anders Widell <anders.wid...@ericsson.com>; Lennart Lund 
<lennart.l...@ericsson.com>
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 
<vu.m.ngu...@dektech.com.au>
Subject: [PATCH 1/1] imm: inform status to NID before exit during start-up 
phrase [#2845]

During node starts up phrase, which AMFD has not been come up, there is a case 
IMMND exit without informing failure result to NID (refer to the ticket to see 
syslog). As the result, IMMND may not be respawned by NID process.

This patch ensures that NID is informed before exit.
---
 src/imm/immnd/immnd_evt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c index 
8f3af92..2b9123d 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -10779,6 +10779,7 @@ static uint32_t immnd_evt_proc_fevs_rcv(IMMND_CB *cb, 
IMMND_EVT *evt,
LOG_ER(
"MESSAGE:%llu OUT OF ORDER my highest 
processed:%llu - exiting",
msgNo, cb->highestProcessed);
+   immnd_ackToNid(NCSCC_RC_FAILURE);
exit(1);
} else if (cb
   ->mSync) { /* If we receive out of sync message
--
1.9.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]

2018-05-04 Thread Hans Nordebäck
Hi Alex,

Agree, adding a comment in nid.conf and 00-README.conf is good. The backtrace 
below looks normal, can you share the syslogs?

/BR HansN

From: Alex Jones [mailto:ajo...@rbbn.com]
Sent: den 2 maj 2018 15:43
To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell 
<anders.wid...@ericsson.com>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: SV: [PATCH 1/1] nid: restart opensafd on failure when systemd 
enabled [#2839]


Hi Hans,

I was finally able to get back to this.

Having "Restart=on-failure" set works with REBOOT_ON_FAIL_TIMEOUT as long 
as RestartSec=xxx is also set in the service file to something greater than 
REBOOT_ON_FAIL_TIMEOUT. Maybe we could put a comment in nid.conf that says if 
you use systemd you need to also set RestartSec to a failure greater than 
REBOOT_ON_FAIL_TIMEOUT?

Regarding "systemctl start opensafd; sleep 1; pkill -ABRT immnd". In my 
setup it does not restart after the nid phase. If I increase the time to 3, it 
starts to work. Here is the backtrace. Nothing looks suspicious.

(gdb) thread apply all bt

Thread 4 (Thread 0x7fbf852e9b00 (LWP 5123)):
#0  0x7fbf839b906d in poll () from /lib64/libc.so.6
#1  0x7fbf8462a370 in poll (__timeout=2, __nfds=2, __fds=) at /usr/include/bits/poll2.h:46
#2  mdtm_process_recv_events_tcp () at src/mds/mds_dt_trans.c:986
#3  0x7fbf83c910db in start_thread () from /lib64/libpthread.so.0
#4  0x7fbf839c1e3d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7fbf85309b00 (LWP 5122)):
#0  0x7fbf839b906d in poll () from /lib64/libc.so.6
#1  0x7fbf84601641 in poll (__timeout=4900, __nfds=1, __fds=0x7fbf85309260) 
at /usr/include/bits/poll2.h:46
#2  osaf_ppoll (io_fds=io_fds@entry=0x7fbf85309260, i_nfds=i_nfds@entry=1, 
i_timeout_ts=0x7fbf85309280, i_sigmask=i_sigmask@entry=0x0) at 
src/base/osaf_poll.c:108
#3  0x7fbf84608c2f in ncs_tmr_wait () at src/base/sysf_tmr.c:463
#4  0x7fbf83c910db in start_thread () from /lib64/libpthread.so.0
#5  0x7fbf839c1e3d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7fbf82787700 (LWP 5121)):
#0  0x7fbf839b906d in poll () from /lib64/libc.so.6
#1  0x7fbf84601560 in poll (__timeout=-1, __nfds=1, __fds=0x7fbf82786e30) 
at /usr/include/bits/poll2.h:46
#2  osaf_poll_no_timeout (io_fds=0x7fbf82786e30, i_nfds=1) at 
src/base/osaf_poll.c:31
#3  0x7fbf846017e5 in osaf_poll (io_fds=io_fds@entry=0x7fbf82786e30, 
i_nfds=i_nfds@entry=1, i_timeout=i_timeout@entry=-1) at src/base/osaf_poll.c:44
#4  0x7fbf8460197c in auth_server_main (_fd=) at 
src/base/osaf_secutil.c:176
#5  0x7fbf83c910db in start_thread () from /lib64/libpthread.so.0
#6  0x7fbf839c1e3d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7fbf85341740 (LWP 5120)):
#0  0x7fbf839b906d in poll () from /lib64/libc.so.6
#1  0x7fbf850cc3b8 in poll (__timeout=, __nfds=5, 
__fds=0x7ffdb1e02590) at /usr/include/bits/poll2.h:46
#2  main (argc=, argv=) at 
src/imm/immnd/immnd_main.c:358
(gdb)

Alex



On 04/26/2018 03:38 AM, Hans Nordeback wrote:

NOTICE: This email was received from an EXTERNAL sender



Hi Alex,

I tested this, immnd gets restarted and systemd reports opensafd.service as 
active (running),

so it works as expected. In your case, immnd is never restarted after the nid 
phase, or does it work

if you increase the sleep time? One thing you can check is to send an ABRT 
instead of the KILL and check

the core dump at e.g. which address you receive the signal. Perhaps you have 
found a "window"

where immnd is not monitored?

/Regards HansN

On 04/25/2018 03:23 PM, Alex Jones wrote:

Hi Hans,

I understand. But, what if it doesn't fail in the nid phase?

If you run this command in your setup: "systemctl start opensafd; sleep 2; 
pkill -KILL immnd", does immnd get restarted? And does opensafd successfully 
come up according to systemd?

Alex

On 04/25/2018 09:19 AM, Hans Nordebäck wrote:

NOTICE: This email was received from an EXTERNAL sender


Hi Alex,

the reboot should only happen if REBOOT_ON_FAIL_TIMEOUT is set, (i.e. not 0).
I checked the latest version, the reboot works fine if e.g. immnd fails in the 
nid phase and REBOOT_ON_FAIL_TIMEOUT is set.

/Thanks HansN

From: Alex Jones [mailto:ajo...@rbbn.com]
Sent: den 25 april 2018 15:05
To: Hans Nordebäck 
<hans.nordeb...@ericsson.com><mailto:hans.nordeb...@ericsson.com>; Anders 
Widell <anders.wid...@ericsson.com><mailto:anders.wid...@ericsson.com>
Cc: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>
Subject: Re: SV: [PATCH 1/1] nid: restart opensafd on failure when systemd 
enabled [#2839]


Hi Hans,



There must be a hole here, then. Because in our setup, if dtmd or immnd 
crashes early in the startup process, the node doesn't reboot, a

Re: [devel] [PATCH 1/1] clmd: Increase message priority of CLMSV_CLMS_MDS_NODE_EVT to be sent to main thread [#2842]

2018-05-03 Thread Hans Nordebäck
Ack, review only. /Thanks HansN

-Original Message-
From: Minh Chau [mailto:minh.c...@dektech.com.au] 
Sent: den 26 april 2018 01:22
To: Anders Widell <anders.wid...@ericsson.com>; Hans Nordebäck 
<hans.nordeb...@ericsson.com>; ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net; Minh Hon Chau 
<minh.c...@dektech.com.au>
Subject: [PATCH 1/1] clmd: Increase message priority of CLMSV_CLMS_MDS_NODE_EVT 
to be sent to main thread [#2842]

In the event of stop/start standby controller, the node is stopped that 
generates the MDS event CLMSV_CLMS_MDS_NODE_EVT. This event is being sent to 
main thread with NORMAL priority. When the node is started again, the other 
event like CLMSV_CLUSTER_JOIN_REQ is being sent with HIGH priority.

The race happens as CLMSV_CLMS_MDS_NODE_EVT is processed after the event 
CLMSV_CLUSTER_JOIN_REQ, possibly caused by the priority.

The patch sets priority of CLMSV_CLMS_MDS_NODE_EVT as high as the others so 
that the order of messages processed in main thread should depend on the timing 
order of events that occurred.
---
 src/clm/clmd/clms_mds.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/clm/clmd/clms_mds.cc b/src/clm/clmd/clms_mds.cc index 
a1f5348..58552cc 100644
--- a/src/clm/clmd/clms_mds.cc
+++ b/src/clm/clmd/clms_mds.cc
@@ -1097,7 +1097,7 @@ static uint32_t clms_mds_node_event(struct 
ncsmds_callback_info *mds_info) {
 clmsv_evt->info.node_mds_info.node_id = mds_info->info.node_evt.node_id;
 clmsv_evt->info.node_mds_info.nodeup = SA_TRUE;
 
-rc = m_NCS_IPC_SEND(_cb->mbx, clmsv_evt, NCS_IPC_PRIORITY_NORMAL);
+rc = m_NCS_IPC_SEND(_cb->mbx, clmsv_evt, 
+ NCS_IPC_PRIORITY_HIGH);
 if (rc != NCSCC_RC_SUCCESS) {
   TRACE("IPC send failed %d", rc);
   free(clmsv_evt);
--
2.7.4


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]

2018-04-25 Thread Hans Nordebäck
Hi Alex,

the reboot should only happen if REBOOT_ON_FAIL_TIMEOUT is set, (i.e. not 0).
I checked the latest version, the reboot works fine if e.g. immnd fails in the 
nid phase and REBOOT_ON_FAIL_TIMEOUT is set.

/Thanks HansN

From: Alex Jones [mailto:ajo...@rbbn.com]
Sent: den 25 april 2018 15:05
To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell 
<anders.wid...@ericsson.com>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: SV: [PATCH 1/1] nid: restart opensafd on failure when systemd 
enabled [#2839]


Hi Hans,



There must be a hole here, then. Because in our setup, if dtmd or immnd 
crashes early in the startup process, the node doesn't reboot, and the 
executables are not restarted. If I set "Restart=on-failure" it works fine.



Can you test this in your setup to see if you see the same thing?



Alex

On 04/24/2018 05:04 AM, Hans Nordeback wrote:

NOTICE: This email was received from an EXTERNAL sender



Hi Alex,



please see comment below.



/Thanks HansN

On 04/23/2018 03:56 PM, Alex Jones wrote:

Hi Hans,



I just did some tests. Maybe there is a bug in nid, but when I do not have 
"Restart=on-failure", the node does not reboot when I run the command 
"systemctl start opensafd; sleep 3; pkill -KILL immnd", and opensafd times out 
and fails, with REBOOT_ON_FAIL_TIMEOUT=30.
[HansN] isn't the nid phase finished before the sleep 3 command? It is only 
during the nid phase that the REBOOT_ON_FAIL_TIMEOUT is used,
After the nid phase opensaf enters "normal" operation,  no reboot will be 
performed as immnd is restartable. Instead of the sleep 3,
you can edit the nodeinit.conf.controller file and change the immnd line to 
e.g. "/usr/local/lib/opensaf/clc-cli/osaf-immndx:IMMND ... " then
nid should fail to start and REBOOT_ON_FAIL_TIMEOUT should work.





But, opensafd restarts every time when I run that command with 
"Restart=on-failure" set.



Alex

On 04/19/2018 04:02 PM, Hans Nordebäck wrote:

NOTICE: This email was received from an EXTERNAL sender



Hi Alex,



a question, if opensafd fails, (assert or exit code ne 0) a reboot of the node 
will be performed if REBOOT_ON_FAIL_TIMEOUT

is configured, I have not checked, but how do systemd handle the reboot request 
if Restart=on-failure is set?



/BR HansN


Från: Alex Jones <ajo...@rbbn.com><mailto:ajo...@rbbn.com>
Skickat: den 19 april 2018 17:27:27
Till: Hans Nordebäck; Anders Widell
Kopia: 
opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>;
 Alex Jones
Ämne: [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]

Under certain circumstances opensafd fails to start (immnd or dtmd crashes,
etc).

Apr 19 15:07:31 ams-idsp-46-novnfm osafdtmd[3315]: 
src/dtm/dtmnd/dtm_intra_svc.cc:1778: dtm_process_internode_service_up_msg: 
Assertion '0' failed.

We can tell systemd to restart opensafd if it fails to start.
---
 src/nid/opensafd.service.in | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/nid/opensafd.service.in b/src/nid/opensafd.service.in
index 7f4d75ee3..6050f5e88 100644
--- a/src/nid/opensafd.service.in
+++ b/src/nid/opensafd.service.in
@@ -12,5 +12,7 @@ ControlGroup=cpu:/
 TimeoutStartSec=3hours
 KillMode=none
 @systemdtasksmax@
+Restart=on-failure
+
 [Install]
 WantedBy=multi-user.target
--
2.13.6



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in [#2835]

2018-04-23 Thread Hans Nordebäck
Hi Alex,

Ack, code review and legacy tests run. I added some comments below.

/Thanks HansN

From: Alex Jones [mailto:ajo...@rbbn.com]
Sent: den 23 april 2018 16:05
To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Gary Lee 
<gary@dektech.com.au>; ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] amfd: if rootCauseEntity is PLM entity don't engage 
lock/lock-in [#2835]


My comments inline:

Alex

On 04/20/2018 04:00 AM, Hans Nordeback wrote:

NOTICE: This email was received from an EXTERNAL sender



Hi Alex,

please see below for some comments/questions.

/Regards HansN

On 04/18/2018 03:41 PM, Alex Jones wrote:

When using PLM an AMF node mapped to a CLM node mapped to a PLM EE, can get

stuck in locked state when rebooting, or going through a PLM EE lock/unlock.



When amfd receives a START step from CLM tracking it attempts to gracefully

shutdown the AMF node using AMF admin operations lock/lock-in. When PLM is

involved this doesn't always work correctly because PLM is also shutting down

the node by calling "opensafd stop". There is a race condition between PLM

using "opensafd stop", and amfd using the admin operations to bring down the

node, so that sometimes the AMF node gets stuck in locked state.



If the rootCauseEntity in the CLM tracking is a PLM entity then don't do

anything, as "opensafd stop" is already being called.

---

 src/amf/amfd/clm.cc | 25 -

 1 file changed, 24 insertions(+), 1 deletion(-)



diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc

index 2bcea2db0..7f675d8e9 100644

--- a/src/amf/amfd/clm.cc

+++ b/src/amf/amfd/clm.cc

@@ -274,6 +274,27 @@ static void clm_track_cb(

 TRACE_3("Already got callback for start of this change.");

 continue;

   }

+

+  if (strncmp(osaf_extended_name_borrow(rootCauseEntity),

+  "safEE=",

+  sizeof("safEE=") - 1) == 0 ||

+  strncmp(osaf_extended_name_borrow(rootCauseEntity),

+  "safHE=",

+  sizeof("safHE=") - 1) == 0) {

+// PLM will take care of calling opensafd stop

+TRACE("rootCause: %s from PLM operation so skipping %u",

+  osaf_extended_name_borrow(rootCauseEntity),

+  notifItem->clusterNode.nodeId);

+

+SaAisErrorT rc(saClmResponse_4(avd_cb->clmHandle,

+   invocation,

+   SA_CLM_CALLBACK_RESPONSE_OK));
[HansN] perhaps use:


SaAisErrorT rc = saClmResponse_4 or SaAisErrorT rc{saClmResponse_4 instead?
[Alex] I'm not sure what you are asking here. Do you not like the function 
syntax? And what is '{'? I don't understand your second suggestion.
[HansN]  ‘{‘ is used for uniform initialization in c++11, (preferred).





+if (rc != SA_AIS_OK)

+  LOG_ER("saClmResponse_4 failed: %i", rc);

+
[HansN] I think the amf operational state has to be checked and set to 
disabled? And should
break be used instead of continue?
[Alex] Setting operational state to disabled is taken care of when COMPLETED is 
received in the track callback. My code change is only when receiving START. I 
used "continue" to explicitly mean that we are done processing this node, and 
we need to move to the next node in the for loop. The same thing is done in 
legacy code above when checking for "clm_change_start_preceded."
[HansN] ok


+continue;

+  }

+

   /* invocation to be used by pending clm response */

   node->clm_pend_inv = invocation;

   clm_node_exit_start(node, notifItem->clusterChange);

@@ -304,7 +325,9 @@ static void clm_track_cb(

 osaf_extended_name_borrow(rootCauseEntity),

 notifItem->clusterNode.nodeId);

   if (strncmp(osaf_extended_name_borrow(rootCauseEntity),

-  "safEE=", 6) == 0) {

+  "safEE=", 6) == 0 ||

+  strncmp(osaf_extended_name_borrow(rootCauseEntity),

+  "safHE=", 6) == 0) {
[HansN] sizeof("safHE=") as above
[Alex] Agreed. I will make this change. And change the older code to conform.


 /* This callback is because of operation on PLM, so we need to mark

the node absent, because PLCD will anyway call opensafd stop.*/

 AVD_AVND *node =


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]

2018-04-19 Thread Hans Nordebäck
Hi Alex,


a question, if opensafd fails, (assert or exit code ne 0) a reboot of the node 
will be performed if REBOOT_ON_FAIL_TIMEOUT

is configured, I have not checked, but how do systemd handle the reboot request 
if Restart=on-failure is set?


/BR HansN


Från: Alex Jones <ajo...@rbbn.com>
Skickat: den 19 april 2018 17:27:27
Till: Hans Nordebäck; Anders Widell
Kopia: opensaf-devel@lists.sourceforge.net; Alex Jones
Ämne: [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]

Under certain circumstances opensafd fails to start (immnd or dtmd crashes,
etc).

Apr 19 15:07:31 ams-idsp-46-novnfm osafdtmd[3315]: 
src/dtm/dtmnd/dtm_intra_svc.cc:1778: dtm_process_internode_service_up_msg: 
Assertion '0' failed.

We can tell systemd to restart opensafd if it fails to start.
---
 src/nid/opensafd.service.in | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/nid/opensafd.service.in b/src/nid/opensafd.service.in
index 7f4d75ee3..6050f5e88 100644
--- a/src/nid/opensafd.service.in
+++ b/src/nid/opensafd.service.in
@@ -12,5 +12,7 @@ ControlGroup=cpu:/
 TimeoutStartSec=3hours
 KillMode=none
 @systemdtasksmax@
+Restart=on-failure
+
 [Install]
 WantedBy=multi-user.target
--
2.13.6

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/2] base: Add support to direct OpenSAF logging to local node file [#2306]

2018-04-18 Thread Hans Nordebäck
Hi Minh,


One part of this problem is in mbcsv_papi.h, the

#ifdef __cplusplus

extern "C" {

should at least be after the  #include stmts. Fixing this should remove the 
need for the extern "C++". But moving this part to

a separate header file would also be good.


As a general comment, (all OpenSAF), is that only extern "C" should be needed, 
and it should

be carefully placed only where it is required, c++ access to a c 
function/variable with external linkage, not placing

it in a too wide scope. It can be placed around a group of c-functions but not 
the complete header file.

/Thanks HansN

____
Från: Hans Nordebäck <hans.nordeb...@ericsson.com>
Skickat: den 18 april 2018 20:43:22
Till: Minh Hon Chau; Anders Widell; ravisekhar.ko...@oracle.com
Kopia: opensaf-devel@lists.sourceforge.net
Ämne: Re: [devel] [PATCH 1/2] base: Add support to direct OpenSAF logging to 
local node file [#2306]

Hi Minh,


yes, before this patch logtrace.h was a c header file callable from c and c++. 
Now it is a c/c++ header file

so including it from a c program without the extern "c++" will fail. In the 
first review comment I suggested

to move this part to a separate header file and keep logtrace.h as before.

/Regards HansN


Från: Minh Hon Chau <minh.c...@dektech.com.au>
Skickat: den 18 april 2018 16:09:35
Till: Hans Nordebäck; Anders Widell; ravisekhar.ko...@oracle.com
Kopia: opensaf-devel@lists.sourceforge.net
Ämne: Re: [PATCH 1/2] base: Add support to direct OpenSAF logging to local node 
file [#2306]

Hi Hans,

One comment regarding extern C++ as below

Thanks,

Minh


On 18/04/18 23:37, Hans Nordebäck wrote:
> Hi Minh,
>
>   See my comments below.
>
> /Thanks HansN
>
> -Original Message-
> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
> Sent: den 18 april 2018 15:20
> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell 
> <anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1/2] base: Add support to direct OpenSAF logging to local 
> node file [#2306]
>
> Hi Hans,
>
> Please check my response with [Minh]
>
> Thanks
>
> Minh
>
>
> On 18/04/18 22:40, Hans Nordeback wrote:
>> Hi Minh,
>>
>> ack, code review only. Some comments below.
>>
>> /Thanks HansN
>>
>>
>> On 04/12/2018 01:12 AM, Minh Chau wrote:
>>> Unify TraceLog and MdsLog class to one class (TraceLog) so it can be
>>> used as a common log client.
>>> Add new instance TraceLog for OpenSAF logging to local file, which
>>> can be enabled/disabled via environment variable OSAF_LOCAL_NODE_LOG
>>> ---
>>>src/base/logtrace.cc | 167
>>> ++-
>>>src/base/logtrace.h  |  50 +--
>>>src/mds/mds_log.cc   | 114 +++
>>>3 files changed, 140 insertions(+), 191 deletions(-)
>>>
>>> diff --git a/src/base/logtrace.cc b/src/base/logtrace.cc index
>>> b046fab..857e31c 100644
>>> --- a/src/base/logtrace.cc
>>> +++ b/src/base/logtrace.cc
>>> @@ -36,15 +36,10 @@
>>>#include 
>>>#include 
>>>#include 
>>> -#include "base/buffer.h"
>>> -#include "base/conf.h"
>>> -#include "base/log_message.h"
>>> -#include "base/macros.h"
>>> -#include "base/mutex.h"
>>> +#include "base/getenv.h"
>>>#include "base/ncsgl_defs.h"
>>>#include "base/osaf_utility.h"
>>>#include "base/time.h"
>>> -#include "base/unix_client_socket.h"
>>>#include "dtm/common/osaflog_protocol.h"
>>>  namespace global {
>>> @@ -55,65 +50,38 @@ const char *const prefix_name[] = {"EM", "AL",
>>> "CR", "ER", "WA", "NO", "IN",
>>>   "T6", "T7", "T8", ">>", "<<"};
>>>char *msg_id;
>>>int logmask;
>>> +const char* osaf_log_file = "osaf.log"; bool enable_osaf_log =
>>> +false;
>>>  }  // namespace global
>>>-class TraceLog {
>>> - public:
>>> -  static bool Init();
>>> -  static void Log(base::LogMessage::Severity severity, const char
>>> *fmt,
>>> -  va_list ap);
>>> -
>>> - private:
>>> -  TraceLog(const std::strin

Re: [devel] [PATCH 1/2] base: Add support to direct OpenSAF logging to local node file [#2306]

2018-04-18 Thread Hans Nordebäck
Hi Minh,


yes, before this patch logtrace.h was a c header file callable from c and c++. 
Now it is a c/c++ header file

so including it from a c program without the extern "c++" will fail. In the 
first review comment I suggested

to move this part to a separate header file and keep logtrace.h as before.

/Regards HansN


Från: Minh Hon Chau <minh.c...@dektech.com.au>
Skickat: den 18 april 2018 16:09:35
Till: Hans Nordebäck; Anders Widell; ravisekhar.ko...@oracle.com
Kopia: opensaf-devel@lists.sourceforge.net
Ämne: Re: [PATCH 1/2] base: Add support to direct OpenSAF logging to local node 
file [#2306]

Hi Hans,

One comment regarding extern C++ as below

Thanks,

Minh


On 18/04/18 23:37, Hans Nordebäck wrote:
> Hi Minh,
>
>   See my comments below.
>
> /Thanks HansN
>
> -Original Message-
> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
> Sent: den 18 april 2018 15:20
> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell 
> <anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1/2] base: Add support to direct OpenSAF logging to local 
> node file [#2306]
>
> Hi Hans,
>
> Please check my response with [Minh]
>
> Thanks
>
> Minh
>
>
> On 18/04/18 22:40, Hans Nordeback wrote:
>> Hi Minh,
>>
>> ack, code review only. Some comments below.
>>
>> /Thanks HansN
>>
>>
>> On 04/12/2018 01:12 AM, Minh Chau wrote:
>>> Unify TraceLog and MdsLog class to one class (TraceLog) so it can be
>>> used as a common log client.
>>> Add new instance TraceLog for OpenSAF logging to local file, which
>>> can be enabled/disabled via environment variable OSAF_LOCAL_NODE_LOG
>>> ---
>>>src/base/logtrace.cc | 167
>>> ++-
>>>src/base/logtrace.h  |  50 +--
>>>src/mds/mds_log.cc   | 114 +++
>>>3 files changed, 140 insertions(+), 191 deletions(-)
>>>
>>> diff --git a/src/base/logtrace.cc b/src/base/logtrace.cc index
>>> b046fab..857e31c 100644
>>> --- a/src/base/logtrace.cc
>>> +++ b/src/base/logtrace.cc
>>> @@ -36,15 +36,10 @@
>>>#include 
>>>#include 
>>>#include 
>>> -#include "base/buffer.h"
>>> -#include "base/conf.h"
>>> -#include "base/log_message.h"
>>> -#include "base/macros.h"
>>> -#include "base/mutex.h"
>>> +#include "base/getenv.h"
>>>#include "base/ncsgl_defs.h"
>>>#include "base/osaf_utility.h"
>>>#include "base/time.h"
>>> -#include "base/unix_client_socket.h"
>>>#include "dtm/common/osaflog_protocol.h"
>>>  namespace global {
>>> @@ -55,65 +50,38 @@ const char *const prefix_name[] = {"EM", "AL",
>>> "CR", "ER", "WA", "NO", "IN",
>>>   "T6", "T7", "T8", ">>", "<<"};
>>>char *msg_id;
>>>int logmask;
>>> +const char* osaf_log_file = "osaf.log"; bool enable_osaf_log =
>>> +false;
>>>  }  // namespace global
>>>-class TraceLog {
>>> - public:
>>> -  static bool Init();
>>> -  static void Log(base::LogMessage::Severity severity, const char
>>> *fmt,
>>> -  va_list ap);
>>> -
>>> - private:
>>> -  TraceLog(const std::string , const std::string _name,
>>> -   uint32_t proc_id, const std::string _id,
>>> -   const std::string _name);
>>> -  void LogInternal(base::LogMessage::Severity severity, const char
>>> *fmt,
>>> -   va_list ap);
>>> -  static constexpr const uint32_t kMaxSequenceId =
>>> uint32_t{0x7fff};
>>> -  static TraceLog *instance_;
>>> -  const base::LogMessage::HostName fqdn_;
>>> -  const base::LogMessage::AppName app_name_;
>>> -  const base::LogMessage::ProcId proc_id_;
>>> -  const base::LogMessage::MsgId msg_id_;
>>> -  uint32_t sequence_id_;
>>> -  base::UnixClientSocket log_socket_;
>>> -  base::Buffer<512> buffer_;
>>> -  base::Mutex mutex_;
>>> -
>>> -  DELETE_COPY_AND_MOVE_OPERATORS(TraceLog);
>>> -};
>>> -
>>> -TraceLog *

Re: [devel] [PATCH 1/2] base: Add support to direct OpenSAF logging to local node file [#2306]

2018-04-18 Thread Hans Nordebäck
Hi Minh,

 See my comments below.

/Thanks HansN

-Original Message-
From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] 
Sent: den 18 april 2018 15:20
To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell 
<anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/2] base: Add support to direct OpenSAF logging to local 
node file [#2306]

Hi Hans,

Please check my response with [Minh]

Thanks

Minh


On 18/04/18 22:40, Hans Nordeback wrote:
> Hi Minh,
>
> ack, code review only. Some comments below.
>
> /Thanks HansN
>
>
> On 04/12/2018 01:12 AM, Minh Chau wrote:
>> Unify TraceLog and MdsLog class to one class (TraceLog) so it can be 
>> used as a common log client.
>> Add new instance TraceLog for OpenSAF logging to local file, which 
>> can be enabled/disabled via environment variable OSAF_LOCAL_NODE_LOG
>> ---
>>   src/base/logtrace.cc | 167
>> ++-
>>   src/base/logtrace.h  |  50 +--
>>   src/mds/mds_log.cc   | 114 +++
>>   3 files changed, 140 insertions(+), 191 deletions(-)
>>
>> diff --git a/src/base/logtrace.cc b/src/base/logtrace.cc index 
>> b046fab..857e31c 100644
>> --- a/src/base/logtrace.cc
>> +++ b/src/base/logtrace.cc
>> @@ -36,15 +36,10 @@
>>   #include 
>>   #include 
>>   #include 
>> -#include "base/buffer.h"
>> -#include "base/conf.h"
>> -#include "base/log_message.h"
>> -#include "base/macros.h"
>> -#include "base/mutex.h"
>> +#include "base/getenv.h"
>>   #include "base/ncsgl_defs.h"
>>   #include "base/osaf_utility.h"
>>   #include "base/time.h"
>> -#include "base/unix_client_socket.h"
>>   #include "dtm/common/osaflog_protocol.h"
>>     namespace global {
>> @@ -55,65 +50,38 @@ const char *const prefix_name[] = {"EM", "AL", 
>> "CR", "ER", "WA", "NO", "IN",
>>  "T6", "T7", "T8", ">>", "<<"};
>>   char *msg_id;
>>   int logmask;
>> +const char* osaf_log_file = "osaf.log"; bool enable_osaf_log = 
>> +false;
>>     }  // namespace global
>>   -class TraceLog {
>> - public:
>> -  static bool Init();
>> -  static void Log(base::LogMessage::Severity severity, const char 
>> *fmt,
>> -  va_list ap);
>> -
>> - private:
>> -  TraceLog(const std::string , const std::string _name,
>> -   uint32_t proc_id, const std::string _id,
>> -   const std::string _name);
>> -  void LogInternal(base::LogMessage::Severity severity, const char 
>> *fmt,
>> -   va_list ap);
>> -  static constexpr const uint32_t kMaxSequenceId = 
>> uint32_t{0x7fff};
>> -  static TraceLog *instance_;
>> -  const base::LogMessage::HostName fqdn_;
>> -  const base::LogMessage::AppName app_name_;
>> -  const base::LogMessage::ProcId proc_id_;
>> -  const base::LogMessage::MsgId msg_id_;
>> -  uint32_t sequence_id_;
>> -  base::UnixClientSocket log_socket_;
>> -  base::Buffer<512> buffer_;
>> -  base::Mutex mutex_;
>> -
>> -  DELETE_COPY_AND_MOVE_OPERATORS(TraceLog);
>> -};
>> -
>> -TraceLog *TraceLog::instance_ = nullptr;
>> -
>> -TraceLog::TraceLog(const std::string , const std::string 
>> _name,
>> -   uint32_t proc_id, const std::string _id,
>> -   const std::string _name)
>> -    : fqdn_{base::LogMessage::HostName{fqdn}},
>> -  app_name_{base::LogMessage::AppName{app_name}},
>> - proc_id_{base::LogMessage::ProcId{std::to_string(proc_id)}},
>> -  msg_id_{base::LogMessage::MsgId{msg_id}},
>> -  sequence_id_{1},
>> -  log_socket_{socket_name, base::UnixSocket::kBlocking},
>> -  buffer_{},
>> -  mutex_{} {}
>> -
>> -bool TraceLog::Init() {
>> -  if (instance_ != nullptr) return false;
>> -  char app_name[49];
>> -  char pid_path[1024];
> [HansN] instead of static global use unnamed namespaces instead. Also 
> try to avoid globals, why change Log and Init from static members? (An 
> alternative is to use a singleton instead, if needed)
[Minh] Changing Log() and Init() not to be static because I am not using 
singleton any more. Before this ticket, we have 2 singleton, one for TraceLog 
and one for MdsLog, 

Re: [devel] [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V3 [#2795]

2018-04-18 Thread Hans Nordebäck
Hi Gary,
in general const  member functions and using logical constness is to prefer, I 
think. (If needed mutable can be used). 
/Regards HansN

-Original Message-
From: Gary Lee [mailto:gary@dektech.com.au] 
Sent: den 13 april 2018 09:45
To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell 
<anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 0/5] Review Request for split-brain: select active SC from 
largest network partition V3 [#2795]

Hi Hans

Yes, they could be declared const member functions, as they generally don't 
change anything in the object. The changes are actually in the KV store.

But I guess we could potentially mislead callers about the intentions of the 
functions though.

What do you think?

/Gary

On 13/04/18 16:16, Hans Nordebäck wrote:
> Hi,
>
>
>
> On 04/12/2018 04:15 PM, Gary Lee wrote:
>> Hi
>>
>>
>> On 12/04/18 23:34, Anders Widell wrote:
>>> Ack with comments:
>>>
>>> * There is no need to use "const" when passing function arguments by 
>>> value. E.g. the argument "const uint64_t cluster_size" should be 
>>> "uint64_t cluster_size".
>>>
>>
>> [GL] Sure, but it doesn't do any harm, and would stop accidental 
>> assignments (that would be lost anyway).
> [HansN] perhaps these functions should be const member functions? E.g. 
> SaAisErrorT PromoteThisNode(bool graceful_takeover, uint64_t
> cluster_size) const;
>>
>>> * You assume that all nodes in the cluster have synchronized clocks 
>>> (probably using NTP). Would it be possible to use an expiration time 
>>> for the etcd key instead of writing a time stamp in the value, so 
>>> that etcd automatically deletes the takeover request when it 
>>> expires? That way we would not require synchronized clocks.
>>>
>>
>> [GL] Good idea. I did question why I hadn't use TTL/lease once I had 
>> finished the ticket. :-) Will see what I can do!
>>
>>> regards,
>>> Anders Widell
>>>
>>> On 04/11/2018 09:35 AM, Gary Lee wrote:
>>>> Summary: split-brain: select active SC from largest network 
>>>> partition V3 [#2795] Review request for Ticket(s): 2795 Peer 
>>>> Reviewer(s): Anders, Ravi, Hans Pull request to: *** LIST THE 
>>>> PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop 
>>>> Development branch: ticket-2795 Base revision: 
>>>> 1c302a300e449e8a8527671fbd6c7f4e2b41e95d
>>>> Personal repository: git://git.code.sf.net/u/userid-2226215/review
>>>>
>>>> 
>>>> Impacted area   Impact y/n
>>>> 
>>>>   Docs    n
>>>>   Build system    n
>>>>   RPM/packaging   n
>>>>   Configuration files n
>>>>   Startup scripts n
>>>>   SAF services    n
>>>>   OpenSAF services    y
>>>>   Core libraries  y
>>>>   Samples n
>>>>   Tests   n
>>>>   Other   n
>>>>
>>>>
>>>> Comments (indicate scope for each "y" above):
>>>> -
>>>>
>>>> *** Changes from V2: ***
>>>>
>>>> fmd: made cluster_size atomic
>>>> fmd: wait 3 seconds before promoting to active, to allow topology 
>>>> events to be processed first
>>>> osaf: add check for existing takeover request, before trying to 
>>>> lock
>>>> etcdv3 plugin: reliablity improvements
>>>>
>>>>
>>>> revision c7bc78656d5de11f6147727bd8612274fb6e438f
>>>> Author:    Gary Lee <gary@dektech.com.au>
>>>> Date:    Wed, 11 Apr 2018 17:16:46 +1000
>>>>
>>>> rded: adapt to new Consensus API [#2795]
>>>>
>>>> - add 3 new internal message:
>>>>
>>>> RDE_MSG_NODE_UP
>>>> RDE_MSG_NODE_DOWN
>>>> RDE_MSG_TAKEOVER_REQUEST_CALLBACK
>>>>
>>>> - subscribe to AMFND service up events to keep track of the number
>>>>    of cluster members
>>>>
>>>> - listen for takeover requests in KV store
>>>>
>>>>
>>>>
>>>> revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393
>>>> Author:    Gary Lee <gary@dektech.com.au>
>>>> Date:    Wed

Re: [devel] [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

2018-04-13 Thread Hans Nordebäck

Hi Ravi,


stonith is not only valid for virutalized environment, I assume stonith 
supports other e.g. ipmi in a legacy environment. The probability for 
"flickering" may be higher in a virtualized environment,


but for redundancy there should be two interfaces configured, which is 
the normal configuration in legacy. If the problem in this ticket is 
solved by using stonith I don't see a need for adding this patch.


BTW do this patch work when stonith is enabled?

/Regards HansN


On 04/13/2018 10:59 AM, Ravi Sekhar Reddy Konda wrote:


HI Hans,

The use case that we are addressing here is link flickering  when 
remote fencing is not enabled, Also remote fencing using Stonith is 
valid only in Virtualization environments. I have not tested using 
Stonith enabled as the use case is in the case where remote fencing is 
disabled.


Thanks,

Ravi

*From:*Hans Nordebäck [mailto:hans.nordeb...@ericsson.com]
*Sent:* Friday, April 13, 2018 1:10 AM
*To:* ravi-sekhar <ravisekhar.ko...@oracle.com>; Anders Widell 
<anders.wid...@ericsson.com>

*Cc:* opensaf-devel@lists.sourceforge.net
*Subject:* SV: [PATCH 1/1] osaf: Isolate the node in the 
opensaf_reboot [#2833]


Hi Ravi,

I think stonith, implemented in ticket #1859, handles this case. This 
"flickering" was one the (manual) tests verifying the added stonith 
support.


It is important to have a separate interface for stonith, to be able 
to perform the remote fencing, similar to use a back plane.


Have you tested with stonith enabled?

/Regards HansN



*Från:*ravi-sekhar <ravisekhar.ko...@oracle.com 
<mailto:ravisekhar.ko...@oracle.com>>

*Skickat:* den 12 april 2018 15:29:13
*Till:* Hans Nordebäck; Anders Widell
*Kopia:* opensaf-devel@lists.sourceforge.net 
<mailto:opensaf-devel@lists.sourceforge.net>; ravi-sekhar

*Ämne:* [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

---
 scripts/opensaf_reboot | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index df65c26..b219c39 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
 if [ -f "$pkgsysconfdir/fmd.conf" ]; then
   . "$pkgsysconfdir/fmd.conf"
 fi
+if [ -f "$pkgsysconfdir/nid.conf" ]; then
+  . "$pkgsysconfdir/nid.conf"
+fi

 NODE_ID_FILE=$pkglocalstatedir/node_id

@@ -118,7 +121,17 @@ else
 # uncomment the following line if debugging errors 
that keep restarting the node

 # exit 0

+    # If the application is using different interface for 
cluster communication, please

+    # add your application specific isolation commands here
+
 logger -t "opensaf_reboot" "Rebooting local node; 
timeout=$OPENSAF_REBOOT_TIMEOUT"

+
+    # Isolate the node
+    if [ "$MDS_TRANSPORT" = "TIPC" ]; then
+   tipc-config -bd eth:$TIPC_ETH_IF
+    else
+   $icmd pkill -STOP osafdtmd
+    fi

 # Start a reboot supervision background process. Note 
that a similar
 # supervision is also done in the opensaf_reboot() 
function in LEAP.

@@ -128,12 +141,6 @@ else
 (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" 
> "/proc/sysrq-trigger") &

 fi

-   # Stop some important opensaf processes to prevent bad 
things from happening

-   $icmd pkill -STOP osafamfwd
-   $icmd pkill -STOP osafamfnd
-   $icmd pkill -STOP osafamfd
-   $icmd pkill -STOP osaffmd
-
 # Flush OpenSAF internal log server messages to disk.
 $bindir/osaflog --flush

--
1.9.1



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V3 [#2795]

2018-04-13 Thread Hans Nordebäck

Hi,



On 04/12/2018 04:15 PM, Gary Lee wrote:

Hi


On 12/04/18 23:34, Anders Widell wrote:

Ack with comments:

* There is no need to use "const" when passing function arguments by 
value. E.g. the argument "const uint64_t cluster_size" should be 
"uint64_t cluster_size".




[GL] Sure, but it doesn't do any harm, and would stop accidental 
assignments (that would be lost anyway).
[HansN] perhaps these functions should be const member functions? E.g. 
SaAisErrorT PromoteThisNode(bool graceful_takeover, uint64_t 
cluster_size) const;


* You assume that all nodes in the cluster have synchronized clocks 
(probably using NTP). Would it be possible to use an expiration time 
for the etcd key instead of writing a time stamp in the value, so 
that etcd automatically deletes the takeover request when it expires? 
That way we would not require synchronized clocks.




[GL] Good idea. I did question why I hadn't use TTL/lease once I had 
finished the ticket. :-) Will see what I can do!



regards,
Anders Widell

On 04/11/2018 09:35 AM, Gary Lee wrote:
Summary: split-brain: select active SC from largest network 
partition V3 [#2795]

Review request for Ticket(s): 2795
Peer Reviewer(s): Anders, Ravi, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2795
Base revision: 1c302a300e449e8a8527671fbd6c7f4e2b41e95d
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

  Docs    n
  Build system    n
  RPM/packaging   n
  Configuration files n
  Startup scripts n
  SAF services    n
  OpenSAF services    y
  Core libraries  y
  Samples n
  Tests   n
  Other   n


Comments (indicate scope for each "y" above):
-

*** Changes from V2: ***

fmd: made cluster_size atomic
fmd: wait 3 seconds before promoting to active, to allow topology 
events to be processed first

osaf: add check for existing takeover request, before trying to lock
etcdv3 plugin: reliablity improvements


revision c7bc78656d5de11f6147727bd8612274fb6e438f
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:16:46 +1000

rded: adapt to new Consensus API [#2795]

- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
   of cluster members

- listen for takeover requests in KV store



revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:16:18 +1000

fmd: adapt to new Consensus API [#2795]



revision 812a315af21df06b2f9fdcc3d8fd5b7bbad3e550
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:15:41 +1000

amfd: adapt to new Consensus API [#2795]



revision b8a37c1b8965826e5faffbfebc44a84bdb6433a1
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:14:39 +1000

osaf: add lock takeover request fuction [#2795]

- add create and set (if previous value matches) functions to 
KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE to 
answer takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted 
to active, it will
   create a takeover request in the KV store. An existing SC can 
reject the lock takeover




revision 955be872ba5887b1b521eac9f7732dd3f6afc593
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:13:45 +1000

osaf: extend API to include a create key and an enhanced set key 
function [#2795]


- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin



Added Files:

  src/osaf/consensus/plugins/etcd3.plugin


Complete diffstat:
--
  src/amf/amfd/role.cc |   2 +-
  src/fm/fmd/fm_cb.h   |   2 +-
  src/fm/fmd/fm_main.cc    |  26 +-
  src/fm/fmd/fm_mds.cc |   2 +
  src/fm/fmd/fm_rda.cc |  27 +-
  src/osaf/consensus/consensus.cc  | 435 
++-

  src/osaf/consensus/consensus.h   |  55 +++-
  src/osaf/consensus/key_value.cc  | 105 +---
  src/osaf/consensus/key_value.h   |  19 +-
  src/osaf/consensus/plugins/etcd.plugin   |  86 +-
  src/osaf/consensus/plugins/etcd3.plugin  | 366 
++

  src/osaf/consensus/plugins/sample.plugin |  67 -
  src/rde/rded/rde_cb.h    |  12 +-
  src/rde/rded/rde_main.cc |  75 --
  src/rde/rded/rde_mds.cc  |  39 ++-
  src/rde/rded/rde_rda.cc  |   2 +-
  src/rde/rded/role.cc   

Re: [devel] [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

2018-04-12 Thread Hans Nordebäck
Hi Ravi,


I think stonith, implemented in ticket #1859, handles this case. This 
"flickering" was one the (manual) tests verifying the added stonith support.

It is important to have a separate interface for stonith, to be able to perform 
the remote fencing, similar to use a back plane.

Have you tested with stonith enabled?


/Regards HansN


Från: ravi-sekhar <ravisekhar.ko...@oracle.com>
Skickat: den 12 april 2018 15:29:13
Till: Hans Nordebäck; Anders Widell
Kopia: opensaf-devel@lists.sourceforge.net; ravi-sekhar
Ämne: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

---
 scripts/opensaf_reboot | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index df65c26..b219c39 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
 if [ -f "$pkgsysconfdir/fmd.conf" ]; then
   . "$pkgsysconfdir/fmd.conf"
 fi
+if [ -f "$pkgsysconfdir/nid.conf" ]; then
+  . "$pkgsysconfdir/nid.conf"
+fi

 NODE_ID_FILE=$pkglocalstatedir/node_id

@@ -118,7 +121,17 @@ else
 # uncomment the following line if debugging errors that keep 
restarting the node
 # exit 0

+# If the application is using different interface for cluster 
communication, please
+# add your application specific isolation commands here
+
 logger -t "opensaf_reboot" "Rebooting local node; 
timeout=$OPENSAF_REBOOT_TIMEOUT"
+
+# Isolate the node
+if [ "$MDS_TRANSPORT" = "TIPC" ]; then
+   tipc-config -bd eth:$TIPC_ETH_IF
+else
+   $icmd pkill -STOP osafdtmd
+fi

 # Start a reboot supervision background process. Note that a 
similar
 # supervision is also done in the opensaf_reboot() function in 
LEAP.
@@ -128,12 +141,6 @@ else
 (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" > 
"/proc/sysrq-trigger") &
 fi

-   # Stop some important opensaf processes to prevent bad things 
from happening
-   $icmd pkill -STOP osafamfwd
-   $icmd pkill -STOP osafamfnd
-   $icmd pkill -STOP osafamfd
-   $icmd pkill -STOP osaffmd
-
 # Flush OpenSAF internal log server messages to disk.
 $bindir/osaflog --flush

--
1.9.1

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] imm: fix memory leaked in immnd [#2825]

2018-04-11 Thread Hans Nordebäck

ack, review only.

/Thanks HansN


On 04/05/2018 04:39 AM, Vu Minh Nguyen wrote:

The allocated memory is not freed before returning from the function
ImmModel::setCcbErrorString().
---
  src/imm/immnd/ImmModel.cc | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc
index f7c8fc0..87ded27 100644
--- a/src/imm/immnd/ImmModel.cc
+++ b/src/imm/immnd/ImmModel.cc
@@ -10910,7 +10910,6 @@ SaAisErrorT ImmModel::deleteObject(ObjectMap::iterator& 
oi, SaUint32T reqConn,
  void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString,
   va_list vl) {
int errLen = strlen(errorString) + 1;
-  char* fmtError = (char*)malloc(errLen);
int len;
va_list args;
int isValidationErrString = 0;
@@ -10921,6 +10920,9 @@ void ImmModel::setCcbErrorString(CcbInfo* ccb, const 
char* errorString,
  return;
}
  
+  char* fmtError = (char*)malloc(errLen);

+  osafassert(fmtError);
+
va_copy(args, vl);
len = vsnprintf(fmtError, errLen, errorString, args);
va_end(args);
@@ -10930,7 +10932,8 @@ void ImmModel::setCcbErrorString(CcbInfo* ccb, const 
char* errorString,
if (len > errLen) {
  char* newFmtError = (char*)realloc(fmtError, len);
  if (newFmtError == nullptr) {
-  TRACE_5("realloc error ,No memory ");
+  TRACE_5("realloc error, no memory");
+  free(fmtError);
return;
  } else {
fmtError = newFmtError;



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] build: Add support for google gmock framework V2 [#2823]

2018-04-09 Thread Hans Nordebäck

Hi Anders,

I'll remove the turtle example. Perhaps we also can add a git pull if 
the repo is older than some date?


/Thanks HansN


On 04/09/2018 10:09 AM, Anders Widell wrote:

Ack with comments:

1) You must replace the turtle example with something that you have 
written yourself, to avoid potential license problems.


2) One inline comment below, marked AndersW>

regards,

Anders Widell


On 04/03/2018 01:05 PM, Hans Nordeback wrote:

---
  00-README.unittest  | 24 ++--
  src/ais/Makefile.am |  5 -
  src/amf/Makefile.am |  7 +--
  src/base/Makefile.am    | 21 +
  src/base/tests/mock_turtle.cc   | 20 
  src/base/tests/mock_turtle.h    | 18 ++
  src/base/tests/turtle.h | 17 +
  src/dtm/Makefile.am |  5 -
  src/experimental/immcpp/api/Makefile.am |  5 -
  src/log/Makefile.am |  5 -
  test.sh | 12 +---
  11 files changed, 116 insertions(+), 23 deletions(-)
  create mode 100644 src/base/tests/mock_turtle.cc
  create mode 100644 src/base/tests/mock_turtle.h
  create mode 100644 src/base/tests/turtle.h

diff --git a/00-README.unittest b/00-README.unittest
index 79e4b4b41..f297bccd0 100644
--- a/00-README.unittest
+++ b/00-README.unittest
@@ -1,22 +1,25 @@
-Support for using google unit test in openSAF. Using unit test 
during e.g. refactoring
+Support for using google unit test and google mock in openSAF. Using 
unit test and mocking during e.g. refactoring
  to identify units and make code unit testable should improve the 
overall code quality and robustness.

  Regarding google unit test, see:
  https://code.google.com/p/googletest/
    To get and install google test do the following:
  -wget https://googletest.googlecode.com/files/gtest-1.7.0.zip
-unzip gtest-1.7.0.zip
-cd gtest-1.7.0
-./configure
-make
-export GTEST_DIR=`pwd`
+git clone https://github.com/google/googletest.git
+cd googletest
+
+autoreconf -vi
+./configure --with-pthreads
+make -j 4
+
+export GTEST_DIR=`pwd`/googletest
+export GMOCK_DIR=`pwd`/googlemock
    configure openSAF as usual, for example:
  ./bootstrap.ch
  ./configure CFLAGS="-DRUNASROOT -O2" CXXFLAGS="-DRUNASROOT -O2" 
--enable-tipc

  -make -j
+make -j 4
    To build and run the unit tests
  make check
@@ -40,8 +43,9 @@ services/saf/amf/
  └── config
    The test code to have the following naming convention as below:
-tests will be in file test_.cc, where  is the name of the 
unit test case,
-e.g test_amfdb.cc. No need to call the RUN_ALL_TESTS() macro, it is 
included in gtest_main
+tests will be in file _test.cc, where  is the name of the 
unit test case,

+mocks will be in file mock_.cc, where  is the name of the mock.
+No need to call the RUN_ALL_TESTS() macro, it is included in 
gtest_main and gmock_main

  and are automatically linked with the unit test cases.
    diff --git a/src/ais/Makefile.am b/src/ais/Makefile.am
index 1af75a0f4..2ef34b219 100644
--- a/src/ais/Makefile.am
+++ b/src/ais/Makefile.am
@@ -101,7 +101,8 @@ bin_testlib_CXXFLAGS = \
    bin_testlib_CPPFLAGS = \
  $(AM_CPPFLAGS) \
-    -I$(GTEST_DIR)/include
+    -I$(GTEST_DIR)/include \
+    -I$(GMOCK_DIR)/include
    bin_testlib_LDFLAGS = \
  $(AM_LDFLAGS)
@@ -112,4 +113,6 @@ bin_testlib_SOURCES = \
  bin_testlib_LDADD = \
  $(GTEST_DIR)/lib/libgtest.la \
  $(GTEST_DIR)/lib/libgtest_main.la \
+    $(GMOCK_DIR)/lib/libgmock.la \
+    $(GMOCK_DIR)/lib/libgmock_main.la \
  lib/libopensaf_core.la
diff --git a/src/amf/Makefile.am b/src/amf/Makefile.am
index 25261fded..413571a52 100644
--- a/src/amf/Makefile.am
+++ b/src/amf/Makefile.am
@@ -194,7 +194,8 @@ bin_testamfd_CXXFLAGS =$(AM_CXXFLAGS)
  bin_testamfd_CPPFLAGS = \
  -DSA_CLM_B01=1 -DSA_EXTENDED_NAME_SOURCE \
  $(AM_CPPFLAGS) \
-    -I$(GTEST_DIR)/include
+    -I$(GTEST_DIR)/include \
+    -I$(GMOCK_DIR)/include
    bin_testamfd_LDFLAGS = \
  $(AM_LDFLAGS) \
@@ -264,7 +265,9 @@ bin_testamfd_LDADD = \
  lib/libSaNtf.la \
  lib/libopensaf_core.la \
  $(GTEST_DIR)/lib/libgtest.la \
-    $(GTEST_DIR)/lib/libgtest_main.la
+    $(GTEST_DIR)/lib/libgtest_main.la \
+    $(GMOCK_DIR)/lib/libgmock.la \
+    $(GMOCK_DIR)/lib/libgmock_main.la
    bin_amfpm_CPPFLAGS = \
  -DSA_EXTENDED_NAME_SOURCE \
diff --git a/src/base/Makefile.am b/src/base/Makefile.am
index bb13d6c43..a7316ceb7 100644
--- a/src/base/Makefile.am
+++ b/src/base/Makefile.am
@@ -150,6 +150,8 @@ noinst_HEADERS += \
  src/base/tests/mock_osaf_abort.h \
  src/base/tests/mock_osafassert.h \
  src/base/tests/mock_syslog.h \
+    src/base/tests/mock_turtle.h \
+    src/base/tests/turtle.h \
  src/base/time.h \
  src/base/unix_client_socket.h \
  src/base/unix_server_socket.h \
@@ -163,7 +165,8 @@ 

Re: [devel] [PATCH 1/1] base: Check return code from unlink in nid_create_ipc [#2829]

2018-04-09 Thread Hans Nordebäck

Hi Anders,

yes, I'll change to you suggestion.

/Thanks HansN


On 04/06/2018 04:10 PM, Anders Widell wrote:
Ack with minor comment: instead of calling access(), you could maybe 
simply check for the ENOENT errno value from unlink()?


regards,

Anders Widell


On 04/05/2018 11:53 AM, Hans Nordeback wrote:

---
  src/nid/agent/nid_ipc.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/nid/agent/nid_ipc.c b/src/nid/agent/nid_ipc.c
index 4f43cd309..1a77fd8e2 100644
--- a/src/nid/agent/nid_ipc.c
+++ b/src/nid/agent/nid_ipc.c
@@ -28,6 +28,7 @@
    #include 
  #include 
+#include 
  #include "osaf/configmake.h"
    #include "nid/agent/nid_api.h"
@@ -56,7 +57,13 @@ uint32_t nid_create_ipc(char *strbuf)
  mode_t mask;
    /* Lets Remove any such file if it already exists */
-    unlink(NID_FIFO);
+    if (access(NID_FIFO, F_OK ) != -1 ) {
+    if (unlink(NID_FIFO) < 0) {
+    sprintf(strbuf, " FAILURE: Unable To Delete FIFO Error: 
%s\n",

+    strerror(errno));
+    return NCSCC_RC_FAILURE;
+    }
+    }
    mask = umask(0);





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amfnd: unlock before releasing the monitoring thread to avoid deadlock [#2818]

2018-04-05 Thread Hans Nordebäck
Hi Ravi,

ack, review only. (I agree, NCS_TASK_RELEASE should not be called with the 
mutex taken as pthread_mutex_lock is not a

cancellation point).

/Regards HansN

On 03/29/2018 07:59 AM, ravi-sekhar wrote:

---

 src/amf/amfnd/mon.cc | 4 ++--

 1 file changed, 2 insertions(+), 2 deletions(-)



diff --git a/src/amf/amfnd/mon.cc b/src/amf/amfnd/mon.cc

index 9cdfc37..4932d50 100644

--- a/src/amf/amfnd/mon.cc

+++ b/src/amf/amfnd/mon.cc

@@ -161,6 +161,8 @@ uint32_t avnd_mon_req_del(AVND_CB *cb, SaUint64T pid) {



   mon_rec = (AVND_MON_REQ *)m_NCS_DBLIST_FIND_FIRST(pid_mon_list);



+  m_NCS_UNLOCK(>mon_lock, NCS_LOCK_WRITE);

+

   /* No more PIDs exists in the pid_mon_list for monitoring */

   if (!mon_rec) {

 /* destroy the task */

@@ -173,8 +175,6 @@ uint32_t avnd_mon_req_del(AVND_CB *cb, SaUint64T pid) {

 }

   }



-  m_NCS_UNLOCK(>mon_lock, NCS_LOCK_WRITE);

-

   return rc;

 }



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] imm: fix memory leaked in immnd [#2825]

2018-04-04 Thread Hans Nordebäck

Hi Zoran,

yes you are right, hm, I didn't check the whole function... /BR Hans


On 04/04/2018 09:57 AM, Zoran Milinkovic wrote:

Hi Hans,

Variable arguments with vsnprintf will work.
The size of the new buffer is resized in
if (len > errLen) {
...
osafassert(vsnprintf(fmtError, len, errorString, vl) >= 0);
}
when there are variable arguments.

BR,
Zoran

-Original Message-
From: Hans Nordebäck
Sent: den 4 april 2018 08:25
To: Vu Minh Nguyen <vu.m.ngu...@dektech.com.au>; Anders Widell 
<anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com; Zoran Milinkovic 
<zoran.milinko...@ericsson.com>; Lennart Lund <lennart.l...@ericsson.com>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] imm: fix memory leaked in immnd [#2825]

Hi Vu,

not saying that you should change now, but an alternative can be to:

instead of:

char* fmtError = (char*) malloc(errLen);
    osafassert(fmtError);

a std::vector can be used, (or a std::array if fixed size):

std::vector fmtError(errLen, 0);

     :

len = vsnprintf(fmtError.data(), errLen, errorString, args);

     :

fmtError.resize(len);

:

errStr->name.buf = strdup(fmtError.data();

    :

Another thing I noticed in the beginning of this function:

void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString,
   va_list vl) {
     int errLen = strlen(errorString) + 1;

does not include the length of the variable arguments, vsnprintf will work but 
the

resulting string may be cut.

/Regards HansN


On 04/04/2018 05:02 AM, Vu Minh Nguyen wrote:

Hi Hans, Anders,

Please see my responses inline, with [Vu].

P.s:
Please ignore previous email. I pressed wrong keys...

Regards, Vu


-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com]
Sent: Tuesday, April 3, 2018 7:07 PM
To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Vu Minh Nguyen
<vu.m.ngu...@dektech.com.au>; ravisekhar.ko...@oracle.com;
zoran.milinko...@ericsson.com; lennart.l...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] imm: fix memory leaked in immnd [#2825]

Ack with comments. There is actually a second memory leak further
down in this function:

   char* newFmtError = (char*)realloc(fmtError, len);
   if (newFmtError == nullptr) {
 TRACE_5("realloc error ,No memory ");
 return;
   } else {

When realloc returns nullptr, the original memory is left untouched
(not deallocated). Thus, you need a free(fmtError) before return in
the code above.

[Vu] Thanks. I will fix this.

I agree with Hans that it would be better to use some RAII
construction instead, so that you don't need to free() before each
return - it is easy to forget. Maybe simply use std::string and
resize() it to emulate malloc/realloc? You don't have to do it now
but think about it as an improvement.

[Vu] The ownership of the allocated memory later on is moved to the `global` 
variable `ccb`.
You can see it at following code lines:
   if (strstr(errStr->name.buf, IMM_RESOURCE_ABORT) == errStr->name.buf) {
free(errStr->name.buf);
errStr->name.buf = fmtError;
errStr->name.size = len;
return;
  }

Or in other case:
else {
  (*errStrTail) = (ImmsvAttrNameList*)malloc(sizeof(ImmsvAttrNameList));
  (*errStrTail)->next = NULL;
  (*errStrTail)->name.size = len;
  (*errStrTail)->name.buf = fmtError;
}

As IMMND is mixing C/C++ code, the `CccbInfo ccb` can be used in C
code and deallocate memory using free(), therefore I  keep using malloc() to 
avoid mix using new/free() or malloc()/delete().


regards,
Anders Widell

On 04/03/2018 01:42 PM, Hans Nordebäck wrote:

Hi Vu,

few minor comments below.

/Thanks HansN


On 04/03/2018 11:43 AM, Vu Minh Nguyen wrote:

The allocated memory is not freed before returning from the
function ImmModel::setCcbErrorString().
---
src/imm/immnd/ImmModel.cc | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/imm/immnd/ImmModel.cc

b/src/imm/immnd/ImmModel.cc

index f7c8fc0..e01ff8c 100644
--- a/src/imm/immnd/ImmModel.cc
+++ b/src/imm/immnd/ImmModel.cc
@@ -10910,7 +10910,6 @@ SaAisErrorT
ImmModel::deleteObject(ObjectMap::iterator& oi, SaUint32T reqConn,
void ImmModel::setCcbErrorString(CcbInfo* ccb, const char*
errorString,
 va_list vl) {
  int errLen = strlen(errorString) + 1;
-  char* fmtError = (char*)malloc(errLen);
  int len;
  va_list args;
  int isValidationErrString = 0; @@ -10921,6 +10920,9 @@ void
ImmModel::setCcbErrorString(CcbInfo*
ccb, const char* errorString,
return;
  }
+  char* fmtError = (char*)malloc(errLen);
+  osafassert(fmtError);

[HansN] in c++ new should be used instead of malloc. There is no
need to check return value of new if "std::set_new_handler(new_handler)"
has bee

Re: [devel] [PATCH 1/1] imm: fix memory leaked in immnd [#2825]

2018-04-04 Thread Hans Nordebäck

Hi Vu,

not saying that you should change now, but an alternative can be to:

instead of:

char* fmtError = (char*) malloc(errLen);
  osafassert(fmtError);

a std::vector can be used, (or a std::array if fixed size):

std::vector fmtError(errLen, 0);

   :

len = vsnprintf(fmtError.data(), errLen, errorString, args);

   :

fmtError.resize(len);

:

errStr->name.buf = strdup(fmtError.data();

  :

Another thing I noticed in the beginning of this function:

void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString,
 va_list vl) {
   int errLen = strlen(errorString) + 1;

does not include the length of the variable arguments, vsnprintf will 
work but the


resulting string may be cut.

/Regards HansN


On 04/04/2018 05:02 AM, Vu Minh Nguyen wrote:

Hi Hans, Anders,

Please see my responses inline, with [Vu].

P.s:
Please ignore previous email. I pressed wrong keys...

Regards, Vu


-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com]
Sent: Tuesday, April 3, 2018 7:07 PM
To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Vu Minh Nguyen
<vu.m.ngu...@dektech.com.au>; ravisekhar.ko...@oracle.com;
zoran.milinko...@ericsson.com; lennart.l...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] imm: fix memory leaked in immnd [#2825]

Ack with comments. There is actually a second memory leak further down
in this function:

  char* newFmtError = (char*)realloc(fmtError, len);
  if (newFmtError == nullptr) {
TRACE_5("realloc error ,No memory ");
return;
  } else {

When realloc returns nullptr, the original memory is left untouched (not
deallocated). Thus, you need a free(fmtError) before return in the code
above.

[Vu] Thanks. I will fix this.

I agree with Hans that it would be better to use some RAII construction
instead, so that you don't need to free() before each return - it is
easy to forget. Maybe simply use std::string and resize() it to emulate
malloc/realloc? You don't have to do it now but think about it as an
improvement.

[Vu] The ownership of the allocated memory later on is moved to the `global` 
variable `ccb`.
You can see it at following code lines:
  if (strstr(errStr->name.buf, IMM_RESOURCE_ABORT) == errStr->name.buf) {
   free(errStr->name.buf);
   errStr->name.buf = fmtError;
   errStr->name.size = len;
   return;
 }

Or in other case:
else {
 (*errStrTail) = (ImmsvAttrNameList*)malloc(sizeof(ImmsvAttrNameList));
 (*errStrTail)->next = NULL;
 (*errStrTail)->name.size = len;
 (*errStrTail)->name.buf = fmtError;
   }

As IMMND is mixing C/C++ code, the `CccbInfo ccb` can be used in C code and 
deallocate memory using free(),
therefore I  keep using malloc() to avoid mix using new/free() or 
malloc()/delete().


regards,
Anders Widell

On 04/03/2018 01:42 PM, Hans Nordebäck wrote:

Hi Vu,

few minor comments below.

/Thanks HansN


On 04/03/2018 11:43 AM, Vu Minh Nguyen wrote:

The allocated memory is not freed before returning from the function
ImmModel::setCcbErrorString().
---
   src/imm/immnd/ImmModel.cc | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/imm/immnd/ImmModel.cc

b/src/imm/immnd/ImmModel.cc

index f7c8fc0..e01ff8c 100644
--- a/src/imm/immnd/ImmModel.cc
+++ b/src/imm/immnd/ImmModel.cc
@@ -10910,7 +10910,6 @@ SaAisErrorT
ImmModel::deleteObject(ObjectMap::iterator& oi, SaUint32T reqConn,
   void ImmModel::setCcbErrorString(CcbInfo* ccb, const char*
errorString,
va_list vl) {
 int errLen = strlen(errorString) + 1;
-  char* fmtError = (char*)malloc(errLen);
 int len;
 va_list args;
 int isValidationErrString = 0;
@@ -10921,6 +10920,9 @@ void ImmModel::setCcbErrorString(CcbInfo*
ccb, const char* errorString,
   return;
 }
   +  char* fmtError = (char*)malloc(errLen);
+  osafassert(fmtError);

[HansN] in c++ new should be used instead of malloc. There is no need
to check return value of new if "std::set_new_handler(new_handler)"
has been called in advance, e.g. in the main function. (also fmtError
is a local variable, it should be possible to use RAII and avoid
explicit calls to delete).

[Vu] Please see my responses previously and let me know your opinion. Thanks.

/Vu


+
 va_copy(args, vl);
 len = vsnprintf(fmtError, errLen, errorString, args);
 va_end(args);





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] imm: fix memory leaked in immnd [#2825]

2018-04-03 Thread Hans Nordebäck

Hi Vu,

few minor comments below.

/Thanks HansN


On 04/03/2018 11:43 AM, Vu Minh Nguyen wrote:

The allocated memory is not freed before returning from the function
ImmModel::setCcbErrorString().
---
  src/imm/immnd/ImmModel.cc | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc
index f7c8fc0..e01ff8c 100644
--- a/src/imm/immnd/ImmModel.cc
+++ b/src/imm/immnd/ImmModel.cc
@@ -10910,7 +10910,6 @@ SaAisErrorT ImmModel::deleteObject(ObjectMap::iterator& 
oi, SaUint32T reqConn,
  void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString,
   va_list vl) {
int errLen = strlen(errorString) + 1;
-  char* fmtError = (char*)malloc(errLen);
int len;
va_list args;
int isValidationErrString = 0;
@@ -10921,6 +10920,9 @@ void ImmModel::setCcbErrorString(CcbInfo* ccb, const 
char* errorString,
  return;
}
  
+  char* fmtError = (char*)malloc(errLen);

+  osafassert(fmtError);
[HansN] in c++ new should be used instead of malloc. There is no need to 
check return value of new if "std::set_new_handler(new_handler)"
has been called in advance, e.g. in the main function. (also fmtError is 
a local variable, it should be possible to use RAII and avoid explicit 
calls to delete).

+
va_copy(args, vl);
len = vsnprintf(fmtError, errLen, errorString, args);
va_end(args);



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] VB: [PATCH 1/1] log: Handling of IMM OI BAD HANDLE in log server is incorrect [#2799]

2018-03-30 Thread Hans Nordebäck
Hi Lennart,


I have one question below marked [HansN].


/Regards HansN



Från: Lennart Lund 
Skickat: den 29 mars 2018 16:04
Till: Vu Minh Nguyen; Canh Van Truong
Kopia: opensaf-devel@lists.sourceforge.net
Ämne: [devel] [PATCH 1/1] log: Handling of IMM OI BAD HANDLE in log server is 
incorrect [#2799]

Recovery of OI handle shall be started in all places where BAD HANDLE
can be returned. Creation of OI must be done in background thread.
Ongoing creation must be possible to stop e.g if server is becoming
standby
---
 src/log/Makefile.am  |   3 +
 src/log/logd/lgs.h   |  24 ---
 src/log/logd/lgs_amf.cc  |  26 +--
 src/log/logd/lgs_cb.h|   3 -
 src/log/logd/lgs_config.cc   |  22 +-
 src/log/logd/lgs_config.h|   5 +-
 src/log/logd/lgs_evt.cc  |  60 +++---
 src/log/logd/lgs_imm.cc  | 312 +++--
 src/log/logd/lgs_imm.h   |  53 +
 src/log/logd/lgs_main.cc | 116 ---
 src/log/logd/lgs_mbcsv_v2.cc |   7 +-
 src/log/logd/lgs_mbcsv_v3.cc |   7 +-
 src/log/logd/lgs_mbcsv_v5.cc |   3 +-
 src/log/logd/lgs_oi_admin.cc | 466 +++
 src/log/logd/lgs_oi_admin.h  | 105 ++
 src/log/logd/lgs_recov.cc|   4 +-
 src/log/logd/lgs_stream.cc   |  85 ++--
 17 files changed, 842 insertions(+), 459 deletions(-)
 create mode 100644 src/log/logd/lgs_imm.h
 create mode 100644 src/log/logd/lgs_oi_admin.cc
 create mode 100644 src/log/logd/lgs_oi_admin.h

diff --git a/src/log/Makefile.am b/src/log/Makefile.am
index 3d951eb5d..5d33d355b 100644
--- a/src/log/Makefile.am
+++ b/src/log/Makefile.am
@@ -79,6 +79,7 @@ noinst_HEADERS += \
 src/log/logd/lgs_file.h \
 src/log/logd/lgs_filehdl.h \
 src/log/logd/lgs_fmt.h \
+   src/log/logd/lgs_imm.h \
 src/log/logd/lgs_imm_gcfg.h \
 src/log/logd/lgs_mbcsv.h \
 src/log/logd/lgs_mbcsv_v1.h \
@@ -86,6 +87,7 @@ noinst_HEADERS += \
 src/log/logd/lgs_mbcsv_v3.h \
 src/log/logd/lgs_mbcsv_v5.h \
 src/log/logd/lgs_mbcsv_v6.h \
+   src/log/logd/lgs_oi_admin.h \
 src/log/logd/lgs_recov.h \
 src/log/logd/lgs_stream.h \
 src/log/logd/lgs_util.h \
@@ -139,6 +141,7 @@ bin_osaflogd_SOURCES = \
 src/log/logd/lgs_mbcsv_v5.cc \
 src/log/logd/lgs_mbcsv_v6.cc \
 src/log/logd/lgs_mds.cc \
+   src/log/logd/lgs_oi_admin.cc \
 src/log/logd/lgs_recov.cc \
 src/log/logd/lgs_stream.cc \
 src/log/logd/lgs_util.cc \
diff --git a/src/log/logd/lgs.h b/src/log/logd/lgs.h
index 18e6d9281..b1d773375 100644
--- a/src/log/logd/lgs.h
+++ b/src/log/logd/lgs.h
@@ -95,7 +95,6 @@ extern uint32_t mbox_msgs[NCS_IPC_PRIORITY_MAX];
 extern bool mbox_full[NCS_IPC_PRIORITY_MAX];
 extern uint32_t mbox_low[NCS_IPC_PRIORITY_MAX];
 extern pthread_mutex_t lgs_mbox_init_mutex;
-extern pthread_mutex_t lgs_OI_init_mutex;

 extern uint32_t initialize_for_assignment(lgs_cb_t *cb, SaAmfHAStateT 
ha_state);

@@ -108,27 +107,4 @@ extern uint32_t lgs_mds_msg_send(lgs_cb_t *cb, lgsv_msg_t 
*msg, MDS_DEST *dest,
  MDS_SYNC_SND_CTXT *mds_ctxt,
  MDS_SEND_PRIORITY_TYPE prio);

-extern SaAisErrorT lgs_imm_create_configStream(lgs_cb_t *cb);
-extern void logRootDirectory_filemove(const std::string _logRootDirectory,
-  const std::string _logRootDirectory,
-  time_t *cur_time_in);
-extern void logDataGroupname_fileown(const char *new_logDataGroupname);
-
-extern void lgs_imm_impl_reinit_nonblocking(lgs_cb_t *cb);
-extern void lgs_imm_init_OI_handle(SaImmOiHandleT *immOiHandle,
-   SaSelectionObjectT *immSelectionObject);
-extern void lgs_imm_impl_set(SaImmOiHandleT *immOiHandle,
- SaSelectionObjectT *immSelectionObject);
-extern SaAisErrorT lgs_imm_init_configStreams(lgs_cb_t *cb);
-
-// Functions for recovery handling
-void lgs_cleanup_abandoned_streams();
-void lgs_delete_one_stream_object(const std::string _str);
-void lgs_search_stream_objects();
-SaUint32T *lgs_get_scAbsenceAllowed_attr(SaUint32T *attr_val);
-int lgs_get_streamobj_attr(SaImmAttrValuesT_2 ***attrib_out,
-   const std::string _name,
-   SaImmHandleT *immOmHandle);
-int lgs_free_streamobj_attr(SaImmHandleT immHandle);
-
 #endif  // LOG_LOGD_LGS_H_
diff --git a/src/log/logd/lgs_amf.cc b/src/log/logd/lgs_amf.cc
index 6fa044ff2..3f0de8c1c 100644
--- a/src/log/logd/lgs_amf.cc
+++ b/src/log/logd/lgs_amf.cc
@@ -23,6 +23,7 @@
 #include "osaf/immutil/immutil.h"
 #include "log/logd/lgs.h"
 #include "log/logd/lgs_config.h"
+#include "log/logd/lgs_oi_admin.h"

 static void close_all_files() {
   log_stream_t *stream;
@@ -63,8 +64,7 @@ static SaAisErrorT amf_active_state_handler(lgs_cb_t *cb,
 goto done;
   }

-  

Re: [devel] [PATCH 1/1] mds: improve thread safety in mdstest - part 2 [#2746]

2018-03-27 Thread Hans Nordebäck

ack, review only. /Thanks HansN


On 03/27/2018 10:34 AM, Hoa Le wrote:

- Remove thread-local declaration of svc_to_mds_info
---
  src/mds/apitest/mdstipc_api.c  |  7 ++-
  src/mds/apitest/mdstipc_conf.c | 43 +-
  2 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c
index 669c770..5bfa7ef 100644
--- a/src/mds/apitest/mdstipc_api.c
+++ b/src/mds/apitest/mdstipc_api.c
@@ -33,7 +33,6 @@ static MDS_CLIENT_MSG_FORMAT_VER gl_set_msg_fmt_ver;
  
  MDS_SVC_ID svc_ids[3] = {2006, 2007, 2008};
  
-_Thread_local NCSMDS_INFO svc_to_mds_info;

  pthread_mutex_t safe_printf_mutex = PTHREAD_MUTEX_INITIALIZER;
  pthread_mutex_t gl_mutex = PTHREAD_MUTEX_INITIALIZER;
  
@@ -3513,6 +3512,7 @@ void tet_just_send_tp_11()

MDS_SVC_ID svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN};
gl_vdest_indx = 0;
  
+	NCSMDS_INFO svc_to_mds_info;

char tmp[] = " Hi Receiver ";
TET_MDS_MSG *mesg;
mesg = (TET_MDS_MSG *)malloc(sizeof(TET_MDS_MSG));
@@ -8020,6 +8020,7 @@ void tet_direct_just_send_tp_9()
  {
int FAIL = 0;
MDS_SVC_ID svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN};
+   NCSMDS_INFO svc_to_mds_info;
char message[] = "Direct Message";
  
  	/*start up*/

@@ -8145,6 +8146,7 @@ void tet_direct_just_send_tp_11()
  {
int FAIL = 0;
MDS_SVC_ID svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN};
+   NCSMDS_INFO svc_to_mds_info;
char message[] = "Direct Message";
  
  	/*start up*/

@@ -9998,6 +1,7 @@ void tet_direct_send_ack_tp_10()
  {
int FAIL = 0;
MDS_SVC_ID svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN};
+   NCSMDS_INFO svc_to_mds_info;
char message[] = "Direct Message";
/*start up*/
if (tet_initialise_setup(false)) {
@@ -10074,6 +10077,7 @@ void tet_direct_send_ack_tp_11()
  {
int FAIL = 0;
MDS_SVC_ID svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN};
+   NCSMDS_INFO svc_to_mds_info;
char message[] = "Direct Message";
/*start up*/
if (tet_initialise_setup(false)) {
@@ -11709,6 +11713,7 @@ void tet_direct_broadcast_to_svc_tp_8()
  {
int FAIL = 0;
MDS_SVC_ID svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN};
+   NCSMDS_INFO svc_to_mds_info;
/*Start up*/
if (tet_initialise_setup(false)) {
printf("\n Setup Initialisation has Failed \n");
diff --git a/src/mds/apitest/mdstipc_conf.c b/src/mds/apitest/mdstipc_conf.c
index c2d7d01..bf4c1de 100644
--- a/src/mds/apitest/mdstipc_conf.c
+++ b/src/mds/apitest/mdstipc_conf.c
@@ -25,7 +25,6 @@ extern int fill_syncparameters(int);
  extern uint32_t mds_vdest_tbl_get_role(MDS_VDEST_ID vdest_id, V_DEST_RL 
*role);
  extern pthread_mutex_t gl_mds_library_mutex;
  
-extern _Thread_local NCSMDS_INFO svc_to_mds_info;

  extern pthread_mutex_t safe_printf_mutex;
  extern pthread_mutex_t gl_mutex;
  
@@ -418,7 +417,7 @@ uint32_t mds_service_install(MDS_HDL mds_hdl, MDS_SVC_ID svc_id,

 bool mds_q_ownership, bool fail_no_active_sends)
  {
int i;
-   memset(_to_mds_info, 0, sizeof(svc_to_mds_info));
+   NCSMDS_INFO svc_to_mds_info;
  
  	svc_to_mds_info.i_mds_hdl = mds_hdl;

svc_to_mds_info.i_svc_id = svc_id;
@@ -465,7 +464,7 @@ uint32_t mds_service_uninstall(MDS_HDL mds_hdl, MDS_SVC_ID 
svc_id)
  {
int i, j, k, FOUND;
uint32_t YES_ADEST;
-   memset(_to_mds_info, 0, sizeof(svc_to_mds_info));
+   NCSMDS_INFO svc_to_mds_info;
/*Find whether this Service is on Adest or Vdest*/
YES_ADEST = is_service_on_adest(mds_hdl, svc_id);
  
@@ -560,7 +559,7 @@ uint32_t mds_service_subscribe(MDS_HDL mds_hdl, MDS_SVC_ID svc_id,

  {
int i, j, k, l, FOUND;
uint32_t YES_ADEST;
-   memset(_to_mds_info, 0, sizeof(svc_to_mds_info));
+   NCSMDS_INFO svc_to_mds_info;
/*Find whether this Service is on Adest or Vdest*/
YES_ADEST = is_service_on_adest(mds_hdl, svc_id);
  
@@ -746,7 +745,7 @@ uint32_t mds_service_redundant_subscribe(MDS_HDL mds_hdl, MDS_SVC_ID svc_id,

  {
int i, j, k, l, FOUND;
uint32_t YES_ADEST;
-   memset(_to_mds_info, 0, sizeof(svc_to_mds_info));
+   NCSMDS_INFO svc_to_mds_info;
/*Find whether this Service is on Adest or Vdest*/
YES_ADEST = is_service_on_adest(mds_hdl, svc_id);
  
@@ -931,7 +930,7 @@ uint32_t mds_service_cancel_subscription(MDS_HDL mds_hdl, MDS_SVC_ID svc_id,

 uint8_t num_svcs, MDS_SVC_ID *svc_ids)
  {
int i, j, k, FOUND;
-   memset(_to_mds_info, 0, sizeof(svc_to_mds_info));
+   NCSMDS_INFO svc_to_mds_info;
svc_to_mds_info.i_mds_hdl = mds_hdl;
svc_to_mds_info.i_svc_id = svc_id;
svc_to_mds_info.i_op = MDS_CANCEL;
@@ -998,7 +997,7 @@ uint32_t mds_just_send(MDS_HDL mds_hdl, MDS_SVC_ID svc_id, 
MDS_SVC_ID to_svc,
   TET_MDS_MSG 

Re: [devel] [PATCH 1/1] mds: improve thread safety in mdstest - part 2 [#2746]

2018-03-27 Thread Hans Nordebäck
Hi Hoa,

Do we need the svc_to_mds_info to be tls? I used it in the sample added to 
ticket
to reduce number of helgrind warnings when I verified safe_printf and 
safe_fflush. 
It should have been removed in the sample, as it was not fully verified.
/Thanks HansN 


-Original Message-
From: Hoa Le [mailto:hoa...@dektech.com.au] 
Sent: den 27 mars 2018 03:47
To: Anders Widell <anders.wid...@ericsson.com>; Hans Nordebäck 
<hans.nordeb...@ericsson.com>
Cc: opensaf-devel@lists.sourceforge.net; Hoa Le <hoa...@dektech.com.au>
Subject: [PATCH 1/1] mds: improve thread safety in mdstest - part 2 [#2746]

- Use __thread if _Thread_local is not supported in GCC version lower than 4.9
---
 src/mds/apitest/mdstipc.h  | 6 ++
 src/mds/apitest/mdstipc_api.c  | 2 +-
 src/mds/apitest/mdstipc_conf.c | 2 +-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/mds/apitest/mdstipc.h b/src/mds/apitest/mdstipc.h index 
01b58c4..f67890a 100644
--- a/src/mds/apitest/mdstipc.h
+++ b/src/mds/apitest/mdstipc.h
@@ -21,6 +21,12 @@
 #include "base/ncssysf_tsk.h"
 #include "base/ncssysf_def.h"
 
+#if !defined(_Thread_local)
+#define MDS_THREAD_LOCAL __thread
+#else
+#define MDS_THREAD_LOCAL _Thread_local
+#endif
+
 typedef struct tet_task {
   NCS_OS_CB entry;
   void *arg;
diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c 
index 669c770..2ff8238 100644
--- a/src/mds/apitest/mdstipc_api.c
+++ b/src/mds/apitest/mdstipc_api.c
@@ -33,7 +33,7 @@ static MDS_CLIENT_MSG_FORMAT_VER gl_set_msg_fmt_ver;
 
 MDS_SVC_ID svc_ids[3] = {2006, 2007, 2008};
 
-_Thread_local NCSMDS_INFO svc_to_mds_info;
+MDS_THREAD_LOCAL NCSMDS_INFO svc_to_mds_info;
 pthread_mutex_t safe_printf_mutex = PTHREAD_MUTEX_INITIALIZER;  
pthread_mutex_t gl_mutex = PTHREAD_MUTEX_INITIALIZER;
 
diff --git a/src/mds/apitest/mdstipc_conf.c b/src/mds/apitest/mdstipc_conf.c 
index c2d7d01..d6ee48e 100644
--- a/src/mds/apitest/mdstipc_conf.c
+++ b/src/mds/apitest/mdstipc_conf.c
@@ -25,7 +25,7 @@ extern int fill_syncparameters(int);  extern uint32_t 
mds_vdest_tbl_get_role(MDS_VDEST_ID vdest_id, V_DEST_RL *role);  extern 
pthread_mutex_t gl_mds_library_mutex;
 
-extern _Thread_local NCSMDS_INFO svc_to_mds_info;
+extern MDS_THREAD_LOCAL NCSMDS_INFO svc_to_mds_info;
 extern pthread_mutex_t safe_printf_mutex;  extern pthread_mutex_t gl_mutex;
 
--
2.7.4


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] base: Add functions for parsing a string as an integer [#2814]

2018-03-26 Thread Hans Nordebäck

ack, review only. Minor comment below.

/Thanks HansN


On 03/22/2018 03:13 PM, Anders Widell wrote:

The new functions StrToInt64 and StrToUint64 parse a string as a signed and
unsigned 64-bit integer, respectively. Prefixes 0 for octal and 0x for
hexadecimal are supported, as well as suffixes k, M and G for kilobytes,
megabytes and gigabytes, respectively.
---
  src/base/Makefile.am|   3 +
  src/base/string_parse.cc|  79 ++
  src/base/string_parse.h |  47 +
  src/base/tests/string_parse_test.cc | 129 
  4 files changed, 258 insertions(+)
  create mode 100644 src/base/string_parse.cc
  create mode 100644 src/base/string_parse.h
  create mode 100644 src/base/tests/string_parse_test.cc

diff --git a/src/base/Makefile.am b/src/base/Makefile.am
index 540c6dfe7..bb13d6c43 100644
--- a/src/base/Makefile.am
+++ b/src/base/Makefile.am
@@ -65,6 +65,7 @@ lib_libopensaf_core_la_SOURCES += \
src/base/process.cc \
src/base/saf_edu.c \
src/base/saf_error.c \
+   src/base/string_parse.cc \
src/base/sysf_def.c \
src/base/sysf_exc_scr.c \
src/base/sysf_ipc.c \
@@ -140,6 +141,7 @@ noinst_HEADERS += \
src/base/saf_error.h \
src/base/saf_mem.h \
src/base/sprr_dl_api.h \
+   src/base/string_parse.h \
src/base/sysf_exc_scr.h \
src/base/sysf_ipc.h \
src/base/tests/mock_clock_gettime.h \
@@ -206,6 +208,7 @@ bin_libbase_test_SOURCES = \
src/base/tests/mock_logtrace.cc \
src/base/tests/mock_osaf_abort.cc \
src/base/tests/mock_osafassert.cc \
+   src/base/tests/string_parse_test.cc \
src/base/tests/time_add_test.cc \
src/base/tests/time_compare_test.cc \
src/base/tests/time_convert_test.cc \
diff --git a/src/base/string_parse.cc b/src/base/string_parse.cc
new file mode 100644
index 0..915f0e95a
--- /dev/null
+++ b/src/base/string_parse.cc
@@ -0,0 +1,79 @@
+/*  -*- OpenSAF  -*-
+ *
+ * Copyright Ericsson AB 2018 - All Rights Reserved.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
+ * under the GNU Lesser General Public License Version 2.1, February 1999.
+ * The complete license can be accessed from the following location:
+ * http://opensource.org/licenses/lgpl-license.php
+ * See the Copying file included with the OpenSAF distribution for full
+ * licensing terms.
+ *
+ */
+
+#include "base/string_parse.h"
+
+#include 
+#include 
+#include 
+
+namespace {
+
+int64_t ParseSuffix(char** endptr) {
+  switch (**endptr) {
+case 'k':
+  ++(*endptr);
+  return 1024;
+case 'M':
+  ++(*endptr);
+  return 1024 * 1024;
+case 'G':
+  ++(*endptr);
+  return 1024 * 1024 * 1024;
+default:
+  return 1;
+  }
+}
+
+}  // namespace
+
+namespace base {
+
+int64_t StrToInt64(const char* str, bool* success) {
+  str = RemoveLeadingWhitespace(str);
+  errno = 0;
+  char* endptr;
+  int64_t val = strtoll(str, , 0);
+  int64_t multiplier = ParseSuffix();
+  endptr = RemoveLeadingWhitespace(endptr);
+  *success = *str != '\0' && errno == 0 && *endptr == '\0' &&
+ (val >= 0 ? val <= (INT64_MAX / multiplier)
+   : val >= (INT64_MIN / multiplier));
+  return val * multiplier;
+}
+
+uint64_t StrToUint64(const char* str, bool* success) {
+  str = RemoveLeadingWhitespace(str);
+  errno = 0;
+  char* endptr;
+  uint64_t val = strtoull(str, , 0);
+  uint64_t multiplier = ParseSuffix();
+  endptr = RemoveLeadingWhitespace(endptr);
+  *success = *str != '\0' && *str != '-' && errno == 0 && *endptr == '\0' &&
+ val <= (~static_cast(0) / multiplier);
+  return val * multiplier;
+}
+
+const char* RemoveLeadingWhitespace(const char* str) {
+  while (*str == ' ' || *str == '\t' || *str == '\r' || *str == '\n') ++str;
+  return str;
+}
+
+char* RemoveLeadingWhitespace(char* str) {
+  while (*str == ' ' || *str == '\t' || *str == '\r' || *str == '\n') ++str;
+  return str;
+}
+

[HansN] you can change name of thiese functions and use isspace:
const char* TrimLeadingWhitespace(const char* str) {
  while (isspace(*str)) ++str;
  return str;
}

char* TrimLeadingWhitespace(char* str) {
  while (isspace(*str)) ++str;
  return str;
}

+}  // namespace base
diff --git a/src/base/string_parse.h b/src/base/string_parse.h
new file mode 100644
index 0..17569241c
--- /dev/null
+++ b/src/base/string_parse.h
@@ -0,0 +1,47 @@
+/*  -*- OpenSAF  -*-
+ *
+ * Copyright Ericsson AB 2018 - All Rights Reserved.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
+ * 

Re: [devel] [PATCH 1/1] mds: improve thread safety in mdstest [#2746]

2018-03-22 Thread Hans Nordebäck

Hi Hoa,

ack, I reviewed the V1 patch and it looks good and I agree on Zoran's 
comments regarding mutex instead of rwlock.


I'm waiting for the valgrind/helgrind results, so far it looks good. One 
question to Zoran, doesn't the performance


os rwlock depend on e.g. number of threads, access patterns, atomic 
operations etc. otherwise there would


be no need to have read and write locks? Say you have a large number of 
threads that are only reading


some variable(s), not atomic, and using read locks it should be able to 
run them in parallel which is not the


case with an ordinary mutex, which serializes the readings. Have you 
tested this or where comes these numbers from?


/Regards HansN


On 03/22/2018 01:47 PM, Hoa Le wrote:


Hi,

I replaced "rwlock" with "mutex" as suggested by Zoran in version 2 of 
the patch, please help review it again.


Thank you.

--
Best regards,
Hoa Le
On 03/21/2018 07:43 PM, Zoran Milinkovic wrote:

Hi,

According to the patch (I haven't checked the code), I don't see the reason for 
using rwlock. Pthread mutex will even work better than rwlock in the patch.

Reasons for using mutex:
1. Mutex is much faster than rwlock in Linux, around 10 times faster, if I 
remember correctly
2. Here, we are talking about two threads, and not more.
3. rwlock has never been used in OpenSAF before.

BR,
Zoran

-Original Message-
From: Hoa Le [mailto:hoa...@dektech.com.au]
Sent: den 21 mars 2018 11:34
To: Anders Widell<anders.wid...@ericsson.com>; Hans 
Nordebäck<hans.nordeb...@ericsson.com>
Cc:opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] mds: improve thread safety in mdstest [#2746]

- Correct helgrind issues in mds/apitest
---
  src/mds/apitest/mdstest.c  |   7 +-
  src/mds/apitest/mdstipc.h  |   7 +-
  src/mds/apitest/mdstipc_api.c  | 196 +
  src/mds/apitest/mdstipc_conf.c |  89 +--
  4 files changed, 234 insertions(+), 65 deletions(-)

diff --git a/src/mds/apitest/mdstest.c b/src/mds/apitest/mdstest.c
index bf6e173..3280e5b 100644
--- a/src/mds/apitest/mdstest.c
+++ b/src/mds/apitest/mdstest.c
@@ -35,6 +35,7 @@
  //#include "mdstest.h"
  
  SaAisErrorT rc;

+pthread_rwlock_t gl_lock;
  
  int mds_startup(void)

  {
@@ -83,13 +84,17 @@ int main(int argc, char **argv)
if (suite == 999) {
return 0;
}
-
if (mds_startup() != 0) {
printf("Fail to start mds agents\n");
return 1;
}
  
+	pthread_rwlock_init(_lock, NULL);

+
int rc = test_run(suite, tcase);
+
+   pthread_rwlock_destroy(_lock);
+
mds_shutdown();
return rc;
  }
diff --git a/src/mds/apitest/mdstipc.h b/src/mds/apitest/mdstipc.h
index fbb6468..9e93a17 100644
--- a/src/mds/apitest/mdstipc.h
+++ b/src/mds/apitest/mdstipc.h
@@ -145,13 +145,12 @@ typedef struct tet_mds_recvd_msg_info {
  } TET_MDS_RECVD_MSG_INFO;
  
  /* GLOBAL variables /

+extern _Thread_local NCSMDS_INFO svc_to_mds_info;
+extern pthread_rwlock_t gl_lock;
+
  TET_ADEST gl_tet_adest;
  TET_VDEST
  gl_tet_vdest[4]; /*change it to 6 to run VDS Redundancy: 101 for Stress*/
-NCSADA_INFO ada_info;
-NCSVDA_INFO vda_info;
-NCSMDS_INFO svc_to_mds_info;
-TET_EVENT_INFO gl_event_data;
  TET_SVC gl_tet_svc;
  TET_MDS_RECVD_MSG_INFO gl_rcvdmsginfo, gl_direct_rcvmsginfo;
  int gl_vdest_indx;
diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c
index 5eb8bd9..3a98ecd 100644
--- a/src/mds/apitest/mdstipc_api.c
+++ b/src/mds/apitest/mdstipc_api.c
@@ -33,6 +33,28 @@ static MDS_CLIENT_MSG_FORMAT_VER gl_set_msg_fmt_ver;
  
  MDS_SVC_ID svc_ids[3] = {2006, 2007, 2008};
  
+pthread_mutex_t safe_printf_mutex = PTHREAD_MUTEX_INITIALIZER;

+_Thread_local NCSMDS_INFO svc_to_mds_info;
+
+void safe_printf(const char* format, ... ) {
+   pthread_mutex_lock(_printf_mutex);
+   va_list args;
+   va_start(args, format);
+   vfprintf(stdout, format, args);
+   va_end(args);
+   pthread_mutex_unlock(_printf_mutex);
+}
+int safe_fflush(FILE *stream) {
+   int rc = 0;
+   pthread_mutex_lock(_printf_mutex);
+   rc = fflush(stream);
+   pthread_mutex_unlock(_printf_mutex);
+   return rc;
+}
+
+#define printf safe_printf
+#define fflush safe_fflush
+
  
/*/
  /SERVICE API TEST CASES   
/
  
/*/
@@ -363,6 +385,7 @@ void tet_svc_install_tp_10()
  {
int FAIL = 0;
SaUint32T rc;
+   NCSCONTEXT t_handle = 0;
// Creating a MxN VDEST with id = 2000
rc = create_vdest(NCS_VDEST_TYPE_MxN, 2000);
if (rc != NCSCC_RC_SUCCESS) {
@@ -373,25 +396,25 @@ void tet_svc_install_tp_10()
printf(
"

Re: [devel] [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805]

2018-03-20 Thread Hans Nordebäck
Hi Minh,
Yes, that will be ok.

/Regards HansN

-Original Message-
From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] 
Sent: den 20 mars 2018 12:53
To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell 
<anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] base: Only close inherited fd(s) after fork() in child 
process [#2805]

Hi Hans,

So I guess it's ok to push V1 with comments from you and Anders.

Regards,

Minh


On 20/03/18 19:16, Hans Nordebäck wrote:
> Hi Minh,
>
> I think you can keep v1 but add the missing if stmt. The /proc/self/fd 
> directory should only contain open fd's and the current and parent directory.
> Later you change stroixmax to the new string to integer utility, if needed, 
> (not likely it is needed though, I think).
> /Regards HansN
>
> -Original Message-
> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
> Sent: den 20 mars 2018 08:29
> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell 
> <anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1/1] base: Only close inherited fd(s) after fork() 
> in child process [#2805]
>
> Hi Hans,
>
> I have just seen Anders created #2814 for the utility function, once
> #2814 is done, we can change the integer conversion in this patch by new 
> utility function, maybe for a numbers of atoXX (atoi, atol,..) calls in many 
> other places for a better error handling. How do you think?
>
> Thanks,
>
> Minh
>
>
> On 20/03/18 01:29, Hans Nordebäck wrote:
>> Hi Minh,
>>
>> I think this additional check, current and parent, directory is 
>> enough for V1 patch. The usage of the 2nd
>>
>> parameter of strtol in V2 patch can be put in a utility function for 
>> a broader use.
>>
>> /Regards HansN
>>
>>
>> On 03/19/2018 08:50 AM, Minh Hon Chau wrote:
>>> Hi Hans,
>>>
>>> Agree that the check of "." and ".." should be added in V1.
>>>
>>> This V2 I use the second parameter of strtol, it should ensure that 
>>> anything read from the fd directory is entirely digit, before close 
>>> the fd.
>>>
>>> There should not be any alphabet-based directories other than ".", 
>>> ".." and 0, 1, 2, ..etc, but the usage of 2nd parameter of strtol is 
>>> more generalized
>>>
>>> So I think the check of strcmp is not needed?
>>>
>>> Thanks,
>>> Minh
>>> On 19/03/18 18:00, Hans Nordebäck wrote:
>>>> Hi Minh,
>>>>
>>>> my comment was that this check could be added:
>>>>
>>>> if (strcmp(pentry->d_name, ".") == 0 || strcmp(pentry->d_name, 
>>>> "..") == 0)
>>>>
>>>>     continue;
>>>>
>>>> /Regards HansN
>>>>
>>>> On 03/16/2018 01:27 PM, Minh Hon Chau wrote:
>>>>> Hi Anders, Hans,
>>>>>
>>>>> When I tested the patch, I did see the "." and ".." returned from 
>>>>> readdir, but the stroimax also return 0, so probably it won't be a 
>>>>> problem to close(0) more than once
>>>>>
>>>>> But to be more safety, it should check the @second_ptr too, I will 
>>>>> update and send out a V2.
>>>>>
>>>>> Thanks
>>>>> Minh
>>>>> On 16/03/18 23:00, Anders Widell wrote:
>>>>>> Hi!
>>>>>>
>>>>>> See my comments below, marked AndersW2>.
>>>>>>
>>>>>> regards,
>>>>>>
>>>>>> Anders Widell
>>>>>>
>>>>>>
>>>>>> On 03/16/2018 12:39 PM, Hans Nordebäck wrote:
>>>>>>> Hi Minh, ack with some comments, (on top of AndersW comments).
>>>>>>>
>>>>>>> /Thanks HansN
>>>>>>>
>>>>>>>
>>>>>>> On 03/15/2018 07:50 AM, Minh Chau wrote:
>>>>>>>> ---
>>>>>>>>    src/base/os_defs.c | 25 -
>>>>>>>>    1 file changed, 20 insertions(+), 5 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/src/base/os_defs.c b/src/base/os_defs.c index
>>>>>>>> 6f9ec52..e914011 100644
>>>>>>>> --- a/src/base/os_defs.c
>&g

Re: [devel] [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805]

2018-03-20 Thread Hans Nordebäck
Hi Minh,

I think you can keep v1 but add the missing if stmt. The /proc/self/fd 
directory should only contain open fd's and the current and parent directory.
Later you change stroixmax to the new string to integer utility, if needed, 
(not likely it is needed though, I think).
/Regards HansN

-Original Message-
From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] 
Sent: den 20 mars 2018 08:29
To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell 
<anders.wid...@ericsson.com>; ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] base: Only close inherited fd(s) after fork() in child 
process [#2805]

Hi Hans,

I have just seen Anders created #2814 for the utility function, once
#2814 is done, we can change the integer conversion in this patch by new 
utility function, maybe for a numbers of atoXX (atoi, atol,..) calls in many 
other places for a better error handling. How do you think?

Thanks,

Minh


On 20/03/18 01:29, Hans Nordebäck wrote:
> Hi Minh,
>
> I think this additional check, current and parent, directory is enough 
> for V1 patch. The usage of the 2nd
>
> parameter of strtol in V2 patch can be put in a utility function for a 
> broader use.
>
> /Regards HansN
>
>
> On 03/19/2018 08:50 AM, Minh Hon Chau wrote:
>> Hi Hans,
>>
>> Agree that the check of "." and ".." should be added in V1.
>>
>> This V2 I use the second parameter of strtol, it should ensure that 
>> anything read from the fd directory is entirely digit, before close 
>> the fd.
>>
>> There should not be any alphabet-based directories other than ".", 
>> ".." and 0, 1, 2, ..etc, but the usage of 2nd parameter of strtol is 
>> more generalized
>>
>> So I think the check of strcmp is not needed?
>>
>> Thanks,
>> Minh
>> On 19/03/18 18:00, Hans Nordebäck wrote:
>>> Hi Minh,
>>>
>>> my comment was that this check could be added:
>>>
>>> if (strcmp(pentry->d_name, ".") == 0 || strcmp(pentry->d_name, "..") 
>>> == 0)
>>>
>>>    continue;
>>>
>>> /Regards HansN
>>>
>>> On 03/16/2018 01:27 PM, Minh Hon Chau wrote:
>>>> Hi Anders, Hans,
>>>>
>>>> When I tested the patch, I did see the "." and ".." returned from 
>>>> readdir, but the stroimax also return 0, so probably it won't be a 
>>>> problem to close(0) more than once
>>>>
>>>> But to be more safety, it should check the @second_ptr too, I will 
>>>> update and send out a V2.
>>>>
>>>> Thanks
>>>> Minh
>>>> On 16/03/18 23:00, Anders Widell wrote:
>>>>> Hi!
>>>>>
>>>>> See my comments below, marked AndersW2>.
>>>>>
>>>>> regards,
>>>>>
>>>>> Anders Widell
>>>>>
>>>>>
>>>>> On 03/16/2018 12:39 PM, Hans Nordebäck wrote:
>>>>>> Hi Minh, ack with some comments, (on top of AndersW comments).
>>>>>>
>>>>>> /Thanks HansN
>>>>>>
>>>>>>
>>>>>> On 03/15/2018 07:50 AM, Minh Chau wrote:
>>>>>>> ---
>>>>>>>   src/base/os_defs.c | 25 -
>>>>>>>   1 file changed, 20 insertions(+), 5 deletions(-)
>>>>>>>
>>>>>>> diff --git a/src/base/os_defs.c b/src/base/os_defs.c index 
>>>>>>> 6f9ec52..e914011 100644
>>>>>>> --- a/src/base/os_defs.c
>>>>>>> +++ b/src/base/os_defs.c
>>>>>>> @@ -1052,14 +1052,29 @@ uint32_t 
>>>>>>> ncs_os_process_execute_timed(NCS_OS_PROC_EXECUTE_TIMED_INFO 
>>>>>>> *req)
>>>>>>>    * child */
>>>>>>>   if (getenv("OPENSAF_KEEP_FD_OPEN_AFTER_FORK") == NULL) 
>>>>>>> {
>>>>>>>   /* Close all inherited file descriptors */
>>>>>>> -    int i = sysconf(_SC_OPEN_MAX);
>>>>>>> -    if (i == -1) {
>>>>>>> +    int fd_max = sysconf(_SC_OPEN_MAX);
>>>>>>> +
>>>>>>> +    if (fd_max == -1) {
>>>>>>>   syslog(LOG_ERR, "%s: sysconf failed - %s",
>>>>>>> -   __FUNCTION__, strerror(errno));
>>>>>&g

Re: [devel] [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805]

2018-03-19 Thread Hans Nordebäck

Hi Minh,

I think this additional check, current and parent, directory is enough 
for V1 patch. The usage of the 2nd


parameter of strtol in V2 patch can be put in a utility function for a 
broader use.


/Regards HansN


On 03/19/2018 08:50 AM, Minh Hon Chau wrote:

Hi Hans,

Agree that the check of "." and ".." should be added in V1.

This V2 I use the second parameter of strtol, it should ensure that 
anything read from the fd directory is entirely digit, before close 
the fd.


There should not be any alphabet-based directories other than ".", 
".." and 0, 1, 2, ..etc, but the usage of 2nd parameter of strtol is 
more generalized


So I think the check of strcmp is not needed?

Thanks,
Minh
On 19/03/18 18:00, Hans Nordebäck wrote:

Hi Minh,

my comment was that this check could be added:

if (strcmp(pentry->d_name, ".") == 0 || strcmp(pentry->d_name, "..") 
== 0)


   continue;

/Regards HansN

On 03/16/2018 01:27 PM, Minh Hon Chau wrote:

Hi Anders, Hans,

When I tested the patch, I did see the "." and ".." returned from 
readdir, but the stroimax also return 0, so probably it won't be a 
problem to close(0) more than once


But to be more safety, it should check the @second_ptr too, I will 
update and send out a V2.


Thanks
Minh
On 16/03/18 23:00, Anders Widell wrote:

Hi!

See my comments below, marked AndersW2>.

regards,

Anders Widell


On 03/16/2018 12:39 PM, Hans Nordebäck wrote:

Hi Minh, ack with some comments, (on top of AndersW comments).

/Thanks HansN


On 03/15/2018 07:50 AM, Minh Chau wrote:

---
  src/base/os_defs.c | 25 -
  1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/src/base/os_defs.c b/src/base/os_defs.c
index 6f9ec52..e914011 100644
--- a/src/base/os_defs.c
+++ b/src/base/os_defs.c
@@ -1052,14 +1052,29 @@ uint32_t 
ncs_os_process_execute_timed(NCS_OS_PROC_EXECUTE_TIMED_INFO *req)

   * child */
  if (getenv("OPENSAF_KEEP_FD_OPEN_AFTER_FORK") == NULL) {
  /* Close all inherited file descriptors */
-    int i = sysconf(_SC_OPEN_MAX);
-    if (i == -1) {
+    int fd_max = sysconf(_SC_OPEN_MAX);
+
+    if (fd_max == -1) {
  syslog(LOG_ERR, "%s: sysconf failed - %s",
-   __FUNCTION__, strerror(errno));
+    __FUNCTION__, strerror(errno));
  exit(EXIT_FAILURE);
  }
-    for (i--; i >= 0; --i)
-    (void)close(i); /* close all descriptors */
+    struct dirent *pentry = NULL;
+    DIR *dir = opendir("/proc/self/fd");
+
+    if (dir != NULL) {
+    while ((pentry = readdir(dir)) != NULL) {
[HansN] readdir will return not only 0, 1, 2 etc. but also the 
current directory '.' and '..' this should be handled here.

[HansN] perhaps we should use readdir_r instead?


AndersW2> I also thought about this, but it turns out that 
readdir_r is (or will become) obsolete. It is listed as obsolete on 
our wiki page:


https://sourceforge.net/p/opensaf/wiki/Unsafe%20and%20Obsolete%20Functions/ 



Maybe we should unlist readdir() from the non-thread-safe section?

Regarding . and .., I think we should check for parse errors, i.e. 
if it was a valid number or not. The second parameter to strtoimax 
will (if not NULL) tell us how much of the string that was parsed. 
It should point to a '\0' character if the whole string was parsed 
as a valid number. In addition, you need to check that the string 
was not empty to begin with.


I am thinking about adding a support function in base, that can 
parse strings into numbers and handle parse errors in a convenient 
way. the strtoxxx functions are a bit tricky since you need to 
check the end pointer, and also errno for overflow/underflow errors.




+    int fd = strtoimax(pentry->d_name, NULL, 10);
+    if (fd > INT_MIN && fd < fd_max) 
(void)close(fd);

+    }
+    (void)closedir(dir);
+    } else {
+    /* fall back, close all possible descriptors */
+    syslog(LOG_ERR, "%s: opendir failed - %s",
+    __FUNCTION__, strerror(errno));
+    for (fd_max--; fd_max >= 0; --fd_max)
+    (void)close(fd_max);
+    }
    /* Redirect standard files to /dev/null */
  if (freopen("/dev/null", "r", stdin) == NULL)














--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805]

2018-03-19 Thread Hans Nordebäck

Hi Minh,

my comment was that this check could be added:

if (strcmp(pentry->d_name, ".") == 0 || strcmp(pentry->d_name, "..") == 0)

   continue;

/Regards HansN

On 03/16/2018 01:27 PM, Minh Hon Chau wrote:

Hi Anders, Hans,

When I tested the patch, I did see the "." and ".." returned from 
readdir, but the stroimax also return 0, so probably it won't be a 
problem to close(0) more than once


But to be more safety, it should check the @second_ptr too, I will 
update and send out a V2.


Thanks
Minh
On 16/03/18 23:00, Anders Widell wrote:

Hi!

See my comments below, marked AndersW2>.

regards,

Anders Widell


On 03/16/2018 12:39 PM, Hans Nordebäck wrote:

Hi Minh, ack with some comments, (on top of AndersW comments).

/Thanks HansN


On 03/15/2018 07:50 AM, Minh Chau wrote:

---
  src/base/os_defs.c | 25 -
  1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/src/base/os_defs.c b/src/base/os_defs.c
index 6f9ec52..e914011 100644
--- a/src/base/os_defs.c
+++ b/src/base/os_defs.c
@@ -1052,14 +1052,29 @@ uint32_t 
ncs_os_process_execute_timed(NCS_OS_PROC_EXECUTE_TIMED_INFO *req)

   * child */
  if (getenv("OPENSAF_KEEP_FD_OPEN_AFTER_FORK") == NULL) {
  /* Close all inherited file descriptors */
-    int i = sysconf(_SC_OPEN_MAX);
-    if (i == -1) {
+    int fd_max = sysconf(_SC_OPEN_MAX);
+
+    if (fd_max == -1) {
  syslog(LOG_ERR, "%s: sysconf failed - %s",
-   __FUNCTION__, strerror(errno));
+    __FUNCTION__, strerror(errno));
  exit(EXIT_FAILURE);
  }
-    for (i--; i >= 0; --i)
-    (void)close(i); /* close all descriptors */
+    struct dirent *pentry = NULL;
+    DIR *dir = opendir("/proc/self/fd");
+
+    if (dir != NULL) {
+    while ((pentry = readdir(dir)) != NULL) {
[HansN] readdir will return not only 0, 1, 2 etc. but also the 
current directory '.' and '..' this should be handled here.

[HansN] perhaps we should use readdir_r instead?


AndersW2> I also thought about this, but it turns out that readdir_r 
is (or will become) obsolete. It is listed as obsolete on our wiki page:


https://sourceforge.net/p/opensaf/wiki/Unsafe%20and%20Obsolete%20Functions/ 



Maybe we should unlist readdir() from the non-thread-safe section?

Regarding . and .., I think we should check for parse errors, i.e. if 
it was a valid number or not. The second parameter to strtoimax will 
(if not NULL) tell us how much of the string that was parsed. It 
should point to a '\0' character if the whole string was parsed as a 
valid number. In addition, you need to check that the string was not 
empty to begin with.


I am thinking about adding a support function in base, that can parse 
strings into numbers and handle parse errors in a convenient way. the 
strtoxxx functions are a bit tricky since you need to check the end 
pointer, and also errno for overflow/underflow errors.




+    int fd = strtoimax(pentry->d_name, NULL, 10);
+    if (fd > INT_MIN && fd < fd_max) (void)close(fd);
+    }
+    (void)closedir(dir);
+    } else {
+    /* fall back, close all possible descriptors */
+    syslog(LOG_ERR, "%s: opendir failed - %s",
+    __FUNCTION__, strerror(errno));
+    for (fd_max--; fd_max >= 0; --fd_max)
+    (void)close(fd_max);
+    }
    /* Redirect standard files to /dev/null */
  if (freopen("/dev/null", "r", stdin) == NULL)









--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805]

2018-03-16 Thread Hans Nordebäck

Hi,

yes, I missed that readdir_r is marked deprecated in newer man pages,

it is not marked deprecated nor in latest LSB or The Open Group Base 
Specification.


Unlisting readdir sounds ok then.

/Thanks HansN


On 03/16/2018 01:00 PM, Anders Widell wrote:

Hi!

See my comments below, marked AndersW2>.

regards,

Anders Widell


On 03/16/2018 12:39 PM, Hans Nordebäck wrote:

Hi Minh, ack with some comments, (on top of AndersW comments).

/Thanks HansN


On 03/15/2018 07:50 AM, Minh Chau wrote:

---
  src/base/os_defs.c | 25 -
  1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/src/base/os_defs.c b/src/base/os_defs.c
index 6f9ec52..e914011 100644
--- a/src/base/os_defs.c
+++ b/src/base/os_defs.c
@@ -1052,14 +1052,29 @@ uint32_t 
ncs_os_process_execute_timed(NCS_OS_PROC_EXECUTE_TIMED_INFO *req)

   * child */
  if (getenv("OPENSAF_KEEP_FD_OPEN_AFTER_FORK") == NULL) {
  /* Close all inherited file descriptors */
-    int i = sysconf(_SC_OPEN_MAX);
-    if (i == -1) {
+    int fd_max = sysconf(_SC_OPEN_MAX);
+
+    if (fd_max == -1) {
  syslog(LOG_ERR, "%s: sysconf failed - %s",
-   __FUNCTION__, strerror(errno));
+    __FUNCTION__, strerror(errno));
  exit(EXIT_FAILURE);
  }
-    for (i--; i >= 0; --i)
-    (void)close(i); /* close all descriptors */
+    struct dirent *pentry = NULL;
+    DIR *dir = opendir("/proc/self/fd");
+
+    if (dir != NULL) {
+    while ((pentry = readdir(dir)) != NULL) {
[HansN] readdir will return not only 0, 1, 2 etc. but also the 
current directory '.' and '..' this should be handled here.

[HansN] perhaps we should use readdir_r instead?


AndersW2> I also thought about this, but it turns out that readdir_r 
is (or will become) obsolete. It is listed as obsolete on our wiki page:


https://sourceforge.net/p/opensaf/wiki/Unsafe%20and%20Obsolete%20Functions/ 



Maybe we should unlist readdir() from the non-thread-safe section?

Regarding . and .., I think we should check for parse errors, i.e. if 
it was a valid number or not. The second parameter to strtoimax will 
(if not NULL) tell us how much of the string that was parsed. It 
should point to a '\0' character if the whole string was parsed as a 
valid number. In addition, you need to check that the string was not 
empty to begin with.


I am thinking about adding a support function in base, that can parse 
strings into numbers and handle parse errors in a convenient way. the 
strtoxxx functions are a bit tricky since you need to check the end 
pointer, and also errno for overflow/underflow errors.




+    int fd = strtoimax(pentry->d_name, NULL, 10);
+    if (fd > INT_MIN && fd < fd_max) (void)close(fd);
+    }
+    (void)closedir(dir);
+    } else {
+    /* fall back, close all possible descriptors */
+    syslog(LOG_ERR, "%s: opendir failed - %s",
+    __FUNCTION__, strerror(errno));
+    for (fd_max--; fd_max >= 0; --fd_max)
+    (void)close(fd_max);
+    }
    /* Redirect standard files to /dev/null */
  if (freopen("/dev/null", "r", stdin) == NULL)







--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805]

2018-03-16 Thread Hans Nordebäck

Hi Minh, ack with some comments, (on top of AndersW comments).

/Thanks HansN


On 03/15/2018 07:50 AM, Minh Chau wrote:

---
  src/base/os_defs.c | 25 -
  1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/src/base/os_defs.c b/src/base/os_defs.c
index 6f9ec52..e914011 100644
--- a/src/base/os_defs.c
+++ b/src/base/os_defs.c
@@ -1052,14 +1052,29 @@ uint32_t 
ncs_os_process_execute_timed(NCS_OS_PROC_EXECUTE_TIMED_INFO *req)
 * child */
if (getenv("OPENSAF_KEEP_FD_OPEN_AFTER_FORK") == NULL) {
/* Close all inherited file descriptors */
-   int i = sysconf(_SC_OPEN_MAX);
-   if (i == -1) {
+   int fd_max = sysconf(_SC_OPEN_MAX);
+
+   if (fd_max == -1) {
syslog(LOG_ERR, "%s: sysconf failed - %s",
-  __FUNCTION__, strerror(errno));
+   __FUNCTION__, strerror(errno));
exit(EXIT_FAILURE);
}
-   for (i--; i >= 0; --i)
-   (void)close(i); /* close all descriptors */
+   struct dirent *pentry = NULL;
+   DIR *dir = opendir("/proc/self/fd");
+
+   if (dir != NULL) {
+   while ((pentry = readdir(dir)) != NULL) {
[HansN] readdir will return not only 0, 1, 2 etc. but also the current 
directory '.' and '..' this should be handled here.

[HansN] perhaps we should use readdir_r instead?

+   int fd = strtoimax(pentry->d_name, 
NULL, 10);
+   if (fd > INT_MIN && fd < fd_max) 
(void)close(fd);
+   }
+   (void)closedir(dir);
+   } else {
+   /* fall back, close all possible descriptors */
+   syslog(LOG_ERR, "%s: opendir failed - %s",
+   __FUNCTION__, strerror(errno));
+   for (fd_max--; fd_max >= 0; --fd_max)
+   (void)close(fd_max);
+   }
  
  			/* Redirect standard files to /dev/null */

if (freopen("/dev/null", "r", stdin) == NULL)



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] base: Only close inherited fd(s) after fork() in child process [#2805]

2018-03-16 Thread Hans Nordebäck

Hi Minh,

I'll review the patch shortly, (today)/Thanks HansN


On 03/16/2018 06:17 AM, Minh Hon Chau wrote:

Thanks Anders for your comments.

Hi Hans, Ravi,

Is there any comment you would like to add, otherwise I update the 
patch with Anders' comments.


Thanks,

Minh


On 16/03/18 01:41, Anders Widell wrote:

Ack with minor comments, marked AndersW> below.

regards,

Anders Widell


On 03/15/2018 07:50 AM, Minh Chau wrote:

---
  src/base/os_defs.c | 25 -
  1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/src/base/os_defs.c b/src/base/os_defs.c
index 6f9ec52..e914011 100644
--- a/src/base/os_defs.c
+++ b/src/base/os_defs.c
@@ -1052,14 +1052,29 @@ uint32_t 
ncs_os_process_execute_timed(NCS_OS_PROC_EXECUTE_TIMED_INFO *req)

   * child */
  if (getenv("OPENSAF_KEEP_FD_OPEN_AFTER_FORK") == NULL) {
  /* Close all inherited file descriptors */
-    int i = sysconf(_SC_OPEN_MAX);
-    if (i == -1) {
+    int fd_max = sysconf(_SC_OPEN_MAX);


AndersW> sysconf() returns a long. Maybe fd_max should have the type 
long to match the return type of sysconf()?



+
+    if (fd_max == -1) {
  syslog(LOG_ERR, "%s: sysconf failed - %s",
-   __FUNCTION__, strerror(errno));
+    __FUNCTION__, strerror(errno));
  exit(EXIT_FAILURE);
  }
-    for (i--; i >= 0; --i)
-    (void)close(i); /* close all descriptors */
+    struct dirent *pentry = NULL;


AndersW> pentry is not a good name (avoid abbreviations, and separate 
words with underscores). Maybe rename it to dir_entry or just entry?



+    DIR *dir = opendir("/proc/self/fd");
+
+    if (dir != NULL) {
+    while ((pentry = readdir(dir)) != NULL) {
+    int fd = strtoimax(pentry->d_name, NULL, 10);


AndersW> strtoimax() is declared in the inttypes.h header file. Add 
an #include at the top of the file.
AndersW> strtoimax() returns an intmax_t. Change the type of fd to 
intmax_t.



+    if (fd > INT_MIN && fd < fd_max) (void)close(fd);


AndersW> File descriptors cannot be negative. Use fd >= 0 instead of 
fd > INT_MIN.

AndersW> Remove (void).


+    }
+    (void)closedir(dir);


AndersW> Remove (void).


+    } else {
+    /* fall back, close all possible descriptors */
+    syslog(LOG_ERR, "%s: opendir failed - %s",
+    __FUNCTION__, strerror(errno));
+    for (fd_max--; fd_max >= 0; --fd_max)
+    (void)close(fd_max);


AndersW> Remove (void).


+    }
    /* Redirect standard files to /dev/null */
  if (freopen("/dev/null", "r", stdin) == NULL)








--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


  1   2   3   4   5   6   7   >