Re: [devel] [PATCH 1/9] mds: Add README for solution of TIPC buffer overflow at MDS [#1960]

2019-09-16 Thread Minh Hon Chau

Hi Vu,

Thanks for your time to review the patches, the question is interesting.

At this moment with normal load traffic, the resource towards the new 
standby (old active) is not released and will be reused if standby 
switches back to active. The reason is that mds won't start the "tx 
probation" again to confirm flow control support as mds has known it had 
enabled flow control on this port id. The messages towards the new 
active are sent on another port id thus they are running on a different 
flow control counter. The test of multiple switchover looks ok so far. 
However, the problem probably happens with overloaded traffic while a 
failover/switchover (I haven't tested this case). The pending messages 
under overload state to be sent to the old active won't be sent to the 
new active, I guess the mds user would get TIMEOUT and try again to send 
the message to the new active, which at least corresponds to legacy 
behavior. However, this could be looked at as an improvement as we have 
pending messages, we know the new active, we can send the pending 
messages to new active, but another question is that whether the 
existing users expect to receive these pending messages according to 
their current logics.


Regards,

Minh

On 16/9/19 5:34 pm, Nguyen Minh Vu wrote:

Hi Minh,

I have just finished my review to your MDS patches, and I have a 
question:


With 2N services, suppose the active is having TIPC overloaded issue;
it will do some memory allocations, and probably starting a timer 
there too.


Then, what happens if that active service is changed to the standby role?
Shall allocated memory/timer be freed up and is there any impact on 
the subsequent messages sent to the new active?


Regards, Vu

On 8/14/19 1:38 PM, Minh Chau wrote:

---
  src/mds/README | 221 
+

  1 file changed, 221 insertions(+)
  create mode 100644 src/mds/README

diff --git a/src/mds/README b/src/mds/README
new file mode 100644
index 000..1b94632
--- /dev/null
+++ b/src/mds/README
@@ -0,0 +1,221 @@
+/*  -*- OpenSAF  -*-
+ *
+ * (C) Copyright 2019 The OpenSAF Foundation
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of 
MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are 
licensed
+ * under the GNU Lesser General Public License Version 2.1, February 
1999.

+ * The complete license can be accessed from the following location:
+ * http://opensource.org/licenses/lgpl-license.php
+ * See the Copying file included with the OpenSAF distribution for full
+ * licensing terms.
+ *
+ * Author(s): Ericsson AB
+ *
+ */
+Background
+==
+If OpenSAF configures TIPC as transport, the MDS library today will use
+TIPC SOCK_RDM socket for message distribution in the cluster. The 
SOCK_RDM
+datagram socket possibly encounters buffer overflow at receiver ends 
which

+has been documented in tipc.io[1]. A temporary solution for this buffer
+overflow issue is that the socket buffer size can be increased to a 
larger
+number. However, if the cluster continues either scaling out or 
adding more

+components, the system will be under dimensioned, thus the TIPC buffer
+overflow can occur again.
+
+MDS's solution for TIPC buffer overflow
+===
+If MDS disables TIPC_DEST_DROPPABLE, TIPC will return the ancillary 
message
+when the original message is failed to deliver. By this event, if 
the message
+has been saved in queue, MDS at sender sides can search and 
retransmit this

+message to the receivers.
+Once the messages in the sender's queue has been delivered 
successfully, MDS

+needs to remove them. MDS introduces its internal ACK message as an
+acknowledgment from receivers so that the senders can remove the 
messages

+out of the queue.
+Also, as such situation of buffer overflow at receivers, the 
retransmission may
+not succeed or even become worse at receiver ends (the more 
retransmission,
+the more overflow to occur). MDS imitates the sliding window in 
TCP[2] to

+control the flow of data message towards the receivers.
+
+Legacy MDS data message, new (data + ACK) MDS message, and 
upgradability
+ 

+Below is the MDS legacy message format that has been used till 
OpenSAF 5.19.07

+
+oct 0  message length
+oct 1
+--
+oct 2  sequence number: incremented for every message sent out to 
all destined

+...   tipc portid.
+oct 5
+--
+oct 6  fragment number: a message with same sequence number can be 
fragmented,

+oct 7  identified by this fragment number.
+--
+oct 8  length check: cross check with message length(oct0,1), NOT USED.
+oct 9
+--
+oct 10 protocol version: (MDS_PROT:0xA0 | 

Re: [devel] [PATCH 1/9] mds: Add README for solution of TIPC buffer overflow at MDS [#1960]

2019-08-15 Thread Minh Hon Chau

Hi Hans,

I will update txprob -> "tx probation"

The kEnabled, it means for a state of a tipc portid only. There's 
another @is_fctrl_enabled, that's for the feature whether mds has flow 
control enabled/disabled.


Thanks

Minh

On 14/8/19 5:48 pm, Hans Nordebäck wrote:

Hi Minh,

ack, some minor comments below/Thanks Hans

On 2019-08-14 08:38, Minh Chau wrote:

---
   src/mds/README | 221 
+
   1 file changed, 221 insertions(+)
   create mode 100644 src/mds/README

diff --git a/src/mds/README b/src/mds/README
new file mode 100644
index 000..1b94632
--- /dev/null
+++ b/src/mds/README
@@ -0,0 +1,221 @@
+/*  -*- OpenSAF  -*-
+ *
+ * (C) Copyright 2019 The OpenSAF Foundation
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
+ * under the GNU Lesser General Public License Version 2.1, February 1999.
+ * The complete license can be accessed from the following location:
+ * http://opensource.org/licenses/lgpl-license.php
+ * See the Copying file included with the OpenSAF distribution for full
+ * licensing terms.
+ *
+ * Author(s): Ericsson AB
+ *
+ */
+Background
+==
+If OpenSAF configures TIPC as transport, the MDS library today will use
+TIPC SOCK_RDM socket for message distribution in the cluster. The SOCK_RDM
+datagram socket possibly encounters buffer overflow at receiver ends which
+has been documented in tipc.io[1]. A temporary solution for this buffer
+overflow issue is that the socket buffer size can be increased to a larger
+number. However, if the cluster continues either scaling out or adding more
+components, the system will be under dimensioned, thus the TIPC buffer
+overflow can occur again.
+
+MDS's solution for TIPC buffer overflow
+===
+If MDS disables TIPC_DEST_DROPPABLE, TIPC will return the ancillary message
+when the original message is failed to deliver. By this event, if the message
+has been saved in queue, MDS at sender sides can search and retransmit this
+message to the receivers.
+Once the messages in the sender's queue has been delivered successfully, MDS
+needs to remove them. MDS introduces its internal ACK message as an
+acknowledgment from receivers so that the senders can remove the messages
+out of the queue.
+Also, as such situation of buffer overflow at receivers, the retransmission may
+not succeed or even become worse at receiver ends (the more retransmission,
+the more overflow to occur). MDS imitates the sliding window in TCP[2] to
+control the flow of data message towards the receivers.
+
+Legacy MDS data message, new (data + ACK) MDS message, and upgradability
+
+Below is the MDS legacy message format that has been used till OpenSAF 5.19.07
+
+oct 0  message length
+oct 1
+--
+oct 2  sequence number: incremented for every message sent out to all destined
+...   tipc portid.
+oct 5
+--
+oct 6  fragment number: a message with same sequence number can be fragmented,
+oct 7  identified by this fragment number.
+--
+oct 8  length check: cross check with message length(oct0,1), NOT USED.
+oct 9
+--
+oct 10 protocol version: (MDS_PROT:0xA0 | MDS_VERSION:0x08) = 0xA8, NOT USED
+--
+oct 11 mds length: length of mds header and mds data, starting from oct13
+oct 12
+--
+oct 13 mds header and data
+...
+--
+
+The current sequence number/fragment number are being used in MDS for all
+messages sent to all discovered tipc portid(s), meaning that every message is 
sent
+to any tipc portid, the sequence/fragment number is increased. The flow control
+needs its own sequence number sliding between two tipc porid(s) so that 
receivers
+can detect message drop due to buffer overload. Therefore, the oct8 and oct9 
are
+now reused as flow control sequence number. The oct10, protocol version, has 
new
+value of 0xB8. The format of new data message as below:
+
+oct 0  same
+...
+oct 7
+--
+oct 8  flow control sequence number
+oct 9
+--
+oct 10 protocol version: (MDS_PROT_TIPC_FCTRL:0xB0 | MDS_VERSION:0x08) = 0xB8
+--
+oct 11 same
+...
+--
+
+The ACK message is introduced to acknowledge one data message or a chunk of
+accumulative data message. The ACK message format:
+
+oct 0  message length
+oct 1
+--
+oct 2  8 bytes, NOT USED
+
+oct 9

Re: [devel] [PATCH 1/9] mds: Add README for solution of TIPC buffer overflow at MDS [#1960]

2019-08-14 Thread Hans Nordebäck
Hi Minh,

ack, some minor comments below/Thanks Hans

On 2019-08-14 08:38, Minh Chau wrote:
> ---
>   src/mds/README | 221 
> +
>   1 file changed, 221 insertions(+)
>   create mode 100644 src/mds/README
>
> diff --git a/src/mds/README b/src/mds/README
> new file mode 100644
> index 000..1b94632
> --- /dev/null
> +++ b/src/mds/README
> @@ -0,0 +1,221 @@
> +/*  -*- OpenSAF  -*-
> + *
> + * (C) Copyright 2019 The OpenSAF Foundation
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> + * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
> + * under the GNU Lesser General Public License Version 2.1, February 1999.
> + * The complete license can be accessed from the following location:
> + * http://opensource.org/licenses/lgpl-license.php
> + * See the Copying file included with the OpenSAF distribution for full
> + * licensing terms.
> + *
> + * Author(s): Ericsson AB
> + *
> + */
> +Background
> +==
> +If OpenSAF configures TIPC as transport, the MDS library today will use
> +TIPC SOCK_RDM socket for message distribution in the cluster. The SOCK_RDM
> +datagram socket possibly encounters buffer overflow at receiver ends which
> +has been documented in tipc.io[1]. A temporary solution for this buffer
> +overflow issue is that the socket buffer size can be increased to a larger
> +number. However, if the cluster continues either scaling out or adding more
> +components, the system will be under dimensioned, thus the TIPC buffer
> +overflow can occur again.
> +
> +MDS's solution for TIPC buffer overflow
> +===
> +If MDS disables TIPC_DEST_DROPPABLE, TIPC will return the ancillary message
> +when the original message is failed to deliver. By this event, if the message
> +has been saved in queue, MDS at sender sides can search and retransmit this
> +message to the receivers.
> +Once the messages in the sender's queue has been delivered successfully, MDS
> +needs to remove them. MDS introduces its internal ACK message as an
> +acknowledgment from receivers so that the senders can remove the messages
> +out of the queue.
> +Also, as such situation of buffer overflow at receivers, the retransmission 
> may
> +not succeed or even become worse at receiver ends (the more retransmission,
> +the more overflow to occur). MDS imitates the sliding window in TCP[2] to
> +control the flow of data message towards the receivers.
> +
> +Legacy MDS data message, new (data + ACK) MDS message, and upgradability
> +
> +Below is the MDS legacy message format that has been used till OpenSAF 
> 5.19.07
> +
> +oct 0  message length
> +oct 1
> +--
> +oct 2  sequence number: incremented for every message sent out to all 
> destined
> +...   tipc portid.
> +oct 5
> +--
> +oct 6  fragment number: a message with same sequence number can be 
> fragmented,
> +oct 7  identified by this fragment number.
> +--
> +oct 8  length check: cross check with message length(oct0,1), NOT USED.
> +oct 9
> +--
> +oct 10 protocol version: (MDS_PROT:0xA0 | MDS_VERSION:0x08) = 0xA8, NOT USED
> +--
> +oct 11 mds length: length of mds header and mds data, starting from oct13
> +oct 12
> +--
> +oct 13 mds header and data
> +...
> +--
> +
> +The current sequence number/fragment number are being used in MDS for all
> +messages sent to all discovered tipc portid(s), meaning that every message 
> is sent
> +to any tipc portid, the sequence/fragment number is increased. The flow 
> control
> +needs its own sequence number sliding between two tipc porid(s) so that 
> receivers
> +can detect message drop due to buffer overload. Therefore, the oct8 and oct9 
> are
> +now reused as flow control sequence number. The oct10, protocol version, has 
> new
> +value of 0xB8. The format of new data message as below:
> +
> +oct 0  same
> +...
> +oct 7
> +--
> +oct 8  flow control sequence number
> +oct 9
> +--
> +oct 10 protocol version: (MDS_PROT_TIPC_FCTRL:0xB0 | MDS_VERSION:0x08) = 0xB8
> +--
> +oct 11 same
> +...
> +--
> +
> +The ACK message is introduced to acknowledge one data message or a chunk of
> +accumulative data message. The ACK message format:
> +
> +oct 0  message length
> +oct 1
> +--
> +oct 2  8 bytes, NOT USED
> +
> +oct 9
> +--
> +oct 10