Re: [devel] [PATCH 1/9] mds: Add README for solution of TIPC buffer overflow at MDS [#1960]
Hi Vu, Thanks for your time to review the patches, the question is interesting. At this moment with normal load traffic, the resource towards the new standby (old active) is not released and will be reused if standby switches back to active. The reason is that mds won't start the "tx probation" again to confirm flow control support as mds has known it had enabled flow control on this port id. The messages towards the new active are sent on another port id thus they are running on a different flow control counter. The test of multiple switchover looks ok so far. However, the problem probably happens with overloaded traffic while a failover/switchover (I haven't tested this case). The pending messages under overload state to be sent to the old active won't be sent to the new active, I guess the mds user would get TIMEOUT and try again to send the message to the new active, which at least corresponds to legacy behavior. However, this could be looked at as an improvement as we have pending messages, we know the new active, we can send the pending messages to new active, but another question is that whether the existing users expect to receive these pending messages according to their current logics. Regards, Minh On 16/9/19 5:34 pm, Nguyen Minh Vu wrote: Hi Minh, I have just finished my review to your MDS patches, and I have a question: With 2N services, suppose the active is having TIPC overloaded issue; it will do some memory allocations, and probably starting a timer there too. Then, what happens if that active service is changed to the standby role? Shall allocated memory/timer be freed up and is there any impact on the subsequent messages sent to the new active? Regards, Vu On 8/14/19 1:38 PM, Minh Chau wrote: --- src/mds/README | 221 + 1 file changed, 221 insertions(+) create mode 100644 src/mds/README diff --git a/src/mds/README b/src/mds/README new file mode 100644 index 000..1b94632 --- /dev/null +++ b/src/mds/README @@ -0,0 +1,221 @@ +/* -*- OpenSAF -*- + * + * (C) Copyright 2019 The OpenSAF Foundation + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed + * under the GNU Lesser General Public License Version 2.1, February 1999. + * The complete license can be accessed from the following location: + * http://opensource.org/licenses/lgpl-license.php + * See the Copying file included with the OpenSAF distribution for full + * licensing terms. + * + * Author(s): Ericsson AB + * + */ +Background +== +If OpenSAF configures TIPC as transport, the MDS library today will use +TIPC SOCK_RDM socket for message distribution in the cluster. The SOCK_RDM +datagram socket possibly encounters buffer overflow at receiver ends which +has been documented in tipc.io[1]. A temporary solution for this buffer +overflow issue is that the socket buffer size can be increased to a larger +number. However, if the cluster continues either scaling out or adding more +components, the system will be under dimensioned, thus the TIPC buffer +overflow can occur again. + +MDS's solution for TIPC buffer overflow +=== +If MDS disables TIPC_DEST_DROPPABLE, TIPC will return the ancillary message +when the original message is failed to deliver. By this event, if the message +has been saved in queue, MDS at sender sides can search and retransmit this +message to the receivers. +Once the messages in the sender's queue has been delivered successfully, MDS +needs to remove them. MDS introduces its internal ACK message as an +acknowledgment from receivers so that the senders can remove the messages +out of the queue. +Also, as such situation of buffer overflow at receivers, the retransmission may +not succeed or even become worse at receiver ends (the more retransmission, +the more overflow to occur). MDS imitates the sliding window in TCP[2] to +control the flow of data message towards the receivers. + +Legacy MDS data message, new (data + ACK) MDS message, and upgradability + +Below is the MDS legacy message format that has been used till OpenSAF 5.19.07 + +oct 0 message length +oct 1 +-- +oct 2 sequence number: incremented for every message sent out to all destined +... tipc portid. +oct 5 +-- +oct 6 fragment number: a message with same sequence number can be fragmented, +oct 7 identified by this fragment number. +-- +oct 8 length check: cross check with message length(oct0,1), NOT USED. +oct 9 +-- +oct 10 protocol version: (MDS_PROT:0xA0 | MDS_V
Re: [devel] [PATCH 1/9] mds: Add README for solution of TIPC buffer overflow at MDS [#1960]
Hi Minh, I have just finished my review to your MDS patches, and I have a question: With 2N services, suppose the active is having TIPC overloaded issue; it will do some memory allocations, and probably starting a timer there too. Then, what happens if that active service is changed to the standby role? Shall allocated memory/timer be freed up and is there any impact on the subsequent messages sent to the new active? Regards, Vu On 8/14/19 1:38 PM, Minh Chau wrote: --- src/mds/README | 221 + 1 file changed, 221 insertions(+) create mode 100644 src/mds/README diff --git a/src/mds/README b/src/mds/README new file mode 100644 index 000..1b94632 --- /dev/null +++ b/src/mds/README @@ -0,0 +1,221 @@ +/* -*- OpenSAF -*- + * + * (C) Copyright 2019 The OpenSAF Foundation + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed + * under the GNU Lesser General Public License Version 2.1, February 1999. + * The complete license can be accessed from the following location: + * http://opensource.org/licenses/lgpl-license.php + * See the Copying file included with the OpenSAF distribution for full + * licensing terms. + * + * Author(s): Ericsson AB + * + */ +Background +== +If OpenSAF configures TIPC as transport, the MDS library today will use +TIPC SOCK_RDM socket for message distribution in the cluster. The SOCK_RDM +datagram socket possibly encounters buffer overflow at receiver ends which +has been documented in tipc.io[1]. A temporary solution for this buffer +overflow issue is that the socket buffer size can be increased to a larger +number. However, if the cluster continues either scaling out or adding more +components, the system will be under dimensioned, thus the TIPC buffer +overflow can occur again. + +MDS's solution for TIPC buffer overflow +=== +If MDS disables TIPC_DEST_DROPPABLE, TIPC will return the ancillary message +when the original message is failed to deliver. By this event, if the message +has been saved in queue, MDS at sender sides can search and retransmit this +message to the receivers. +Once the messages in the sender's queue has been delivered successfully, MDS +needs to remove them. MDS introduces its internal ACK message as an +acknowledgment from receivers so that the senders can remove the messages +out of the queue. +Also, as such situation of buffer overflow at receivers, the retransmission may +not succeed or even become worse at receiver ends (the more retransmission, +the more overflow to occur). MDS imitates the sliding window in TCP[2] to +control the flow of data message towards the receivers. + +Legacy MDS data message, new (data + ACK) MDS message, and upgradability + +Below is the MDS legacy message format that has been used till OpenSAF 5.19.07 + +oct 0 message length +oct 1 +-- +oct 2 sequence number: incremented for every message sent out to all destined +... tipc portid. +oct 5 +-- +oct 6 fragment number: a message with same sequence number can be fragmented, +oct 7 identified by this fragment number. +-- +oct 8 length check: cross check with message length(oct0,1), NOT USED. +oct 9 +-- +oct 10 protocol version: (MDS_PROT:0xA0 | MDS_VERSION:0x08) = 0xA8, NOT USED +-- +oct 11 mds length: length of mds header and mds data, starting from oct13 +oct 12 +-- +oct 13 mds header and data +... +-- + +The current sequence number/fragment number are being used in MDS for all +messages sent to all discovered tipc portid(s), meaning that every message is sent +to any tipc portid, the sequence/fragment number is increased. The flow control +needs its own sequence number sliding between two tipc porid(s) so that receivers +can detect message drop due to buffer overload. Therefore, the oct8 and oct9 are +now reused as flow control sequence number. The oct10, protocol version, has new +value of 0xB8. The format of new data message as below: + +oct 0 same +... +oct 7 +-- +oct 8 flow control sequence number +oct 9 +-- +oct 10 protocol version: (MDS_PROT_TIPC_FCTRL:0xB0 | MDS_VERSION:0x08) = 0xB8 +-- +oct 11 same +... +-- + +The ACK message is introduced to acknowledge one data message or a chunk of +accumulative data message. The ACK message format: + +oct 0 message length +
Re: [devel] [PATCH 1/9] mds: Add README for solution of TIPC buffer overflow at MDS [#1960]
Hi Hans, I will update txprob -> "tx probation" The kEnabled, it means for a state of a tipc portid only. There's another @is_fctrl_enabled, that's for the feature whether mds has flow control enabled/disabled. Thanks Minh On 14/8/19 5:48 pm, Hans Nordebäck wrote: Hi Minh, ack, some minor comments below/Thanks Hans On 2019-08-14 08:38, Minh Chau wrote: --- src/mds/README | 221 + 1 file changed, 221 insertions(+) create mode 100644 src/mds/README diff --git a/src/mds/README b/src/mds/README new file mode 100644 index 000..1b94632 --- /dev/null +++ b/src/mds/README @@ -0,0 +1,221 @@ +/* -*- OpenSAF -*- + * + * (C) Copyright 2019 The OpenSAF Foundation + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed + * under the GNU Lesser General Public License Version 2.1, February 1999. + * The complete license can be accessed from the following location: + * http://opensource.org/licenses/lgpl-license.php + * See the Copying file included with the OpenSAF distribution for full + * licensing terms. + * + * Author(s): Ericsson AB + * + */ +Background +== +If OpenSAF configures TIPC as transport, the MDS library today will use +TIPC SOCK_RDM socket for message distribution in the cluster. The SOCK_RDM +datagram socket possibly encounters buffer overflow at receiver ends which +has been documented in tipc.io[1]. A temporary solution for this buffer +overflow issue is that the socket buffer size can be increased to a larger +number. However, if the cluster continues either scaling out or adding more +components, the system will be under dimensioned, thus the TIPC buffer +overflow can occur again. + +MDS's solution for TIPC buffer overflow +=== +If MDS disables TIPC_DEST_DROPPABLE, TIPC will return the ancillary message +when the original message is failed to deliver. By this event, if the message +has been saved in queue, MDS at sender sides can search and retransmit this +message to the receivers. +Once the messages in the sender's queue has been delivered successfully, MDS +needs to remove them. MDS introduces its internal ACK message as an +acknowledgment from receivers so that the senders can remove the messages +out of the queue. +Also, as such situation of buffer overflow at receivers, the retransmission may +not succeed or even become worse at receiver ends (the more retransmission, +the more overflow to occur). MDS imitates the sliding window in TCP[2] to +control the flow of data message towards the receivers. + +Legacy MDS data message, new (data + ACK) MDS message, and upgradability + +Below is the MDS legacy message format that has been used till OpenSAF 5.19.07 + +oct 0 message length +oct 1 +-- +oct 2 sequence number: incremented for every message sent out to all destined +... tipc portid. +oct 5 +-- +oct 6 fragment number: a message with same sequence number can be fragmented, +oct 7 identified by this fragment number. +-- +oct 8 length check: cross check with message length(oct0,1), NOT USED. +oct 9 +-- +oct 10 protocol version: (MDS_PROT:0xA0 | MDS_VERSION:0x08) = 0xA8, NOT USED +-- +oct 11 mds length: length of mds header and mds data, starting from oct13 +oct 12 +-- +oct 13 mds header and data +... +-- + +The current sequence number/fragment number are being used in MDS for all +messages sent to all discovered tipc portid(s), meaning that every message is sent +to any tipc portid, the sequence/fragment number is increased. The flow control +needs its own sequence number sliding between two tipc porid(s) so that receivers +can detect message drop due to buffer overload. Therefore, the oct8 and oct9 are +now reused as flow control sequence number. The oct10, protocol version, has new +value of 0xB8. The format of new data message as below: + +oct 0 same +... +oct 7 +-- +oct 8 flow control sequence number +oct 9 +-- +oct 10 protocol version: (MDS_PROT_TIPC_FCTRL:0xB0 | MDS_VERSION:0x08) = 0xB8 +-- +oct 11 same +... +-- + +The ACK message is introduced to acknowledge one data message or a chunk of +accumulative data message. The ACK message format: + +oct 0 message length +oct 1 +-- +oct 2 8 bytes, NOT USED + +oct 9 +---
Re: [devel] [PATCH 1/9] mds: Add README for solution of TIPC buffer overflow at MDS [#1960]
Hi Minh, ack, some minor comments below/Thanks Hans On 2019-08-14 08:38, Minh Chau wrote: > --- > src/mds/README | 221 > + > 1 file changed, 221 insertions(+) > create mode 100644 src/mds/README > > diff --git a/src/mds/README b/src/mds/README > new file mode 100644 > index 000..1b94632 > --- /dev/null > +++ b/src/mds/README > @@ -0,0 +1,221 @@ > +/* -*- OpenSAF -*- > + * > + * (C) Copyright 2019 The OpenSAF Foundation > + * > + * This program is distributed in the hope that it will be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY > + * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed > + * under the GNU Lesser General Public License Version 2.1, February 1999. > + * The complete license can be accessed from the following location: > + * http://opensource.org/licenses/lgpl-license.php > + * See the Copying file included with the OpenSAF distribution for full > + * licensing terms. > + * > + * Author(s): Ericsson AB > + * > + */ > +Background > +== > +If OpenSAF configures TIPC as transport, the MDS library today will use > +TIPC SOCK_RDM socket for message distribution in the cluster. The SOCK_RDM > +datagram socket possibly encounters buffer overflow at receiver ends which > +has been documented in tipc.io[1]. A temporary solution for this buffer > +overflow issue is that the socket buffer size can be increased to a larger > +number. However, if the cluster continues either scaling out or adding more > +components, the system will be under dimensioned, thus the TIPC buffer > +overflow can occur again. > + > +MDS's solution for TIPC buffer overflow > +=== > +If MDS disables TIPC_DEST_DROPPABLE, TIPC will return the ancillary message > +when the original message is failed to deliver. By this event, if the message > +has been saved in queue, MDS at sender sides can search and retransmit this > +message to the receivers. > +Once the messages in the sender's queue has been delivered successfully, MDS > +needs to remove them. MDS introduces its internal ACK message as an > +acknowledgment from receivers so that the senders can remove the messages > +out of the queue. > +Also, as such situation of buffer overflow at receivers, the retransmission > may > +not succeed or even become worse at receiver ends (the more retransmission, > +the more overflow to occur). MDS imitates the sliding window in TCP[2] to > +control the flow of data message towards the receivers. > + > +Legacy MDS data message, new (data + ACK) MDS message, and upgradability > + > +Below is the MDS legacy message format that has been used till OpenSAF > 5.19.07 > + > +oct 0 message length > +oct 1 > +-- > +oct 2 sequence number: incremented for every message sent out to all > destined > +... tipc portid. > +oct 5 > +-- > +oct 6 fragment number: a message with same sequence number can be > fragmented, > +oct 7 identified by this fragment number. > +-- > +oct 8 length check: cross check with message length(oct0,1), NOT USED. > +oct 9 > +-- > +oct 10 protocol version: (MDS_PROT:0xA0 | MDS_VERSION:0x08) = 0xA8, NOT USED > +-- > +oct 11 mds length: length of mds header and mds data, starting from oct13 > +oct 12 > +-- > +oct 13 mds header and data > +... > +-- > + > +The current sequence number/fragment number are being used in MDS for all > +messages sent to all discovered tipc portid(s), meaning that every message > is sent > +to any tipc portid, the sequence/fragment number is increased. The flow > control > +needs its own sequence number sliding between two tipc porid(s) so that > receivers > +can detect message drop due to buffer overload. Therefore, the oct8 and oct9 > are > +now reused as flow control sequence number. The oct10, protocol version, has > new > +value of 0xB8. The format of new data message as below: > + > +oct 0 same > +... > +oct 7 > +-- > +oct 8 flow control sequence number > +oct 9 > +-- > +oct 10 protocol version: (MDS_PROT_TIPC_FCTRL:0xB0 | MDS_VERSION:0x08) = 0xB8 > +-- > +oct 11 same > +... > +-- > + > +The ACK message is introduced to acknowledge one data message or a chunk of > +accumulative data message. The ACK message format: > + > +oct 0 message length > +oct 1 > +-- > +oct 2 8 bytes, NOT USED > + > +oct 9 > +-- > +oct 10 p