But after all retries are still failed, we might need to terminate the portid, which leads to a MDS DOWN event, but let's look at it later.

On 27/11/19 3:23 pm, Minh Hon Chau wrote:
Hi Thuan,

I'm thinking to retry 3 times with 100 ms in between, but you can decide it. Also, we need to ensure not to make the mds main receiving thread being blocked with the retry (on the flow of processing data). The retry in this patch is ok since it retries on the mds flow control thread, so it does not delay the mds main receiving thread.

Thanks

Minh

On 27/11/19 2:40 pm, Tran Thuan wrote:
Hi Minh,

I think it's good if retry some times for normal Send().
Do you have any idea how many retries? Interval b/w tries?

Best Regards,
ThuanTr

-----Original Message-----
From: Minh Hon Chau <minh.c...@dektech.com.au>
Sent: Wednesday, November 27, 2019 10:30 AM
To: thuan.tran <thuan.t...@dektech.com.au>; thang . d . nguyen <thang.d.ngu...@dektech.com.au>; 'Nguyen Minh Vu' <vu.m.ngu...@dektech.com.au>; gary....@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: Fix mds flow control keep all messages in queue [#3123]

Hi Thuan,

The TipcPortId:Send is also called at a few other places, do you think
it is good if we make a wrapper of TipcPortId::Send with a few retries
on failures, says TipcPortId::TryToSend(), and call TryToSend() instead
of Send()?

Thanks

Minh

On 27/11/19 1:26 pm, thuan.tran wrote:
When overflow happens, mds with flow control enabled may keep
all messages in queue if it fails to send a message when receiving
Nack or ChunkAck since no more trigger come after that.
MDS flow control should retry to send message in this scenario.
---
   src/mds/mds_tipc_fctrl_portid.cc | 16 ++++++++++++----
   1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index 724eb7b7b..e6e179669 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -17,6 +17,7 @@
      #include "mds/mds_tipc_fctrl_portid.h"
   #include "base/ncssysf_def.h"
+#include "base/osaf_time.h"
      #include "mds/mds_dt.h"
   #include "mds/mds_log.h"
@@ -440,13 +441,14 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, uint16_t chksize) {
       // try to send a few pending msg
       DataMessage* msg = nullptr;
       uint16_t send_msg_cnt = 0;
-    while (send_msg_cnt++ < chunk_size_) {
+    while (send_msg_cnt < chunk_size_) {
         // find the lowest sequence unsent yet
         msg = sndqueue_.FirstUnsent();
         if (msg == nullptr) {
           break;
         } else {
             if (Send(msg->msg_data_, msg->header_.msg_len_) == NCSCC_RC_SUCCESS) {
+            send_msg_cnt++;
               msg->is_sent_ = true;
               m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
                   "SndQData[fseq:%u, len:%u], "
@@ -455,7 +457,10 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, uint16_t chksize) {
                   msg->header_.fseq_, msg->header_.msg_len_,
                   sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_);
             } else {
-            break;
+            // If not retry, all messages are kept in queue
+            // and no more trigger to send messages
+            osaf_nanosleep(&kTenMilliseconds);
+            continue;
             }
         }
       }
@@ -508,9 +513,12 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t mfrag,
     DataMessage* msg = sndqueue_.Find(Seq16(fseq));
     if (msg != nullptr) {
       // Resend the msg found
-    if (Send(msg->msg_data_, msg->header_.msg_len_) == NCSCC_RC_SUCCESS) {
-      msg->is_sent_ = true;
+    while (Send(msg->msg_data_, msg->header_.msg_len_) != NCSCC_RC_SUCCESS) {
+      // If not retry, all messages are kept in queue
+      // and no more trigger to send messages
+      osaf_nanosleep(&kTenMilliseconds);
       }
+    msg->is_sent_ = true;
       m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
           "RsndData[mseq:%u, mfrag:%u, fseq:%u], "
           "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",



_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to