RE: Some question about DCTCP implementation in FreeBSD
(win - ((win * - dctcp_data->alpha) >> 11)) / mss, 2) * mss; - CCV(ccv, snd_cwnd) = CCV(ccv, snd_ssthresh); - ENTER_CONGRECOVERY(CCV(ccv, t_flags)); - } - dctcp_data->ece_curr = 1; - break; - case CC_RTO: - if (CCV(ccv, t_flags) & TF_ECN_PERMIT) { - CCV(ccv, t_flags) |= TF_ECN_SND_CWR; - dctcp_update_alpha(ccv); - dctcp_data->save_sndnxt += CCV(ccv, t_maxseg); - dctcp_data->num_cong_events++; + break; } - break; - } + } else + newreno_cc_algo.newreno_cong_signal(ccv, type); } static void @@ -303,7 +356,7 @@ dctcp_conn_init(struct cc_var *ccv) static void dctcp_post_recovery(struct cc_var *ccv) { - dctcp_cc_algo.post_recovery = newreno_cc_algo.post_recovery; + newreno_cc_algo.post_recovery(ccv); if (CCV(ccv, t_flags) & TF_ECN_PERMIT) dctcp_update_alpha(ccv); @@ -328,9 +381,12 @@ dctcp_ecnpkt_handler(struct cc_var *ccv) ccflag = ccv->flags; delay_ack = 1; /* -* DCTCP responses an ACK immediately when the CE state -* in between this segment and the last segment is not same. +* DCTCP responds with an ACK immediately when the CE state +* in between this segment and the last segment has changed. */ if (ccflag & CCF_IPHDR_CE) { if (!dctcp_data->ce_prev && (ccflag & CCF_DELACK)) @@ -350,8 +406,11 @@ dctcp_ecnpkt_handler(struct cc_var *ccv) if (delay_ack == 0) ccv->flags |= CCF_ACKNOW; - else - ccv->flags &= ~CCF_ACKNOW; +// else +// ccv->flags &= ~CCF_ACKNOW; //XXRMS: shouldn't we leave this alone here? } /* @@ -366,7 +425,8 @@ dctcp_update_alpha(struct cc_var *ccv) dctcp_data = ccv->cc_data; alpha_prev = dctcp_data->alpha; - dctcp_data->bytes_total = max(dctcp_data->bytes_total, 1); + if (dctcp_data->bytes_total < 1) + dctcp_data->bytes_total = 1; /* * Update alpha: alpha = (1 - g) * alpha + g * F. @@ -379,8 +439,8 @@ dctcp_update_alpha(struct cc_var *ccv) * updated every RTT * Alpha must be round to 0 - MAX_ALPHA_VALUE. */ - dctcp_data->alpha = min(alpha_prev - (alpha_prev >> V_dctcp_shift_g) + - (dctcp_data->bytes_ecn << (10 - V_dctcp_shift_g)) / + dctcp_data->alpha = ulmin(alpha_prev - (alpha_prev >> V_dctcp_shift_g) + + ((uint64_t)dctcp_data->bytes_ecn << (DCTCP_SHIFT - V_dctcp_shift_g)) / dctcp_data->bytes_total, MAX_ALPHA_VALUE); /* Initialize internal parameters for next alpha calculation */ @@ -398,14 +458,10 @@ dctcp_alpha_handler(SYSCTL_HANDLER_ARGS) new = V_dctcp_alpha; error = sysctl_handle_int(oidp, , 0, req); if (error == 0 && req->newptr != NULL) { - if (new > 1) + if (new > 1) //XXRMS: this only effectively allows dctcp_alpha to be set to 0 or 1? error = EINVAL; - else { - if (new > MAX_ALPHA_VALUE) - V_dctcp_alpha = MAX_ALPHA_VALUE; - else - V_dctcp_alpha = new; - } + else + V_dctcp_alpha = new; } return (error); @@ -420,10 +476,15 @@ dctcp_shift_g_handler(SYSCTL_HANDLER_ARGS) new = V_dctcp_shift_g; error = sysctl_handle_int(oidp, , 0, req); if (error == 0 && req->newptr != NULL) { - if (new > 1) + if (new < 0) //XXRMS: ditto error = EINVAL; - else - V_dctcp_shift_g = new; + else { + if (new > DCTCP_SHIFT) { + V_dctcp_shift_g = DCTCP_SHIFT; + error = EINVAL; + } else + V_dctcp_shift_g = new; + } } return (error); @@ -438,7 +499,7 @@ dctcp_slowstart_handler(SYSCTL_HANDLER_ARGS) new = V_dctcp_slowstart; error = sysctl_handle_int(oidp, , 0, req); if (error == 0 && req->newptr != NULL) { - if (new > 1) + if ((new > 1) || (new < 0)) error = EINVAL; else V_dctcp_slowstart = new; @@ -454,7 +515,7 @@ SYSCTL_NODE(_net_inet_tcp_cc, OID_AUTO, dctcp, CTLFLAG_RW, NULL, SYSCTL_PROC(_net_inet_tcp_cc_dctcp, OID_AUTO, alpha, CTLFLAG_VNET|CTLTYPE_UINT|CTLFLAG_RW, _NAME(dctcp_alpha
Re: Some question about DCTCP implementation in FreeBSD
Hi, glad to see interest in DCTCP! On 2019-6-4, at 11:05, Yu He via freebsd-net wrote: > In line 387 of file cc_tcp.c, the update of alpha is calculated by following > code: > > dctcp_data->alpha = min(alpha_prev - (alpha_prev >> V_dctcp_shift_g) + > (dctcp_data->bytes_ecn << (10 - V_dctcp_shift_g)) / > dctcp_data->bytes_total, MAX_ALPHA_VALUE); > > As the update formula from the original paper is alpha = (1 - g) * alpha + g > * F, I’m wandering about what the intention is of using left-shift when > calculating the g * F part, which might seemingly multiplying the value > rather than dividing it as suggested by the previous code. Let alone the > operand (10 - V_dctcp_shift_g) might be a negative value, which will lead > to an undefined behavior in C. You should really, really be looking at RFC8257 (https://tools.ietf.org/html/rfc8257) if you are interested in a DCTCP implementation, and not the original paper (which is of course still required reading for much of the background). The RFC fixes a bunch of issues with the paper, and this is one of them. Also, the t...@ietf.org list is likely going to be a better forum for these questions, except for FreeBSD-specific ones. Lars signature.asc Description: Message signed with OpenPGP
Some question about DCTCP implementation in FreeBSD
Greetings! I’m a graduate student from Carnegie Mellon University, and currently an intern in VMware. I’m now working on some internal project about implementing DCTCP algorithm and coming across the implementation in https://reviews.freebsd.org/rS277054. There is one thing I’m quite curious about. In line 387 of file cc_tcp.c, the update of alpha is calculated by following code: dctcp_data->alpha = min(alpha_prev - (alpha_prev >> V_dctcp_shift_g) + (dctcp_data->bytes_ecn << (10 - V_dctcp_shift_g)) / dctcp_data->bytes_total, MAX_ALPHA_VALUE); As the update formula from the original paper is alpha = (1 - g) * alpha + g * F, I’m wandering about what the intention is of using left-shift when calculating the g * F part, which might seemingly multiplying the value rather than dividing it as suggested by the previous code. Let alone the operand (10 - V_dctcp_shift_g) might be a negative value, which will lead to an undefined behavior in C. Because this is the very central formula of DCTCP algorithm, I’m feeling it necessary to have an explanation from the original authors. Or if in another way this is actually a bug, I’m willing to improve it. Best, Yu He Intern-Product Development-NSBU, VMware Master of Science, Information Networking, Carnegie Mellon University ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: DCTCP implementation
Hi Lawrence, Thank you so much for your response. I discuss KPI changes with Hiren over a month and understood your idea. The main point in your discussion thread was the KPI changes in cc_module. So, I describe about it as follows. As you mention, the changes in my implemtation is DCTCP specific ones. It makes sense not to add these two functions (ect_handler and ecnpkt_handler) into the main stream. At the same time, without these functions, it means that we cannot use DCTCP in FreeBSD. We have to find alternative ways to be putting into them. I think the compatibility with RFC3168 is the thing we should consider. The current KPI has not thought about it. There are two idea to achieve this: - append parameters to handle ECN in struct cc_var - change APIs to process ECN The former idea is easy and elegant. Which idea do you like when we implement DCTCP? And let me know if you have more better idea. I love to work on this to complete the DCTCP implementation. Hiren, if I miss something about our discussion, please provide additional explanation. Many thanks, -- Midori (9/2/14, 9:05 AM), Lawrence Stewart wrote: Hi Midori ( Lars), On 03/31/14 16:37, Midori Kato wrote: Hi FreeBSD developpers, I'm Midori Kato. I'm working on the DCTCP implementation in the FreeBSD with Lars Eggert. I mail you because I would like to ask you a code review and testing. The attached patch is not good enough to test our code. Please give me your message. I will send an ECN marking implmenetation in dummynet and test scripts personally to you. [snip] Firstly, please let me apologise for not providing follow up feedback and review after our initial discussion of your work last year. It was always my intention to come back to this but life has been particularly crazy the past 12 months and I have been unsuccessful in my attempts to keep a number of balls in the air. Hiren has been diligently working on shepherding your code into the FreeBSD source tree and has been prodding me for a review which I finally got around to yesterday. Unfortunately, my understanding is that non developers are unable to interact with the code review system being used, so we're moving the discussion back to the mailing list so that you and others can participate. You can read the review discussion history to date at [1]. Leaving editorial work on the documentation aside for the time being, I'd like to discuss the KPI changes and stack integration aspects of the patch to start with. Specifically: - The ect_handler KPI addition seems redundant to me and I believe the patch can be simplified by removing it entirely. - The ecnpkt_handler KPI addition takes arguments which are too specific to DCTCP, and is too indiscriminate in what it matches i.e. all packets for ECN enabled connections. I would argue we either add a fully generalist hook which would catch all packets of an ESTABLISHED connection, and have the dctcp module check if ECN is enabled or not before continuing processing (not my preferred option), or we extract the essence of what dctcp_ecnpkt_handler() needs to do into a less generic hook or bit of meta data passed in to an existing hook via struct cc_var (I need to study the code a bit more, but will likely strongly push for this option). - I don't believe the code correctly handles the case of dctcp being used on connections which did not successfully negotiate ECN. There is more to discuss but I think getting these high level technical things resolved first will help guide the remainder of the review discussion. Cheers, Lawrence [1] https://reviews.freebsd.org/D604 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: DCTCP implementation
Hi Midori ( Lars), On 03/31/14 16:37, Midori Kato wrote: Hi FreeBSD developpers, I'm Midori Kato. I'm working on the DCTCP implementation in the FreeBSD with Lars Eggert. I mail you because I would like to ask you a code review and testing. The attached patch is not good enough to test our code. Please give me your message. I will send an ECN marking implmenetation in dummynet and test scripts personally to you. [snip] Firstly, please let me apologise for not providing follow up feedback and review after our initial discussion of your work last year. It was always my intention to come back to this but life has been particularly crazy the past 12 months and I have been unsuccessful in my attempts to keep a number of balls in the air. Hiren has been diligently working on shepherding your code into the FreeBSD source tree and has been prodding me for a review which I finally got around to yesterday. Unfortunately, my understanding is that non developers are unable to interact with the code review system being used, so we're moving the discussion back to the mailing list so that you and others can participate. You can read the review discussion history to date at [1]. Leaving editorial work on the documentation aside for the time being, I'd like to discuss the KPI changes and stack integration aspects of the patch to start with. Specifically: - The ect_handler KPI addition seems redundant to me and I believe the patch can be simplified by removing it entirely. - The ecnpkt_handler KPI addition takes arguments which are too specific to DCTCP, and is too indiscriminate in what it matches i.e. all packets for ECN enabled connections. I would argue we either add a fully generalist hook which would catch all packets of an ESTABLISHED connection, and have the dctcp module check if ECN is enabled or not before continuing processing (not my preferred option), or we extract the essence of what dctcp_ecnpkt_handler() needs to do into a less generic hook or bit of meta data passed in to an existing hook via struct cc_var (I need to study the code a bit more, but will likely strongly push for this option). - I don't believe the code correctly handles the case of dctcp being used on connections which did not successfully negotiate ECN. There is more to discuss but I think getting these high level technical things resolved first will help guide the remainder of the review discussion. Cheers, Lawrence [1] https://reviews.freebsd.org/D604 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: DCTCP implementation
Hi Hiren, Yes, I intentionally replace the location of DELAY_ACK() macro. A DCTCP receiver sends an immediate ACK when incoming packets sets CE and non-CE bit by turns. To implement this processing, I prepare the cc_ecnpkt_handler() function. This function calls the DEKAY_ACK() macro to check if this packet is in delayed ack or not. This is why I replace the macro position. Let me know if my explanation is not good enough. Regards, -- Midori (4/3/14, 10:24 AM), hiren panchasara wrote: On Sun, Mar 30, 2014 at 10:37 PM, Midori Kato kat...@sfc.wide.ad.jp wrote: Hi FreeBSD developpers, I'm Midori Kato. I'm working on the DCTCP implementation in the FreeBSD with Lars Eggert. I mail you because I would like to ask you a code review and testing. The attached patch is not good enough to test our code. Please give me your message. I will send an ECN marking implmenetation in dummynet and test scripts personally to you. Midori, First thing I noticed in your dctcp.patch is that you are dropping r256920 which adds a new argument tlen to DELAY_ACK() macro. I also see you removed and re-added that macro definition without any changes to it. Is that intentional? If not, can you please fix that and resend the patch? It is usually a good idea to work on -HEAD for such things so patching/committing is easier. cheers, Hiren ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: DCTCP implementation
On Thu, Apr 3, 2014 at 4:27 AM, Midori Kato kat...@sfc.wide.ad.jp wrote: Hi Hiren, Yes, I intentionally replace the location of DELAY_ACK() macro. A DCTCP receiver sends an immediate ACK when incoming packets sets CE and non-CE bit by turns. To implement this processing, I prepare the cc_ecnpkt_handler() function. This function calls the DEKAY_ACK() macro to check if this packet is in delayed ack or not. This is why I replace the macro position. That is fine but macro definition is also changed. This is how it looks on -HEAD right now: #define DELAY_ACK(tp, tlen) \ ((!tcp_timer_active(tp, TT_DELACK) \ (tp-t_flags TF_RXWIN0SENT) == 0) \ (tlen = tp-t_maxopd) \ (V_tcp_delack_enabled || (tp-t_flags TF_NEEDSYN))) We need to pass this new argument tlen when calling DELAY_ACK() from cc_ecnpkt_handler() function. We can probably pass that to cc_encpkg_handler() while calling it from tcp_do_segment(). I'll make that change and see how it goes. cheers, Hiren Regards, -- Midori (4/3/14, 10:24 AM), hiren panchasara wrote: On Sun, Mar 30, 2014 at 10:37 PM, Midori Kato kat...@sfc.wide.ad.jp wrote: Hi FreeBSD developpers, I'm Midori Kato. I'm working on the DCTCP implementation in the FreeBSD with Lars Eggert. I mail you because I would like to ask you a code review and testing. The attached patch is not good enough to test our code. Please give me your message. I will send an ECN marking implmenetation in dummynet and test scripts personally to you. Midori, First thing I noticed in your dctcp.patch is that you are dropping r256920 which adds a new argument tlen to DELAY_ACK() macro. I also see you removed and re-added that macro definition without any changes to it. Is that intentional? If not, can you please fix that and resend the patch? It is usually a good idea to work on -HEAD for such things so patching/committing is easier. cheers, Hiren ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: DCTCP implementation
On Sun, Mar 30, 2014 at 10:37 PM, Midori Kato kat...@sfc.wide.ad.jp wrote: Hi FreeBSD developpers, I'm Midori Kato. I'm working on the DCTCP implementation in the FreeBSD with Lars Eggert. I mail you because I would like to ask you a code review and testing. The attached patch is not good enough to test our code. Please give me your message. I will send an ECN marking implmenetation in dummynet and test scripts personally to you. Midori, First thing I noticed in your dctcp.patch is that you are dropping r256920 which adds a new argument tlen to DELAY_ACK() macro. I also see you removed and re-added that macro definition without any changes to it. Is that intentional? If not, can you please fix that and resend the patch? It is usually a good idea to work on -HEAD for such things so patching/committing is easier. cheers, Hiren ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: DCTCP implementation
On Sun, Mar 30, 2014 at 10:37 PM, Midori Kato kat...@sfc.wide.ad.jp wrote: Hi FreeBSD developpers, I'm Midori Kato. I'm working on the DCTCP implementation in the FreeBSD with Lars Eggert. I mail you because I would like to ask you a code review and testing. The attached patch is not good enough to test our code. Please give me your message. I will send an ECN marking implmenetation in dummynet and test scripts personally to you. Hi Midori, Thank you for your work and I'd like to test this. Please let me know how you setup the dummynet testing cluster and I'll try it. cheers, Hiren ccing gnn@ as he was also asking for the same (testing details). ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: DCTCP implementation
Hi, On 2014-3-31, at 7:37, Midori Kato kat...@sfc.wide.ad.jp wrote: I will send an ECN marking implmenetation in dummynet and test scripts personally to you. I think you can send the dummynet ECN patch also to the list. I know Luigi is reviewing it for a merge, but that lets people play with it already. Lars signature.asc Description: Message signed with OpenPGP using GPGMail
DCTCP implementation
Hi FreeBSD developpers, I'm Midori Kato. I'm working on the DCTCP implementation in the FreeBSD with Lars Eggert. I mail you because I would like to ask you a code review and testing. The attached patch is not good enough to test our code. Please give me your message. I will send an ECN marking implmenetation in dummynet and test scripts personally to you. A DCTCP paper is published in SIGCOMM 2010. DCTCP is also published as an IETF draft [1]. In our implementation, there are a change to the modular congestion control framework and five changes from the original DCTCP algorithm. I briefly describe each of them as below. 1 A change for the modular congestion control framework DCTCP uses the different ECN processing from RFC3168. We need three functions to do the proper DCTCP ECN processing. a) The kernel decides whether an ECE flag should be set in the outgoing TCP segment. (tcp_input.c) b) The kernel controls congestion if an ECE flag is set in the arriving TCP segment. (tcp_input.c) c) After the outgoing TCP segment is generated, the kernel decides whether an ECT bit should be set in an ECN field of IP header in the outgoing packet. (tcp_output.c) The current framework has no housekeeping functions for (a) and (b). Therefore, I add two functions into the moduler cc framework: ecnpkt_handler() and ect_handler(). - ecnpkt_handler() allows the kernel to do the additional ECN processing by snooping ECN field in IP and TCP headers. As an option, this function check a delayed ACK flag, which tells whether this function is in the delayed ACK. This function returns an integer value. When the return value is set, the kernel force to disable delayed ACK. - ect_handler() allows the kernel to use the different rule from RFC3168 in terms of an ECT marking in the outgoing segment. This function returns an integer value. If the value is set, an ECT bit is set to the outgoing segment. 2 Five changes from the original DCTCP algorithm In order to reflect the DCTCP motivation correctly, the following modifications are included in our patch. First four modifications are prepared for senders and the last modification is prepared for receivers. (1) ECE processing FreeBSD handles ECN as a congestion event but it's not true for DCTCP senders. A DCTCP sender uses ECN as a means to understand the extent of congestions. Therefore, a modified DCTCP sender never enters congestion recovery mode in any situations. (2) selective initial alpha value DCTCP defines alpha as a parameter to see the depth of a congestion. When the alpha value is large, it allows a saw-toothed CWND behavior to a DCTCP sender. A problem is that the alpha value is not reliable during a dozen of RTTs because there is no way to identify the depth of a congestion over a network from the beginning. When considering the alpha reliability, I think the initial alpha should be selective for applications by users. When a user chooses DCTCP for latency-sensitive applications, the initial alpha is preferred. Otherwise, DCTCP senders had better to set the initial alpha value to zero. The default alpha value is set to zero in our implementation. (3) alpha value initialization after an idle period The original DCTCP paper does not define how the sender behaves after idle time. A DCTCP sender resets alpha to the initial value when an idle time happens. The following changes is applied to eliminate a compatibility issue to standard ECN defined in RFC3465. Currently, DCTCP and standard ECN servers have no way to identify which mechanism is working on the peer. Thus, we eliminate the worst situation in a network mixing DCTCP senders/receivers and standard ECN senders/receivers. (4) Emitting CWRs at one-sided senders This change is applied for a situation when a sender uses DCTCP and a reciever uses standard ECN. Under the situation, we find that a DCTCP sender minimizes CWND. Fortunately, the current tcp_input() function complement this change, thus, there is no modification in our patch. (5) delayed ACK at one-sided receivers This change is applied for a situation when a sender uses standard ECN and a reciever uses DCTCP. Under the situation, we find that a standard ECN sender increases smaller CWND than expected when the one-sided DCTCP receiver unsets delayed ACK against a packet with CWR flag. Thus, we always apply delayed ACK only when CWR flag is set in the arriving packet. If you want to understand the detailed background of these modifications, see my thesis [2], especially in section 3 and 4. I'm looking forward to hear from you! Regards, -- Midori [1] http://tools.ietf.org/html/draft-bensley-tcpm-dctcp [2]https://eggert.org/students/kato-thesis.pdf diff --git a/sys/modules/cc/Makefile b/sys/modules/cc/Makefile index 7b851f5..7f4e94e 100644 --- a/sys/modules/cc/Makefile +++ b/sys/modules/cc/Makefile @@ -3,6 +3,7 @@ SUBDIR=cc_cdg \ cc_chd