Re: [RFC2][PATCH 7/7] [TFRC]: New rx history code
Em Thu, Dec 06, 2007 at 02:02:25PM +, Gerrit Renker escreveu: | The first six patches in this series are unmodified, so if you | are OK with them please send me your Signed-off-by. Patches [1/7], [2/7], and [6/7] already have a signed-off and there are no changes. Just acknowledged [3..5/7], will look at [7/7] now. OK, please let me know if there are still any problems. The removal of timestamp insertion in ccid3_hc_rx_insert_options will be put in another cset. - Arnaldo - To unsubscribe from this list: send the line unsubscribe dccp in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/7] [TFRC]: New rx history code
Credit here goes to Gerrit Renker, that provided the initial implementation for this new codebase. I modified it just to try to make it closer to the existing API, renaming some functions, add namespacing and fix one bug where the tfrc_rx_hist_alloc was not freeing the allocated ring entries on the error path. Original changeset comment from Gerrit: --- This provides a new, self-contained and generic RX history service for TFRC based protocols. Details: * new data structure, initialisation and cleanup routines; * allocation of dccp_rx_hist entries local to packet_history.c, as a service exported by the dccp_tfrc_lib module. * interface to automatically track highest-received seqno; * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1); * a generic function to test for `data packets' as per RFC 4340, sec. 7.7. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] --- net/dccp/ccids/ccid3.c | 288 -- net/dccp/ccids/ccid3.h | 14 +- net/dccp/ccids/lib/loss_interval.c | 13 ++- net/dccp/ccids/lib/packet_history.c | 290 +-- net/dccp/ccids/lib/packet_history.h | 83 +-- 5 files changed, 330 insertions(+), 358 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 5ff5aab..faacffa 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -641,6 +641,15 @@ static int ccid3_hc_tx_getsockopt(struct sock *sk, const int optname, int len, /* * Receiver Half-Connection Routines */ + +/* CCID3 feedback types */ +enum ccid3_fback_type { + CCID3_FBACK_NONE = 0, + CCID3_FBACK_INITIAL, + CCID3_FBACK_PERIODIC, + CCID3_FBACK_PARAM_CHANGE +}; + #ifdef CONFIG_IP_DCCP_CCID3_DEBUG static const char *ccid3_rx_state_name(enum ccid3_hc_rx_states state) { @@ -667,59 +676,60 @@ static void ccid3_hc_rx_set_state(struct sock *sk, hcrx-ccid3hcrx_state = state; } -static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) -{ - if (likely(len 0))/* don't update on empty packets (e.g. ACKs) */ - hcrx-ccid3hcrx_s = tfrc_ewma(hcrx-ccid3hcrx_s, len, 9); -} - -static void ccid3_hc_rx_send_feedback(struct sock *sk) +static void ccid3_hc_rx_send_feedback(struct sock *sk, + const struct sk_buff *skb, + enum ccid3_fback_type fbtype) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - struct tfrc_rx_hist_entry *packet; ktime_t now; - suseconds_t delta; + s64 delta = 0; ccid3_pr_debug(%s(%p) - entry \n, dccp_role(sk), sk); + if (unlikely(hcrx-ccid3hcrx_state == TFRC_RSTATE_TERM)) + return; + now = ktime_get_real(); - switch (hcrx-ccid3hcrx_state) { - case TFRC_RSTATE_NO_DATA: + switch (fbtype) { + case CCID3_FBACK_INITIAL: hcrx-ccid3hcrx_x_recv = 0; + hcrx-ccid3hcrx_pinv = ~0U; /* see RFC 4342, 8.5 */ break; - case TFRC_RSTATE_DATA: - delta = ktime_us_delta(now, - hcrx-ccid3hcrx_tstamp_last_feedback); - DCCP_BUG_ON(delta 0); - hcrx-ccid3hcrx_x_recv = - scaled_div32(hcrx-ccid3hcrx_bytes_recv, delta); + case CCID3_FBACK_PARAM_CHANGE: + /* +* When parameters change (new loss or p p_prev), we do not +* have a reliable estimate for R_m of [RFC 3448, 6.2] and so +* need to reuse the previous value of X_recv. However, when +* X_recv was 0 (due to early loss), this would kill X down to +* s/t_mbi (i.e. one packet in 64 seconds). +* To avoid such drastic reduction, we approximate X_recv as +* the number of bytes since last feedback. +* This is a safe fallback, since X is bounded above by X_calc. +*/ + if (hcrx-ccid3hcrx_x_recv 0) + break; + /* fall through */ + case CCID3_FBACK_PERIODIC: + delta = ktime_us_delta(now, hcrx-ccid3hcrx_tstamp_last_feedback); + if (delta = 0) + DCCP_BUG(delta (%ld) = 0, (long)delta); + else + hcrx-ccid3hcrx_x_recv = + scaled_div32(hcrx-ccid3hcrx_bytes_recv, delta); break; - case TFRC_RSTATE_TERM: - DCCP_BUG(%s(%p) - Illegal state TERM, dccp_role(sk), sk); + default: return; } - packet = tfrc_rx_hist_find_data_packet(hcrx-ccid3hcrx_hist); - if (unlikely(packet == NULL)) { - DCCP_WARN(%s(%p), no data
Re: [PATCH 7/7] [TFRC] New rx history code
Em Wed, Dec 05, 2007 at 09:35:30AM +, Gerrit Renker escreveu: Quoting Arnaldo: | Em Tue, Dec 04, 2007 at 06:55:17AM +, Gerrit Renker escreveu: | NAK. You have changed the control flow of the algorithm and the underlying | data structure. Originally it had been an array of pointers, and this had | been a design decision right from the first submissions in March. From I | of http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt | | 1. 'ring' is an array of pointers rather than a contiguous area in | memory. The reason is that, when having to swap adjacent entries | to sort the history, it is easier to swap pointers than to copy | (heavy-weight) history-entry data structs. | | But in your algorithm it becomes a normal linear array. | | As a consequence all subsequent code needs to be rewritten to use | copying instead of swapping pointers. And there is a lot of that, since | the loss detection code makes heavy use of swapping entries in order to | cope with reordered and/or delayed packets. | | So lets not do that, the decision was made based on the RX history patch | alone, where, as pointed out, no swapping occurs. | | This is not what I would call entirely equivalent as you claim. Your | algorithm is fundamentally different, the control flow is not the same, | nor are the underlying data structures. | | Its is equivalent up to what is in the tree, thank you for point out | that in subsequent patches it will be used. | | Had this design decision been made explicit in the changeset comment | that introduces the data structure I wouldn't have tried to reduce the | number of allocations. | | And you even said that it was a good decision when first reacting to | this change, which I saw as positive feedback for the RFC I sent. | That was my fault. Your solution looked much better to me but even I had forgotten that the algorithm is based on an array of pointers. When I finally sat down And you wrote that code, I acted on looking the code together with reading the changeset comment, to me it also looked much better to do that, but as there were many changes, I sent an [RFC] message. It served its purpose, as you after two iterations realised it was in fact not a good idea as later an array of pointers is required. I should have taken that swap routine as the definitive hint that indeed it was needed. and looked through the patch set I found the original notes from March and saw that using a linear array instead of an array of pointers will require quite a few changes and means changing the algorithm. I then thought through whether there could be any advantage in using a linear array as in this patch, but could find none. In the end this went back to the same questions and issues that were tackled between February and March this year. I wish we could have had this dicussion then; it would have made this email unnecessary. That is why it is important that the changeset comment collects these scattered notes and discussions. Think about in some years from now we will be left with a situation where we would have to look at many places to understando some aspects of what is in the code. While this happens and we have google for that I think that since you keep such detailed notes it is not a problem to get them in the changesets. I thank you for your replay and I am sorry if you took offense at my perhaps a little strong reaction, here is the essence what I tried to convey but what maybe did not succeed in conveying: * I am absolutely ok with you making changes to variable names, namespaces, file organisation, overall arrangement etc (as per previous email). You have experience and this will often be a benefit. For instance, you combined tfrc_entry_from_skb() with tfrc_rx_hist_update() to give a new function, tfrc_rx_hist_add_packet(). This is ok since entry_from_skb() is not otherwise used (and I only found this out when reading your patch). (On the other hand, since this is a 4-element ring buffer, it is no real adding, the same entry will frequently be overwritten, this is why it was initially called hist_update().) From the RX history API user perspective (CCID3 right now, others may come) it is adding a packet to the history. In the past it went to a list, now we have a ring and don't keep more than TFRC_NDUPACK + 1 entries, but this is an internal detail that users don't need to know. * I am also not so much concerned about credit. As long as the (changed) end result is as least as good as or hopefully better than the submitted input, the author name is a side issue; so I do think that your approach is sound and fair. The only place where credits are needed is for the SatSix project (www.ist-satsix.org) which funded this work. This is why some of the
Re: [PATCH 7/7] [TFRC] New rx history code
| In the end this went back to the same questions and issues that were tackled between | February and March this year. I wish we could have had this dicussion then; it would | have made this email unnecessary. | | That is why it is important that the changeset comment collects these | scattered notes and discussions. Think about in some years from now we | will be left with a situation where we would have to look at many places | to understando some aspects of what is in the code. While this happens | and we have google for that I think that since you keep such detailed | notes it is not a problem to get them in the changesets. | I agree, the changelogs are all a bit short. When condensing from 40 to 16 patches the changelogs were also cut down, for fear of being too verbose. Will go through the patches in the next days and see if/where this can be improved, that would probably have saved some trouble. | * But where I need to interact is when changes are made to the algorithm, even if these may | seem small. The underlying reasons that lead to the code may not be immediately clear, | but since this is a solution for a specific problem, there are also specific reasons; which | is why for algorithm changes (and underlying data structure) I'd ask you to discuss this first | before committing the patch. | | I did this for the RX handling, where changes were made. Did that for | the TX but ended up commiting before comments from you, but I think its | fair to say that the changes for the TX history were more organizational | than in the essence of the algorithm. | It was a similar situation: the RFC patch came out on Thursday afternoon; I replied a bit hurriedly on Friday that it seemed ok, but found the full confirmation only on Sunday after putting all recent changes into the test tree. And yes, you made a good job of it. - To unsubscribe from this list: send the line unsubscribe dccp in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] [TFRC] New rx history code
Quoting Arnaldo: | Em Tue, Dec 04, 2007 at 06:55:17AM +, Gerrit Renker escreveu: | NAK. You have changed the control flow of the algorithm and the underlying | data structure. Originally it had been an array of pointers, and this had | been a design decision right from the first submissions in March. From I | of http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt | | 1. 'ring' is an array of pointers rather than a contiguous area in | memory. The reason is that, when having to swap adjacent entries | to sort the history, it is easier to swap pointers than to copy | (heavy-weight) history-entry data structs. | | But in your algorithm it becomes a normal linear array. | | As a consequence all subsequent code needs to be rewritten to use | copying instead of swapping pointers. And there is a lot of that, since | the loss detection code makes heavy use of swapping entries in order to | cope with reordered and/or delayed packets. | | So lets not do that, the decision was made based on the RX history patch | alone, where, as pointed out, no swapping occurs. | | This is not what I would call entirely equivalent as you claim. Your | algorithm is fundamentally different, the control flow is not the same, | nor are the underlying data structures. | | Its is equivalent up to what is in the tree, thank you for point out | that in subsequent patches it will be used. | | Had this design decision been made explicit in the changeset comment | that introduces the data structure I wouldn't have tried to reduce the | number of allocations. | | And you even said that it was a good decision when first reacting to | this change, which I saw as positive feedback for the RFC I sent. | That was my fault. Your solution looked much better to me but even I had forgotten that the algorithm is based on an array of pointers. When I finally sat down and looked through the patch set I found the original notes from March and saw that using a linear array instead of an array of pointers will require quite a few changes and means changing the algorithm. I then thought through whether there could be any advantage in using a linear array as in this patch, but could find none. In the end this went back to the same questions and issues that were tackled between February and March this year. I wish we could have had this dicussion then; it would have made this email unnecessary. I thank you for your replay and I am sorry if you took offense at my perhaps a little strong reaction, here is the essence what I tried to convey but what maybe did not succeed in conveying: * I am absolutely ok with you making changes to variable names, namespaces, file organisation, overall arrangement etc (as per previous email). You have experience and this will often be a benefit. For instance, you combined tfrc_entry_from_skb() with tfrc_rx_hist_update() to give a new function, tfrc_rx_hist_add_packet(). This is ok since entry_from_skb() is not otherwise used (and I only found this out when reading your patch). (On the other hand, since this is a 4-element ring buffer, it is no real adding, the same entry will frequently be overwritten, this is why it was initially called hist_update().) * I am also not so much concerned about credit. As long as the (changed) end result is as least as good as or hopefully better than the submitted input, the author name is a side issue; so I do think that your approach is sound and fair. The only place where credits are needed is for the SatSix project (www.ist-satsix.org) which funded this work. This is why some of the copyrights now included University of Aberdeen. What in any case I'd like to add is at least the Signed-off-by: if it turns out to be a mess I want to contribute my part of responsibility. * But where I need to interact is when changes are made to the algorithm, even if these may seem small. The underlying reasons that lead to the code may not be immediately clear, but since this is a solution for a specific problem, there are also specific reasons; which is why for algorithm changes (and underlying data structure) I'd ask you to discuss this first before committing the patch. * In part you are already doing this (message subject); it may be necessary to adapt the way of discussion/presentation. The subject is complex (spent a week with the flow chart only) and it is a lot - initially this had been 40 small patches. | | I modified it just to try to make it closer to the existing API, hide details from | | the CCIDs and fix a couple bugs found in the initial implementation. | What is a couple bugs? So far you have pointed out only one problem and that was | agreed, it can be fixed. But it remains a side issue, it is not a failure of the | | I pointed two problems one ended up being just the