Re: [RFC2][PATCH 7/7] [TFRC]: New rx history code

2007-12-06 Thread Arnaldo Carvalho de Melo
Em Thu, Dec 06, 2007 at 02:02:25PM +, Gerrit Renker escreveu:
 | The first six patches in this series are unmodified, so if you
 | are OK with them please send me your Signed-off-by.
 Patches [1/7], [2/7], and [6/7] already have a signed-off and there are
 no changes. Just acknowledged [3..5/7], will look at [7/7] now.

OK, please let me know if there are still any problems.

The removal of timestamp insertion in ccid3_hc_rx_insert_options will be
put in another cset.

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe dccp in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] [TFRC]: New rx history code

2007-12-06 Thread Arnaldo Carvalho de Melo
Credit here goes to Gerrit Renker, that provided the initial implementation for
this new codebase.

I modified it just to try to make it closer to the existing API, renaming some
functions, add namespacing and fix one bug where the tfrc_rx_hist_alloc was not
freeing the allocated ring entries on the error path.

Original changeset comment from Gerrit:
  ---
This provides a new, self-contained and generic RX history service for TFRC
based protocols.

Details:
 * new data structure, initialisation and cleanup routines;
 * allocation of dccp_rx_hist entries local to packet_history.c,
   as a service exported by the dccp_tfrc_lib module.
 * interface to automatically track highest-received seqno;
 * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1);
 * a generic function to test for `data packets' as per  RFC 4340, sec. 7.7.

Signed-off-by: Gerrit Renker [EMAIL PROTECTED]
Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
 net/dccp/ccids/ccid3.c  |  288 --
 net/dccp/ccids/ccid3.h  |   14 +-
 net/dccp/ccids/lib/loss_interval.c  |   13 ++-
 net/dccp/ccids/lib/packet_history.c |  290 +--
 net/dccp/ccids/lib/packet_history.h |   83 +--
 5 files changed, 330 insertions(+), 358 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 5ff5aab..faacffa 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -641,6 +641,15 @@ static int ccid3_hc_tx_getsockopt(struct sock *sk, const 
int optname, int len,
 /*
  * Receiver Half-Connection Routines
  */
+
+/* CCID3 feedback types */
+enum ccid3_fback_type {
+   CCID3_FBACK_NONE = 0,
+   CCID3_FBACK_INITIAL,
+   CCID3_FBACK_PERIODIC,
+   CCID3_FBACK_PARAM_CHANGE
+};
+
 #ifdef CONFIG_IP_DCCP_CCID3_DEBUG
 static const char *ccid3_rx_state_name(enum ccid3_hc_rx_states state)
 {
@@ -667,59 +676,60 @@ static void ccid3_hc_rx_set_state(struct sock *sk,
hcrx-ccid3hcrx_state = state;
 }
 
-static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len)
-{
-   if (likely(len  0))/* don't update on empty packets (e.g. ACKs) */
-   hcrx-ccid3hcrx_s = tfrc_ewma(hcrx-ccid3hcrx_s, len, 9);
-}
-
-static void ccid3_hc_rx_send_feedback(struct sock *sk)
+static void ccid3_hc_rx_send_feedback(struct sock *sk,
+ const struct sk_buff *skb,
+ enum ccid3_fback_type fbtype)
 {
struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
struct dccp_sock *dp = dccp_sk(sk);
-   struct tfrc_rx_hist_entry *packet;
ktime_t now;
-   suseconds_t delta;
+   s64 delta = 0;
 
ccid3_pr_debug(%s(%p) - entry \n, dccp_role(sk), sk);
 
+   if (unlikely(hcrx-ccid3hcrx_state == TFRC_RSTATE_TERM))
+   return;
+
now = ktime_get_real();
 
-   switch (hcrx-ccid3hcrx_state) {
-   case TFRC_RSTATE_NO_DATA:
+   switch (fbtype) {
+   case CCID3_FBACK_INITIAL:
hcrx-ccid3hcrx_x_recv = 0;
+   hcrx-ccid3hcrx_pinv   = ~0U;   /* see RFC 4342, 8.5 */
break;
-   case TFRC_RSTATE_DATA:
-   delta = ktime_us_delta(now,
-  hcrx-ccid3hcrx_tstamp_last_feedback);
-   DCCP_BUG_ON(delta  0);
-   hcrx-ccid3hcrx_x_recv =
-   scaled_div32(hcrx-ccid3hcrx_bytes_recv, delta);
+   case CCID3_FBACK_PARAM_CHANGE:
+   /*
+* When parameters change (new loss or p  p_prev), we do not
+* have a reliable estimate for R_m of [RFC 3448, 6.2] and so
+* need to  reuse the previous value of X_recv. However, when
+* X_recv was 0 (due to early loss), this would kill X down to
+* s/t_mbi (i.e. one packet in 64 seconds).
+* To avoid such drastic reduction, we approximate X_recv as
+* the number of bytes since last feedback.
+* This is a safe fallback, since X is bounded above by X_calc.
+*/
+   if (hcrx-ccid3hcrx_x_recv  0)
+   break;
+   /* fall through */
+   case CCID3_FBACK_PERIODIC:
+   delta = ktime_us_delta(now, 
hcrx-ccid3hcrx_tstamp_last_feedback);
+   if (delta = 0)
+   DCCP_BUG(delta (%ld) = 0, (long)delta);
+   else
+   hcrx-ccid3hcrx_x_recv =
+   scaled_div32(hcrx-ccid3hcrx_bytes_recv, delta);
break;
-   case TFRC_RSTATE_TERM:
-   DCCP_BUG(%s(%p) - Illegal state TERM, dccp_role(sk), sk);
+   default:
return;
}
 
-   packet = tfrc_rx_hist_find_data_packet(hcrx-ccid3hcrx_hist);
-   if (unlikely(packet == NULL)) {
-   DCCP_WARN(%s(%p), no data 

Re: [PATCH 7/7] [TFRC] New rx history code

2007-12-05 Thread Arnaldo Carvalho de Melo
Em Wed, Dec 05, 2007 at 09:35:30AM +, Gerrit Renker escreveu:
 Quoting Arnaldo:
 | Em Tue, Dec 04, 2007 at 06:55:17AM +, Gerrit Renker escreveu:
 |  NAK. You have changed the control flow of the algorithm and the underlying
 |  data structure. Originally it had been an array of pointers, and this had
 |  been a design decision right from the first submissions in March. From I
 |  of 
 http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt
 |   
 |   1. 'ring' is an array of pointers rather than a contiguous area in
 |   memory. The reason is that, when having to swap adjacent entries
 |   to sort the history, it is easier to swap pointers than to copy
 |   (heavy-weight) history-entry data structs.
 |  
 |  But in your algorithm it becomes a normal linear array.
 |  
 |  As a consequence all subsequent code needs to be rewritten to use
 |  copying instead of swapping pointers. And there is a lot of that, since
 |  the loss detection code makes heavy use of swapping entries in order to
 |  cope with reordered and/or delayed packets.
 | 
 | So lets not do that, the decision was made based on the RX history patch
 | alone, where, as pointed out, no swapping occurs.
 |  
 |  This is not what I would call entirely equivalent as you claim. Your
 |  algorithm is fundamentally different, the control flow is not the same,
 |  nor are the underlying data structures.
 | 
 | Its is equivalent up to what is in the tree, thank you for point out
 | that in subsequent patches it will be used.
 | 
 | Had this design decision been made explicit in the changeset comment
 | that introduces the data structure I wouldn't have tried to reduce the
 | number of allocations.
 | 
 | And you even said that it was a good decision when first reacting to
 | this change, which I saw as positive feedback for the RFC I sent.
 | 
 That was my fault. Your solution looked much better to me but even I had 
 forgotten
 that the algorithm is based on an array of pointers. When I finally sat down 

And you wrote that code, I acted on looking the code together with
reading the changeset comment, to me it also looked much better to do
that, but as there were many changes, I sent an [RFC] message. It served
its purpose, as you after two iterations realised it was in fact not a
good idea as later an array of pointers is required. I should have taken
that swap routine as the definitive hint that indeed it was needed.

 and looked through the patch set I found the original notes from March and
 saw that using a linear array instead of an array of pointers will require 
 quite
 a few changes and means changing the algorithm. I then thought through 
 whether there
 could be any advantage in using a linear array as in this patch, but could 
 find none.
 In the end this went back to the same questions and issues that were tackled 
 between
 February and March this year. I wish we could have had this dicussion then; 
 it would
 have made this email unnecessary.

That is why it is important that the changeset comment collects these
scattered notes and discussions. Think about in some years from now we
will be left with a situation where we would have to look at many places
to understando some aspects of what is in the code. While this happens
and we have google for that I think that since you keep such detailed
notes it is not a problem to get them in the changesets.
 
 I thank you for your replay and I am sorry if you took offense at my perhaps 
 a little strong
 reaction, here is the essence what I tried to convey but what maybe did not 
 succeed in conveying:
 
  * I am absolutely ok with you making changes to variable names, namespaces, 
 file organisation,
overall arrangement etc (as per previous email). You have experience and 
 this will often be a 
benefit. For instance, you combined tfrc_entry_from_skb() with 
 tfrc_rx_hist_update() to give
a new function, tfrc_rx_hist_add_packet(). This is ok since 
 entry_from_skb() is not otherwise
used (and I only found this out when reading your patch). (On the other 
 hand, since this is a
4-element ring buffer, it is no real adding, the same entry will 
 frequently be overwritten,
this is why it was initially called hist_update().)

From the RX history API user perspective (CCID3 right now, others may
come) it is adding a packet to the history. In the past it went to a
list, now we have a ring and don't keep more than TFRC_NDUPACK + 1
entries, but this is an internal detail that users don't need to know.
 
  * I am also not so much concerned about credit. As long as the (changed) end 
 result is as
least as good as or hopefully better than the submitted input, the author 
 name is a side
issue; so I do think that your approach is sound and fair. The only place 
 where credits
are needed is for the SatSix project (www.ist-satsix.org) which funded 
 this work. This is
why some of the 

Re: [PATCH 7/7] [TFRC] New rx history code

2007-12-05 Thread Gerrit Renker
|  In the end this went back to the same questions and issues that were 
tackled between
|  February and March this year. I wish we could have had this dicussion then; 
it would
|  have made this email unnecessary.
| 
| That is why it is important that the changeset comment collects these
| scattered notes and discussions. Think about in some years from now we
| will be left with a situation where we would have to look at many places
| to understando some aspects of what is in the code. While this happens
| and we have google for that I think that since you keep such detailed
| notes it is not a problem to get them in the changesets.
|  
I agree, the changelogs are all a bit short. When condensing from 40 to 16 
patches the changelogs
were also cut down, for fear of being too verbose. Will go through the patches 
in the next days and
see if/where this can be improved, that would probably have saved some trouble.

|   * But where I need to interact is when changes are made to the algorithm, 
even if these may 
| seem small. The underlying reasons that lead to the code may not be 
immediately clear,
| but since this is a solution for a specific problem, there are also 
specific reasons; which
| is why for algorithm changes (and underlying data structure) I'd ask you 
to discuss this first
| before committing the patch. 
| 
| I did this for the RX handling, where changes were made. Did that for
| the TX but ended up commiting before comments from you, but I think its
| fair to say that the changes for the TX history were more organizational
| than in the essence of the algorithm.
|  
It was a similar situation: the RFC patch came out on Thursday afternoon; I 
replied a bit hurriedly
on Friday that it seemed ok, but found the full confirmation only on Sunday 
after putting all
recent changes into the test tree. And yes, you made a good job of it.
-
To unsubscribe from this list: send the line unsubscribe dccp in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] [TFRC] New rx history code

2007-12-05 Thread Gerrit Renker
Quoting Arnaldo:
| Em Tue, Dec 04, 2007 at 06:55:17AM +, Gerrit Renker escreveu:
|  NAK. You have changed the control flow of the algorithm and the underlying
|  data structure. Originally it had been an array of pointers, and this had
|  been a design decision right from the first submissions in March. From I
|  of 
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt
|   
|   1. 'ring' is an array of pointers rather than a contiguous area in
|   memory. The reason is that, when having to swap adjacent entries
|   to sort the history, it is easier to swap pointers than to copy
|   (heavy-weight) history-entry data structs.
|  
|  But in your algorithm it becomes a normal linear array.
|  
|  As a consequence all subsequent code needs to be rewritten to use
|  copying instead of swapping pointers. And there is a lot of that, since
|  the loss detection code makes heavy use of swapping entries in order to
|  cope with reordered and/or delayed packets.
| 
| So lets not do that, the decision was made based on the RX history patch
| alone, where, as pointed out, no swapping occurs.
|  
|  This is not what I would call entirely equivalent as you claim. Your
|  algorithm is fundamentally different, the control flow is not the same,
|  nor are the underlying data structures.
| 
| Its is equivalent up to what is in the tree, thank you for point out
| that in subsequent patches it will be used.
| 
| Had this design decision been made explicit in the changeset comment
| that introduces the data structure I wouldn't have tried to reduce the
| number of allocations.
| 
| And you even said that it was a good decision when first reacting to
| this change, which I saw as positive feedback for the RFC I sent.
| 
That was my fault. Your solution looked much better to me but even I had 
forgotten
that the algorithm is based on an array of pointers. When I finally sat down 
and looked through the patch set I found the original notes from March and
saw that using a linear array instead of an array of pointers will require quite
a few changes and means changing the algorithm. I then thought through whether 
there
could be any advantage in using a linear array as in this patch, but could find 
none.
In the end this went back to the same questions and issues that were tackled 
between
February and March this year. I wish we could have had this dicussion then; it 
would
have made this email unnecessary.

I thank you for your replay and I am sorry if you took offense at my perhaps a 
little strong
reaction, here is the essence what I tried to convey but what maybe did not 
succeed in conveying:

 * I am absolutely ok with you making changes to variable names, namespaces, 
file organisation,
   overall arrangement etc (as per previous email). You have experience and 
this will often be a 
   benefit. For instance, you combined tfrc_entry_from_skb() with 
tfrc_rx_hist_update() to give
   a new function, tfrc_rx_hist_add_packet(). This is ok since entry_from_skb() 
is not otherwise
   used (and I only found this out when reading your patch). (On the other 
hand, since this is a
   4-element ring buffer, it is no real adding, the same entry will frequently 
be overwritten,
   this is why it was initially called hist_update().)

 * I am also not so much concerned about credit. As long as the (changed) end 
result is as
   least as good as or hopefully better than the submitted input, the author 
name is a side
   issue; so I do think that your approach is sound and fair. The only place 
where credits
   are needed is for the SatSix project (www.ist-satsix.org) which funded this 
work. This is
   why some of the copyrights now included University of Aberdeen.  What in 
any case I'd like
   to add is at least the Signed-off-by: if it turns out to be a mess I want to 
contribute my
   part of responsibility.

 * But where I need to interact is when changes are made to the algorithm, even 
if these may 
   seem small. The underlying reasons that lead to the code may not be 
immediately clear,
   but since this is a solution for a specific problem, there are also specific 
reasons; which
   is why for algorithm changes (and underlying data structure) I'd ask you to 
discuss this first
   before committing the patch. 

 * In part you are already doing this (message subject); it may be necessary to 
adapt the way
   of discussion/presentation. The subject is complex (spent a week with the 
flow chart only)
   and it is a lot - initially this had been 40 small patches.  
   
|  | I modified it just to try to make it closer to the existing API, hide 
details from
|  | the CCIDs and fix a couple bugs found in the initial implementation.
|  What is a couple bugs? So far you have pointed out only one problem and 
that was
|  agreed, it can be fixed. But it remains a side issue, it is not a failure 
of the
| 
| I pointed two problems one ended up being just the