Re: CVS commit: src/sys/net

2016-02-16 Thread Taylor R Campbell
   Date: Wed, 17 Feb 2016 15:12:31 +0900
   From: Ryota Ozaki 

   On Wed, Feb 17, 2016 at 2:13 PM, Taylor R Campbell
    wrote:
   > Note that this queueing takes effect only if the link state changes
   > multiple times within maybe a few microseconds, before the softint can
   > run.  If your link state is changing that many times so quickly,
   > losing an event or two is probably the least of your worries

   Yeah, it's nitpicking, but for that reason, I think it's better to pass
   events as they are to userland.

   > -- but
   > you're probably more interested in seeing something like ...down...up
   > than ...up/up/up.

   Yes. (up/up/up events are eliminated in the first place though.)

So what events would you choose to skip, if not the scheme that Roy
described?

I bet that, whatever events you would choose to skip, we can still
prove that the resulting queues need be no longer than, say, three
elements, and we'd still usefully report link flapping to userland --
as long as we can make enough progress to run softints and userland
processes at all.

Here are some example reductions that intuitively sound reasonable to
me:

down/down = down
up/up = up
down/unknown/up = down/up
down/up/down = down

What other sequences of events would you simplify?


Re: CVS commit: src/sys/net

2016-02-16 Thread Ryota Ozaki
On Wed, Feb 17, 2016 at 1:40 PM, Roy Marples  wrote:
> On 2016-02-17 02:49, Ryota Ozaki wrote:
>>>
>>> If you don't read the patch, here is the comment I added to the
>>> if_link_state_change() function:
>>>
>>> * Queue a change in the interface link state.
>>> *
>>> * The queue itself is very limited:
>>> *   no state can be queued more than once
>>> *   a higher priority state will remove lower priority states
>>> *
>>> * The queue state priority in descending order is DOWN, UNKNOWN, UP.
>>> * Rationale is that if the kernel can't process the link state change
>>> queue
>>> * fast enough then userland has no chance of catching up.
>>> * Any lossage is logged to the console.
>>> * The queue state priority is ordered so that link state aware programs
>>> * will still have the correct end result regardless of any lossage.
>>>
>>> Any comments or objections?
>>
>>
>> I worry about the priority order, where it comes from? I feel that
>> the kernel does too much decision; which event is important. I think
>> it depends on applications.
>
>
> Fair enough.
> Can you state a use case for an application needing the full ordering?

Event tracking of link states for example.

>
> Let us be more specific
> UP: DOWN - UP
> Is passed fully
> DOWN: UP - DOWN
> UP is not passed to userland.
> What do you want to achieve in userland during the UP when the kernel
> already considers the link DOWN?

Even if a UP state is transient, it's an event that may provide us a
hint of network conditions for diagnostic. We may be able to get it
from the console output, but it's not so convenient; we need to track
events via two different facilities.

>
>> And the priority provides asymmetric event deliveries; when the state
>> repeats up and down, a down event is delivered if the final state is down
>> while a down event and a up event are delivered if the final state is up.
>> It's confusable to me.
>
>
> It's not that confusing :)
> If you're an application, what benefit do you have of processing an UP state
> on the link when a DOWN state is about to follow?
> Aside from logging it (which we still do, just not via a route message just
> a console diagnostic) what would you do you with it?
> Because a DOWN state is coming, it cannot possibly do anything over the
> interface, so why bother?

Applications can ignore the UP event. IMO we should delegate such a decision
to applications because we don't know all requirements of all applications.

  ozaki-r


Re: CVS commit: src/sys/net

2016-02-16 Thread Taylor R Campbell
   Date: Wed, 17 Feb 2016 11:49:48 +0900
   From: Ryota Ozaki 

   And the priority provides asymmetric event deliveries; when the state
   repeats up and down, a down event is delivered if the final state is down
   while a down event and a up event are delivered if the final state is up.
   It's confusable to me.

This is the part I'm still not sure about: Roy's patch reduces up/down
to just down, but leaves down/up alone.  I'm guessing that Roy expects
applications like dhcpcd to want to clear some state if the link ever
goes down, but to be uninterested in knowing that the link went up for
a microsecond and then back down again.

   Can we pass events as they are as much as possible? I don't complain that
   event reductions in the kernel, but I think it should be down based on time
   series manner (e.g., pick latest three events), not based on some priority
   things. If we accept event reductions, we can do it with bit-encoding
   (w/o a linked list (memory allocations)), for example represent the state
   as 2 bits and encode event series into a variable (say 16 events in int).

Note that this queueing takes effect only if the link state changes
multiple times within maybe a few microseconds, before the softint can
run.  If your link state is changing that many times so quickly,
losing an event or two is probably the least of your worries -- but
you're probably more interested in seeing something like ...down...up
than ...up/up/up.


Re: CVS commit: src/sys/net

2016-02-16 Thread Roy Marples

On 2016-02-17 02:49, Ryota Ozaki wrote:

If you don't read the patch, here is the comment I added to the
if_link_state_change() function:

* Queue a change in the interface link state.
*
* The queue itself is very limited:
*   no state can be queued more than once
*   a higher priority state will remove lower priority states
*
* The queue state priority in descending order is DOWN, UNKNOWN, UP.
* Rationale is that if the kernel can't process the link state change 
queue

* fast enough then userland has no chance of catching up.
* Any lossage is logged to the console.
* The queue state priority is ordered so that link state aware 
programs

* will still have the correct end result regardless of any lossage.

Any comments or objections?


I worry about the priority order, where it comes from? I feel that
the kernel does too much decision; which event is important. I think
it depends on applications.


Fair enough.
Can you state a use case for an application needing the full ordering?

Let us be more specific
UP: DOWN - UP
Is passed fully
DOWN: UP - DOWN
UP is not passed to userland.
What do you want to achieve in userland during the UP when the kernel 
already considers the link DOWN?



And the priority provides asymmetric event deliveries; when the state
repeats up and down, a down event is delivered if the final state is 
down
while a down event and a up event are delivered if the final state is 
up.

It's confusable to me.


It's not that confusing :)
If you're an application, what benefit do you have of processing an UP 
state on the link when a DOWN state is about to follow?
Aside from logging it (which we still do, just not via a route message 
just a console diagnostic) what would you do you with it?
Because a DOWN state is coming, it cannot possibly do anything over the 
interface, so why bother?


Can we pass events as they are as much as possible? I don't complain 
that
event reductions in the kernel, but I think it should be down based on 
time
series manner (e.g., pick latest three events), not based on some 
priority

things. If we accept event reductions, we can do it with bit-encoding
(w/o a linked list (memory allocations)), for example represent the 
state
as 2 bits and encode event series into a variable (say 16 events in 
int).


Of course we could.
But until we understand a reason why this is needed why should we?

Roy


Re: CVS commit: src/sys/net

2016-02-16 Thread Ryota Ozaki
On Wed, Feb 17, 2016 at 10:46 AM, Roy Marples  wrote:
> On Tuesday 16 February 2016 22:20:37 Taylor R Campbell wrote:
>> Except for an issue with queueing discussed privately (scheduling a
>> softint that is already scheduled won't cause it to run again, so
>> if_link_state_change_si needs to process the whole queue in one go),
>> that approach looks fine to me, although as we also discussed
>> privately we can easily compact it into a three-bit mask with a
>> trivial update instead of a whole array of states.
>
> Patch attached to solve change from a priority array into a bit mask approach
> where we swallow the whole queue in a softint.
> Thanks to Taylor for some reviews and suggestions.
>
>> This has the consequence that if the link goes up/down in quick
>> succession, and then up/down in quick succession, , with the queue
>> processed after each up/down transition, userland will never be
>> notified.  However, if the link goes down/up, then down/up, , the
>> userland will be notified of all the transitions.  Roy claims that
>> that's OK, and I'm inclined to believe the author of dhcpcd about
>> this.
>
> If you don't read the patch, here is the comment I added to the
> if_link_state_change() function:
>
> * Queue a change in the interface link state.
> *
> * The queue itself is very limited:
> *   no state can be queued more than once
> *   a higher priority state will remove lower priority states
> *
> * The queue state priority in descending order is DOWN, UNKNOWN, UP.
> * Rationale is that if the kernel can't process the link state change queue
> * fast enough then userland has no chance of catching up.
> * Any lossage is logged to the console.
> * The queue state priority is ordered so that link state aware programs
> * will still have the correct end result regardless of any lossage.
>
> Any comments or objections?

I worry about the priority order, where it comes from? I feel that
the kernel does too much decision; which event is important. I think
it depends on applications.

And the priority provides asymmetric event deliveries; when the state
repeats up and down, a down event is delivered if the final state is down
while a down event and a up event are delivered if the final state is up.
It's confusable to me.

Can we pass events as they are as much as possible? I don't complain that
event reductions in the kernel, but I think it should be down based on time
series manner (e.g., pick latest three events), not based on some priority
things. If we accept event reductions, we can do it with bit-encoding
(w/o a linked list (memory allocations)), for example represent the state
as 2 bits and encode event series into a variable (say 16 events in int).

  ozaki-r


Re: CVS commit: src/sys/net

2016-02-16 Thread Taylor R Campbell
   Date: Tue, 16 Feb 2016 13:24:15 +
   From: Roy Marples 

   I found the time to work on this, here is the patch I've been running
   for an hour or so upping and downing interfaces.

The rule in Roy's patch is that each new state changes kicks out all
higher-priority transitions, and is queued after all lower-priority
transitions, with priorities DOWN = -1, UNKNOWN = 0, UP = 1.

This has the effect that each queue has the regular expression

down? unknown? up?

with the following equivalences:

down/down = down
down/unknown = down/unknown
down/up = down/up
unknown/down = down
unknown/unknown = unknown
unknown/up = unknown/up
up/down = down
up/unknown = unknown
up/up = up

Except for an issue with queueing discussed privately (scheduling a
softint that is already scheduled won't cause it to run again, so
if_link_state_change_si needs to process the whole queue in one go),
that approach looks fine to me, although as we also discussed
privately we can easily compact it into a three-bit mask with a
trivial update instead of a whole array of states.

This has the consequence that if the link goes up/down in quick
succession, and then up/down in quick succession, , with the queue
processed after each up/down transition, userland will never be
notified.  However, if the link goes down/up, then down/up, , the
userland will be notified of all the transitions.  Roy claims that
that's OK, and I'm inclined to believe the author of dhcpcd about
this.


Re: CVS commit: src/sys/net

2016-02-16 Thread Taylor R Campbell
   Date: Tue, 16 Feb 2016 16:30:54 +0900
   From: Ryota Ozaki 

   On Tue, Feb 16, 2016 at 6:16 AM, Taylor R Campbell
    wrote:
   > void
   > if_link_state_change(struct ifnet *ifp, int link_state)
   > {
   > int s;
   >
   > s = splnet();
   > if (ifp->if_link_state_pending == link_state)
   > goto out;
   >
   > if (ifp->if_link_state_pending != ifp->if_link_state) {

   What does the conditional intend?

I don't remember!  It doesn't make sense now that I look at it again.
This was just a sketch -- obviously it was missing some parts, such as
pool_cache_put, and some parts seem to make no sense.  I would just
take it out.

   Anyway I made a patch based on the above code:
 http://www.netbsd.org/~ozaki-r/link_state_change.diff

   What I did are to make your code compilable and workable,
   and limit the number of pending events.

Looks reasonable to me.  We should have some automatic tests for all
this, though!


Re: CVS commit: src/sys/net

2016-02-16 Thread Roy Marples
On 16/02/2016 08:38, Roy Marples wrote:
> On Tuesday 16 February 2016 13:37:28 you wrote:
>>> Oh, I see. We shouldn't drop any events of link state changes.
>>
>> i don't see any point in keeping a huge series of link changes
>> if they can be collapsed into an equivalent sequence.  with VMs
>> and other user-controlled systems it's possible to flood such a
>> link state checking engine.
> 
> The queue can be kept quite short because we don't care about any prior 
> entries each time state changes to LINK_STATE_DOWN.
> Also, we don't care about any prior entries (other than LINK_STATE_DOWN) when 
> setting LINK_STATE_UNKNOWN.
> Finally we set LINK_STATE_UP.
> 
> So the queue should only ever hold a maximum of 3 items. Could even work it 
> like so

I found the time to work on this, here is the patch I've been running
for an hour or so upping and downing interfaces.

OK to commit?

Roy
Index: sys/net/if.c
===
RCS file: /cvsroot/src/sys/net/if.c,v
retrieving revision 1.324
diff -u -r1.324 if.c
--- sys/net/if.c15 Feb 2016 08:08:04 -  1.324
+++ sys/net/if.c16 Feb 2016 13:22:43 -
@@ -603,7 +603,6 @@
ifp->if_broadcastaddr = 0; /* reliably crash if used uninitialized */
 
ifp->if_link_state = LINK_STATE_UNKNOWN;
-   ifp->if_old_link_state = LINK_STATE_UNKNOWN;
 
ifp->if_capenable = 0;
ifp->if_csum_flags_tx = 0;
@@ -1562,41 +1561,73 @@
}
 }
 
+/* Priority list for link state changes */
+static const int
+if_link_state_prio[LINK_STATE_MAX] = {
+   0,  /* LINK_STATE_UNKNOWN */
+   -1, /* LINK_STATE_DOWN */
+   1,  /* LINK_STATE_UP */
+};
+
 /*
  * Handle a change in the interface link state.
- * XXX: We should listen to the routing socket in-kernel rather
- * than calling in6_if_link_* functions directly from here.
  */
 void
 if_link_state_change(struct ifnet *ifp, int link_state)
 {
-   int s;
+   int s, i;
+
+   KASSERT(link_state >= 0 && link_state < LINK_STATE_MAX);
 
s = splnet();
-   if (ifp->if_link_state == link_state) {
-   splx(s);
-   return;
+   if (ifp->if_link_state == link_state && ifp->if_link_queue_len == 0)
+   goto out;
+
+   for (i = 0; i < ifp->if_link_queue_len; i++) {
+   if (if_link_state_prio[link_state] <=
+   if_link_state_prio[ifp->if_link_queue[i]])
+   {
+   ifp->if_link_queue[i] = link_state;
+   ifp->if_link_queue_len = i + 1;
+   /* Because we are trimming the queue,
+* there is no need to call softint_schedule again. */
+   goto out;
+   }
}
 
-   ifp->if_link_state = link_state;
+   KASSERT(ifp->if_link_queue_len + 1 < __arraycount(ifp->if_link_queue));
+   ifp->if_link_queue[ifp->if_link_queue_len++] = link_state;
softint_schedule(ifp->if_link_si);
 
+out:
splx(s);
 }
 
-
 static void
 if_link_state_change_si(void *arg)
 {
struct ifnet *ifp = arg;
-   int s;
+   int s, i;
int link_state, old_link_state;
struct domain *dp;
 
s = splnet();
-   link_state = ifp->if_link_state;
-   old_link_state = ifp->if_old_link_state;
-   ifp->if_old_link_state = ifp->if_link_state;
+   KASSERT(ifp->if_link_queue_len >= 0); /* unlikely */
+   if (ifp->if_link_queue_len == 0)
+   goto out;
+
+   /* Pop the link state change from the queue */
+   link_state = ifp->if_link_queue[0];
+   ifp->if_link_queue_len--;
+   for (i = 0; i < ifp->if_link_queue_len; i++)
+   ifp->if_link_queue[i] = ifp->if_link_queue[i + 1];
+
+   /* Ensure link state is actually changing */
+   if (ifp->if_link_state == link_state)
+   goto out;
+
+   old_link_state = ifp->if_link_state;
+   ifp->if_link_state = link_state;
 
 #ifdef DEBUG
log(LOG_DEBUG, "%s: link state %s (was %s)\n", ifp->if_xname,
@@ -1640,6 +1671,7 @@
dp->dom_if_link_state_change(ifp, link_state);
}
 
+out:
splx(s);
 }
 
Index: sys/net/if.h
===
RCS file: /cvsroot/src/sys/net/if.h,v
retrieving revision 1.197
diff -u -r1.197 if.h
--- sys/net/if.h16 Feb 2016 01:31:26 -  1.197
+++ sys/net/if.h16 Feb 2016 13:22:43 -
@@ -191,10 +191,12 @@
 
 /*
  * Values for if_link_state.
+ * When changing these values, consider if_link_state_prio in if.c.
  */
 #defineLINK_STATE_UNKNOWN  0   /* link invalid/unknown */
 #defineLINK_STATE_DOWN 1   /* link is down */
 #defineLINK_STATE_UP   2   /* link is up */
+#defineLINK_STATE_MAX  3   /* size of link states */
 
 /*
  * Structure defining a queue for a network interface.
@@ 

Re: CVS commit: src/sys/net

2016-02-16 Thread Roy Marples
On 16/02/2016 08:38, Roy Marples wrote:
>   for (i = 0; i < ifp->if_link_queue_len; i++) {
>   if (link_state <= ifp->if_link_queue[i]) {
>   ifp->if_link_queue[i] = link_state;
>   ifp->if_link_queue_len = i;

This should be ifp->if_link_queue_len = i + 1;

>   /* Because we are trimming the queue,
>* there is no need to call softint_schedule again. */
>   goto out;
>   }
>   }

Roy


Re: CVS commit: src/sys/net

2016-02-16 Thread Roy Marples
On Tuesday 16 February 2016 13:37:28 you wrote:
> > Oh, I see. We shouldn't drop any events of link state changes.
> 
> i don't see any point in keeping a huge series of link changes
> if they can be collapsed into an equivalent sequence.  with VMs
> and other user-controlled systems it's possible to flood such a
> link state checking engine.

The queue can be kept quite short because we don't care about any prior 
entries each time state changes to LINK_STATE_DOWN.
Also, we don't care about any prior entries (other than LINK_STATE_DOWN) when 
setting LINK_STATE_UNKNOWN.
Finally we set LINK_STATE_UP.

So the queue should only ever hold a maximum of 3 items. Could even work it 
like so

void
if_link_state_change(struct ifnet *ifp, int link_state)
{
int s, i;

s = splnet();

if (ifp->if_link_queue_len == 0 && ifp->link_state == link_state)
goto out;

for (i = 0; i < ifp->if_link_queue_len; i++) {
if (link_state <= ifp->if_link_queue[i]) {
ifp->if_link_queue[i] = link_state;
ifp->if_link_queue_len = i;
/* Because we are trimming the queue,
 * there is no need to call softint_schedule again. */
goto out;
}
}

/* Add bounds check here for sanity */
ifp->if_link_queue[ifp->if_link_queue_len++] = link_state;
softint_schedule(ifp->if_link_state_softint);

out:
splx(s);
}


static void
if_link_state_change_si(void *arg)
{
int s, link_state, i;

s = splnet();
if (ifp->if_link_queue_len == 0)
goto out;

link_state = ifp->if_link_queue[0];
ifp->if_link_queue_len--;
for (i = 0; i < ifp->link_queue_len; i++)
ifp->if_link_queue[i] = ifp->if_link_queue[i + 1];

/* process state change as normal */

out:
splx(s);
}

Thoughts?

Roy