Re: CVS commit: src/sys/net
Date: Wed, 17 Feb 2016 15:12:31 +0900 From: Ryota OzakiOn Wed, Feb 17, 2016 at 2:13 PM, Taylor R Campbell wrote: > Note that this queueing takes effect only if the link state changes > multiple times within maybe a few microseconds, before the softint can > run. If your link state is changing that many times so quickly, > losing an event or two is probably the least of your worries Yeah, it's nitpicking, but for that reason, I think it's better to pass events as they are to userland. > -- but > you're probably more interested in seeing something like ...down...up > than ...up/up/up. Yes. (up/up/up events are eliminated in the first place though.) So what events would you choose to skip, if not the scheme that Roy described? I bet that, whatever events you would choose to skip, we can still prove that the resulting queues need be no longer than, say, three elements, and we'd still usefully report link flapping to userland -- as long as we can make enough progress to run softints and userland processes at all. Here are some example reductions that intuitively sound reasonable to me: down/down = down up/up = up down/unknown/up = down/up down/up/down = down What other sequences of events would you simplify?
Re: CVS commit: src/sys/net
On Wed, Feb 17, 2016 at 1:40 PM, Roy Marpleswrote: > On 2016-02-17 02:49, Ryota Ozaki wrote: >>> >>> If you don't read the patch, here is the comment I added to the >>> if_link_state_change() function: >>> >>> * Queue a change in the interface link state. >>> * >>> * The queue itself is very limited: >>> * no state can be queued more than once >>> * a higher priority state will remove lower priority states >>> * >>> * The queue state priority in descending order is DOWN, UNKNOWN, UP. >>> * Rationale is that if the kernel can't process the link state change >>> queue >>> * fast enough then userland has no chance of catching up. >>> * Any lossage is logged to the console. >>> * The queue state priority is ordered so that link state aware programs >>> * will still have the correct end result regardless of any lossage. >>> >>> Any comments or objections? >> >> >> I worry about the priority order, where it comes from? I feel that >> the kernel does too much decision; which event is important. I think >> it depends on applications. > > > Fair enough. > Can you state a use case for an application needing the full ordering? Event tracking of link states for example. > > Let us be more specific > UP: DOWN - UP > Is passed fully > DOWN: UP - DOWN > UP is not passed to userland. > What do you want to achieve in userland during the UP when the kernel > already considers the link DOWN? Even if a UP state is transient, it's an event that may provide us a hint of network conditions for diagnostic. We may be able to get it from the console output, but it's not so convenient; we need to track events via two different facilities. > >> And the priority provides asymmetric event deliveries; when the state >> repeats up and down, a down event is delivered if the final state is down >> while a down event and a up event are delivered if the final state is up. >> It's confusable to me. > > > It's not that confusing :) > If you're an application, what benefit do you have of processing an UP state > on the link when a DOWN state is about to follow? > Aside from logging it (which we still do, just not via a route message just > a console diagnostic) what would you do you with it? > Because a DOWN state is coming, it cannot possibly do anything over the > interface, so why bother? Applications can ignore the UP event. IMO we should delegate such a decision to applications because we don't know all requirements of all applications. ozaki-r
Re: CVS commit: src/sys/net
Date: Wed, 17 Feb 2016 11:49:48 +0900 From: Ryota OzakiAnd the priority provides asymmetric event deliveries; when the state repeats up and down, a down event is delivered if the final state is down while a down event and a up event are delivered if the final state is up. It's confusable to me. This is the part I'm still not sure about: Roy's patch reduces up/down to just down, but leaves down/up alone. I'm guessing that Roy expects applications like dhcpcd to want to clear some state if the link ever goes down, but to be uninterested in knowing that the link went up for a microsecond and then back down again. Can we pass events as they are as much as possible? I don't complain that event reductions in the kernel, but I think it should be down based on time series manner (e.g., pick latest three events), not based on some priority things. If we accept event reductions, we can do it with bit-encoding (w/o a linked list (memory allocations)), for example represent the state as 2 bits and encode event series into a variable (say 16 events in int). Note that this queueing takes effect only if the link state changes multiple times within maybe a few microseconds, before the softint can run. If your link state is changing that many times so quickly, losing an event or two is probably the least of your worries -- but you're probably more interested in seeing something like ...down...up than ...up/up/up.
Re: CVS commit: src/sys/net
On 2016-02-17 02:49, Ryota Ozaki wrote: If you don't read the patch, here is the comment I added to the if_link_state_change() function: * Queue a change in the interface link state. * * The queue itself is very limited: * no state can be queued more than once * a higher priority state will remove lower priority states * * The queue state priority in descending order is DOWN, UNKNOWN, UP. * Rationale is that if the kernel can't process the link state change queue * fast enough then userland has no chance of catching up. * Any lossage is logged to the console. * The queue state priority is ordered so that link state aware programs * will still have the correct end result regardless of any lossage. Any comments or objections? I worry about the priority order, where it comes from? I feel that the kernel does too much decision; which event is important. I think it depends on applications. Fair enough. Can you state a use case for an application needing the full ordering? Let us be more specific UP: DOWN - UP Is passed fully DOWN: UP - DOWN UP is not passed to userland. What do you want to achieve in userland during the UP when the kernel already considers the link DOWN? And the priority provides asymmetric event deliveries; when the state repeats up and down, a down event is delivered if the final state is down while a down event and a up event are delivered if the final state is up. It's confusable to me. It's not that confusing :) If you're an application, what benefit do you have of processing an UP state on the link when a DOWN state is about to follow? Aside from logging it (which we still do, just not via a route message just a console diagnostic) what would you do you with it? Because a DOWN state is coming, it cannot possibly do anything over the interface, so why bother? Can we pass events as they are as much as possible? I don't complain that event reductions in the kernel, but I think it should be down based on time series manner (e.g., pick latest three events), not based on some priority things. If we accept event reductions, we can do it with bit-encoding (w/o a linked list (memory allocations)), for example represent the state as 2 bits and encode event series into a variable (say 16 events in int). Of course we could. But until we understand a reason why this is needed why should we? Roy
Re: CVS commit: src/sys/net
On Wed, Feb 17, 2016 at 10:46 AM, Roy Marpleswrote: > On Tuesday 16 February 2016 22:20:37 Taylor R Campbell wrote: >> Except for an issue with queueing discussed privately (scheduling a >> softint that is already scheduled won't cause it to run again, so >> if_link_state_change_si needs to process the whole queue in one go), >> that approach looks fine to me, although as we also discussed >> privately we can easily compact it into a three-bit mask with a >> trivial update instead of a whole array of states. > > Patch attached to solve change from a priority array into a bit mask approach > where we swallow the whole queue in a softint. > Thanks to Taylor for some reviews and suggestions. > >> This has the consequence that if the link goes up/down in quick >> succession, and then up/down in quick succession, , with the queue >> processed after each up/down transition, userland will never be >> notified. However, if the link goes down/up, then down/up, , the >> userland will be notified of all the transitions. Roy claims that >> that's OK, and I'm inclined to believe the author of dhcpcd about >> this. > > If you don't read the patch, here is the comment I added to the > if_link_state_change() function: > > * Queue a change in the interface link state. > * > * The queue itself is very limited: > * no state can be queued more than once > * a higher priority state will remove lower priority states > * > * The queue state priority in descending order is DOWN, UNKNOWN, UP. > * Rationale is that if the kernel can't process the link state change queue > * fast enough then userland has no chance of catching up. > * Any lossage is logged to the console. > * The queue state priority is ordered so that link state aware programs > * will still have the correct end result regardless of any lossage. > > Any comments or objections? I worry about the priority order, where it comes from? I feel that the kernel does too much decision; which event is important. I think it depends on applications. And the priority provides asymmetric event deliveries; when the state repeats up and down, a down event is delivered if the final state is down while a down event and a up event are delivered if the final state is up. It's confusable to me. Can we pass events as they are as much as possible? I don't complain that event reductions in the kernel, but I think it should be down based on time series manner (e.g., pick latest three events), not based on some priority things. If we accept event reductions, we can do it with bit-encoding (w/o a linked list (memory allocations)), for example represent the state as 2 bits and encode event series into a variable (say 16 events in int). ozaki-r
Re: CVS commit: src/sys/net
Date: Tue, 16 Feb 2016 13:24:15 + From: Roy MarplesI found the time to work on this, here is the patch I've been running for an hour or so upping and downing interfaces. The rule in Roy's patch is that each new state changes kicks out all higher-priority transitions, and is queued after all lower-priority transitions, with priorities DOWN = -1, UNKNOWN = 0, UP = 1. This has the effect that each queue has the regular expression down? unknown? up? with the following equivalences: down/down = down down/unknown = down/unknown down/up = down/up unknown/down = down unknown/unknown = unknown unknown/up = unknown/up up/down = down up/unknown = unknown up/up = up Except for an issue with queueing discussed privately (scheduling a softint that is already scheduled won't cause it to run again, so if_link_state_change_si needs to process the whole queue in one go), that approach looks fine to me, although as we also discussed privately we can easily compact it into a three-bit mask with a trivial update instead of a whole array of states. This has the consequence that if the link goes up/down in quick succession, and then up/down in quick succession, , with the queue processed after each up/down transition, userland will never be notified. However, if the link goes down/up, then down/up, , the userland will be notified of all the transitions. Roy claims that that's OK, and I'm inclined to believe the author of dhcpcd about this.
Re: CVS commit: src/sys/net
Date: Tue, 16 Feb 2016 16:30:54 +0900 From: Ryota OzakiOn Tue, Feb 16, 2016 at 6:16 AM, Taylor R Campbell wrote: > void > if_link_state_change(struct ifnet *ifp, int link_state) > { > int s; > > s = splnet(); > if (ifp->if_link_state_pending == link_state) > goto out; > > if (ifp->if_link_state_pending != ifp->if_link_state) { What does the conditional intend? I don't remember! It doesn't make sense now that I look at it again. This was just a sketch -- obviously it was missing some parts, such as pool_cache_put, and some parts seem to make no sense. I would just take it out. Anyway I made a patch based on the above code: http://www.netbsd.org/~ozaki-r/link_state_change.diff What I did are to make your code compilable and workable, and limit the number of pending events. Looks reasonable to me. We should have some automatic tests for all this, though!
Re: CVS commit: src/sys/net
On 16/02/2016 08:38, Roy Marples wrote: > On Tuesday 16 February 2016 13:37:28 you wrote: >>> Oh, I see. We shouldn't drop any events of link state changes. >> >> i don't see any point in keeping a huge series of link changes >> if they can be collapsed into an equivalent sequence. with VMs >> and other user-controlled systems it's possible to flood such a >> link state checking engine. > > The queue can be kept quite short because we don't care about any prior > entries each time state changes to LINK_STATE_DOWN. > Also, we don't care about any prior entries (other than LINK_STATE_DOWN) when > setting LINK_STATE_UNKNOWN. > Finally we set LINK_STATE_UP. > > So the queue should only ever hold a maximum of 3 items. Could even work it > like so I found the time to work on this, here is the patch I've been running for an hour or so upping and downing interfaces. OK to commit? Roy Index: sys/net/if.c === RCS file: /cvsroot/src/sys/net/if.c,v retrieving revision 1.324 diff -u -r1.324 if.c --- sys/net/if.c15 Feb 2016 08:08:04 - 1.324 +++ sys/net/if.c16 Feb 2016 13:22:43 - @@ -603,7 +603,6 @@ ifp->if_broadcastaddr = 0; /* reliably crash if used uninitialized */ ifp->if_link_state = LINK_STATE_UNKNOWN; - ifp->if_old_link_state = LINK_STATE_UNKNOWN; ifp->if_capenable = 0; ifp->if_csum_flags_tx = 0; @@ -1562,41 +1561,73 @@ } } +/* Priority list for link state changes */ +static const int +if_link_state_prio[LINK_STATE_MAX] = { + 0, /* LINK_STATE_UNKNOWN */ + -1, /* LINK_STATE_DOWN */ + 1, /* LINK_STATE_UP */ +}; + /* * Handle a change in the interface link state. - * XXX: We should listen to the routing socket in-kernel rather - * than calling in6_if_link_* functions directly from here. */ void if_link_state_change(struct ifnet *ifp, int link_state) { - int s; + int s, i; + + KASSERT(link_state >= 0 && link_state < LINK_STATE_MAX); s = splnet(); - if (ifp->if_link_state == link_state) { - splx(s); - return; + if (ifp->if_link_state == link_state && ifp->if_link_queue_len == 0) + goto out; + + for (i = 0; i < ifp->if_link_queue_len; i++) { + if (if_link_state_prio[link_state] <= + if_link_state_prio[ifp->if_link_queue[i]]) + { + ifp->if_link_queue[i] = link_state; + ifp->if_link_queue_len = i + 1; + /* Because we are trimming the queue, +* there is no need to call softint_schedule again. */ + goto out; + } } - ifp->if_link_state = link_state; + KASSERT(ifp->if_link_queue_len + 1 < __arraycount(ifp->if_link_queue)); + ifp->if_link_queue[ifp->if_link_queue_len++] = link_state; softint_schedule(ifp->if_link_si); +out: splx(s); } - static void if_link_state_change_si(void *arg) { struct ifnet *ifp = arg; - int s; + int s, i; int link_state, old_link_state; struct domain *dp; s = splnet(); - link_state = ifp->if_link_state; - old_link_state = ifp->if_old_link_state; - ifp->if_old_link_state = ifp->if_link_state; + KASSERT(ifp->if_link_queue_len >= 0); /* unlikely */ + if (ifp->if_link_queue_len == 0) + goto out; + + /* Pop the link state change from the queue */ + link_state = ifp->if_link_queue[0]; + ifp->if_link_queue_len--; + for (i = 0; i < ifp->if_link_queue_len; i++) + ifp->if_link_queue[i] = ifp->if_link_queue[i + 1]; + + /* Ensure link state is actually changing */ + if (ifp->if_link_state == link_state) + goto out; + + old_link_state = ifp->if_link_state; + ifp->if_link_state = link_state; #ifdef DEBUG log(LOG_DEBUG, "%s: link state %s (was %s)\n", ifp->if_xname, @@ -1640,6 +1671,7 @@ dp->dom_if_link_state_change(ifp, link_state); } +out: splx(s); } Index: sys/net/if.h === RCS file: /cvsroot/src/sys/net/if.h,v retrieving revision 1.197 diff -u -r1.197 if.h --- sys/net/if.h16 Feb 2016 01:31:26 - 1.197 +++ sys/net/if.h16 Feb 2016 13:22:43 - @@ -191,10 +191,12 @@ /* * Values for if_link_state. + * When changing these values, consider if_link_state_prio in if.c. */ #defineLINK_STATE_UNKNOWN 0 /* link invalid/unknown */ #defineLINK_STATE_DOWN 1 /* link is down */ #defineLINK_STATE_UP 2 /* link is up */ +#defineLINK_STATE_MAX 3 /* size of link states */ /* * Structure defining a queue for a network interface. @@
Re: CVS commit: src/sys/net
On 16/02/2016 08:38, Roy Marples wrote: > for (i = 0; i < ifp->if_link_queue_len; i++) { > if (link_state <= ifp->if_link_queue[i]) { > ifp->if_link_queue[i] = link_state; > ifp->if_link_queue_len = i; This should be ifp->if_link_queue_len = i + 1; > /* Because we are trimming the queue, >* there is no need to call softint_schedule again. */ > goto out; > } > } Roy
Re: CVS commit: src/sys/net
On Tuesday 16 February 2016 13:37:28 you wrote: > > Oh, I see. We shouldn't drop any events of link state changes. > > i don't see any point in keeping a huge series of link changes > if they can be collapsed into an equivalent sequence. with VMs > and other user-controlled systems it's possible to flood such a > link state checking engine. The queue can be kept quite short because we don't care about any prior entries each time state changes to LINK_STATE_DOWN. Also, we don't care about any prior entries (other than LINK_STATE_DOWN) when setting LINK_STATE_UNKNOWN. Finally we set LINK_STATE_UP. So the queue should only ever hold a maximum of 3 items. Could even work it like so void if_link_state_change(struct ifnet *ifp, int link_state) { int s, i; s = splnet(); if (ifp->if_link_queue_len == 0 && ifp->link_state == link_state) goto out; for (i = 0; i < ifp->if_link_queue_len; i++) { if (link_state <= ifp->if_link_queue[i]) { ifp->if_link_queue[i] = link_state; ifp->if_link_queue_len = i; /* Because we are trimming the queue, * there is no need to call softint_schedule again. */ goto out; } } /* Add bounds check here for sanity */ ifp->if_link_queue[ifp->if_link_queue_len++] = link_state; softint_schedule(ifp->if_link_state_softint); out: splx(s); } static void if_link_state_change_si(void *arg) { int s, link_state, i; s = splnet(); if (ifp->if_link_queue_len == 0) goto out; link_state = ifp->if_link_queue[0]; ifp->if_link_queue_len--; for (i = 0; i < ifp->link_queue_len; i++) ifp->if_link_queue[i] = ifp->if_link_queue[i + 1]; /* process state change as normal */ out: splx(s); } Thoughts? Roy