Re: [net] protecting interfaces from races between control and data ?

2013-08-08 Thread Scott Long
Yup, it's an incredibly unsafe pattern.  It also leads to the pattern where
auxiliary processing is handed off to a taskqueue, which then interleaves
the lock ownership with the ithread and produces out-of-order packet
reception.

Scott

On Aug 8, 2013, at 5:18 PM, Adrian Chadd  wrote:

> .. and it's not just about "saturate the port" with traffic.
> 
> It's also about "what happens if I shut down the MAC whilst I'm in the
> process of programming in new RX/TX descriptors?"
> 
> The ath(4) driver had a spectacular behaviour where if you mess things
> up the wrong way it will quite happily DMA crap all over the memory of
> your running system, leading to quite hilarious bugs.
> 
> 
> 
> -adrian

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-08 Thread Adrian Chadd
.. and it's not just about "saturate the port" with traffic.

It's also about "what happens if I shut down the MAC whilst I'm in the
process of programming in new RX/TX descriptors?"

The ath(4) driver had a spectacular behaviour where if you mess things
up the wrong way it will quite happily DMA crap all over the memory of
your running system, leading to quite hilarious bugs.



-adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-08 Thread Scott Long

On Aug 7, 2013, at 10:16 PM, Warner Losh  wrote:

> 
> On Aug 5, 2013, at 11:20 AM, Adrian Chadd wrote:
> 
>> .. and I bet it's not a design pattern, and this is total conjecture on my 
>> part:
>> 
>> * the original drivers weren't SMP safe;
>> * noone really sat down and figured out how to correctly synchronise
>> all of this stuff;
>> * people did the minimum amount of work to keep the driver from
>> immediately crashing, but didn't really think things through at a
>> larger scale.
>> 
>> Almost every driver is this way Luigi. :-)
> 
> Most of the drivers in the three don't support hardware that performs well 
> enough for this to be a problem. :) Any driver that's still around from the 
> pre-locking days can easily saturate the lines (or the hardware) on today's 
> (and even yesterday's hardware).
> 
> All the rest have come up with different ways to cope…
> 

Not really.

So the typical pattern is:

foo_rxeof()
{
….
FOO_UNLOCK(sc);
ifp->if_input(ifp, m);
FOO_LOCK(sc);
….
}

Grepping for an approximation of this pattern:

[nflx1] ~/svn/head/sys/dev% grep -r -B 5 if_input * | grep -i UNLOCK | cut -d 
'/' -f 1
ae
age
alc
ale
an
an
bce
bfe
bge
bm
cadence
cas
cas
dc
de
e1000
e1000
e1000
ed
ep
et
ex
fe
fxp
gem
gxemul
hme
ie
ixgb
ixgbe
ixgbe
jme
le
le
lge
mge
msk
msk
my
nfe
nfe
nge
nve
pcn
pdq
re
sbni
sf
sge
sis
sk
sk
sn
snc
ste
stge
ti
tl
tsec
tx
txp
usb
usb
vge
virtio
vr
vte
vx
wb
wl
xe
xl

Sure a lot of these are very legacy.  But there's a lot in here's that are not. 
 bge, bce, e1000, ixgbe, virtio, etc, probably more that I'm not catching in 
this quick pass.

Scott



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-07 Thread Warner Losh

On Aug 5, 2013, at 11:20 AM, Adrian Chadd wrote:

> .. and I bet it's not a design pattern, and this is total conjecture on my 
> part:
> 
> * the original drivers weren't SMP safe;
> * noone really sat down and figured out how to correctly synchronise
> all of this stuff;
> * people did the minimum amount of work to keep the driver from
> immediately crashing, but didn't really think things through at a
> larger scale.
> 
> Almost every driver is this way Luigi. :-)

Most of the drivers in the three don't support hardware that performs well 
enough for this to be a problem. :) Any driver that's still around from the 
pre-locking days can easily saturate the lines (or the hardware) on today's 
(and even yesterday's hardware).

All the rest have come up with different ways to cope...

Warner

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-07 Thread Tim Kientzle

On Aug 6, 2013, at 9:43 AM, Andre Oppermann wrote:

> The driver supplies a TX frame transmit function (mostly like if_transmit
> today) which does all locking and multi-queue handling internally (driver
> owned.  This gives driver writers the freedom to better adjust to different
> hardware communication methods as they become available, like we witnessed
> a couple of times in the past.

How would you handle TX dequeue?

I'm curious because I got a nice speedup with cpsw
by not using the TX interrupt at all:  I just dequeued
completed packets at the end of the TX transmit
function.

I suppose this would still work with your scheme.

Tim

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-07 Thread Tim Kientzle

On Aug 6, 2013, at 9:43 AM, Andre Oppermann wrote:

> The driver supplies a TX frame transmit function (mostly like if_transmit
> today) which does all locking and multi-queue handling internally (driver
> owned.  This gives driver writers the freedom to better adjust to different
> hardware communication methods as they become available, like we witnessed
> a couple of times in the past.

How would you handle TX dequeue?

I'm curious because I got a nice speedup with cpsw
by not using the TX interrupt at all:  I just dequeued
completed packets at the end of the TX transmit
function.

I suppose this would still work with your scheme.

Tim

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-07 Thread Andre Oppermann

On 07.08.2013 09:18, Luigi Rizzo wrote:

On Wed, Aug 7, 2013 at 5:26 AM, Mike Karels mailto:m...@karels.net>> wrote:
Jumping to (near) the end of the thread, I like most of Andre's proposal.
Running with minimal locks at this layer is an admirable goal, and I agree
with most of what was said.  I have a few observations on the general 
changes,
or related issues:

There was mention of taskqueues.  I think that with MSI-X, taskqueues
should not be needed or used.  More specifically, having separate ithreads
and taskqueues, with ithreads deferring to taskqueues after some limit, 
makes
sense only for MSI and legacy interrupts.  With MSI-X, an interrupt thread
should be able to process packets indefinitely with sufficient CPU 
resources,
and there is no reason to context switch to a different thread periodically.
A periodic "yield" might be reasonable, but if it is necessary, small packet
performance will suffer.  However, most of this is internal to the driver.


i am not completely clear on what is the difference between ithreads and 
taskqueues.


The difference between ithreads and taskqueues is actually very small and 
mostly in
name and how they're set up and kicked off when work is to be done.  Both are 
real
kernel threads.  Most of the confusion, also in Mikes response, seems to come 
from
the name ithreads for interrupt-threads.  However an ithread is not running in
interrupt context, only the interrupt handler (called filter) does.  Scheduling 
a
taskqueue from an ithread is a bit round-about but commonly done.  The 
bus_setup_intr(9)
man page isn't helping to clear up the confusion.

The idea is that a (legacy) interrupt is handled in two stages: 1) a small 
function
reading the hardware interrupt status register to determine if this hardware 
actually
raised an interrupt, and acknowledge it (to prevent additional interrupts from 
firing
while this one is handled).  If it wasn't this card, it hands off to the 
handler if
this interrupt line is shared; 2) the actual function/thread handling the data 
that
was indicated by the interrupt.  Only step 1 runs in interrupt context and thus 
must
be very small and avoid any form of blocking.  Step 2 is a normal kernel thread 
set
up as an ithread running at an elevated priority compared to user-space 
processes/threads.

MSI and MSI-X always are exclusive and non-shared interrupts.  Thus a handler 
running
in interrupt context isn't necessary for non-legacy interrupt sources.  The 
ithread can
be scheduled right away to do its work.


Also, Andre's proposal requires to force-kill the ithread, but i am unclear on 
how to do it
safely (i.e. without leaving the data structures in some inconsistent state), 
unless ithread
periodically yields the CPU when it is in a safe state. While this is internal 
to the driver,
we should probably provide some template code to avoid that each driver 
implements
its own way to shutdown the ithread.


Yes, when done a well-tested, stable and performant template will be provided.

--
Andre

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-07 Thread Luigi Rizzo
On Wed, Aug 7, 2013 at 5:26 AM, Mike Karels  wrote:

> I'm replying to one of the last messages of this thread, but in part going
> back to the beginning; then I'm following up on Andre's proposal.
>
> Luigi wrote:
> > i am slightly unclear of what mechanisms we use to prevent races
> > between interface being reconfigured (up/down/multicast setting, etc,
> > all causing reinitialization of the rx and tx rings) and
>
> > i) packets from the host stack being sent out;
> > ii) interrupts from the network card being processed.
>
> > I think in the old times IFF_DRV_RUNNING was used for this purpose,
> > but now it is not enough.
> > Acquiring the "core lock" in the NIC does not seem enough, either,
> > because newer drivers, especially multiqueue ones, have per-queue
> > rx and tx locks.
>
> > Does anyone know if there is a generic mechanism, or each driver
> > reimplements its own way ?
>
> I'm not sure I understand the question, or its motivation.  What problem(s)
> are we trying to solve here?  It seems to me that this is mostly internal
> to drivers, and I don't see the issue with races.  In particular, the only
> external guarantees that I see are that control operations will affect the
> packet stream "soon" but at some undefined place.  Not all of the cited
> operations (e.g. multicast changes) need to cause the rings to be
> reinitialized; if they do, that's a chip or driver flaw.  Clearing the UP
> flag should cause packets to stop arriving "soon", but presumably
> processing
> those already in memory; packets to stop being sent "soon", probably
> including
> some already accepted for transmission; and new attempts to transmit
> receiving
> an error "soon".  And, of course, the driver should not crash or misbehave.
> Other than that, I don't see what external guarantees need to be met.
>

i only want 'driver should not crash or misbehave', which is something that
(I believe) many current drivers do not guarantee because of the races
mentioned
in the thread (control path reinitializes rings without waiting for the
datapath to drain).
My specific problem was achieving this safe behaviour when moving
between
netmap mode and regular mode; i hoped i could replicate whatever scheme
was implemented by the drivers in 'normal' mode, and this is when i
realized that
there was no such protection in place.

Jumping to (near) the end of the thread, I like most of Andre's proposal.
> Running with minimal locks at this layer is an admirable goal, and I agree
> with most of what was said.  I have a few observations on the general
> changes,
> or related issues:
>
> There was mention of taskqueues.  I think that with MSI-X, taskqueues
> should not be needed or used.  More specifically, having separate ithreads
> and taskqueues, with ithreads deferring to taskqueues after some limit,
> makes
> sense only for MSI and legacy interrupts.  With MSI-X, an interrupt thread
> should be able to process packets indefinitely with sufficient CPU
> resources,
> and there is no reason to context switch to a different thread
> periodically.
> A periodic "yield" might be reasonable, but if it is necessary, small
> packet
> performance will suffer.  However, most of this is internal to the driver.
>

i am not completely clear on what is the difference between ithreads and
taskqueues.

Also, Andre's proposal requires to force-kill the ithread, but i am unclear
on how to do it
safely (i.e. without leaving the data structures in some inconsistent
state), unless ithread
periodically yields the CPU when it is in a safe state. While this is
internal to the driver,
we should probably provide some template code to avoid that each driver
implements
its own way to shutdown the ithread.

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-06 Thread Luigi Rizzo
thanks for the explanations and for experimenting with the various
alternatives.

I started this thread just to understand whether
something was already in place, and to make sure that what I
do with netmap is not worse than the situation we have now.

I guess that while the best solution comes out, a quick and
non intrusive workaround is at least follow the approach that
Bryan suggested.

cheers
luigi

On Tue, Aug 6, 2013 at 6:43 PM, Andre Oppermann  wrote:

> On 05.08.2013 23:53, Luigi Rizzo wrote:
>
>> On Mon, Aug 05, 2013 at 11:04:44PM +0200, Andre Oppermann wrote:
>>
>>> On 05.08.2013 19:36, Luigi Rizzo wrote:
>>>
>> ...
>>
>>>
>>> [picking a post at random to reply in this thread]
>>>
 tell whether or not we should bail out).

>>>
>>> Ideally we don't want to have any locks in the RX and TX path at all.
>>>
>>
>> Ok i have read your proposal below but there are a few things
>> I do not completely understand, below -- essentially I do not
>> understand whether the removal of IFF_DRV_RUNNING (or equivalent)
>> forces you to replace the ifp->new_tx_function() with something
>> else when you want to do an "ifconfig down"
>>
>
> Sorry, it's IFF_DRV_OACTIVE that'll go away, not IFF_DRV_RUNNING.
> It's of no use with multi-queue anyways.  Though we may get more
> differentiated states for interface availability.
>
>
>  Specifically, here are my doubts:
>>
>> - Considering that the rxq lock is rarely contended
>>(only when the control path is active, which in your approach
>>below requires killing and restarting the ithread),
>>and is acquired/released only once per interrupt/batch,
>>i am under the impression that the per-queue RX lock is
>>not a performance problem and makes it easier to reason
>>on the code (and does not require different approach
>>for MSI-x and other cases).
>>
>
> There are two important things here: 1) there must only be one
> (i)thread per RX queue to prevent lock contention; 2) even with
> a non-contended lock there are effects felt by the other cores
> or CPUs like bus lock cycles which are considerable.  Stopping
> and restarting the ithread/taskqueue in those cases that require
> more involved (hardware) reconfiguration isn't much of an overhead
> compared to per-packet or per-packet-batch locking in the hot path.
>
>
>  - in the tx datapath, do you want to acquire the txq lock
>>before or after calling ifp->new_tx_function() ?
>>(btw how it compares to ifp->if_start or ifp->if_transmit ?)
>>
>
> Struct ifnet is going to be split into two parts, the stack owned
> part and the driver owned part.  The stack will never fumble the
> driver owned parts and the driver must only change the stack owned
> parts through accessor functions.  (This split has also been proposed
> by Juniper quite some time ago but for different reasons).
>
> The driver supplies a TX frame transmit function (mostly like if_transmit
> today) which does all locking and multi-queue handling internally (driver
> owned.  This gives driver writers the freedom to better adjust to different
> hardware communication methods as they become available, like we witnessed
> a couple of times in the past.
>
> If the driver is multi-queue capable it takes the rss-hash from the packet
> header
> to select an appropriate queue which may have its own dedicated lock.  If
> there
> is only one queue then it will obtain that lock which may see some
> contention
> when multiple cores try to send at the same time.  The driver may do more
> extensive locking, however that likely comes at the expense of performance.
>
> Typically on modern NICs the TX function will be a thin locked wrapper
> around
> the DMA enqueue function.  For or older or other kinds of interfaces it may
> implement full internal soft queuing (as the IFQ did).  An efficient
> generic
> implementation of those will be provided for driver to make use of.
>
>
> >If you do it within the tx handler then you lose the ability
> >of replacing the handler when you do a reconfiguration,
> >because grabbing the tx lock in the control path does not tell
> >you whether anyone is in the handler.
> >If you do it outside, then the driver should also expose the locks,
> >or some locking function.
>
> The stack calls the transmit function without any driver-specific locks
> held.
> It'll make sure though that the stack-ifnet doesn't go away in between
> probably
> by using cheap rmlocks.
>
> The drivers control path gets a function to ensure that the stack has
> drained
> all writers and it is safe to reconfigure (as in callout_drain()).  Not all
> hardware and control path changes necessarily require a reinitialization.
>
> The stack-ifnet shadows some information, like interface state, and may do
> its
> own independent SMP optimizations to avoid contention.
>
>
>  Overall, it seems to me that keeping the IFF_DRV_RUNNING
>> flag is a lot more practical, not to mention fewer modifications
>> to the code.
>>
>
> Gen

Re: [net] protecting interfaces from races between control and data ?

2013-08-06 Thread Andre Oppermann

On 05.08.2013 23:53, Luigi Rizzo wrote:

On Mon, Aug 05, 2013 at 11:04:44PM +0200, Andre Oppermann wrote:

On 05.08.2013 19:36, Luigi Rizzo wrote:

...


[picking a post at random to reply in this thread]

tell whether or not we should bail out).


Ideally we don't want to have any locks in the RX and TX path at all.


Ok i have read your proposal below but there are a few things
I do not completely understand, below -- essentially I do not
understand whether the removal of IFF_DRV_RUNNING (or equivalent)
forces you to replace the ifp->new_tx_function() with something
else when you want to do an "ifconfig down"


Sorry, it's IFF_DRV_OACTIVE that'll go away, not IFF_DRV_RUNNING.
It's of no use with multi-queue anyways.  Though we may get more
differentiated states for interface availability.


Specifically, here are my doubts:

- Considering that the rxq lock is rarely contended
   (only when the control path is active, which in your approach
   below requires killing and restarting the ithread),
   and is acquired/released only once per interrupt/batch,
   i am under the impression that the per-queue RX lock is
   not a performance problem and makes it easier to reason
   on the code (and does not require different approach
   for MSI-x and other cases).


There are two important things here: 1) there must only be one
(i)thread per RX queue to prevent lock contention; 2) even with
a non-contended lock there are effects felt by the other cores
or CPUs like bus lock cycles which are considerable.  Stopping
and restarting the ithread/taskqueue in those cases that require
more involved (hardware) reconfiguration isn't much of an overhead
compared to per-packet or per-packet-batch locking in the hot path.


- in the tx datapath, do you want to acquire the txq lock
   before or after calling ifp->new_tx_function() ?
   (btw how it compares to ifp->if_start or ifp->if_transmit ?)


Struct ifnet is going to be split into two parts, the stack owned
part and the driver owned part.  The stack will never fumble the
driver owned parts and the driver must only change the stack owned
parts through accessor functions.  (This split has also been proposed
by Juniper quite some time ago but for different reasons).

The driver supplies a TX frame transmit function (mostly like if_transmit
today) which does all locking and multi-queue handling internally (driver
owned.  This gives driver writers the freedom to better adjust to different
hardware communication methods as they become available, like we witnessed
a couple of times in the past.

If the driver is multi-queue capable it takes the rss-hash from the packet 
header
to select an appropriate queue which may have its own dedicated lock.  If there
is only one queue then it will obtain that lock which may see some contention
when multiple cores try to send at the same time.  The driver may do more
extensive locking, however that likely comes at the expense of performance.

Typically on modern NICs the TX function will be a thin locked wrapper around
the DMA enqueue function.  For or older or other kinds of interfaces it may
implement full internal soft queuing (as the IFQ did).  An efficient generic
implementation of those will be provided for driver to make use of.

>If you do it within the tx handler then you lose the ability
>of replacing the handler when you do a reconfiguration,
>because grabbing the tx lock in the control path does not tell
>you whether anyone is in the handler.
>If you do it outside, then the driver should also expose the locks,
>or some locking function.

The stack calls the transmit function without any driver-specific locks held.
It'll make sure though that the stack-ifnet doesn't go away in between probably
by using cheap rmlocks.

The drivers control path gets a function to ensure that the stack has drained
all writers and it is safe to reconfigure (as in callout_drain()).  Not all
hardware and control path changes necessarily require a reinitialization.

The stack-ifnet shadows some information, like interface state, and may do its
own independent SMP optimizations to avoid contention.


Overall, it seems to me that keeping the IFF_DRV_RUNNING
flag is a lot more practical, not to mention fewer modifications
to the code.


Generally we want to optimize for the fast packet path, reduce any friction we
can, and take a hit on reconfig being more "expensive" as it is much less
frequent.

Mind you not all of this is worked out in detail yet and may be subject to
further changes and refinements as more benchmarking of different approaches
is performed.

--
Andre

[PS: I'm currently on summer vacation and write this while having access]


[description of Andre's proposal below, for reference]

cheers
luigi


Every queue has its own separate RX interrupt vector which is serviced by
a single dedicated ithread (or taskqueue) which does while(1) for work on
the RX ring only going to sleep when it reaches the end of 

Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Luigi Rizzo
On Mon, Aug 05, 2013 at 11:04:44PM +0200, Andre Oppermann wrote:
> On 05.08.2013 19:36, Luigi Rizzo wrote:
...
> 
> [picking a post at random to reply in this thread]
> > tell whether or not we should bail out).
> 
> Ideally we don't want to have any locks in the RX and TX path at all.

Ok i have read your proposal below but there are a few things
I do not completely understand, below -- essentially I do not
understand whether the removal of IFF_DRV_RUNNING (or equivalent)
forces you to replace the ifp->new_tx_function() with something
else when you want to do an "ifconfig down"

Specifically, here are my doubts:

- Considering that the rxq lock is rarely contended
  (only when the control path is active, which in your approach
  below requires killing and restarting the ithread),
  and is acquired/released only once per interrupt/batch,
  i am under the impression that the per-queue RX lock is
  not a performance problem and makes it easier to reason
  on the code (and does not require different approach
  for MSI-x and other cases).

- in the tx datapath, do you want to acquire the txq lock
  before or after calling ifp->new_tx_function() ?
  (btw how it compares to ifp->if_start or ifp->if_transmit ?)
  If you do it within the tx handler then you lose the ability
  of replacing the handler when you do a reconfiguration,
  because grabbing the tx lock in the control path does not tell
  you whether anyone is in the handler.
  If you do it outside, then the driver should also expose the locks,
  or some locking function.

Overall, it seems to me that keeping the IFF_DRV_RUNNING
flag is a lot more practical, not to mention fewer modifications
to the code.

[description of Andre's proposal below, for reference]

cheers
luigi

> Every queue has its own separate RX interrupt vector which is serviced by
> a single dedicated ithread (or taskqueue) which does while(1) for work on
> the RX ring only going to sleep when it reaches the end of work on the ring.
> It is then woken up by an interrupt to resume work.  To prevent a live-lock
> on the CPU it would periodically yield to allow other kernel and user-space
> threads to run in between.  Each queue's ithread (taskqueue) can be pinned
> to a particular core or float with a certain stickiness.  To reconfigure the
> card (when the RX ring is affected) the ithread (taskqueue) is terminated
> and after the changes restarted again.  This is done by the control path
> and allows us to run RX completely lock free because there is only ever one
> ithread accessing the RX queue doing away with the need for further locking
> synchronization.

ok I admit i did not think of killing and restarting the ithread,
but i wonder how 
> RX with MSI or legacy interrupts:
> 
> Here multi-queue is impeded because of some synchronization issues.  Depending
> on the hardware synchronization model the ithreads again do while(1) but have
> to be made runnable by the interrupt handler when new work has been indicated.
> 
> TX in general:
> 
> This is a bit more tricky as we have the hard requirement of packet ordering
> for individual sessions (tcp and others).  That means we must use the same
> queue for all packets of the same session.  This makes running TX non-locked
> a bit tricky because when we pin a TX queue to a core we must switch to that
> core first before adding anything to the queue.  If the packet was generated
> on that core there is no overhead, OTOH if not there is the scheduler over-
> head and some cache losses.  Ensuring that a the entire TX path, possibly
> including user-space (write et al) happens on the same core is difficult and
> may have its own inefficiencies.  The number of TX queue vs. number of cores
> is another point of contention.  To make the 1:1 scheme work well, one must
> have as many queues as cores.
> 
> A more scalable and generic approach doesn't bind TX queues to any particular
> core and is covers each by its own lock to protect the TX ring enqueue.  To
> prevent false lock cache line sharing each TX structure should be on its own
> cache line.  As each session has an affinity hash they become distributed
> across the TX queues significantly reducing contention.
> 
> The TX complete handling freeing the mbuf(s) is done the same way as for the
> RX ring with a dedicated ithread (taskqueue) for each.  Also bpf should hook
> in here (the packet has been sent) instead of at the TX enqueue time.
> 
> The whole concept of IFF_DRV_RUNNING will go away along with the IFQ macros.
> Each driver then provides a direct TX function pointer which is put into a
> decoupled ifnet TX field for use by the stack.  Under normal operation all
> interface TX goes directly into TX DMA ring.  The particular multi-queue
> and locking model is decided by the driver.  The kernel provides a couple
> of common optimized abstractions for use by all drivers that want/need it
> to avoid code and functionality duplication.  When things like ALTQ are
> activated

Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Andre Oppermann

On 05.08.2013 19:36, Luigi Rizzo wrote:

On Mon, Aug 5, 2013 at 7:17 PM, Adrian Chadd  wrote:


I'm travelling back to San Jose today; poke me tomorrow and I'll brain
dump what I did in ath(4) and the lessons learnt.

The TL;DR version - you don't want to grab an extra lock in the
read/write paths as that slows things down. Reuse the same per-queue
TX/RX lock and have:

* a reset flag that is set when something is resetting; that says to
the queue "don't bother processing anything, just dive out";
* 'i am doing Tx / Rx' flags per queue that is set at the start of
TX/RX servicing and finishes at the end; that way the reset code knows
if there's something pending;
* have the reset path grab each lock, set the 'reset' flag on each,
then walk each queue again and make sure they're all marked as 'not
doing TX/RX'. At that point the reset can occur, then the flag cna be
cleared, then TX/RX can resume.



[picking a post at random to reply in this thread]


so this is slightly different from what Bryan suggested (and you endorsed)
before, as in that case there was a single 'reset' flag IFF_DRV_RUNNING
protected by the 'core' lock, then a nested round on all tx and rx locks
to make sure that all customers have seen it.
In both cases the tx and rx paths only need the per-queue lock.

As i see it, having a per-queue reset flag removes the need for nesting
core + queue locks, but since this is only in the control path perhaps
it is not a big deal (and is better to have a single place to look at to
tell whether or not we should bail out).


Ideally we don't want to have any locks in the RX and TX path at all.

For the TX path this is not easy to achieve but for RX it isn't difficult
at all provided the correct model is used.

My upcoming stack/driver proposal is along these lines:

RX with MSI-X:

Every queue has its own separate RX interrupt vector which is serviced by
a single dedicated ithread (or taskqueue) which does while(1) for work on
the RX ring only going to sleep when it reaches the end of work on the ring.
It is then woken up by an interrupt to resume work.  To prevent a live-lock
on the CPU it would periodically yield to allow other kernel and user-space
threads to run in between.  Each queue's ithread (taskqueue) can be pinned
to a particular core or float with a certain stickiness.  To reconfigure the
card (when the RX ring is affected) the ithread (taskqueue) is terminated
and after the changes restarted again.  This is done by the control path
and allows us to run RX completely lock free because there is only ever one
ithread accessing the RX queue doing away with the need for further locking
synchronization.

RX with MSI or legacy interrupts:

Here multi-queue is impeded because of some synchronization issues.  Depending
on the hardware synchronization model the ithreads again do while(1) but have
to be made runnable by the interrupt handler when new work has been indicated.

TX in general:

This is a bit more tricky as we have the hard requirement of packet ordering
for individual sessions (tcp and others).  That means we must use the same
queue for all packets of the same session.  This makes running TX non-locked
a bit tricky because when we pin a TX queue to a core we must switch to that
core first before adding anything to the queue.  If the packet was generated
on that core there is no overhead, OTOH if not there is the scheduler over-
head and some cache losses.  Ensuring that a the entire TX path, possibly
including user-space (write et al) happens on the same core is difficult and
may have its own inefficiencies.  The number of TX queue vs. number of cores
is another point of contention.  To make the 1:1 scheme work well, one must
have as many queues as cores.

A more scalable and generic approach doesn't bind TX queues to any particular
core and is covers each by its own lock to protect the TX ring enqueue.  To
prevent false lock cache line sharing each TX structure should be on its own
cache line.  As each session has an affinity hash they become distributed
across the TX queues significantly reducing contention.

The TX complete handling freeing the mbuf(s) is done the same way as for the
RX ring with a dedicated ithread (taskqueue) for each.  Also bpf should hook
in here (the packet has been sent) instead of at the TX enqueue time.

The whole concept of IFF_DRV_RUNNING will go away along with the IFQ macros.
Each driver then provides a direct TX function pointer which is put into a
decoupled ifnet TX field for use by the stack.  Under normal operation all
interface TX goes directly into TX DMA ring.  The particular multi-queue
and locking model is decided by the driver.  The kernel provides a couple
of common optimized abstractions for use by all drivers that want/need it
to avoid code and functionality duplication.  When things like ALTQ are
activated on an interface the ifnet TX function pointer is replaced with
the appropriate intermediate function pointer which eventually will call

Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Bryan Venteicher


- Original Message -
> On Mon, Aug 5, 2013 at 8:19 PM, Adrian Chadd  wrote:
> 
> > No, brian said two things:
> >
> > * the flag, protected by the core lock
> > * per-queue flags
> >
> 
> i see no mentions on per-queue flags on his email.
> This is the relevant part
> 

Right, I just use the IFF_DRV_RUNNING flag. I think Adrian meant
'per-queue locks' here? 

> 
> 
> What I've done in my drivers is:
>   * Lock the core mutex
>   * Clear IFF_DRV_RUNNING
>   * Lock/unlock each queue's lock
> 
> The various Rx/Tx queue functions check for IFF_DRV_RUNNING after
> (re)acquiring their queue lock. See at vtnet_stop_rendezvous() at
> [1] for an example.
> 
> [1]
> http://svnweb.freebsd.org/base/user/bryanv/vtnetmq/sys/dev/virtio/network/if_vtnet.c?revision=252451&view=markup
> 
> -
> 
> 
> >
> >
> >
> > -adrian
> >
> 
> 
> 
> --
> -+---
>  Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
>  http://www.iet.unipi.it/~luigi/. Universita` di Pisa
>  TEL  +39-050-2211611   . via Diotisalvi 2
>  Mobile   +39-338-6809875   . 56122 PISA (Italy)
> -+---
> 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Luigi Rizzo
On Mon, Aug 5, 2013 at 8:46 PM, Jack Vogel  wrote:

> What do you think about this change?
>
>
looks good to me. but there is no need to rush, especially
it will be nice if all interested parties agree on an approach
and possibly even naming.

I do not have any specific test case but the problem came up
when an interface is in netmap mode and dispatches to the
in-kernel VALE switch, and userland tries to move the interface
back to regular mode.

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Andre Oppermann

On 05.08.2013 16:59, Bryan Venteicher wrote:



- Original Message -

i am slightly unclear of what mechanisms we use to prevent races
between interface being reconfigured (up/down/multicast setting, etc,
all causing reinitialization of the rx and tx rings) and

i) packets from the host stack being sent out;
ii) interrupts from the network card being processed.

I think in the old times IFF_DRV_RUNNING was used for this purpose,
but now it is not enough.
Acquiring the "core lock" in the NIC does not seem enough, either,
because newer drivers, especially multiqueue ones, have per-queue
rx and tx locks.



What I've done in my drivers is:
   * Lock the core mutex
   * Clear IFF_DRV_RUNNING
   * Lock/unlock each queue's lock

The various Rx/Tx queue functions check for IFF_DRV_RUNNING after
(re)acquiring their queue lock. See at vtnet_stop_rendezvous() at
[1] for an example.


Does anyone know if there is a generic mechanism, or each driver
reimplements its own way ?



We desperately need a saner ifnet/driver interface. I think andre@
had some previous work in this area (and additional plans as well?).


Yes.  I have received a grant from the FF to look at this in depth,
evaluate different approaches and to propose an implementation for
the whole stack.

--
Andre


IMO, there's a lot to like on what DragonflyBSD has done in this area.

[1] - 
http://svnweb.freebsd.org/base/user/bryanv/vtnetmq/sys/dev/virtio/network/if_vtnet.c?revision=252451&view=markup


thanks
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


___
freebsd-...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"




___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Luigi Rizzo
On Mon, Aug 5, 2013 at 8:19 PM, Adrian Chadd  wrote:

> No, brian said two things:
>
> * the flag, protected by the core lock
> * per-queue flags
>

i see no mentions on per-queue flags on his email.
This is the relevant part



What I've done in my drivers is:
  * Lock the core mutex
  * Clear IFF_DRV_RUNNING
  * Lock/unlock each queue's lock

The various Rx/Tx queue functions check for IFF_DRV_RUNNING after
(re)acquiring their queue lock. See at vtnet_stop_rendezvous() at
[1] for an example.

[1]
http://svnweb.freebsd.org/base/user/bryanv/vtnetmq/sys/dev/virtio/network/if_vtnet.c?revision=252451&view=markup

-


>
>
>
> -adrian
>



-- 
-+---
 Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/. Universita` di Pisa
 TEL  +39-050-2211611   . via Diotisalvi 2
 Mobile   +39-338-6809875   . 56122 PISA (Italy)
-+---
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Jack Vogel
What do you think about this change?

Cheers,

Jack



On Mon, Aug 5, 2013 at 10:58 AM, Luigi Rizzo  wrote:

> On Mon, Aug 5, 2013 at 7:49 PM, Jack Vogel  wrote:
>
>> Sigh, this ends up being ugly I'm afraid. I need some time to look at
>> code and think about it.
>>
>>
> actually the intel drivers seem in decent shape,
> especially if we reuse IFF_DRV_RUNNING as the reset flag
> and the core+queue lock in the control path.
>
> cheers
> luigi
>
>
>
>> Jack
>>
>>
>>
>> On Mon, Aug 5, 2013 at 10:36 AM, Luigi Rizzo  wrote:
>>
>>> On Mon, Aug 5, 2013 at 7:17 PM, Adrian Chadd  wrote:
>>>
>>> > I'm travelling back to San Jose today; poke me tomorrow and I'll brain
>>> > dump what I did in ath(4) and the lessons learnt.
>>> >
>>> > The TL;DR version - you don't want to grab an extra lock in the
>>> > read/write paths as that slows things down. Reuse the same per-queue
>>> > TX/RX lock and have:
>>> >
>>> > * a reset flag that is set when something is resetting; that says to
>>> > the queue "don't bother processing anything, just dive out";
>>> > * 'i am doing Tx / Rx' flags per queue that is set at the start of
>>> > TX/RX servicing and finishes at the end; that way the reset code knows
>>> > if there's something pending;
>>> > * have the reset path grab each lock, set the 'reset' flag on each,
>>> > then walk each queue again and make sure they're all marked as 'not
>>> > doing TX/RX'. At that point the reset can occur, then the flag cna be
>>> > cleared, then TX/RX can resume.
>>> >
>>>
>>> so this is slightly different from what Bryan suggested (and you
>>> endorsed)
>>> before, as in that case there was a single 'reset' flag IFF_DRV_RUNNING
>>> protected by the 'core' lock, then a nested round on all tx and rx locks
>>> to make sure that all customers have seen it.
>>> In both cases the tx and rx paths only need the per-queue lock.
>>>
>>> As i see it, having a per-queue reset flag removes the need for nesting
>>> core + queue locks, but since this is only in the control path perhaps
>>> it is not a big deal (and is better to have a single place to look at to
>>> tell whether or not we should bail out).
>>>
>>> cheers
>>> luigi
>>>
>>> ___
>>> freebsd-...@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>>>
>>
>>
>
>
> --
> -+---
>  Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
>  http://www.iet.unipi.it/~luigi/. Universita` di Pisa
>  TEL  +39-050-2211611   . via Diotisalvi 2
>  Mobile   +39-338-6809875   . 56122 PISA (Italy)
> -+---
>


quiesce.diff
Description: Binary data
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Adrian Chadd
No, brian said two things:

* the flag, protected by the core lock
* per-queue flags



-adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Luigi Rizzo
On Mon, Aug 5, 2013 at 7:49 PM, Jack Vogel  wrote:

> Sigh, this ends up being ugly I'm afraid. I need some time to look at code
> and think about it.
>
>
actually the intel drivers seem in decent shape,
especially if we reuse IFF_DRV_RUNNING as the reset flag
and the core+queue lock in the control path.

cheers
luigi



> Jack
>
>
>
> On Mon, Aug 5, 2013 at 10:36 AM, Luigi Rizzo  wrote:
>
>> On Mon, Aug 5, 2013 at 7:17 PM, Adrian Chadd  wrote:
>>
>> > I'm travelling back to San Jose today; poke me tomorrow and I'll brain
>> > dump what I did in ath(4) and the lessons learnt.
>> >
>> > The TL;DR version - you don't want to grab an extra lock in the
>> > read/write paths as that slows things down. Reuse the same per-queue
>> > TX/RX lock and have:
>> >
>> > * a reset flag that is set when something is resetting; that says to
>> > the queue "don't bother processing anything, just dive out";
>> > * 'i am doing Tx / Rx' flags per queue that is set at the start of
>> > TX/RX servicing and finishes at the end; that way the reset code knows
>> > if there's something pending;
>> > * have the reset path grab each lock, set the 'reset' flag on each,
>> > then walk each queue again and make sure they're all marked as 'not
>> > doing TX/RX'. At that point the reset can occur, then the flag cna be
>> > cleared, then TX/RX can resume.
>> >
>>
>> so this is slightly different from what Bryan suggested (and you endorsed)
>> before, as in that case there was a single 'reset' flag IFF_DRV_RUNNING
>> protected by the 'core' lock, then a nested round on all tx and rx locks
>> to make sure that all customers have seen it.
>> In both cases the tx and rx paths only need the per-queue lock.
>>
>> As i see it, having a per-queue reset flag removes the need for nesting
>> core + queue locks, but since this is only in the control path perhaps
>> it is not a big deal (and is better to have a single place to look at to
>> tell whether or not we should bail out).
>>
>> cheers
>> luigi
>>
>> ___
>> freebsd-...@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>>
>
>


-- 
-+---
 Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/. Universita` di Pisa
 TEL  +39-050-2211611   . via Diotisalvi 2
 Mobile   +39-338-6809875   . 56122 PISA (Italy)
-+---
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Jack Vogel
Sigh, this ends up being ugly I'm afraid. I need some time to look at code
and think about it.

Jack



On Mon, Aug 5, 2013 at 10:36 AM, Luigi Rizzo  wrote:

> On Mon, Aug 5, 2013 at 7:17 PM, Adrian Chadd  wrote:
>
> > I'm travelling back to San Jose today; poke me tomorrow and I'll brain
> > dump what I did in ath(4) and the lessons learnt.
> >
> > The TL;DR version - you don't want to grab an extra lock in the
> > read/write paths as that slows things down. Reuse the same per-queue
> > TX/RX lock and have:
> >
> > * a reset flag that is set when something is resetting; that says to
> > the queue "don't bother processing anything, just dive out";
> > * 'i am doing Tx / Rx' flags per queue that is set at the start of
> > TX/RX servicing and finishes at the end; that way the reset code knows
> > if there's something pending;
> > * have the reset path grab each lock, set the 'reset' flag on each,
> > then walk each queue again and make sure they're all marked as 'not
> > doing TX/RX'. At that point the reset can occur, then the flag cna be
> > cleared, then TX/RX can resume.
> >
>
> so this is slightly different from what Bryan suggested (and you endorsed)
> before, as in that case there was a single 'reset' flag IFF_DRV_RUNNING
> protected by the 'core' lock, then a nested round on all tx and rx locks
> to make sure that all customers have seen it.
> In both cases the tx and rx paths only need the per-queue lock.
>
> As i see it, having a per-queue reset flag removes the need for nesting
> core + queue locks, but since this is only in the control path perhaps
> it is not a big deal (and is better to have a single place to look at to
> tell whether or not we should bail out).
>
> cheers
> luigi
> ___
> freebsd-...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Scott Long

On Aug 5, 2013, at 11:20 AM, Adrian Chadd  wrote:

> .. and I bet it's not a design pattern, and this is total conjecture on my 
> part:
> 
> * the original drivers weren't SMP safe;
> * noone really sat down and figured out how to correctly synchronise
> all of this stuff;
> * people did the minimum amount of work to keep the driver from
> immediately crashing, but didn't really think things through at a
> larger scale.
> 
> Almost every driver is this way Luigi. :-)
> 
> 


Yes, but let me expand.  The original work to make NIC drivers SMP focused
around just putting everything under 1 lock.  The lock was acquired in
if_start and held through the transmission of the packet, it was held in
if_ioctl all the way through whatever action was taken, and it was held in
the interrupt handler over all of the RX and TX-complete processing.  This
worked fine up until the RX path called if_input() with the netisr path set
for direct dispatch.  In this mode, the code could flow up the stack and
then immediately back down into the if_start handler for the driver,
where it would try to acquire the same lock again.  Also, it meant that
forwarding packets between to interfaces would have the lock from the
RX context of one interface held into the TX context of the other interface.

Obviously, this was a big mess, so the "solution" was to just drop the
lock around the call to if_input().  No consideration was made for driver
state that might change during the lock release, nor was any consideration
made for the performance impact of dropping the lock on every packet and
then having to contend with top-half TX threads to pick it back up.  But
this became the quick-fix pattern.

As for the original question of why the RX path can operate unlocked, it's
fairly easy.  The RX machinery of most NICs is fairly separate from the TX
machinery, so little synchronization is needed.  When there's a state change
in the driver, in terms of an interface going down, a queue size changing,
or the driver being unloaded, it's up to the driver to pause and drain the
RX processing prior to this state change.  It worked well when I did it in
if_em, and it appears to work well with cxgbe.

Scott

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Luigi Rizzo
On Mon, Aug 5, 2013 at 7:17 PM, Adrian Chadd  wrote:

> I'm travelling back to San Jose today; poke me tomorrow and I'll brain
> dump what I did in ath(4) and the lessons learnt.
>
> The TL;DR version - you don't want to grab an extra lock in the
> read/write paths as that slows things down. Reuse the same per-queue
> TX/RX lock and have:
>
> * a reset flag that is set when something is resetting; that says to
> the queue "don't bother processing anything, just dive out";
> * 'i am doing Tx / Rx' flags per queue that is set at the start of
> TX/RX servicing and finishes at the end; that way the reset code knows
> if there's something pending;
> * have the reset path grab each lock, set the 'reset' flag on each,
> then walk each queue again and make sure they're all marked as 'not
> doing TX/RX'. At that point the reset can occur, then the flag cna be
> cleared, then TX/RX can resume.
>

so this is slightly different from what Bryan suggested (and you endorsed)
before, as in that case there was a single 'reset' flag IFF_DRV_RUNNING
protected by the 'core' lock, then a nested round on all tx and rx locks
to make sure that all customers have seen it.
In both cases the tx and rx paths only need the per-queue lock.

As i see it, having a per-queue reset flag removes the need for nesting
core + queue locks, but since this is only in the control path perhaps
it is not a big deal (and is better to have a single place to look at to
tell whether or not we should bail out).

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Adrian Chadd
.. and I bet it's not a design pattern, and this is total conjecture on my part:

* the original drivers weren't SMP safe;
* noone really sat down and figured out how to correctly synchronise
all of this stuff;
* people did the minimum amount of work to keep the driver from
immediately crashing, but didn't really think things through at a
larger scale.

Almost every driver is this way Luigi. :-)



-adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Adrian Chadd
I'm travelling back to San Jose today; poke me tomorrow and I'll brain
dump what I did in ath(4) and the lessons learnt.

The TL;DR version - you don't want to grab an extra lock in the
read/write paths as that slows things down. Reuse the same per-queue
TX/RX lock and have:

* a reset flag that is set when something is resetting; that says to
the queue "don't bother processing anything, just dive out";
* 'i am doing Tx / Rx' flags per queue that is set at the start of
TX/RX servicing and finishes at the end; that way the reset code knows
if there's something pending;
* have the reset path grab each lock, set the 'reset' flag on each,
then walk each queue again and make sure they're all marked as 'not
doing TX/RX'. At that point the reset can occur, then the flag cna be
cleared, then TX/RX can resume.



-adrian

On 5 August 2013 10:13, Navdeep Parhar  wrote:
> On 08/05/13 09:15, Luigi Rizzo wrote:
>> On Mon, Aug 5, 2013 at 5:46 PM, Adrian Chadd  wrote:
>>
>>> On 5 August 2013 07:59, Bryan Venteicher 
>>> wrote:
>>>
 What I've done in my drivers is:
   * Lock the core mutex
   * Clear IFF_DRV_RUNNING
   * Lock/unlock each queue's lock
>>>
>>> .. and I think that's the only sane way of doing it.
>>>
>>
>> yeah, this was also the solution we had in mind, i was surprised
>> not find this pattern in the drivers i have looked at.
>>
>> Also there are drivers (chelsio ?) which do not seem to have locks on the
>> receive interrupt handlers ?
>
> This is correct.  cxgbe(4) does not have any locks on rx, just a "state"
> for each rx queue that's maintained with atomic ops.
>
> Regards,
> Navdeep
>
>
>>
>> Does anyone know how linux copes with the same problem ?
>>
>> They seem to have an rtnl_lock() which is a global lock for all
>> configuration
>> of netdevices (would replace our per-interface 'core lock' above),
>> but i am totally unclear on how individual tx threads and interrupt handlers
>> acknowledge that they have read the change in status.
>>
>> cheers
>> luigi
>> ___
>> freebsd-...@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>>
>
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Navdeep Parhar
On 08/05/13 09:15, Luigi Rizzo wrote:
> On Mon, Aug 5, 2013 at 5:46 PM, Adrian Chadd  wrote:
> 
>> On 5 August 2013 07:59, Bryan Venteicher 
>> wrote:
>>
>>> What I've done in my drivers is:
>>>   * Lock the core mutex
>>>   * Clear IFF_DRV_RUNNING
>>>   * Lock/unlock each queue's lock
>>
>> .. and I think that's the only sane way of doing it.
>>
> 
> yeah, this was also the solution we had in mind, i was surprised
> not find this pattern in the drivers i have looked at.
> 
> Also there are drivers (chelsio ?) which do not seem to have locks on the
> receive interrupt handlers ?

This is correct.  cxgbe(4) does not have any locks on rx, just a "state"
for each rx queue that's maintained with atomic ops.

Regards,
Navdeep


> 
> Does anyone know how linux copes with the same problem ?
> 
> They seem to have an rtnl_lock() which is a global lock for all
> configuration
> of netdevices (would replace our per-interface 'core lock' above),
> but i am totally unclear on how individual tx threads and interrupt handlers
> acknowledge that they have read the change in status.
> 
> cheers
> luigi
> ___
> freebsd-...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> 

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Luigi Rizzo
On Mon, Aug 5, 2013 at 5:46 PM, Adrian Chadd  wrote:

> On 5 August 2013 07:59, Bryan Venteicher 
> wrote:
>
> > What I've done in my drivers is:
> >   * Lock the core mutex
> >   * Clear IFF_DRV_RUNNING
> >   * Lock/unlock each queue's lock
>
> .. and I think that's the only sane way of doing it.
>

yeah, this was also the solution we had in mind, i was surprised
not find this pattern in the drivers i have looked at.

Also there are drivers (chelsio ?) which do not seem to have locks on the
receive interrupt handlers ?

Does anyone know how linux copes with the same problem ?

They seem to have an rtnl_lock() which is a global lock for all
configuration
of netdevices (would replace our per-interface 'core lock' above),
but i am totally unclear on how individual tx threads and interrupt handlers
acknowledge that they have read the change in status.

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Adrian Chadd
On 5 August 2013 07:59, Bryan Venteicher  wrote:

> What I've done in my drivers is:
>   * Lock the core mutex
>   * Clear IFF_DRV_RUNNING
>   * Lock/unlock each queue's lock

.. and I think that's the only sane way of doing it.

I'm going to (soon) propose something similar for cxgbe/ixgbe as we
use these NICs at work, then feed this experiment back into the
network stack so we can have a unified way of doing this.

You may also want to synchronize against the driver TX/RX/core locks
and state when doing things like, say, halting DMA in preparation for
multicast reprogramming on some hardware; or even doing a chip reset.

I had to hand-roll this for ath(4) to make it completely correct - any
kind of overlapping reset, reset during TX, reset during RX etc would
cause all kinds of instability and random-crap-scribbled-everywhere
issues. So yes, this is a larger scale issue that needs to be solved.


-adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Bryan Venteicher


- Original Message -
> i am slightly unclear of what mechanisms we use to prevent races
> between interface being reconfigured (up/down/multicast setting, etc,
> all causing reinitialization of the rx and tx rings) and
> 
> i) packets from the host stack being sent out;
> ii) interrupts from the network card being processed.
> 
> I think in the old times IFF_DRV_RUNNING was used for this purpose,
> but now it is not enough.
> Acquiring the "core lock" in the NIC does not seem enough, either,
> because newer drivers, especially multiqueue ones, have per-queue
> rx and tx locks.
> 

What I've done in my drivers is:
  * Lock the core mutex
  * Clear IFF_DRV_RUNNING
  * Lock/unlock each queue's lock

The various Rx/Tx queue functions check for IFF_DRV_RUNNING after
(re)acquiring their queue lock. See at vtnet_stop_rendezvous() at
[1] for an example.

> Does anyone know if there is a generic mechanism, or each driver
> reimplements its own way ?
> 

We desperately need a saner ifnet/driver interface. I think andre@ 
had some previous work in this area (and additional plans as well?).
IMO, there's a lot to like on what DragonflyBSD has done in this area.

[1] - 
http://svnweb.freebsd.org/base/user/bryanv/vtnetmq/sys/dev/virtio/network/if_vtnet.c?revision=252451&view=markup

> thanks
> luigi
> ___
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Scott Long

On Aug 5, 2013, at 2:23 AM, Luigi Rizzo  wrote:

> i am slightly unclear of what mechanisms we use to prevent races
> between interface being reconfigured (up/down/multicast setting, etc,
> all causing reinitialization of the rx and tx rings) and
> 
> i) packets from the host stack being sent out;
> ii) interrupts from the network card being processed.
> 
> I think in the old times IFF_DRV_RUNNING was used for this purpose,
> but now it is not enough.
> Acquiring the "core lock" in the NIC does not seem enough, either,
> because newer drivers, especially multiqueue ones, have per-queue
> rx and tx locks.
> 
> Does anyone know if there is a generic mechanism, or each driver
> reimplements its own way ?
> 

I'll speak to the RX side of the question.  Several years ago I modified the
if_em driver to use a fast interrupt handler that would signal actual processing
in a taskqueue thread.  The result was a system with no more latency than
the classic ithread model, but with the ability to allow RX processing to be
halted, drained, and restarted via an explicit API.  This in turn was extended
to cause the RX processing to be safely paused during the control events
that you described above.

The system worked well on many fronts, but unfortunately I was unable to
actively maintain it, and it was slowly garbage collected over time.  I think
that it could have been extended without much effort to cover TX-complete
processing.  TX dispatch is a different matter, but I don't think that it would 
be
hard to have the if_transmit/if_start path respond to control synchronization
events.

Scott


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


[net] protecting interfaces from races between control and data ?

2013-08-05 Thread Luigi Rizzo
i am slightly unclear of what mechanisms we use to prevent races
between interface being reconfigured (up/down/multicast setting, etc,
all causing reinitialization of the rx and tx rings) and

i) packets from the host stack being sent out;
ii) interrupts from the network card being processed.

I think in the old times IFF_DRV_RUNNING was used for this purpose,
but now it is not enough.
Acquiring the "core lock" in the NIC does not seem enough, either,
because newer drivers, especially multiqueue ones, have per-queue
rx and tx locks.

Does anyone know if there is a generic mechanism, or each driver
reimplements its own way ?

thanks
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"