Re: ifq serialisation, was Re: taskctx and revisiting if_start serialisation

2015-12-09 Thread Mark Kettenis
> Date: Tue, 8 Dec 2015 21:58:49 +1000
> From: David Gwynne 
> 
> On Sun, Dec 06, 2015 at 02:00:16PM +1000, David Gwynne wrote:
> > the current code for serialising if_start calls for mpsafe nics does what 
> > it says.
> 
> as per mpi@s suggestion, this makes the ifq code responsible for
> the task serialisation.
> 
> all the machinery is there, but it provides a minimal step toward
> coordination of the oactive flag. the only concession is if a driver
> wants to clear oactive and run its start routine again it can call
> ifq_restart() to have it serialised with the stacks calls to the
> start routine.
> 
> if there's a desire for a driver to run all of its txeof serialised
> with its start routine, we'll figure out the best way to provide
> that when the problem really occurs.
> 
> if_start_barrier is now called ifq_barrier too.i

Bleah.  I just spent two days in bed with a flu.  I still need to
write down my conclusions from the experiments I did with forwarding
during n2k15.  But one of the things I noticed that the taskctx
serialization reduced the forwarding performance a by something like
10%.  I'd like to see what happens with this version of the diff.  I
think I can recreate setup that I used during n2k15 with em(4) instead
of ix(4) at home.  But it won't happen until after this weekend.

> Index: dev/pci/if_myx.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_myx.c,v
> retrieving revision 1.90
> diff -u -p -r1.90 if_myx.c
> --- dev/pci/if_myx.c  3 Dec 2015 12:45:56 -   1.90
> +++ dev/pci/if_myx.c  8 Dec 2015 11:31:09 -
> @@ -1208,7 +1208,7 @@ myx_up(struct myx_softc *sc)
>   ifq_clr_oactive(>if_snd);
>   SET(ifp->if_flags, IFF_RUNNING);
>   myx_iff(sc);
> - myx_start(ifp);
> + if_start(ifp);
>  
>   return;
>  
> @@ -1330,6 +1330,8 @@ myx_down(struct myx_softc *sc)
>   int  s;
>   int  ring;
>  
> + CLR(ifp->if_flags, IFF_RUNNING);
> +
>   bus_dmamap_sync(sc->sc_dmat, map, 0, map->dm_mapsize,
>   BUS_DMASYNC_POSTREAD|BUS_DMASYNC_POSTWRITE);
>   sc->sc_linkdown = sts->ms_linkdown;
> @@ -1362,8 +1364,7 @@ myx_down(struct myx_softc *sc)
>   }
>  
>   ifq_clr_oactive(>if_snd);
> - CLR(ifp->if_flags, IFF_RUNNING);
> - if_start_barrier(ifp);
> + ifq_barrier(>if_snd);
>  
>   for (ring = 0; ring < 2; ring++) {
>   struct myx_rx_ring *mrr = >sc_rx_ring[ring];
> @@ -1702,9 +1703,8 @@ myx_txeof(struct myx_softc *sc, u_int32_
>   sc->sc_tx_ring_cons = idx;
>   sc->sc_tx_cons = cons;
>  
> - ifq_clr_oactive(>if_snd);
> - if (!ifq_empty(>if_snd))
> - if_start(ifp);
> + if (ifq_is_oactive(>if_snd))
> + ifq_restart(>if_snd);
>  }
>  
>  void
> Index: dev/pci/if_bnx.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_bnx.c,v
> retrieving revision 1.118
> diff -u -p -r1.118 if_bnx.c
> --- dev/pci/if_bnx.c  5 Dec 2015 16:23:37 -   1.118
> +++ dev/pci/if_bnx.c  8 Dec 2015 11:31:09 -
> @@ -871,6 +871,7 @@ bnx_attachhook(void *xsc)
>   ifp = >arpcom.ac_if;
>   ifp->if_softc = sc;
>   ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
> + ifp->if_xflags = IFXF_MPSAFE;
>   ifp->if_ioctl = bnx_ioctl;
>   ifp->if_start = bnx_start;
>   ifp->if_watchdog = bnx_watchdog;
> @@ -4573,20 +4574,14 @@ bnx_tx_intr(struct bnx_softc *sc)
>  
>   used = atomic_sub_int_nv(>used_tx_bd, freed);
>  
> + sc->tx_cons = sw_tx_cons;
> +
>   /* Clear the TX timeout timer. */
>   if (used == 0)
>   ifp->if_timer = 0;
>  
> - /* Clear the tx hardware queue full flag. */
> - if (used < sc->max_tx_bd) {
> - DBRUNIF(ifq_is_oactive(>if_snd),
> - printf("%s: Open TX chain! %d/%d (used/total)\n",
> - sc->bnx_dev.dv_xname, used,
> - sc->max_tx_bd));
> - ifq_clr_oactive(>if_snd);
> - }
> -
> - sc->tx_cons = sw_tx_cons;
> + if (ifq_is_oactive(>if_snd))
> + ifq_restart(>if_snd);
>  }
>  
>  
> //
> @@ -4880,10 +4875,8 @@ bnx_start(struct ifnet *ifp)
>   int used;
>   u_int16_t   tx_prod, tx_chain_prod;
>  
> - /* If there's no link or the transmit queue is empty then just exit. */
> - if (!sc->bnx_link || IFQ_IS_EMPTY(>if_snd)) {
> - DBPRINT(sc, BNX_INFO_SEND,
> - "%s(): No link or transmit queue empty.\n", __FUNCTION__);
> + if (!sc->bnx_link) {
> + ifq_purge(>if_snd);
>   goto bnx_start_exit;
>   }
>  
> Index: net/if.c
> ===
> RCS file: /cvs/src/sys/net/if.c,v
> retrieving revision 1.423
> diff -u -p 

ifq serialisation, was Re: taskctx and revisiting if_start serialisation

2015-12-08 Thread David Gwynne
On Sun, Dec 06, 2015 at 02:00:16PM +1000, David Gwynne wrote:
> the current code for serialising if_start calls for mpsafe nics does what it 
> says.

as per mpi@s suggestion, this makes the ifq code responsible for
the task serialisation.

all the machinery is there, but it provides a minimal step toward
coordination of the oactive flag. the only concession is if a driver
wants to clear oactive and run its start routine again it can call
ifq_restart() to have it serialised with the stacks calls to the
start routine.

if there's a desire for a driver to run all of its txeof serialised
with its start routine, we'll figure out the best way to provide
that when the problem really occurs.

if_start_barrier is now called ifq_barrier too.i

Index: dev/pci/if_myx.c
===
RCS file: /cvs/src/sys/dev/pci/if_myx.c,v
retrieving revision 1.90
diff -u -p -r1.90 if_myx.c
--- dev/pci/if_myx.c3 Dec 2015 12:45:56 -   1.90
+++ dev/pci/if_myx.c8 Dec 2015 11:31:09 -
@@ -1208,7 +1208,7 @@ myx_up(struct myx_softc *sc)
ifq_clr_oactive(>if_snd);
SET(ifp->if_flags, IFF_RUNNING);
myx_iff(sc);
-   myx_start(ifp);
+   if_start(ifp);
 
return;
 
@@ -1330,6 +1330,8 @@ myx_down(struct myx_softc *sc)
int  s;
int  ring;
 
+   CLR(ifp->if_flags, IFF_RUNNING);
+
bus_dmamap_sync(sc->sc_dmat, map, 0, map->dm_mapsize,
BUS_DMASYNC_POSTREAD|BUS_DMASYNC_POSTWRITE);
sc->sc_linkdown = sts->ms_linkdown;
@@ -1362,8 +1364,7 @@ myx_down(struct myx_softc *sc)
}
 
ifq_clr_oactive(>if_snd);
-   CLR(ifp->if_flags, IFF_RUNNING);
-   if_start_barrier(ifp);
+   ifq_barrier(>if_snd);
 
for (ring = 0; ring < 2; ring++) {
struct myx_rx_ring *mrr = >sc_rx_ring[ring];
@@ -1702,9 +1703,8 @@ myx_txeof(struct myx_softc *sc, u_int32_
sc->sc_tx_ring_cons = idx;
sc->sc_tx_cons = cons;
 
-   ifq_clr_oactive(>if_snd);
-   if (!ifq_empty(>if_snd))
-   if_start(ifp);
+   if (ifq_is_oactive(>if_snd))
+   ifq_restart(>if_snd);
 }
 
 void
Index: dev/pci/if_bnx.c
===
RCS file: /cvs/src/sys/dev/pci/if_bnx.c,v
retrieving revision 1.118
diff -u -p -r1.118 if_bnx.c
--- dev/pci/if_bnx.c5 Dec 2015 16:23:37 -   1.118
+++ dev/pci/if_bnx.c8 Dec 2015 11:31:09 -
@@ -871,6 +871,7 @@ bnx_attachhook(void *xsc)
ifp = >arpcom.ac_if;
ifp->if_softc = sc;
ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
+   ifp->if_xflags = IFXF_MPSAFE;
ifp->if_ioctl = bnx_ioctl;
ifp->if_start = bnx_start;
ifp->if_watchdog = bnx_watchdog;
@@ -4573,20 +4574,14 @@ bnx_tx_intr(struct bnx_softc *sc)
 
used = atomic_sub_int_nv(>used_tx_bd, freed);
 
+   sc->tx_cons = sw_tx_cons;
+
/* Clear the TX timeout timer. */
if (used == 0)
ifp->if_timer = 0;
 
-   /* Clear the tx hardware queue full flag. */
-   if (used < sc->max_tx_bd) {
-   DBRUNIF(ifq_is_oactive(>if_snd),
-   printf("%s: Open TX chain! %d/%d (used/total)\n",
-   sc->bnx_dev.dv_xname, used,
-   sc->max_tx_bd));
-   ifq_clr_oactive(>if_snd);
-   }
-
-   sc->tx_cons = sw_tx_cons;
+   if (ifq_is_oactive(>if_snd))
+   ifq_restart(>if_snd);
 }
 
 //
@@ -4880,10 +4875,8 @@ bnx_start(struct ifnet *ifp)
int used;
u_int16_t   tx_prod, tx_chain_prod;
 
-   /* If there's no link or the transmit queue is empty then just exit. */
-   if (!sc->bnx_link || IFQ_IS_EMPTY(>if_snd)) {
-   DBPRINT(sc, BNX_INFO_SEND,
-   "%s(): No link or transmit queue empty.\n", __FUNCTION__);
+   if (!sc->bnx_link) {
+   ifq_purge(>if_snd);
goto bnx_start_exit;
}
 
Index: net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.423
diff -u -p -r1.423 if.c
--- net/if.c8 Dec 2015 10:06:12 -   1.423
+++ net/if.c8 Dec 2015 11:31:09 -
@@ -1,4 +1,4 @@
-/* $OpenBSD: if.c,v 1.423 2015/12/08 10:06:12 dlg Exp $*/
+/* $OpenBSD: if.c,v 1.422 2015/12/05 10:07:55 tedu Exp $   */
 /* $NetBSD: if.c,v 1.35 1996/05/07 05:26:04 thorpej Exp $  */
 
 /*
@@ -153,7 +153,6 @@ voidif_input_process(void *);
 void   ifa_print_all(void);
 #endif
 
-void   if_start_mpsafe(struct ifnet *ifp);
 void   if_start_locked(struct ifnet *ifp);
 
 /*
@@ -512,7 +511,7 @@ if_attach_common(struct ifnet *ifp)
TAILQ_INIT(>if_addrlist);
TAILQ_INIT(>if_maddrlist);
 
-   ifq_init(>if_snd);
+   

Re: taskctx and revisiting if_start serialisation

2015-12-07 Thread David Gwynne

> On 7 Dec 2015, at 11:34 PM, Martin Pieuchot  wrote:
> 
> On 07/12/15(Mon) 19:37, David Gwynne wrote:
>> On Sun, Dec 06, 2015 at 03:51:26PM +0100, Martin Pieuchot wrote:
>>> On 06/12/15(Sun) 14:00, David Gwynne wrote:
 [...]
 the idea is you have a taskctx, which represents a serialising
 context for tasks. tasks are submitted to the taskctx, and the code
 will try to run the tasks immediately rather than defer them to a
 thread. if there is contention on the context, the contending cpu
 yields after queueing the task because the other cpu is responsible
 for running all pending tasks to completion.
 
 it also simplifies the barrier operations a lot.
 
 the diff below implements a generic taskctx framework, and cuts the
 mpsafe if_start() implementation over to it.
>>> 
>>> I'm not sure this should be implemented as a generic framework.  I'd
>>> suggest to keep it specific to if_start().  As you say above the least
>>> worst mechanism we have is currently taskqs, but maybe this could/will
>>> change?
>> 
>> im in two minds about where the ctx code should sit. i obviously
>> lean toward keeping it with taskqs, but i also cant come up with
>> another use case for it. i also havent tried very hard to think of
>> one.
> 
> Even if you have another use case for it I find this generic framework
> confusing.  It is build on top of taskq, but it's not part of the taskq
> API.  So putting the documentation in the same manual just confuse me.
> Your task API is simple and people use it easily.  I think it should
> stay like that.
> 
> Now we need a way to serialize start and txeof and you came up with a
> solution...
> 
>> are you asking if taskqs will get better, or if something better
>> than taskqs will come along?
> 
> ...I'm just saying that the "implementation" of your serialisation
> mechanism should not matter.  I'm fine with having a context for tasks
> but I'd suggest that this should be abstracted in a simple API instead
> of offering a framework to build upon.

ill tweak it in the morning.

> Could you make it such that you don't need a task in myx_softc?

sure.

the end extreme would be having an if_txeof() function in if.c that wraps a lot 
of this up.

however, i dont want to dictate that start and txeof have to be serialised. if 
you're careful it is possible to operate both the start and txeof side of a 
ring concurrently. the only coordination necessary is around the oactive 
handling.

if some drivers want to completely serialise both start and txeof they should 
be free to do, but if they want to make it as concurrent as possible then they 
should be able to what myx is doing and only coordinate oactive.

ill tinker with it tomorrow though.

> 
>>> I cannot understand what's happening by reading the myx(4) chunk itself.
>>> So I believe the interface needs to be polished.  Would it be possible
>>> to implement this serialization without introducing if_restart()?
>> 
>> that is by far my least favourite bit of this diff. in my defense,
>> i wrote it while my mother was trying to have a conversation with
>> me, so it didn't get my full attention. ive torn if_restart out and
>> implemented it completely in myx below.
>> 
 myx is also changed to only clr oactive from within the taskctx
 serialiser, thereby avoiding the race, but keeps the bulk of txeof
 outside the serialiser so it can run concurrently with start.
 
 other nics are free to serialise start and txeof within the
 ifq_serializer if they want, or not, it is up to them.
 
 thoughts? tests? opinions on messy .h files?
>>> 
>>> It appears to me that you have unrelated changes: if_enter/if_leave.
>> 
>> oops, yes. i suck at juggling diffs.  maybe we should call if.c
>> full and split it up ;)
> 
> I think we should start with that because the I'm afraid we're already
> cooking spaghetti.
> 
> My question is the barrier and mpsafe code is related to a queue or to
> an interface?  In other words does it make sense to serialize on a
> context in the sending queue?  What about multiple queues?

it's per queue.

ifnet and ifqueue are mixed together atm cos we've long had a 1:1 mapping 
between them in our stack. right now im trying to make the code mpsafe. keeping 
the api change simple means passing an ifp when an ifq is technically more 
correctly. making it work for multiple queues is something to do properly in 
the future.

ill try to be more careful about this now to avoid confusion though by at least 
keeping the data structures grouped properly.

dlg


Re: taskctx and revisiting if_start serialisation

2015-12-07 Thread Martin Pieuchot
On 07/12/15(Mon) 19:37, David Gwynne wrote:
> On Sun, Dec 06, 2015 at 03:51:26PM +0100, Martin Pieuchot wrote:
> > On 06/12/15(Sun) 14:00, David Gwynne wrote:
> > > [...]
> > > the idea is you have a taskctx, which represents a serialising
> > > context for tasks. tasks are submitted to the taskctx, and the code
> > > will try to run the tasks immediately rather than defer them to a
> > > thread. if there is contention on the context, the contending cpu
> > > yields after queueing the task because the other cpu is responsible
> > > for running all pending tasks to completion.
> > > 
> > > it also simplifies the barrier operations a lot.
> > > 
> > > the diff below implements a generic taskctx framework, and cuts the
> > > mpsafe if_start() implementation over to it.
> > 
> > I'm not sure this should be implemented as a generic framework.  I'd
> > suggest to keep it specific to if_start().  As you say above the least
> > worst mechanism we have is currently taskqs, but maybe this could/will
> > change?
> 
> im in two minds about where the ctx code should sit. i obviously
> lean toward keeping it with taskqs, but i also cant come up with
> another use case for it. i also havent tried very hard to think of
> one.

Even if you have another use case for it I find this generic framework
confusing.  It is build on top of taskq, but it's not part of the taskq
API.  So putting the documentation in the same manual just confuse me.
Your task API is simple and people use it easily.  I think it should
stay like that.

Now we need a way to serialize start and txeof and you came up with a
solution...

> are you asking if taskqs will get better, or if something better
> than taskqs will come along?

...I'm just saying that the "implementation" of your serialisation
mechanism should not matter.  I'm fine with having a context for tasks
but I'd suggest that this should be abstracted in a simple API instead
of offering a framework to build upon.

Could you make it such that you don't need a task in myx_softc?

> > I cannot understand what's happening by reading the myx(4) chunk itself.
> > So I believe the interface needs to be polished.  Would it be possible
> > to implement this serialization without introducing if_restart()?
> 
> that is by far my least favourite bit of this diff. in my defense,
> i wrote it while my mother was trying to have a conversation with
> me, so it didn't get my full attention. ive torn if_restart out and
> implemented it completely in myx below.
> 
> > > myx is also changed to only clr oactive from within the taskctx
> > > serialiser, thereby avoiding the race, but keeps the bulk of txeof
> > > outside the serialiser so it can run concurrently with start.
> > > 
> > > other nics are free to serialise start and txeof within the
> > > ifq_serializer if they want, or not, it is up to them.
> > >
> > > thoughts? tests? opinions on messy .h files?
> > 
> > It appears to me that you have unrelated changes: if_enter/if_leave.
> 
> oops, yes. i suck at juggling diffs.  maybe we should call if.c
> full and split it up ;)

I think we should start with that because the I'm afraid we're already
cooking spaghetti.

My question is the barrier and mpsafe code is related to a queue or to
an interface?  In other words does it make sense to serialize on a
context in the sending queue?  What about multiple queues?

> Index: sys/net/if.c
> ===
> RCS file: /cvs/src/sys/net/if.c,v
> retrieving revision 1.422
> diff -u -p -r1.422 if.c
> --- sys/net/if.c  5 Dec 2015 10:07:55 -   1.422
> +++ sys/net/if.c  7 Dec 2015 06:30:42 -
> @@ -153,8 +153,9 @@ void  if_input_process(void *);
>  void ifa_print_all(void);
>  #endif
>  
> -void if_start_mpsafe(struct ifnet *ifp);
> -void if_start_locked(struct ifnet *ifp);
> +void if_start_mpsafe(struct ifnet *);
> +void if_start_locked(struct ifnet *);
> +void if_start_task(void *);
>  
>  /*
>   * interface index map
> @@ -513,6 +514,7 @@ if_attach_common(struct ifnet *ifp)
>   TAILQ_INIT(>if_maddrlist);
>  
>   ifq_init(>if_snd);
> + task_set(>if_start_ctx, if_start_task, ifp);
>  
>   ifp->if_addrhooks = malloc(sizeof(*ifp->if_addrhooks),
>   M_TEMP, M_WAITOK);
> @@ -557,66 +559,29 @@ if_start_locked(struct ifnet *ifp)
>   KERNEL_UNLOCK();
>  }
>  
> -static inline unsigned int
> -ifq_enter(struct ifqueue *ifq)
> -{
> - return (atomic_inc_int_nv(>ifq_serializer) == 1);
> -}
> -
> -static inline unsigned int
> -ifq_leave(struct ifqueue *ifq)
> +void
> +if_start_mpsafe(struct ifnet *ifp)
>  {
> - if (atomic_cas_uint(>ifq_serializer, 1, 0) == 1)
> - return (1);
> -
> - ifq->ifq_serializer = 1;
> -
> - return (0);
> + task_dispatch(>if_snd.ifq_serializer, >if_start_ctx);
>  }
>  
>  void
> -if_start_mpsafe(struct ifnet *ifp)
> +if_start_task(void *p)
>  {
> - struct ifqueue *ifq = >if_snd;
> + struct ifnet *ifp = p;
>  
> - if 

Re: taskctx and revisiting if_start serialisation

2015-12-07 Thread David Gwynne
On Sun, Dec 06, 2015 at 03:51:26PM +0100, Martin Pieuchot wrote:
> On 06/12/15(Sun) 14:00, David Gwynne wrote:
> > the current code for serialising if_start calls for mpsafe nics does
> > what it says.
> > 
> > however, kettenis realised it doesnt help us much when we're trying
> > to coordinate between the start and txeof side of a driver when
> > setting or clearing oactive. in particular, a start routine can
> > figure out there's no more space, and then set oactive. txeof could
> > be running on another cpu emptying the ring and clearing it. if
> > that clear runs in between the other cpus space check and
> > ifq_set_oactive, then the nic will be marked full and the stack
> > wont ever call start on it again.
> > 
> > so it can be argued that start and txeof should be serialised.
> > indeed, other platforms do exactly that.
> > 
> > the least worst mechanism we have for doing that is taskqs. however,
> > all my experiments deferring start to a taskq end up significantly
> > hurting performance.
> > [...] 
> > while toying with ideas on how to solve kettenis' oactive problem,
> > i came up with the following.
> > 
> > it combines tasks with direct dispatch, and borrows the current
> > ifq_serialiser/pool/scsi serialisation algorithm.
> 
> I like the idea.

\o/

> > the idea is you have a taskctx, which represents a serialising
> > context for tasks. tasks are submitted to the taskctx, and the code
> > will try to run the tasks immediately rather than defer them to a
> > thread. if there is contention on the context, the contending cpu
> > yields after queueing the task because the other cpu is responsible
> > for running all pending tasks to completion.
> > 
> > it also simplifies the barrier operations a lot.
> > 
> > the diff below implements a generic taskctx framework, and cuts the
> > mpsafe if_start() implementation over to it.
> 
> I'm not sure this should be implemented as a generic framework.  I'd
> suggest to keep it specific to if_start().  As you say above the least
> worst mechanism we have is currently taskqs, but maybe this could/will
> change?

im in two minds about where the ctx code should sit. i obviously
lean toward keeping it with taskqs, but i also cant come up with
another use case for it. i also havent tried very hard to think of
one.

are you asking if taskqs will get better, or if something better
than taskqs will come along?

> I cannot understand what's happening by reading the myx(4) chunk itself.
> So I believe the interface needs to be polished.  Would it be possible
> to implement this serialization without introducing if_restart()?

that is by far my least favourite bit of this diff. in my defense,
i wrote it while my mother was trying to have a conversation with
me, so it didn't get my full attention. ive torn if_restart out and
implemented it completely in myx below.

> > myx is also changed to only clr oactive from within the taskctx
> > serialiser, thereby avoiding the race, but keeps the bulk of txeof
> > outside the serialiser so it can run concurrently with start.
> > 
> > other nics are free to serialise start and txeof within the
> > ifq_serializer if they want, or not, it is up to them.
> >
> > thoughts? tests? opinions on messy .h files?
> 
> It appears to me that you have unrelated changes: if_enter/if_leave.

oops, yes. i suck at juggling diffs.  maybe we should call if.c
full and split it up ;)

Index: share/man/man9/Makefile
===
RCS file: /cvs/src/share/man/man9/Makefile,v
retrieving revision 1.262
diff -u -p -r1.262 Makefile
--- share/man/man9/Makefile 25 Nov 2015 03:09:57 -  1.262
+++ share/man/man9/Makefile 7 Dec 2015 06:30:42 -
@@ -397,9 +397,14 @@ MLINKS+=systrace.9 systrace_redirect.9 \
systrace.9 systrace_fork.9 systrace.9 systrace_exit.9
 MLINKS+=task_add.9 taskq_create.9 \
task_add.9 taskq_destroy.9 \
+   task_add.9 taskq_barrier.9 \
task_add.9 task_set.9 \
task_add.9 task_del.9 \
-   task_add.9 TASK_INITIALIZER.9
+   task_add.9 TASK_INITIALIZER.9 \
+   task_add.9 taskctx_init.9 \
+   task_add.9 TASKCTX_INITIALIZER.9 \
+   task_add.9 taskctx_barrier.9 \
+   task_add.9 task_dispatch.9
 MLINKS+=time.9 boottime.9 time.9 mono_time.9 time.9 runtime.9
 MLINKS+=timeout.9 timeout_add.9 timeout.9 timeout_set.9 \
timeout.9 timeout_pending.9 timeout.9 timeout_del.9 \
Index: share/man/man9/task_add.9
===
RCS file: /cvs/src/share/man/man9/task_add.9,v
retrieving revision 1.16
diff -u -p -r1.16 task_add.9
--- share/man/man9/task_add.9   14 Sep 2015 15:14:55 -  1.16
+++ share/man/man9/task_add.9   7 Dec 2015 06:30:42 -
@@ -18,15 +18,23 @@
 .Dt TASK_ADD 9
 .Os
 .Sh NAME
+.Nm task_set ,
+.Nm TASK_INITIALIZER ,
 .Nm taskq_create ,
 .Nm taskq_destroy ,
-.Nm task_set ,
+.Nm taskq_barrier ,
 .Nm task_add ,
 .Nm task_del ,
-.Nm TASK_INITIALIZER

Re: taskctx and revisiting if_start serialisation

2015-12-07 Thread Hrvoje Popovski
On 6.12.2015. 15:56, Hrvoje Popovski wrote:
> On 6.12.2015. 5:00, David Gwynne wrote:
>> the current code for serialising if_start calls for mpsafe nics does what it 
>> says.
>>
>> however, kettenis realised it doesnt help us much when we're trying
>> to coordinate between the start and txeof side of a driver when
>> setting or clearing oactive. in particular, a start routine can
>> figure out there's no more space, and then set oactive. txeof could
>> be running on another cpu emptying the ring and clearing it. if
>> that clear runs in between the other cpus space check and
>> ifq_set_oactive, then the nic will be marked full and the stack
>> wont ever call start on it again.
>>
>> so it can be argued that start and txeof should be serialised.
>> indeed, other platforms do exactly that.
>>
>> the least worst mechanism we have for doing that is taskqs. however,
>> all my experiments deferring start to a taskq end up significantly
>> hurting performance.
>>
>> dragonfly appears to have some of the semantics we want. according
>> to sephe, start and txeof are serialised, but they can be directly
>> called from anywhere. however, if one cpu is trying to run start
>> while the other is in txeof, it figures it out and makes the other
>> cpu run txeof on the first cpus behalf. the first cpu then simply
>> returns cos it knows the other cpu will end up doing the work.
>>
>> the implementation is tied very much to that specific situation,
>> and its hard for me to grok cos im not familiar with their locking
>> infrastructure.
>>
>> the dfly code has the (slight) caveat that you cant run txeof and
>> start concurrently, it forces them to be serialised.
>>
>> while toying with ideas on how to solve kettenis' oactive problem,
>> i came up with the following.
>>
>> it combines tasks with direct dispatch, and borrows the current
>> ifq_serialiser/pool/scsi serialisation algorithm.
>>
>> the idea is you have a taskctx, which represents a serialising
>> context for tasks. tasks are submitted to the taskctx, and the code
>> will try to run the tasks immediately rather than defer them to a
>> thread. if there is contention on the context, the contending cpu
>> yields after queueing the task because the other cpu is responsible
>> for running all pending tasks to completion.
>>
>> it also simplifies the barrier operations a lot.
>>
>> the diff below implements a generic taskctx framework, and cuts the
>> mpsafe if_start() implementation over to it.
>>
>> myx is also changed to only clr oactive from within the taskctx
>> serialiser, thereby avoiding the race, but keeps the bulk of txeof
>> outside the serialiser so it can run concurrently with start.
>>
>> other nics are free to serialise start and txeof within the
>> ifq_serializer if they want, or not, it is up to them.
>>
>> thoughts? tests? opinions on messy .h files?
> 
> 
> Hi,
> 
> after applying this patches over cvs source from few hours (no
> additional patches for ix and em) it seems that something isn't right...
> 
> freshly rebooted box, sending 2Mpps over ix (82599) and i'm getting
> around 50kpps on receiver... over x540 around 100kpps


With latest patch 50kpps and 100kpps problem is gone.



Re: taskctx and revisiting if_start serialisation

2015-12-06 Thread Hrvoje Popovski
On 6.12.2015. 5:00, David Gwynne wrote:
> the current code for serialising if_start calls for mpsafe nics does what it 
> says.
> 
> however, kettenis realised it doesnt help us much when we're trying
> to coordinate between the start and txeof side of a driver when
> setting or clearing oactive. in particular, a start routine can
> figure out there's no more space, and then set oactive. txeof could
> be running on another cpu emptying the ring and clearing it. if
> that clear runs in between the other cpus space check and
> ifq_set_oactive, then the nic will be marked full and the stack
> wont ever call start on it again.
> 
> so it can be argued that start and txeof should be serialised.
> indeed, other platforms do exactly that.
> 
> the least worst mechanism we have for doing that is taskqs. however,
> all my experiments deferring start to a taskq end up significantly
> hurting performance.
> 
> dragonfly appears to have some of the semantics we want. according
> to sephe, start and txeof are serialised, but they can be directly
> called from anywhere. however, if one cpu is trying to run start
> while the other is in txeof, it figures it out and makes the other
> cpu run txeof on the first cpus behalf. the first cpu then simply
> returns cos it knows the other cpu will end up doing the work.
> 
> the implementation is tied very much to that specific situation,
> and its hard for me to grok cos im not familiar with their locking
> infrastructure.
> 
> the dfly code has the (slight) caveat that you cant run txeof and
> start concurrently, it forces them to be serialised.
> 
> while toying with ideas on how to solve kettenis' oactive problem,
> i came up with the following.
> 
> it combines tasks with direct dispatch, and borrows the current
> ifq_serialiser/pool/scsi serialisation algorithm.
> 
> the idea is you have a taskctx, which represents a serialising
> context for tasks. tasks are submitted to the taskctx, and the code
> will try to run the tasks immediately rather than defer them to a
> thread. if there is contention on the context, the contending cpu
> yields after queueing the task because the other cpu is responsible
> for running all pending tasks to completion.
> 
> it also simplifies the barrier operations a lot.
> 
> the diff below implements a generic taskctx framework, and cuts the
> mpsafe if_start() implementation over to it.
> 
> myx is also changed to only clr oactive from within the taskctx
> serialiser, thereby avoiding the race, but keeps the bulk of txeof
> outside the serialiser so it can run concurrently with start.
> 
> other nics are free to serialise start and txeof within the
> ifq_serializer if they want, or not, it is up to them.
> 
> thoughts? tests? opinions on messy .h files?


Hi,

after applying this patches over cvs source from few hours (no
additional patches for ix and em) it seems that something isn't right...

freshly rebooted box, sending 2Mpps over ix (82599) and i'm getting
around 50kpps on receiver... over x540 around 100kpps






Re: taskctx and revisiting if_start serialisation

2015-12-06 Thread Hrvoje Popovski
On 6.12.2015. 15:56, Hrvoje Popovski wrote:
> On 6.12.2015. 5:00, David Gwynne wrote:
>> the current code for serialising if_start calls for mpsafe nics does what it 
>> says.
>>
>> however, kettenis realised it doesnt help us much when we're trying
>> to coordinate between the start and txeof side of a driver when
>> setting or clearing oactive. in particular, a start routine can
>> figure out there's no more space, and then set oactive. txeof could
>> be running on another cpu emptying the ring and clearing it. if
>> that clear runs in between the other cpus space check and
>> ifq_set_oactive, then the nic will be marked full and the stack
>> wont ever call start on it again.
>>
>> so it can be argued that start and txeof should be serialised.
>> indeed, other platforms do exactly that.
>>
>> the least worst mechanism we have for doing that is taskqs. however,
>> all my experiments deferring start to a taskq end up significantly
>> hurting performance.
>>
>> dragonfly appears to have some of the semantics we want. according
>> to sephe, start and txeof are serialised, but they can be directly
>> called from anywhere. however, if one cpu is trying to run start
>> while the other is in txeof, it figures it out and makes the other
>> cpu run txeof on the first cpus behalf. the first cpu then simply
>> returns cos it knows the other cpu will end up doing the work.
>>
>> the implementation is tied very much to that specific situation,
>> and its hard for me to grok cos im not familiar with their locking
>> infrastructure.
>>
>> the dfly code has the (slight) caveat that you cant run txeof and
>> start concurrently, it forces them to be serialised.
>>
>> while toying with ideas on how to solve kettenis' oactive problem,
>> i came up with the following.
>>
>> it combines tasks with direct dispatch, and borrows the current
>> ifq_serialiser/pool/scsi serialisation algorithm.
>>
>> the idea is you have a taskctx, which represents a serialising
>> context for tasks. tasks are submitted to the taskctx, and the code
>> will try to run the tasks immediately rather than defer them to a
>> thread. if there is contention on the context, the contending cpu
>> yields after queueing the task because the other cpu is responsible
>> for running all pending tasks to completion.
>>
>> it also simplifies the barrier operations a lot.
>>
>> the diff below implements a generic taskctx framework, and cuts the
>> mpsafe if_start() implementation over to it.
>>
>> myx is also changed to only clr oactive from within the taskctx
>> serialiser, thereby avoiding the race, but keeps the bulk of txeof
>> outside the serialiser so it can run concurrently with start.
>>
>> other nics are free to serialise start and txeof within the
>> ifq_serializer if they want, or not, it is up to them.
>>
>> thoughts? tests? opinions on messy .h files?
> 
> 
> Hi,
> 
> after applying this patches over cvs source from few hours (no
> additional patches for ix and em) it seems that something isn't right...
> 
> freshly rebooted box, sending 2Mpps over ix (82599) and i'm getting
> around 50kpps on receiver... over x540 around 100kpps


on other box i have bge (5270) and everything seems fine... sending
1Mpps and getting cca 650kpps




Re: taskctx and revisiting if_start serialisation

2015-12-06 Thread Norman Golisz
Hi,

I can't comment on the code itself, but there's

>  The
>  taskq
>  API provides a mechanism to defer work to a process context.
>  .Pp
> +The
> +taskctx
> +API provides a mechanism to serialise work in a single context.
> +A taskctx guarantees that all work submitted to it will not run
> +concurrently and can therefore provide exclusive access to a resource.
> +It attempts to run the submitted work immediately, unless another
> +another CPU is already running work in the taskctx.
   ^^^
one "another" too many.