> On 14 Nov 2017, at 23:17, Martin Pieuchot <[email protected]> wrote:
> 
> On 14/11/17(Tue) 14:42, David Gwynne wrote:
>> this replaces the single mbuf_queue and task in struct ifnet with
>> a new ifiqueue structure modelled on ifqueues.
> 
> The name is confusing, should we rename ifqueues 'ifoqueue' then?

yes.

> 
>> the main motivation behind this was to show mpsafe input counters.
> 
> I like that you're sharing the whole direction where you're going.
> Nice to know we don't need to address input counters :)
> 
>> ifiqueues, like ifqueues, allow a driver to configure multiple
>> queueus. this in turn allows a driver with multiple rx rings to
>> have them queue and account for packets independently.
> 
> How does a driver decide to use multiple queues?  Aren't we missing
> the interrupt on !CPU0 bits?  Once that's done how do we split traffic?

i dont know how to answer how a driver decides to use multiple queues, but i 
can show how it can use them once the decision is made. something like this:

        if_attach_queues(ifp, sc->num_queues);
        if_attach_iqueues(ifp, sc->num_queues);
        for (i = 0; i < sc->num_queues; i++) {
                struct ifqueue *ifq = ifp->if_ifqs[i];
                struct tx_ring *txr = &sc->tx_rings[i];
                struct ifiqueue *ifiq = ifp->if_iqs[i];
                struct rx_ring *rxr = &sc->rx_rings[i];

                ifq->ifq_softc = txr;
                txr->ifq = ifq;

                ifiq->ifiq_softc = rxr;
                rxr->ifiq = ifiq;
        }


> 
>> ifiq_input generally replaces if_input. if_input now simply queues
>> on ifnets builtin ifiqueue (ifp->if_rcv) to support the current set
>> of drivers that only configure a single rx context/ring.
> 
> So we always use at least `ifp->if_rcv'.  Can't we add an argument to
> if_attach_common() rather than adding a new interface?

do you want to make if_rcv optional rather than unconditional?

> 
>> ifiq counters are updated and read using the same semantic as percpu
>> counters, ie, there's a generation number updated before and after
>> a counter update. this means writers dont block, but a reader may
>> loop a couple of times waiting for a writer to finish.
> 
> Why do we need yet another copy of counters_enter()/leave()?  Something
> feels wrong.  I *know* it is not per-CPU memory, but I wish we could do
> better.  Otherwise we will end up with per data structure counter API.

i got rid of this in the next diff i sent.

if you're saying it would be nice to factor generation based counter 
updates/reads out, i agree. apart from this there hasn't been a need for it 
though.

> 
>> loop a couple of times waiting for a writer to finish. readers call
>> yield(), which causes splasserts to fire if if_getdata is still
>> holding the kernel lock.
> 
> You mean the NET_LOCK(), well this should be easily removed.  tb@ has a
> diff that should go in then you can remove the NET_LOCK()/UNLOCK() dance
> around if_getdata().

it's moving to^W^R^R^Red to RLOCK?

> 
>> ifiq_input is set up to interact with the interface rx ring moderation
>> code (ie, if_rxring code). you pass what the current rxr watermark
>> is, and it will look at the backlog of packets on that ifiq to
>> determine if the ring is producing too fast. if it is, itll return
>> 1 to slow down packet reception. i have a later diff that adds
>> functionality to the if_rxring bits so a driver can say say a ring
>> is livelocked, rather than relying on the system to detect it. so
>> drv_rxeof would have the following at the end of it:
>> 
>>      if (ifiq_input(rxr->ifiq, &ml, if_rxr_cwm(&rxr->rx_ring))
>>              if_rxr_livelocked(&rxr->rx_ring);
>>      drv_rx_refill(rxr);
> 
> This magic it worth discussing in a diff doing only that :)

ill send if_rxr_livelocked and if_rxr_cwm out today.

> 
>> ive run with that on a hacked up ix(4) that runs with 8 rx rings,
>> and this works pretty well. if you're doing a single stream, you
>> see one rx ring grow the number of descs it will handle, but if you
>> flood it with a range of traffic you'll see that one ring scale
>> down and balance out with the rest of the rings. it turns out you
>> can still get reasonable throughput even if the ifiqs are dynamically
>> scaling themselves to only 100 packets. however, the interactivity
>> of the system improves a lot.
> 
> So you're now introducing MCLGETI(9) back with support for multiple
> ring.  Nice.

yes :D :D

> 
>> currently if one interface is being DoSed, it'll end up with 8192
>> packet on if_inputqueue. that takes a while to process those packets,
>> which blocks the processing of packets from the other interface.
>> by scaling the input queues to relatively small counts, softnet can
>> service packets frmo other interfaces sooner.
> 
> I see a lot of good stuff in this diff.  I like your direction.  I'm
> not sure how/when you plan to add support for multiple input rings.
> 
> Now I'm afraid of a single diff doing refactoring + iqueues + counter +
> MCLGETI :o)

the update to this does not include counters, and rxring moderation is 
implemented outside this.

dlg

> 
>> 
>> Index: if.c
>> ===================================================================
>> RCS file: /cvs/src/sys/net/if.c,v
>> retrieving revision 1.527
>> diff -u -p -r1.527 if.c
>> --- if.c     14 Nov 2017 04:08:11 -0000      1.527
>> +++ if.c     14 Nov 2017 04:18:41 -0000
>> @@ -156,7 +156,6 @@ int      if_group_egress_build(void);
>> 
>> void if_watchdog_task(void *);
>> 
>> -void        if_input_process(void *);
>> void if_netisr(void *);
>> 
>> #ifdef DDB
>> @@ -437,8 +436,6 @@ if_attachsetup(struct ifnet *ifp)
>> 
>>      ifidx = ifp->if_index;
>> 
>> -    mq_init(&ifp->if_inputqueue, 8192, IPL_NET);
>> -    task_set(ifp->if_inputtask, if_input_process, (void *)ifidx);
>>      task_set(ifp->if_watchdogtask, if_watchdog_task, (void *)ifidx);
>>      task_set(ifp->if_linkstatetask, if_linkstate_task, (void *)ifidx);
>> 
>> @@ -563,6 +560,30 @@ if_attach_queues(struct ifnet *ifp, unsi
>> }
>> 
>> void
>> +if_attach_iqueues(struct ifnet *ifp, unsigned int niqs)
>> +{
>> +    struct ifiqueue **map;
>> +    struct ifiqueue *ifiq;
>> +    unsigned int i;
>> +
>> +    KASSERT(niqs != 0);
>> +
>> +    map = mallocarray(niqs, sizeof(*map), M_DEVBUF, M_WAITOK);
>> +
>> +    ifp->if_rcv.ifiq_softc = NULL;
>> +    map[0] = &ifp->if_rcv;
>> +
>> +    for (i = 1; i < niqs; i++) {
>> +            ifiq = malloc(sizeof(*ifiq), M_DEVBUF, M_WAITOK|M_ZERO);
>> +            ifiq_init(ifiq, ifp, i);
>> +            map[i] = ifiq;
>> +    }
>> +
>> +    ifp->if_iqs = map;
>> +    ifp->if_niqs = niqs;
>> +}
>> +
>> +void
>> if_attach_common(struct ifnet *ifp)
>> {
>>      KASSERT(ifp->if_ioctl != NULL);
>> @@ -587,6 +608,12 @@ if_attach_common(struct ifnet *ifp)
>>      ifp->if_ifqs = ifp->if_snd.ifq_ifqs;
>>      ifp->if_nifqs = 1;
>> 
>> +    ifiq_init(&ifp->if_rcv, ifp, 0);
>> +
>> +    ifp->if_rcv.ifiq_ifiqs[0] = &ifp->if_rcv;
>> +    ifp->if_iqs = ifp->if_rcv.ifiq_ifiqs;
>> +    ifp->if_niqs = 1;
>> +
>>      ifp->if_addrhooks = malloc(sizeof(*ifp->if_addrhooks),
>>          M_TEMP, M_WAITOK);
>>      TAILQ_INIT(ifp->if_addrhooks);
>> @@ -605,8 +632,6 @@ if_attach_common(struct ifnet *ifp)
>>          M_TEMP, M_WAITOK|M_ZERO);
>>      ifp->if_linkstatetask = malloc(sizeof(*ifp->if_linkstatetask),
>>          M_TEMP, M_WAITOK|M_ZERO);
>> -    ifp->if_inputtask = malloc(sizeof(*ifp->if_inputtask),
>> -        M_TEMP, M_WAITOK|M_ZERO);
>>      ifp->if_llprio = IFQ_DEFPRIO;
>> 
>>      SRPL_INIT(&ifp->if_inputs);
>> @@ -694,47 +719,7 @@ if_enqueue(struct ifnet *ifp, struct mbu
>> void
>> if_input(struct ifnet *ifp, struct mbuf_list *ml)
>> {
>> -    struct mbuf *m;
>> -    size_t ibytes = 0;
>> -#if NBPFILTER > 0
>> -    caddr_t if_bpf;
>> -#endif
>> -
>> -    if (ml_empty(ml))
>> -            return;
>> -
>> -    MBUF_LIST_FOREACH(ml, m) {
>> -            m->m_pkthdr.ph_ifidx = ifp->if_index;
>> -            m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
>> -            ibytes += m->m_pkthdr.len;
>> -    }
>> -
>> -    ifp->if_ipackets += ml_len(ml);
>> -    ifp->if_ibytes += ibytes;
>> -
>> -#if NBPFILTER > 0
>> -    if_bpf = ifp->if_bpf;
>> -    if (if_bpf) {
>> -            struct mbuf_list ml0;
>> -
>> -            ml_init(&ml0);
>> -            ml_enlist(&ml0, ml);
>> -            ml_init(ml);
>> -
>> -            while ((m = ml_dequeue(&ml0)) != NULL) {
>> -                    if (bpf_mtap_ether(if_bpf, m, BPF_DIRECTION_IN))
>> -                            m_freem(m);
>> -                    else
>> -                            ml_enqueue(ml, m);
>> -            }
>> -
>> -            if (ml_empty(ml))
>> -                    return;
>> -    }
>> -#endif
>> -
>> -    if (mq_enlist(&ifp->if_inputqueue, ml) == 0)
>> -            task_add(net_tq(ifp->if_index), ifp->if_inputtask);
>> +    ifiq_input(&ifp->if_rcv, ml, 2048);
>> }
>> 
>> int
>> @@ -789,6 +774,24 @@ if_input_local(struct ifnet *ifp, struct
>>      return (0);
>> }
>> 
>> +int
>> +if_output_local(struct ifnet *ifp, struct mbuf *m, sa_family_t af)
>> +{
>> +    struct ifiqueue *ifiq;
>> +    unsigned int flow = 0;
>> +
>> +    m->m_pkthdr.ph_family = af;
>> +    m->m_pkthdr.ph_ifidx = ifp->if_index;
>> +    m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
>> +
>> +    if (ISSET(m->m_pkthdr.ph_flowid, M_FLOWID_VALID))
>> +            flow = m->m_pkthdr.ph_flowid & M_FLOWID_MASK;
>> +
>> +    ifiq = ifp->if_iqs[flow % ifp->if_niqs];
>> +
>> +    return (ifiq_enqueue(ifiq, m) == 0 ? 0 : ENOBUFS);
>> +}
>> +
>> struct ifih {
>>      SRPL_ENTRY(ifih)          ifih_next;
>>      int                     (*ifih_input)(struct ifnet *, struct mbuf *,
>> @@ -873,26 +876,18 @@ if_ih_remove(struct ifnet *ifp, int (*in
>> }
>> 
>> void
>> -if_input_process(void *xifidx)
>> +if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
>> {
>> -    unsigned int ifidx = (unsigned long)xifidx;
>> -    struct mbuf_list ml;
>>      struct mbuf *m;
>> -    struct ifnet *ifp;
>>      struct ifih *ifih;
>>      struct srp_ref sr;
>>      int s;
>> 
>> -    ifp = if_get(ifidx);
>> -    if (ifp == NULL)
>> +    if (ml_empty(ml))
>>              return;
>> 
>> -    mq_delist(&ifp->if_inputqueue, &ml);
>> -    if (ml_empty(&ml))
>> -            goto out;
>> -
>>      if (!ISSET(ifp->if_xflags, IFXF_CLONED))
>> -            add_net_randomness(ml_len(&ml));
>> +            add_net_randomness(ml_len(ml));
>> 
>>      /*
>>       * We grab the NET_LOCK() before processing any packet to
>> @@ -908,7 +903,7 @@ if_input_process(void *xifidx)
>>       */
>>      NET_RLOCK();
>>      s = splnet();
>> -    while ((m = ml_dequeue(&ml)) != NULL) {
>> +    while ((m = ml_dequeue(ml)) != NULL) {
>>              /*
>>               * Pass this mbuf to all input handlers of its
>>               * interface until it is consumed.
>> @@ -924,8 +919,6 @@ if_input_process(void *xifidx)
>>      }
>>      splx(s);
>>      NET_RUNLOCK();
>> -out:
>> -    if_put(ifp);
>> }
>> 
>> void
>> @@ -1033,10 +1026,6 @@ if_detach(struct ifnet *ifp)
>>      ifp->if_ioctl = if_detached_ioctl;
>>      ifp->if_watchdog = NULL;
>> 
>> -    /* Remove the input task */
>> -    task_del(net_tq(ifp->if_index), ifp->if_inputtask);
>> -    mq_purge(&ifp->if_inputqueue);
>> -
>>      /* Remove the watchdog timeout & task */
>>      timeout_del(ifp->if_slowtimo);
>>      task_del(net_tq(ifp->if_index), ifp->if_watchdogtask);
>> @@ -1090,7 +1079,6 @@ if_detach(struct ifnet *ifp)
>>      free(ifp->if_slowtimo, M_TEMP, sizeof(*ifp->if_slowtimo));
>>      free(ifp->if_watchdogtask, M_TEMP, sizeof(*ifp->if_watchdogtask));
>>      free(ifp->if_linkstatetask, M_TEMP, sizeof(*ifp->if_linkstatetask));
>> -    free(ifp->if_inputtask, M_TEMP, sizeof(*ifp->if_inputtask));
>> 
>>      for (i = 0; (dp = domains[i]) != NULL; i++) {
>>              if (dp->dom_ifdetach && ifp->if_afdata[dp->dom_family])
>> @@ -1113,6 +1101,17 @@ if_detach(struct ifnet *ifp)
>>              free(ifp->if_ifqs, M_DEVBUF,
>>                  sizeof(struct ifqueue *) * ifp->if_nifqs);
>>      }
>> +
>> +    for (i = 0; i < ifp->if_niqs; i++)
>> +            ifiq_destroy(ifp->if_iqs[i]);
>> +    if (ifp->if_iqs != ifp->if_rcv.ifiq_ifiqs) {
>> +            for (i = 1; i < ifp->if_niqs; i++) {
>> +                    free(ifp->if_iqs[i], M_DEVBUF,
>> +                        sizeof(struct ifiqueue));
>> +            }
>> +            free(ifp->if_iqs, M_DEVBUF,
>> +                sizeof(struct ifiqueue *) * ifp->if_niqs);
>> +    }
>> }
>> 
>> /*
>> @@ -2280,11 +2279,21 @@ if_getdata(struct ifnet *ifp, struct if_
>> 
>>      *data = ifp->if_data;
>> 
>> +    NET_UNLOCK();
>> +
>>      for (i = 0; i < ifp->if_nifqs; i++) {
>>              struct ifqueue *ifq = ifp->if_ifqs[i];
>> 
>>              ifq_add_data(ifq, data);
>>      }
>> +
>> +    for (i = 0; i < ifp->if_niqs; i++) {
>> +            struct ifiqueue *ifiq = ifp->if_iqs[i];
>> +
>> +            ifiq_add_data(ifiq, data);
>> +    }
>> +
>> +    NET_LOCK();
>> }
>> 
>> /*
>> Index: if_var.h
>> ===================================================================
>> RCS file: /cvs/src/sys/net/if_var.h,v
>> retrieving revision 1.83
>> diff -u -p -r1.83 if_var.h
>> --- if_var.h 31 Oct 2017 22:05:12 -0000      1.83
>> +++ if_var.h 14 Nov 2017 04:18:41 -0000
>> @@ -140,8 +140,6 @@ struct ifnet {                           /* and the 
>> entries */
>>      struct  task *if_linkstatetask; /* task to do route updates */
>> 
>>      /* procedure handles */
>> -    struct mbuf_queue if_inputqueue;
>> -    struct task *if_inputtask;      /* input task */
>>      SRPL_HEAD(, ifih) if_inputs;    /* input routines (dequeue) */
>> 
>>                                      /* output routine (enqueue) */
>> @@ -164,6 +162,10 @@ struct ifnet {                          /* and the 
>> entries */
>>      void    (*if_qstart)(struct ifqueue *);
>>      unsigned int if_nifqs;
>> 
>> +    struct  ifiqueue if_rcv;        /* rx/input queue */
>> +    struct  ifiqueue **if_iqs;      /* pointer to the array of iqs */
>> +    unsigned int if_niqs;
>> +
>>      struct sockaddr_dl *if_sadl;    /* pointer to our sockaddr_dl */
>> 
>>      void    *if_afdata[AF_MAX];
>> @@ -303,7 +305,9 @@ void     if_start(struct ifnet *);
>> int  if_enqueue_try(struct ifnet *, struct mbuf *);
>> int  if_enqueue(struct ifnet *, struct mbuf *);
>> void if_input(struct ifnet *, struct mbuf_list *);
>> +void        if_input_process(struct ifnet *, struct mbuf_list *);
>> int  if_input_local(struct ifnet *, struct mbuf *, sa_family_t);
>> +int if_output_local(struct ifnet *, struct mbuf *, sa_family_t);
>> void if_rtrequest_dummy(struct ifnet *, int, struct rtentry *);
>> void p2p_rtrequest(struct ifnet *, int, struct rtentry *);
>> 
>> Index: ifq.c
>> ===================================================================
>> RCS file: /cvs/src/sys/net/ifq.c,v
>> retrieving revision 1.14
>> diff -u -p -r1.14 ifq.c
>> --- ifq.c    14 Nov 2017 04:08:11 -0000      1.14
>> +++ ifq.c    14 Nov 2017 04:18:41 -0000
>> @@ -16,15 +16,22 @@
>>  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
>>  */
>> 
>> +#include "bpfilter.h"
>> +
>> #include <sys/param.h>
>> #include <sys/systm.h>
>> #include <sys/socket.h>
>> #include <sys/mbuf.h>
>> #include <sys/proc.h>
>> +#include <sys/atomic.h>
>> 
>> #include <net/if.h>
>> #include <net/if_var.h>
>> 
>> +#if NBPFILTER > 0
>> +#include <net/bpf.h>
>> +#endif
>> +
>> /*
>>  * priq glue
>>  */
>> @@ -457,6 +464,223 @@ ifq_mfreeml(struct ifqueue *ifq, struct 
>>      ifq->ifq_len -= ml_len(ml);
>>      ifq->ifq_qdrops += ml_len(ml);
>>      ml_enlist(&ifq->ifq_free, ml);
>> +}
>> +
>> +/*
>> + * ifiq
>> + */
>> +
>> +struct ifiq_ref {
>> +    unsigned int gen;
>> +}; /* __upunused */
>> +
>> +static void ifiq_process(void *);
>> +
>> +void
>> +ifiq_init(struct ifiqueue *ifiq, struct ifnet *ifp, unsigned int idx)
>> +{
>> +    ifiq->ifiq_if = ifp;
>> +    ifiq->ifiq_softnet = net_tq(ifp->if_index); /* + idx */
>> +    ifiq->ifiq_softc = NULL;
>> +
>> +    mtx_init(&ifiq->ifiq_mtx, IPL_NET);
>> +    ml_init(&ifiq->ifiq_ml);
>> +    task_set(&ifiq->ifiq_task, ifiq_process, ifiq);
>> +
>> +    ifiq->ifiq_qdrops = 0;
>> +    ifiq->ifiq_packets = 0;
>> +    ifiq->ifiq_bytes = 0;
>> +    ifiq->ifiq_qdrops = 0;
>> +    ifiq->ifiq_errors = 0;
>> +
>> +    ifiq->ifiq_idx = idx;
>> +}
>> +
>> +void
>> +ifiq_destroy(struct ifiqueue *ifiq)
>> +{
>> +    if (!task_del(ifiq->ifiq_softnet, &ifiq->ifiq_task)) {
>> +            int netlocked = (rw_status(&netlock) == RW_WRITE);
>> +
>> +            if (netlocked) /* XXXSMP breaks atomicity */
>> +                    NET_UNLOCK();
> 
> This isn't call with the NET_LOCK() held. 
> 
>> +
>> +            taskq_barrier(ifiq->ifiq_softnet);
>> +
>> +            if (netlocked)
>> +                    NET_LOCK();
>> +    }
>> +
>> +    /* don't need to lock because this is the last use of the ifiq */
>> +    ml_purge(&ifiq->ifiq_ml);
>> +}
>> +
>> +static inline void
>> +ifiq_enter(struct ifiq_ref *ref, struct ifiqueue *ifiq)
>> +{
>> +    ref->gen = ++ifiq->ifiq_gen;
>> +    membar_producer();
>> +}
>> +
>> +static inline void
>> +ifiq_leave(struct ifiq_ref *ref, struct ifiqueue *ifiq)
>> +{
>> +    membar_producer();
>> +    ifiq->ifiq_gen = ++ref->gen;
>> +}
> :1
>> +
>> +static inline void
>> +ifiq_count(struct ifiqueue *ifiq, uint64_t packets, uint64_t bytes)
>> +{
>> +    struct ifiq_ref ref;
>> +
>> +    ifiq_enter(&ref, ifiq);
>> +    ifiq->ifiq_packets += packets;
>> +    ifiq->ifiq_bytes += bytes;
>> +    ifiq_leave(&ref, ifiq);
>> +}
>> +
>> +static inline void
>> +ifiq_qdrop(struct ifiqueue *ifiq, struct mbuf_list *ml)
>> +{
>> +    struct ifiq_ref ref;
>> +    unsigned int qdrops;
>> +
>> +    qdrops = ml_purge(ml);
>> +
>> +    ifiq_enter(&ref, ifiq);
>> +    ifiq->ifiq_qdrops += qdrops;
>> +    ifiq_leave(&ref, ifiq);
>> +}
>> +
>> +int
>> +ifiq_input(struct ifiqueue *ifiq, struct mbuf_list *ml, unsigned int cwm)
>> +{
>> +    struct ifnet *ifp = ifiq->ifiq_if;
>> +    struct mbuf *m;
>> +    uint64_t bytes = 0;
>> +#if NBPFILTER > 0
>> +    caddr_t if_bpf;
>> +#endif
>> +    int rv;
>> +
>> +    if (ml_empty(ml))
>> +            return (0);
>> +
>> +    MBUF_LIST_FOREACH(ml, m) {
>> +            m->m_pkthdr.ph_ifidx = ifp->if_index;
>> +            m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
>> +            bytes += m->m_pkthdr.len;
>> +    }
>> +
>> +    ifiq_count(ifiq, ml_len(ml), bytes);
> 
> No need to add a function which is used only once.
> 
>> +
>> +#if NBPFILTER > 0
>> +    if_bpf = ifp->if_bpf;
>> +    if (if_bpf) {
>> +            struct mbuf_list ml0 = *ml;
>> +
>> +            ml_init(ml);
>> +
>> +            while ((m = ml_dequeue(&ml0)) != NULL) {
>> +                    if (bpf_mtap_ether(if_bpf, m, BPF_DIRECTION_IN))
>> +                            m_freem(m);
>> +                    else
>> +                            ml_enqueue(ml, m);
>> +            }
>> +
>> +            if (ml_empty(ml))
>> +                    return (0);
>> +    }
>> +#endif
>> +
>> +    if (ifiq_len(ifiq) >= cwm * 5) {
>> +            /* the backlock is way too high, so drop these packets */
>> +            ifiq_qdrop(ifiq, ml);
> 
> This function is also used only once.
> 
>> +            return (1); /* tell the caller to slow down */
>> +    }
>> +
>> +    /* tell the caller to slow down if the backlock is getting high */
>> +    rv = (ifiq_len(ifiq) >= cwm * 3);
>> +
>> +    mtx_enter(&ifiq->ifiq_mtx);
>> +    ml_enlist(&ifiq->ifiq_ml, ml);
>> +    mtx_leave(&ifiq->ifiq_mtx);
>> +
>> +    task_add(ifiq->ifiq_softnet, &ifiq->ifiq_task);
>> +
>> +    return (rv);
>> +}
>> +
>> +void
>> +ifiq_add_data(struct ifiqueue *ifiq, struct if_data *data)
>> +{
>> +    unsigned int enter, leave;
>> +    uint64_t packets, bytes, qdrops;
>> +
>> +    enter = ifiq->ifiq_gen;
>> +    for (;;) {
>> +            /* the generation number is odd during an update */
>> +            while (enter & 1) {
>> +                    yield();
>> +                    enter = ifiq->ifiq_gen;
>> +            }
>> +
>> +            membar_consumer();
>> +            packets = ifiq->ifiq_packets;
>> +            bytes = ifiq->ifiq_bytes;
>> +            qdrops = ifiq->ifiq_qdrops;
>> +            membar_consumer();
>> +
>> +            leave = ifiq->ifiq_gen;
>> +
>> +            if (enter == leave)
>> +                    break;
>> +
>> +            enter = leave;
>> +    }
>> +
>> +    data->ifi_ipackets += packets;
>> +    data->ifi_ibytes += bytes;
>> +    data->ifi_iqdrops += qdrops;
>> +}
>> +
>> +void
>> +ifiq_barrier(struct ifiqueue *ifiq)
>> +{
>> +    if (!task_del(ifiq->ifiq_softnet, &ifiq->ifiq_task))
>> +            taskq_barrier(ifiq->ifiq_softnet);
>> +}
>> +
>> +int
>> +ifiq_enqueue(struct ifiqueue *ifiq, struct mbuf *m)
>> +{
>> +    /* this can be called from anywhere at any time, so must lock */
>> +
>> +    mtx_enter(&ifiq->ifiq_mtx);
>> +    ml_enqueue(&ifiq->ifiq_ml, m);
>> +    mtx_leave(&ifiq->ifiq_mtx);
>> +
>> +    task_add(ifiq->ifiq_softnet, &ifiq->ifiq_task);
>> +
>> +    return (0);
>> +}
>> +
>> +static void
>> +ifiq_process(void *arg)
>> +{
>> +    struct ifiqueue *ifiq = arg;
>> +    struct mbuf_list ml;
>> +
>> +    if (ifiq_empty(ifiq))
>> +            return;
>> +
>> +    mtx_enter(&ifiq->ifiq_mtx);
>> +    ml = ifiq->ifiq_ml;
>> +    ml_init(&ifiq->ifiq_ml);
>> +    mtx_leave(&ifiq->ifiq_mtx);
>> +
>> +    if_input_process(ifiq->ifiq_if, &ml);
>> }
>> 
>> /*
>> Index: ifq.h
>> ===================================================================
>> RCS file: /cvs/src/sys/net/ifq.h,v
>> retrieving revision 1.15
>> diff -u -p -r1.15 ifq.h
>> --- ifq.h    14 Nov 2017 04:08:11 -0000      1.15
>> +++ ifq.h    14 Nov 2017 04:18:41 -0000
>> @@ -69,6 +69,34 @@ struct ifqueue {
>>      unsigned int             ifq_idx;
>> };
>> 
>> +struct ifiqueue {
>> +    struct ifnet            *ifiq_if;
>> +    struct taskq            *ifiq_softnet;
>> +    union {
>> +            void                    *_ifiq_softc;
>> +            struct ifiqueue         *_ifiq_ifiqs[1];
>> +    } _ifiq_ptr;
>> +#define ifiq_softc           _ifiq_ptr._ifiq_softc
>> +#define ifiq_ifiqs           _ifiq_ptr._ifiq_ifiqs
>> +
>> +    struct mutex             ifiq_mtx;
>> +    struct mbuf_list         ifiq_ml;
>> +    struct task              ifiq_task;
>> +
>> +    /* counters */
>> +    unsigned int             ifiq_gen;
>> +
>> +    uint64_t                 ifiq_packets;
>> +    uint64_t                 ifiq_bytes;
>> +    uint64_t                 ifiq_qdrops;
>> +    uint64_t                 ifiq_errors;
>> +    uint64_t                 ifiq_mcasts;
>> +    uint64_t                 ifiq_noproto;
>> +
>> +    /* properties */
>> +    unsigned int             ifiq_idx;
>> +};
>> +
>> #ifdef _KERNEL
>> 
>> #define IFQ_MAXLEN           256
>> @@ -432,6 +460,18 @@ ifq_idx(struct ifqueue *ifq, unsigned in
>> #define IFQ_ASSERT_SERIALIZED(_ifq)  KASSERT(ifq_is_serialized(_ifq))
>> 
>> extern const struct ifq_ops * const ifq_priq_ops;
>> +
>> +/* ifiq */
>> +
>> +void                 ifiq_init(struct ifiqueue *, struct ifnet *, unsigned 
>> int);
>> +void                 ifiq_destroy(struct ifiqueue *);
>> +int          ifiq_input(struct ifiqueue *, struct mbuf_list *,
>> +                 unsigned int);
>> +int          ifiq_enqueue(struct ifiqueue *, struct mbuf *);
>> +void                 ifiq_add_data(struct ifiqueue *, struct if_data *);
>> +
>> +#define     ifiq_len(_ifiq)                 ml_len(&(_ifiq)->ifiq_ml)
>> +#define     ifiq_empty(_ifiq)               ml_empty(&(_ifiq)->ifiq_ml)
>> 
>> #endif /* _KERNEL */
>> 
>> Index: if_loop.c
>> ===================================================================
>> RCS file: /cvs/src/sys/net/if_loop.c,v
>> retrieving revision 1.83
>> diff -u -p -r1.83 if_loop.c
>> --- if_loop.c        31 Oct 2017 22:05:12 -0000      1.83
>> +++ if_loop.c        14 Nov 2017 04:18:41 -0000
>> @@ -241,12 +241,7 @@ looutput(struct ifnet *ifp, struct mbuf 
>>      if ((m->m_flags & M_LOOP) == 0)
>>              return (if_input_local(ifp, m, dst->sa_family));
>> 
>> -    m->m_pkthdr.ph_family = dst->sa_family;
>> -    if (mq_enqueue(&ifp->if_inputqueue, m))
>> -            return ENOBUFS;
>> -    task_add(net_tq(ifp->if_index), ifp->if_inputtask);
>> -
>> -    return (0);
>> +    return (if_output_local(ifp, m, dst->sa_family));
>> }
>> 
>> void
>> 

Reply via email to