Re: [ovs-dev] [PATCH 1/3] dpif-netdev: only poll enabled vhost queues

David Marchand Sat, 13 Apr 2019 11:50:34 -0700

On Fri, Apr 12, 2019 at 2:12 PM Ilya Maximets <[email protected]>
wrote:


> On 11.04.2019 17:00, David Marchand wrote:
> > We currently poll all available queues based on the max queue count
> > exchanged with the vhost peer and rely on the vhost library in DPDK to
> > check the vring status beneath.
> > This can lead to some overhead when we have a lot of unused queues.
> >
> > To enhance the situation, we can skip the disabled queues.
> > On rxq notifications, we make use of the netdev's change_seq number so
> > that the pmd thread main loop can cache the queue state periodically.
> >
> > $ ovs-appctl dpif-netdev/pmd-rxq-show
> > pmd thread numa_id 0 core_id 1:
> >   isolated : true
> >   port: dpdk0             queue-id:  0  pmd usage:  0 %  polling: enabled
> > pmd thread numa_id 0 core_id 2:
> >   isolated : true
> >   port: vhost1            queue-id:  0  pmd usage:  0 %  polling:
> disabled
> >   port: vhost3            queue-id:  0  pmd usage:  0 %  polling:
> disabled
> > pmd thread numa_id 0 core_id 15:
> >   isolated : true
> >   port: dpdk1             queue-id:  0  pmd usage:  0 %  polling: enabled
> > pmd thread numa_id 0 core_id 16:
> >   isolated : true
> >   port: vhost0            queue-id:  0  pmd usage:  0 %  polling:
> disabled
> >   port: vhost2            queue-id:  0  pmd usage:  0 %  polling:
> disabled
> >
> > $ while true; do
> >   ovs-appctl dpif-netdev/pmd-rxq-show |awk '
> >   /polling: / {
> >     tot++;
> >     if ($NF == "enabled") {
> >       en++;
> >     }
> >   }
> >   END {
> >     print "total: " tot ", enabled: " en
> >   }'
> >   sleep 1
> > done
> >
> > total: 6, enabled: 2
> > total: 6, enabled: 2
> > ...
> >
> >  # Started vm, virtio devices are bound to kernel driver which enables
> >  # F_MQ + all queue pairs
> > total: 6, enabled: 2
> > total: 66, enabled: 66
> > ...
> >
> >  # Unbound vhost0 and vhost1 from the kernel driver
> > total: 66, enabled: 66
> > total: 66, enabled: 34
> > ...
> >
> >  # Configured kernel bound devices to use only 1 queue pair
> > total: 66, enabled: 34
> > total: 66, enabled: 19
> > total: 66, enabled: 4
> > ...
> >
> >  # While rebooting the vm
> > total: 66, enabled: 4
> > total: 66, enabled: 2
> > ...
> > total: 66, enabled: 66
> > ...
> >
> >  # After shutting down the vm
> > total: 66, enabled: 66
> > total: 66, enabled: 2
> >
> > Signed-off-by: David Marchand <[email protected]>
> > ---
>
> Hi.
> One important issue:
> You need to enable queue #0 in case there was no state changes.
> We're doing this for Tx queues in 'dpdk_vhost_reconfigure_helper'.
> This is because if F_PROTOCOL_FEATURES not negotiated, vrings are
> created in enabled state. Required to support older QEMU or legacy
> drivers.
>

Ah ok, that explains this txq0 special treatment.
Thanks.


> I'm working on DPDK patch to trigger 'vring_state_changed' in this
> case, however it could take long or never be accepted to LTS.
> I'd also like not to check for negotiated features in OVS code.
> I think this kind of stuff should be hidden by vhost library from
> the application.
>

Ok, we will have to keep this workaround anyway.



>
> More comments inline.
>
> Also, you might want to add Ian to CC list while sending userspace
> datapath related patches.
>

I dropped the previous Cc: list by mistake.
Sure will keep him Cc.



> >  lib/dpif-netdev.c     | 27 ++++++++++++++++++++
> >  lib/netdev-dpdk.c     | 71
> +++++++++++++++++++++++++++++++++++++++++----------
> >  lib/netdev-provider.h |  5 ++++
> >  lib/netdev.c          | 10 ++++++++
> >  lib/netdev.h          |  1 +
> >  5 files changed, 101 insertions(+), 13 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 4d6d0c3..c95612f 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -591,6 +591,8 @@ struct polled_queue {
> >      struct dp_netdev_rxq *rxq;
> >      odp_port_t port_no;
> >      bool emc_enabled;
> > +    bool enabled;
> > +    uint64_t change_seq;
> >  };
> >
> >  /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
> > @@ -1171,6 +1173,9 @@ pmd_info_show_rxq(struct ds *reply, struct
> dp_netdev_pmd_thread *pmd)
> >              } else {
> >                  ds_put_format(reply, "%s", "NOT AVAIL");
> >              }
> > +            ds_put_format(reply, "  polling: %s",
> > +                          (netdev_rxq_enabled(list[i].rxq->rx))
> > +                          ? "enabled" : "disabled");
> >              ds_put_cstr(reply, "\n");
> >          }
> >          ovs_mutex_unlock(&pmd->port_mutex);
> > @@ -5198,6 +5203,11 @@ dpif_netdev_run(struct dpif *dpif)
> >                  }
> >
> >                  for (i = 0; i < port->n_rxq; i++) {
> > +
> > +                    if (!netdev_rxq_enabled(port->rxqs[i].rx)) {
> > +                        continue;
> > +                    }
> > +
> >                      if (dp_netdev_process_rxq_port(non_pmd,
> >                                                     &port->rxqs[i],
> >                                                     port->port_no)) {
> > @@ -5371,6 +5381,9 @@ pmd_load_queues_and_ports(struct
> dp_netdev_pmd_thread *pmd,
> >          poll_list[i].rxq = poll->rxq;
> >          poll_list[i].port_no = poll->rxq->port->port_no;
> >          poll_list[i].emc_enabled = poll->rxq->port->emc_enabled;
> > +        poll_list[i].enabled = netdev_rxq_enabled(poll->rxq->rx);
> > +        poll_list[i].change_seq =
> > +                     netdev_get_change_seq(poll->rxq->port->netdev);
> >          i++;
> >      }
> >
> > @@ -5436,6 +5449,10 @@ reload:
> >
> >          for (i = 0; i < poll_cnt; i++) {
> >
> > +            if (!poll_list[i].enabled) {
> > +                continue;
> > +            }
> > +
> >              if (poll_list[i].emc_enabled) {
> >                  atomic_read_relaxed(&pmd->dp->emc_insert_min,
> >                                      &pmd->ctx.emc_insert_min);
> > @@ -5472,6 +5489,16 @@ reload:
> >              if (reload) {
> >                  break;
> >              }
> > +
> > +            for (i = 0; i < poll_cnt; i++) {
> > +                uint64_t current_seq =
> > +
>  netdev_get_change_seq(poll_list[i].rxq->port->netdev);
> > +                if (poll_list[i].change_seq != current_seq) {
> > +                    poll_list[i].change_seq = current_seq;
> > +                    poll_list[i].enabled =
> > +
>  netdev_rxq_enabled(poll_list[i].rxq->rx);
> > +                }
> > +            }
> >          }
> >          pmd_perf_end_iteration(s, rx_packets, tx_packets,
> >                                 pmd_perf_metrics_enabled(pmd));
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 47153dc..9088ff4 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -317,6 +317,10 @@ struct dpdk_mp {
> >       struct ovs_list list_node OVS_GUARDED_BY(dpdk_mp_mutex);
> >   };
> >
> > +struct dpdk_rx_queue {
> > +    bool enabled;
> > +};
>
> I'm thinking if we need to mimic tx_q here.
> rxq_enable is vhost specific. We could call an array like
> 'vhost_rxq_enabled'
> and drop the structure. 'free' of this array should happen in
> '*vhost_destruct',
> not in 'common_destruct'. Clearing could be done with 'memset' in this
> case.
>
> What do you think?
>

I had something similar in my first test, dropped it thinking I would align
on tx_q.
But indeed, tx_q is generic, while rx_q is really vhost specific.
We have no reason for a struct at the moment, so let's go with a simple
array.



> > +
> >  /* There should be one 'struct dpdk_tx_queue' created for
> >   * each cpu core. */
> >  struct dpdk_tx_queue {
> > @@ -424,6 +428,8 @@ struct netdev_dpdk {
> >          OVSRCU_TYPE(struct ingress_policer *) ingress_policer;
> >          uint32_t policer_rate;
> >          uint32_t policer_burst;
> > +
> > +        struct dpdk_rx_queue *rx_q;
>
> Comment would be nice here.
>

Sure.



> >      );
> >
> >      PADDED_MEMBERS(CACHE_LINE_SIZE,
> > @@ -1109,6 +1115,12 @@ netdev_dpdk_alloc(void)
> >      return NULL;
> >  }
> >
> > +static struct dpdk_rx_queue *
> > +netdev_dpdk_alloc_rxq(unsigned int n_rxqs)
> > +{
> > +    return dpdk_rte_mzalloc(n_rxqs * sizeof(struct dpdk_rx_queue));
> > +}
> > +
> >  static struct dpdk_tx_queue *
> >  netdev_dpdk_alloc_txq(unsigned int n_txqs)
> >  {
> > @@ -1235,6 +1247,10 @@ vhost_common_construct(struct netdev *netdev)
> >      int socket_id = rte_lcore_to_socket_id(rte_get_master_lcore());
> >      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> >
> > +    dev->rx_q = netdev_dpdk_alloc_rxq(OVS_VHOST_MAX_QUEUE_NUM);
>
> Do we need a function for this? Allocating inplace you'll be able to
> use more OVS-style sizeof call:
>
>        dev->rx_q = dpdk_rte_mzalloc(OVS_VHOST_MAX_QUEUE_NUM * sizeof
> *dev->rx_q);
>

Ok.


> > +    if (!dev->rx_q) {
> > +        return ENOMEM;
> > +    }
> >      dev->tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM);
> >      if (!dev->tx_q) {
>
> Need to free 'rx_q' before return.
>

Indeed.


> >          return ENOMEM;
> > @@ -1354,6 +1370,7 @@ common_destruct(struct netdev_dpdk *dev)
> >      OVS_REQUIRES(dpdk_mutex)
> >      OVS_EXCLUDED(dev->mutex)
> >  {
> > +    rte_free(dev->rx_q);
> >      rte_free(dev->tx_q);
> >      dpdk_mp_put(dev->dpdk_mp);
> >
> > @@ -2201,6 +2218,14 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
> >  }
> >
> >  static int
> > +netdev_dpdk_vhost_rxq_enabled(struct netdev_rxq *rxq)
> > +{
> > +    struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
> > +
> > +    return dev->rx_q[rxq->queue_id].enabled;
> > +}
> > +
> > +static int
> >  netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch
> *batch,
> >                       int *qfill)
> >  {
> > @@ -3531,6 +3556,17 @@ new_device(int vid)
> >
> >  /* Clears mapping for all available queues of vhost interface. */
> >  static void
> > +netdev_dpdk_rxq_map_clear(struct netdev_dpdk *dev)
>
> This isn't a 'map' actually.
>

I'll go with a simple memset as you suggested.


> > +    OVS_REQUIRES(dev->mutex)
> > +{
> > +    int i;
> > +
> > +    for (i = 0; i < dev->up.n_rxq; i++) {
> > +        dev->rx_q[i].enabled = false;
> > +    }
> > +}
> > +
> > +static void
> >  netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev)
> >      OVS_REQUIRES(dev->mutex)
> >  {
> > @@ -3563,6 +3599,7 @@ destroy_device(int vid)
> >              ovs_mutex_lock(&dev->mutex);
> >              dev->vhost_reconfigured = false;
> >              ovsrcu_index_set(&dev->vid, -1);
> > +            netdev_dpdk_rxq_map_clear(dev);
> >              netdev_dpdk_txq_map_clear(dev);
> >
> >              netdev_change_seq_changed(&dev->up);
> > @@ -3597,24 +3634,30 @@ vring_state_changed(int vid, uint16_t queue_id,
> int enable)
> >      struct netdev_dpdk *dev;
> >      bool exists = false;
> >      int qid = queue_id / VIRTIO_QNUM;
> > +    bool is_rx = (queue_id % VIRTIO_QNUM) == VIRTIO_TXQ;
> >      char ifname[IF_NAME_SZ];
> >
> >      rte_vhost_get_ifname(vid, ifname, sizeof ifname);
> >
> > -    if (queue_id % VIRTIO_QNUM == VIRTIO_TXQ) {
> > -        return 0;
> > -    }
> > -
> >      ovs_mutex_lock(&dpdk_mutex);
> >      LIST_FOR_EACH (dev, list_node, &dpdk_list) {
> >          ovs_mutex_lock(&dev->mutex);
> >          if (nullable_string_is_equal(ifname, dev->vhost_id)) {
> > -            if (enable) {
> > -                dev->tx_q[qid].map = qid;
> > +            if (is_rx) {
> > +                bool enabled = dev->rx_q[qid].enabled;
> > +
> > +                dev->rx_q[qid].enabled = enable != 0;
> > +                if (enabled ^ dev->rx_q[qid].enabled) {
>
> Bit ops on a boolean seems not a good practice. It's better to just use
> '!='.
>

Ok.


> > +                    netdev_change_seq_changed(&dev->up);
> > +                }
> >              } else {
> > -                dev->tx_q[qid].map = OVS_VHOST_QUEUE_DISABLED;
> > +                if (enable) {
> > +                    dev->tx_q[qid].map = qid;
> > +                } else {
> > +                    dev->tx_q[qid].map = OVS_VHOST_QUEUE_DISABLED;
> > +                }
> > +                netdev_dpdk_remap_txqs(dev);
> >              }
> > -            netdev_dpdk_remap_txqs(dev);
> >              exists = true;
> >              ovs_mutex_unlock(&dev->mutex);
> >              break;
> > @@ -3624,9 +3667,9 @@ vring_state_changed(int vid, uint16_t queue_id,
> int enable)
> >      ovs_mutex_unlock(&dpdk_mutex);
> >
> >      if (exists) {
> > -        VLOG_INFO("State of queue %d ( tx_qid %d ) of vhost device '%s'
> "
> > -                  "changed to \'%s\'", queue_id, qid, ifname,
> > -                  (enable == 1) ? "enabled" : "disabled");
> > +        VLOG_INFO("State of queue %d ( %s_qid %d ) of vhost device '%s'
> "
> > +                  "changed to \'%s\'", queue_id, is_rx == true ? "rx" :
> "tx",
> > +                  qid, ifname, (enable == 1) ? "enabled" : "disabled");
> >      } else {
> >          VLOG_INFO("vHost Device '%s' not found", ifname);
> >          return -1;
> > @@ -4297,7 +4340,8 @@ static const struct netdev_class dpdk_vhost_class
> = {
> >      .get_stats = netdev_dpdk_vhost_get_stats,
> >      .get_status = netdev_dpdk_vhost_user_get_status,
> >      .reconfigure = netdev_dpdk_vhost_reconfigure,
> > -    .rxq_recv = netdev_dpdk_vhost_rxq_recv
> > +    .rxq_recv = netdev_dpdk_vhost_rxq_recv,
> > +    .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,
> >  };
> >
> >  static const struct netdev_class dpdk_vhost_client_class = {
> > @@ -4311,7 +4355,8 @@ static const struct netdev_class
> dpdk_vhost_client_class = {
> >      .get_stats = netdev_dpdk_vhost_get_stats,
> >      .get_status = netdev_dpdk_vhost_user_get_status,
> >      .reconfigure = netdev_dpdk_vhost_client_reconfigure,
> > -    .rxq_recv = netdev_dpdk_vhost_rxq_recv
> > +    .rxq_recv = netdev_dpdk_vhost_rxq_recv,
> > +    .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,
> >  };
> >
> >  void
> > diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
> > index fb0c27e..39354db 100644
> > --- a/lib/netdev-provider.h
> > +++ b/lib/netdev-provider.h
> > @@ -789,6 +789,11 @@ struct netdev_class {
> >      void (*rxq_destruct)(struct netdev_rxq *);
> >      void (*rxq_dealloc)(struct netdev_rxq *);
> >
> > +    /* A netdev can report if a queue won't get traffic and should be
> excluded
> > +     * from polling (no callback implicitely means that the queue is
> enabled).
> > +     */
> > +    int (*rxq_enabled)(struct netdev_rxq *);
> > +
> >      /* Attempts to receive a batch of packets from 'rx'.  In 'batch',
> the
> >       * caller supplies 'packets' as the pointer to the beginning of an
> array
> >       * of NETDEV_MAX_BURST pointers to dp_packet.  If successful, the
> > diff --git a/lib/netdev.c b/lib/netdev.c
> > index 7d7ecf6..88db6c7 100644
> > --- a/lib/netdev.c
> > +++ b/lib/netdev.c
> > @@ -682,6 +682,16 @@ netdev_rxq_close(struct netdev_rxq *rx)
> >      }
> >  }
> >
> > +int netdev_rxq_enabled(struct netdev_rxq *rx)
> > +{
> > +    bool enabled = true;
> > +
> > +    if (rx->netdev->netdev_class->rxq_enabled) {
> > +        enabled = rx->netdev->netdev_class->rxq_enabled(rx);
> > +    }
> > +    return enabled;
>
> You're mixing and converting 'int' and 'bool' all the time.
> I think, it's better to change the function prototype to return 'bool'.
>

Yes.



> > +}
> > +
> >  /* Attempts to receive a batch of packets from 'rx'.  'batch' should
> point to
> >   * the beginning of an array of NETDEV_MAX_BURST pointers to
> dp_packet.  If
> >   * successful, this function stores pointers to up to NETDEV_MAX_BURST
> > diff --git a/lib/netdev.h b/lib/netdev.h
> > index d94817f..859f5ef 100644
> > --- a/lib/netdev.h
> > +++ b/lib/netdev.h
> > @@ -183,6 +183,7 @@ enum netdev_pt_mode netdev_get_pt_mode(const struct
> netdev *);
> >  /* Packet reception. */
> >  int netdev_rxq_open(struct netdev *, struct netdev_rxq **, int id);
> >  void netdev_rxq_close(struct netdev_rxq *);
> > +int netdev_rxq_enabled(struct netdev_rxq *);
> >
> >  const char *netdev_rxq_get_name(const struct netdev_rxq *);
> >  int netdev_rxq_get_queue_id(const struct netdev_rxq *);
> >
>


Thanks a lot for the review, will send v2 on monday if all goes well.


-- 
David Marchand
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/3] dpif-netdev: only poll enabled vhost queues

Reply via email to