> -----Original Message-----
> From: Ilya Maximets [mailto:[email protected]]
> Sent: Thursday, January 18, 2018 6:18 AM
> To: Jan Scheurich <[email protected]>; [email protected]
> Cc: [email protected]; Stokes, Ian <[email protected]>; O Mahony,
> Billy <[email protected]>
> Subject: Re: [PATCH v7 1/3] netdev: Add rxq callback function rxq_length()
> 
> On 18.01.2018 02:21, Jan Scheurich wrote:
> > Thanks for the review. Answers inline.
> > Regards, Jan
> >
> >
> >> From: Ilya Maximets [mailto:[email protected]]
> >> Sent: Wednesday, 17 January, 2018 11:47
> >> Subject: Re: [PATCH v7 1/3] netdev: Add rxq callback function
> >> rxq_length()
> >>
> >> On 16.01.2018 04:51, Jan Scheurich wrote:
> >>> If implememented, this function returns the number of packets in an
> >>> rx queue of the netdev. If not implemented, it returns -1.
> >>
> >> To be conform with other netdev functions it should return meaningful
> >> error codes. As 'rte_eth_rx_queue_count' could return different
> >> errors like -EINVAL or -ENOTSUP, 'netdev_rxq_length' itself should
> >> return -EOPNOTSUPP if not implemented.
> >
> > OK.
> >
> >>
> >>>
> >>> This function will be used in the upcoming commit for PMD
> >>> performance metrics to supervise the rx queue fill level for DPDK
> vhostuser ports.
> >>>
> >>> Signed-off-by: Jan Scheurich <[email protected]>
> >>> ---
> >>>  lib/netdev-bsd.c      |  1 +
> >>>  lib/netdev-dpdk.c     | 36 +++++++++++++++++++++++++++++++-----
> >>>  lib/netdev-dummy.c    |  1 +
> >>>  lib/netdev-linux.c    |  1 +
> >>>  lib/netdev-provider.h |  3 +++
> >>>  lib/netdev-vport.c    |  1 +
> >>>  lib/netdev.c          |  9 +++++++++
> >>>  lib/netdev.h          |  1 +
> >>>  8 files changed, 48 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c index
> >>> 05974c1..8d1771e 100644
> >>> --- a/lib/netdev-bsd.c
> >>> +++ b/lib/netdev-bsd.c
> >>> @@ -1546,6 +1546,7 @@ netdev_bsd_update_flags(struct netdev
> *netdev_, enum netdev_flags off,
> >>>      netdev_bsd_rxq_recv,                             \
> >>>      netdev_bsd_rxq_wait,                             \
> >>>      netdev_bsd_rxq_drain,                            \
> >>> +    NULL, /* rxq_length */                           \
> >>>                                                       \
> >>>      NO_OFFLOAD_API                                   \
> >>>  }
> >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> >>> ccda3fc..4200556 100644
> >>> --- a/lib/netdev-dpdk.c
> >>> +++ b/lib/netdev-dpdk.c
> >>> @@ -1839,6 +1839,27 @@ netdev_dpdk_rxq_recv(struct netdev_rxq
> *rxq, struct dp_packet_batch *batch)
> >>>      return 0;
> >>>  }
> >>>
> >>> +static int
> >>> +netdev_dpdk_vhost_rxq_length(struct netdev_rxq *rxq) {
> >>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
> >>> +    int qid = rxq->queue_id;
> >>> +
> >>
> >> We must make all the checks as in rxq_recv() function before calling
> >> 'rte_vhost_rx_queue_count'. Otherwise we may crash here if device
> >> will be occasionally disconnected:
> >>
> >>     int vid = netdev_dpdk_get_vid(dev);
> >>
> >>     if (OVS_UNLIKELY(vid < 0 || !dev->vhost_reconfigured
> >>                      || !(dev->flags & NETDEV_UP))) {
> >>         return -EAGAIN;
> >>     }
> >>
> >> Not sure about -EAGAIN, but we need to return some negative errno.
> >
> > OK. Not necessary for our use case, as it will only be called by the PMD
> after having received a full batch of 32 packets, but in general I agree those
> checks are needed.
> 
> It's necessary because vhost device could be disconnected between
> rxq_recv() and rxq_length(). In this case we will call
> rte_vhost_rx_queue_count() with vid == -1. This will produce access to the
> random memory inside dpdk and likely a segmentation fault.
> 
> See commit daf22bf7a826 ("netdev-dpdk: Fix calling vhost API with negative
> vid.") for a example of a similar issue. And I'm taking this opportunity to 
> recall
> that you should retrieve the vid only once.

 [[BO'M]] Is there not also the possibility that the vhost device gets 
disconnected between the call to get_vid() and rxq_recv()? 

Also, given these required calls to get_vid (which afaik requires some slow 
memory fencing) wouldn't that argue for the original approach where the rxq len 
is returned from rxq_recv(). As the call to rxq_length()  would be made once 
per batch once the queue is not being drained rxq_recv() the overhead could be 
significant.

> 
> >
> >>
> >>> +    /* The DPDK API returns a uint32_t which often has invalid bits in 
> >>> the
> >>> +     * upper 16-bits. Need to restrict the value uint16_t. */
> >>> +    return rte_vhost_rx_queue_count(netdev_dpdk_get_vid(dev),
> >>> +                                    qid * VIRTIO_QNUM + VIRTIO_TXQ)
> >>> +                & UINT16_MAX;
> >>> +}
> >>> +
> >>> +static int
> >>> +netdev_dpdk_rxq_length(struct netdev_rxq *rxq) {
> >>> +    struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq);
> >>> +
> >>
> >> Same here:
> >>
> >>     struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
> >>
> >>     if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) {
> >>         return -EAGAIN;
> >>     }
> >>
> >>> +    return rte_eth_rx_queue_count(rx->port_id, rxq->queue_id); }
> >>> +
> >>>  static inline int
> >>>  netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf
> **pkts,
> >>>                      int cnt, bool may_steal) @@ -3580,7 +3601,7 @@
> >>> unlock:
> >>>                            GET_CARRIER, GET_STATS,                        
> >>>   \
> >>>                            GET_CUSTOM_STATS,
>         \
> >>>                            GET_FEATURES, GET_STATUS,           \
> >>> -                          RECONFIGURE, RXQ_RECV)              \
> >>> +                          RECONFIGURE, RXQ_RECV, RXQ_LENGTH)  \
> >>>  {                                                             \
> >>>      NAME,                                                     \
> >>>      true,                       /* is_pmd */                  \
> >>> @@ -3649,6 +3670,7 @@ unlock:
> >>>      RXQ_RECV,                                                 \
> >>>      NULL,                       /* rx_wait */                 \
> >>>      NULL,                       /* rxq_drain */               \
> >>> +    RXQ_LENGTH,                                               \
> >>>      NO_OFFLOAD_API                                            \
> >>>  }
> >>>
> >>> @@ -3667,7 +3689,8 @@ static const struct netdev_class dpdk_class =
> >>>          netdev_dpdk_get_features,
> >>>          netdev_dpdk_get_status,
> >>>          netdev_dpdk_reconfigure,
> >>> -        netdev_dpdk_rxq_recv);
> >>> +        netdev_dpdk_rxq_recv,
> >>> +        netdev_dpdk_rxq_length);
> >>>
> >>>  static const struct netdev_class dpdk_ring_class =
> >>>      NETDEV_DPDK_CLASS(
> >>> @@ -3684,7 +3707,8 @@ static const struct netdev_class
> dpdk_ring_class =
> >>>          netdev_dpdk_get_features,
> >>>          netdev_dpdk_get_status,
> >>>          netdev_dpdk_reconfigure,
> >>> -        netdev_dpdk_rxq_recv);
> >>> +        netdev_dpdk_rxq_recv,
> >>> +        NULL);
> >>>
> >>>  static const struct netdev_class dpdk_vhost_class =
> >>>      NETDEV_DPDK_CLASS(
> >>> @@ -3701,7 +3725,8 @@ static const struct netdev_class
> dpdk_vhost_class =
> >>>          NULL,
> >>>          NULL,
> >>>          netdev_dpdk_vhost_reconfigure,
> >>> -        netdev_dpdk_vhost_rxq_recv);
> >>> +        netdev_dpdk_vhost_rxq_recv,
> >>> +        netdev_dpdk_vhost_rxq_length);
> >>>  static const struct netdev_class dpdk_vhost_client_class =
> >>>      NETDEV_DPDK_CLASS(
> >>>          "dpdkvhostuserclient",
> >>> @@ -3717,7 +3742,8 @@ static const struct netdev_class
> dpdk_vhost_client_class =
> >>>          NULL,
> >>>          NULL,
> >>>          netdev_dpdk_vhost_client_reconfigure,
> >>> -        netdev_dpdk_vhost_rxq_recv);
> >>> +        netdev_dpdk_vhost_rxq_recv,
> >>> +        netdev_dpdk_vhost_rxq_length);
> >>>
> >>>  void
> >>>  netdev_dpdk_register(void)
> >>> diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c index
> >>> 4246af3..7e2c0a2 100644
> >>> --- a/lib/netdev-dummy.c
> >>> +++ b/lib/netdev-dummy.c
> >>> @@ -1457,6 +1457,7 @@ netdev_dummy_update_flags(struct netdev
> *netdev_,
> >>>      netdev_dummy_rxq_recv,                                      \
> >>>      netdev_dummy_rxq_wait,                                      \
> >>>      netdev_dummy_rxq_drain,                                     \
> >>> +    NULL,                       /* rxq_length */                \
> >>>                                                                  \
> >>>      NO_OFFLOAD_API                                              \
> >>>  }
> >>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index
> >>> 37143b8..8b19890 100644
> >>> --- a/lib/netdev-linux.c
> >>> +++ b/lib/netdev-linux.c
> >>> @@ -2890,6 +2890,7 @@ netdev_linux_update_flags(struct netdev
> *netdev_, enum netdev_flags off,
> >>>      netdev_linux_rxq_recv,                                      \
> >>>      netdev_linux_rxq_wait,                                      \
> >>>      netdev_linux_rxq_drain,                                     \
> >>> +    NULL,                       /* rxq_length */                \
> >>>                                                                  \
> >>>      FLOW_OFFLOAD_API                                            \
> >>>  }
> >>> diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h index
> >>> 25bd671..297644a 100644
> >>> --- a/lib/netdev-provider.h
> >>> +++ b/lib/netdev-provider.h
> >>> @@ -801,6 +801,9 @@ struct netdev_class {
> >>>      /* Discards all packets waiting to be received from 'rx'. */
> >>>      int (*rxq_drain)(struct netdev_rxq *rx);
> >>>
> >>> +    /* Retrieve the number of packets present in an rx queue. */
> >>
> >> Comment should be extended. See below.
> >>
> >> In addition we need to mark here that function is not thread-safe and
> >> could not be used while device reconfiguration.
> >
> > I'm curious: is there any difference regarding thread-safeness and use
> during device configuration compared to the other netdev rxq functions?
> Why would it be necessary to list these constraints for this function but not
> the others?
> 
> Thread safety in general and particulary for netdev_rxq_*() functions
> described in corresponding section at the top of lib/netdev.h . Thread safety
> of netdev_send() in details described near to send() function definition in
> lib/netdev-provider.h because it has it's own thread-safe related argument.
> 
> Comment near to reconfigure() in lib/netdev-provider.h states that we must
> not use rxq_recv() and send() simultaneously with it.
> 
> I think, we should not describe the thread-safety and dependency with
> reconfigure() for rxq_length() in it's own comment, but update comments
> listed above with this new function.
> 
> >
> > And why should this read-only function be a priori thread-unsafe? And in
> which sense: concurrent invocations of this function? Or concurrent
> invocation of this function with rxq_recv() and rxq_drain()?
> 
> At least it could return wrong result in case of concurrent invocation with
> rxq_recv().
> 
> >
> > Should the netdev.h and the netdev_provider.h not rather have a general
> disclaimer that the rxq functions are only to be called from a single thread
> assigned "polling" the rx queue, or alternatively be protected by a lock on 
> the
> netdev user side?
> 
> lib/netdev.h already has general thread-safety disclaimer.
> 
> >
> >>
> >>> +    int (*rxq_length)(struct netdev_rxq *rx);
> >>> +
> >>>      /* ## -------------------------------- ## */
> >>>      /* ## netdev flow offloading functions ## */
> >>>      /* ## -------------------------------- ## */ diff --git
> >>> a/lib/netdev-vport.c b/lib/netdev-vport.c index 478ed90..1e7bc96
> >>> 100644
> >>> --- a/lib/netdev-vport.c
> >>> +++ b/lib/netdev-vport.c
> >>> @@ -944,6 +944,7 @@ netdev_vport_get_ifindex(const struct netdev
> *netdev_)
> >>>      NULL,                   /* rx_recv */                   \
> >>>      NULL,                   /* rx_wait */                   \
> >>>      NULL,                   /* rx_drain */                  \
> >>> +    NULL,                   /* rx_length */                 \
> >>>                                                              \
> >>>      NETDEV_FLOW_OFFLOAD_API
> >>>
> >>> diff --git a/lib/netdev.c b/lib/netdev.c index be05dc6..063c318
> >>> 100644
> >>> --- a/lib/netdev.c
> >>> +++ b/lib/netdev.c
> >>> @@ -724,6 +724,15 @@ netdev_rxq_drain(struct netdev_rxq *rx)
> >>>              : 0);
> >>>  }
> >>>
> >>> +/* Retrieve the number of packets present in an rx queue. */
> >>
> >> Comment should clearly declare what kind of result should be treated
> >> as an error, and what is the result in case of success. You may use
> >> description for 'netdev_get_ifindex' as a reference.
> >> Something like:
> >>
> >> /* Returns the number of packets present in an rx queue, if
> >> successful, as a
> >>  * positive number.  On failure, returns a negative errno value.
> >>  *
> >>  * Some network devices may not implement support for this function.
> >> In such
> >>  * cases this function will always return -EOPNOTSUPP. */
> >
> > OK.
> >>
> >>> +int
> >>> +netdev_rxq_length(struct netdev_rxq *rx) {
> >>> +    return (rx->netdev->netdev_class->rxq_length
> >>> +            ? rx->netdev->netdev_class->rxq_length(rx)
> >>> +            : -1);
> >>> +}
> >>> +
> >>>  /* Configures the number of tx queues of 'netdev'. Returns 0 if
> successful,
> >>>   * otherwise a positive errno value.
> >>>   *
> >>> diff --git a/lib/netdev.h b/lib/netdev.h index ff1b604..edd41b1
> >>> 100644
> >>> --- a/lib/netdev.h
> >>> +++ b/lib/netdev.h
> >>> @@ -178,6 +178,7 @@ int netdev_rxq_get_queue_id(const struct
> >>> netdev_rxq *);  int netdev_rxq_recv(struct netdev_rxq *rx, struct
> >>> dp_packet_batch *);  void netdev_rxq_wait(struct netdev_rxq *);  int
> >>> netdev_rxq_drain(struct netdev_rxq *);
> >>> +int netdev_rxq_length(struct netdev_rxq *rx);
> >>
> >> argument's name not needed here.
> >
> > OK.
> >
> >>
> >>>
> >>>  /* Packet transmission. */
> >>>  int netdev_send(struct netdev *, int qid, struct dp_packet_batch *,
> >>>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to