Re: [ovs-dev] [PATCH v7 1/3] netdev: Add rxq callback function rxq_length()

Ilya Maximets Thu, 18 Jan 2018 03:31:15 -0800

On 18.01.2018 13:51, O Mahony, Billy wrote:
> 
> 
>> -----Original Message-----
>> From: Ilya Maximets [mailto:[email protected]]
>> Sent: Thursday, January 18, 2018 6:18 AM
>> To: Jan Scheurich <[email protected]>; [email protected]
>> Cc: [email protected]; Stokes, Ian <[email protected]>; O Mahony,
>> Billy <[email protected]>
>> Subject: Re: [PATCH v7 1/3] netdev: Add rxq callback function rxq_length()
>>
>> On 18.01.2018 02:21, Jan Scheurich wrote:
>>> Thanks for the review. Answers inline.
>>> Regards, Jan
>>>
>>>
>>>> From: Ilya Maximets [mailto:[email protected]]
>>>> Sent: Wednesday, 17 January, 2018 11:47
>>>> Subject: Re: [PATCH v7 1/3] netdev: Add rxq callback function
>>>> rxq_length()
>>>>
>>>> On 16.01.2018 04:51, Jan Scheurich wrote:
>>>>> If implememented, this function returns the number of packets in an
>>>>> rx queue of the netdev. If not implemented, it returns -1.
>>>>
>>>> To be conform with other netdev functions it should return meaningful
>>>> error codes. As 'rte_eth_rx_queue_count' could return different
>>>> errors like -EINVAL or -ENOTSUP, 'netdev_rxq_length' itself should
>>>> return -EOPNOTSUPP if not implemented.
>>>
>>> OK.
>>>
>>>>
>>>>>
>>>>> This function will be used in the upcoming commit for PMD
>>>>> performance metrics to supervise the rx queue fill level for DPDK
>> vhostuser ports.
>>>>>
>>>>> Signed-off-by: Jan Scheurich <[email protected]>
>>>>> ---
>>>>>  lib/netdev-bsd.c      |  1 +
>>>>>  lib/netdev-dpdk.c     | 36 +++++++++++++++++++++++++++++++-----
>>>>>  lib/netdev-dummy.c    |  1 +
>>>>>  lib/netdev-linux.c    |  1 +
>>>>>  lib/netdev-provider.h |  3 +++
>>>>>  lib/netdev-vport.c    |  1 +
>>>>>  lib/netdev.c          |  9 +++++++++
>>>>>  lib/netdev.h          |  1 +
>>>>>  8 files changed, 48 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c index
>>>>> 05974c1..8d1771e 100644
>>>>> --- a/lib/netdev-bsd.c
>>>>> +++ b/lib/netdev-bsd.c
>>>>> @@ -1546,6 +1546,7 @@ netdev_bsd_update_flags(struct netdev
>> *netdev_, enum netdev_flags off,
>>>>>      netdev_bsd_rxq_recv,                             \
>>>>>      netdev_bsd_rxq_wait,                             \
>>>>>      netdev_bsd_rxq_drain,                            \
>>>>> +    NULL, /* rxq_length */                           \
>>>>>                                                       \
>>>>>      NO_OFFLOAD_API                                   \
>>>>>  }
>>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
>>>>> ccda3fc..4200556 100644
>>>>> --- a/lib/netdev-dpdk.c
>>>>> +++ b/lib/netdev-dpdk.c
>>>>> @@ -1839,6 +1839,27 @@ netdev_dpdk_rxq_recv(struct netdev_rxq
>> *rxq, struct dp_packet_batch *batch)
>>>>>      return 0;
>>>>>  }
>>>>>
>>>>> +static int
>>>>> +netdev_dpdk_vhost_rxq_length(struct netdev_rxq *rxq) {
>>>>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
>>>>> +    int qid = rxq->queue_id;
>>>>> +
>>>>
>>>> We must make all the checks as in rxq_recv() function before calling
>>>> 'rte_vhost_rx_queue_count'. Otherwise we may crash here if device
>>>> will be occasionally disconnected:
>>>>
>>>>     int vid = netdev_dpdk_get_vid(dev);
>>>>
>>>>     if (OVS_UNLIKELY(vid < 0 || !dev->vhost_reconfigured
>>>>                      || !(dev->flags & NETDEV_UP))) {
>>>>         return -EAGAIN;
>>>>     }
>>>>
>>>> Not sure about -EAGAIN, but we need to return some negative errno.
>>>
>>> OK. Not necessary for our use case, as it will only be called by the PMD
>> after having received a full batch of 32 packets, but in general I agree 
>> those
>> checks are needed.
>>
>> It's necessary because vhost device could be disconnected between
>> rxq_recv() and rxq_length(). In this case we will call
>> rte_vhost_rx_queue_count() with vid == -1. This will produce access to the
>> random memory inside dpdk and likely a segmentation fault.
>>
>> See commit daf22bf7a826 ("netdev-dpdk: Fix calling vhost API with negative
>> vid.") for a example of a similar issue. And I'm taking this opportunity to 
>> recall
>> that you should retrieve the vid only once.
> 
>  [[BO'M]] Is there not also the possibility that the vhost device gets 
> disconnected between the call to get_vid() and rxq_recv()?


You mean disconnect between netdev_dpdk_get_vid(dev) and 
rte_vhost_dequeue_burst(vid) ?
There is no issue in this case, because 'destroy_device()' will wait for other 
threads to
quiesce. This means that device structure inside dpdk will not be freed while 
we're inside
netdev_rxq_recv(). We can safely call any rte_vhost API for the old vid until 
device not
freed inside dpdk.

> 
> Also, given these required calls to get_vid (which afaik requires some slow 
> memory fencing) wouldn't that argue for the original approach where the rxq 
> len is returned from rxq_recv(). As the call to rxq_length()  would be made 
> once per batch once the queue is not being drained rxq_recv() the overhead 
> could be significant.

I'm not sure (I hope that Jan tested the performance of this version), but I 
feel that
'rte_vhost_rx_queue_count()' is more heavy operation.

> 
>>
>>>
>>>>
>>>>> +    /* The DPDK API returns a uint32_t which often has invalid bits in 
>>>>> the
>>>>> +     * upper 16-bits. Need to restrict the value uint16_t. */
>>>>> +    return rte_vhost_rx_queue_count(netdev_dpdk_get_vid(dev),
>>>>> +                                    qid * VIRTIO_QNUM + VIRTIO_TXQ)
>>>>> +                & UINT16_MAX;
>>>>> +}
>>>>> +
>>>>> +static int
>>>>> +netdev_dpdk_rxq_length(struct netdev_rxq *rxq) {
>>>>> +    struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq);
>>>>> +
>>>>
>>>> Same here:
>>>>
>>>>     struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
>>>>
>>>>     if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) {
>>>>         return -EAGAIN;
>>>>     }
>>>>
>>>>> +    return rte_eth_rx_queue_count(rx->port_id, rxq->queue_id); }
>>>>> +
>>>>>  static inline int
>>>>>  netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf
>> **pkts,
>>>>>                      int cnt, bool may_steal) @@ -3580,7 +3601,7 @@
>>>>> unlock:
>>>>>                            GET_CARRIER, GET_STATS,                        
>>>>>   \
>>>>>                            GET_CUSTOM_STATS,
>>        \
>>>>>                            GET_FEATURES, GET_STATUS,           \
>>>>> -                          RECONFIGURE, RXQ_RECV)              \
>>>>> +                          RECONFIGURE, RXQ_RECV, RXQ_LENGTH)  \
>>>>>  {                                                             \
>>>>>      NAME,                                                     \
>>>>>      true,                       /* is_pmd */                  \
>>>>> @@ -3649,6 +3670,7 @@ unlock:
>>>>>      RXQ_RECV,                                                 \
>>>>>      NULL,                       /* rx_wait */                 \
>>>>>      NULL,                       /* rxq_drain */               \
>>>>> +    RXQ_LENGTH,                                               \
>>>>>      NO_OFFLOAD_API                                            \
>>>>>  }
>>>>>
>>>>> @@ -3667,7 +3689,8 @@ static const struct netdev_class dpdk_class =
>>>>>          netdev_dpdk_get_features,
>>>>>          netdev_dpdk_get_status,
>>>>>          netdev_dpdk_reconfigure,
>>>>> -        netdev_dpdk_rxq_recv);
>>>>> +        netdev_dpdk_rxq_recv,
>>>>> +        netdev_dpdk_rxq_length);
>>>>>
>>>>>  static const struct netdev_class dpdk_ring_class =
>>>>>      NETDEV_DPDK_CLASS(
>>>>> @@ -3684,7 +3707,8 @@ static const struct netdev_class
>> dpdk_ring_class =
>>>>>          netdev_dpdk_get_features,
>>>>>          netdev_dpdk_get_status,
>>>>>          netdev_dpdk_reconfigure,
>>>>> -        netdev_dpdk_rxq_recv);
>>>>> +        netdev_dpdk_rxq_recv,
>>>>> +        NULL);
>>>>>
>>>>>  static const struct netdev_class dpdk_vhost_class =
>>>>>      NETDEV_DPDK_CLASS(
>>>>> @@ -3701,7 +3725,8 @@ static const struct netdev_class
>> dpdk_vhost_class =
>>>>>          NULL,
>>>>>          NULL,
>>>>>          netdev_dpdk_vhost_reconfigure,
>>>>> -        netdev_dpdk_vhost_rxq_recv);
>>>>> +        netdev_dpdk_vhost_rxq_recv,
>>>>> +        netdev_dpdk_vhost_rxq_length);
>>>>>  static const struct netdev_class dpdk_vhost_client_class =
>>>>>      NETDEV_DPDK_CLASS(
>>>>>          "dpdkvhostuserclient",
>>>>> @@ -3717,7 +3742,8 @@ static const struct netdev_class
>> dpdk_vhost_client_class =
>>>>>          NULL,
>>>>>          NULL,
>>>>>          netdev_dpdk_vhost_client_reconfigure,
>>>>> -        netdev_dpdk_vhost_rxq_recv);
>>>>> +        netdev_dpdk_vhost_rxq_recv,
>>>>> +        netdev_dpdk_vhost_rxq_length);
>>>>>
>>>>>  void
>>>>>  netdev_dpdk_register(void)
>>>>> diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c index
>>>>> 4246af3..7e2c0a2 100644
>>>>> --- a/lib/netdev-dummy.c
>>>>> +++ b/lib/netdev-dummy.c
>>>>> @@ -1457,6 +1457,7 @@ netdev_dummy_update_flags(struct netdev
>> *netdev_,
>>>>>      netdev_dummy_rxq_recv,                                      \
>>>>>      netdev_dummy_rxq_wait,                                      \
>>>>>      netdev_dummy_rxq_drain,                                     \
>>>>> +    NULL,                       /* rxq_length */                \
>>>>>                                                                  \
>>>>>      NO_OFFLOAD_API                                              \
>>>>>  }
>>>>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index
>>>>> 37143b8..8b19890 100644
>>>>> --- a/lib/netdev-linux.c
>>>>> +++ b/lib/netdev-linux.c
>>>>> @@ -2890,6 +2890,7 @@ netdev_linux_update_flags(struct netdev
>> *netdev_, enum netdev_flags off,
>>>>>      netdev_linux_rxq_recv,                                      \
>>>>>      netdev_linux_rxq_wait,                                      \
>>>>>      netdev_linux_rxq_drain,                                     \
>>>>> +    NULL,                       /* rxq_length */                \
>>>>>                                                                  \
>>>>>      FLOW_OFFLOAD_API                                            \
>>>>>  }
>>>>> diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h index
>>>>> 25bd671..297644a 100644
>>>>> --- a/lib/netdev-provider.h
>>>>> +++ b/lib/netdev-provider.h
>>>>> @@ -801,6 +801,9 @@ struct netdev_class {
>>>>>      /* Discards all packets waiting to be received from 'rx'. */
>>>>>      int (*rxq_drain)(struct netdev_rxq *rx);
>>>>>
>>>>> +    /* Retrieve the number of packets present in an rx queue. */
>>>>
>>>> Comment should be extended. See below.
>>>>
>>>> In addition we need to mark here that function is not thread-safe and
>>>> could not be used while device reconfiguration.
>>>
>>> I'm curious: is there any difference regarding thread-safeness and use
>> during device configuration compared to the other netdev rxq functions?
>> Why would it be necessary to list these constraints for this function but not
>> the others?
>>
>> Thread safety in general and particulary for netdev_rxq_*() functions
>> described in corresponding section at the top of lib/netdev.h . Thread safety
>> of netdev_send() in details described near to send() function definition in
>> lib/netdev-provider.h because it has it's own thread-safe related argument.
>>
>> Comment near to reconfigure() in lib/netdev-provider.h states that we must
>> not use rxq_recv() and send() simultaneously with it.
>>
>> I think, we should not describe the thread-safety and dependency with
>> reconfigure() for rxq_length() in it's own comment, but update comments
>> listed above with this new function.
>>
>>>
>>> And why should this read-only function be a priori thread-unsafe? And in
>> which sense: concurrent invocations of this function? Or concurrent
>> invocation of this function with rxq_recv() and rxq_drain()?
>>
>> At least it could return wrong result in case of concurrent invocation with
>> rxq_recv().
>>
>>>
>>> Should the netdev.h and the netdev_provider.h not rather have a general
>> disclaimer that the rxq functions are only to be called from a single thread
>> assigned "polling" the rx queue, or alternatively be protected by a lock on 
>> the
>> netdev user side?
>>
>> lib/netdev.h already has general thread-safety disclaimer.
>>
>>>
>>>>
>>>>> +    int (*rxq_length)(struct netdev_rxq *rx);
>>>>> +
>>>>>      /* ## -------------------------------- ## */
>>>>>      /* ## netdev flow offloading functions ## */
>>>>>      /* ## -------------------------------- ## */ diff --git
>>>>> a/lib/netdev-vport.c b/lib/netdev-vport.c index 478ed90..1e7bc96
>>>>> 100644
>>>>> --- a/lib/netdev-vport.c
>>>>> +++ b/lib/netdev-vport.c
>>>>> @@ -944,6 +944,7 @@ netdev_vport_get_ifindex(const struct netdev
>> *netdev_)
>>>>>      NULL,                   /* rx_recv */                   \
>>>>>      NULL,                   /* rx_wait */                   \
>>>>>      NULL,                   /* rx_drain */                  \
>>>>> +    NULL,                   /* rx_length */                 \
>>>>>                                                              \
>>>>>      NETDEV_FLOW_OFFLOAD_API
>>>>>
>>>>> diff --git a/lib/netdev.c b/lib/netdev.c index be05dc6..063c318
>>>>> 100644
>>>>> --- a/lib/netdev.c
>>>>> +++ b/lib/netdev.c
>>>>> @@ -724,6 +724,15 @@ netdev_rxq_drain(struct netdev_rxq *rx)
>>>>>              : 0);
>>>>>  }
>>>>>
>>>>> +/* Retrieve the number of packets present in an rx queue. */
>>>>
>>>> Comment should clearly declare what kind of result should be treated
>>>> as an error, and what is the result in case of success. You may use
>>>> description for 'netdev_get_ifindex' as a reference.
>>>> Something like:
>>>>
>>>> /* Returns the number of packets present in an rx queue, if
>>>> successful, as a
>>>>  * positive number.  On failure, returns a negative errno value.
>>>>  *
>>>>  * Some network devices may not implement support for this function.
>>>> In such
>>>>  * cases this function will always return -EOPNOTSUPP. */
>>>
>>> OK.
>>>>
>>>>> +int
>>>>> +netdev_rxq_length(struct netdev_rxq *rx) {
>>>>> +    return (rx->netdev->netdev_class->rxq_length
>>>>> +            ? rx->netdev->netdev_class->rxq_length(rx)
>>>>> +            : -1);
>>>>> +}
>>>>> +
>>>>>  /* Configures the number of tx queues of 'netdev'. Returns 0 if
>> successful,
>>>>>   * otherwise a positive errno value.
>>>>>   *
>>>>> diff --git a/lib/netdev.h b/lib/netdev.h index ff1b604..edd41b1
>>>>> 100644
>>>>> --- a/lib/netdev.h
>>>>> +++ b/lib/netdev.h
>>>>> @@ -178,6 +178,7 @@ int netdev_rxq_get_queue_id(const struct
>>>>> netdev_rxq *);  int netdev_rxq_recv(struct netdev_rxq *rx, struct
>>>>> dp_packet_batch *);  void netdev_rxq_wait(struct netdev_rxq *);  int
>>>>> netdev_rxq_drain(struct netdev_rxq *);
>>>>> +int netdev_rxq_length(struct netdev_rxq *rx);
>>>>
>>>> argument's name not needed here.
>>>
>>> OK.
>>>
>>>>
>>>>>
>>>>>  /* Packet transmission. */
>>>>>  int netdev_send(struct netdev *, int qid, struct dp_packet_batch *,
>>>>>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v7 1/3] netdev: Add rxq callback function rxq_length()

Reply via email to