On 18.01.2018 13:51, O Mahony, Billy wrote:
>
>
>> -----Original Message-----
>> From: Ilya Maximets [mailto:[email protected]]
>> Sent: Thursday, January 18, 2018 6:18 AM
>> To: Jan Scheurich <[email protected]>; [email protected]
>> Cc: [email protected]; Stokes, Ian <[email protected]>; O Mahony,
>> Billy <[email protected]>
>> Subject: Re: [PATCH v7 1/3] netdev: Add rxq callback function rxq_length()
>>
>> On 18.01.2018 02:21, Jan Scheurich wrote:
>>> Thanks for the review. Answers inline.
>>> Regards, Jan
>>>
>>>
>>>> From: Ilya Maximets [mailto:[email protected]]
>>>> Sent: Wednesday, 17 January, 2018 11:47
>>>> Subject: Re: [PATCH v7 1/3] netdev: Add rxq callback function
>>>> rxq_length()
>>>>
>>>> On 16.01.2018 04:51, Jan Scheurich wrote:
>>>>> If implememented, this function returns the number of packets in an
>>>>> rx queue of the netdev. If not implemented, it returns -1.
>>>>
>>>> To be conform with other netdev functions it should return meaningful
>>>> error codes. As 'rte_eth_rx_queue_count' could return different
>>>> errors like -EINVAL or -ENOTSUP, 'netdev_rxq_length' itself should
>>>> return -EOPNOTSUPP if not implemented.
>>>
>>> OK.
>>>
>>>>
>>>>>
>>>>> This function will be used in the upcoming commit for PMD
>>>>> performance metrics to supervise the rx queue fill level for DPDK
>> vhostuser ports.
>>>>>
>>>>> Signed-off-by: Jan Scheurich <[email protected]>
>>>>> ---
>>>>> lib/netdev-bsd.c | 1 +
>>>>> lib/netdev-dpdk.c | 36 +++++++++++++++++++++++++++++++-----
>>>>> lib/netdev-dummy.c | 1 +
>>>>> lib/netdev-linux.c | 1 +
>>>>> lib/netdev-provider.h | 3 +++
>>>>> lib/netdev-vport.c | 1 +
>>>>> lib/netdev.c | 9 +++++++++
>>>>> lib/netdev.h | 1 +
>>>>> 8 files changed, 48 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c index
>>>>> 05974c1..8d1771e 100644
>>>>> --- a/lib/netdev-bsd.c
>>>>> +++ b/lib/netdev-bsd.c
>>>>> @@ -1546,6 +1546,7 @@ netdev_bsd_update_flags(struct netdev
>> *netdev_, enum netdev_flags off,
>>>>> netdev_bsd_rxq_recv, \
>>>>> netdev_bsd_rxq_wait, \
>>>>> netdev_bsd_rxq_drain, \
>>>>> + NULL, /* rxq_length */ \
>>>>> \
>>>>> NO_OFFLOAD_API \
>>>>> }
>>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
>>>>> ccda3fc..4200556 100644
>>>>> --- a/lib/netdev-dpdk.c
>>>>> +++ b/lib/netdev-dpdk.c
>>>>> @@ -1839,6 +1839,27 @@ netdev_dpdk_rxq_recv(struct netdev_rxq
>> *rxq, struct dp_packet_batch *batch)
>>>>> return 0;
>>>>> }
>>>>>
>>>>> +static int
>>>>> +netdev_dpdk_vhost_rxq_length(struct netdev_rxq *rxq) {
>>>>> + struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
>>>>> + int qid = rxq->queue_id;
>>>>> +
>>>>
>>>> We must make all the checks as in rxq_recv() function before calling
>>>> 'rte_vhost_rx_queue_count'. Otherwise we may crash here if device
>>>> will be occasionally disconnected:
>>>>
>>>> int vid = netdev_dpdk_get_vid(dev);
>>>>
>>>> if (OVS_UNLIKELY(vid < 0 || !dev->vhost_reconfigured
>>>> || !(dev->flags & NETDEV_UP))) {
>>>> return -EAGAIN;
>>>> }
>>>>
>>>> Not sure about -EAGAIN, but we need to return some negative errno.
>>>
>>> OK. Not necessary for our use case, as it will only be called by the PMD
>> after having received a full batch of 32 packets, but in general I agree
>> those
>> checks are needed.
>>
>> It's necessary because vhost device could be disconnected between
>> rxq_recv() and rxq_length(). In this case we will call
>> rte_vhost_rx_queue_count() with vid == -1. This will produce access to the
>> random memory inside dpdk and likely a segmentation fault.
>>
>> See commit daf22bf7a826 ("netdev-dpdk: Fix calling vhost API with negative
>> vid.") for a example of a similar issue. And I'm taking this opportunity to
>> recall
>> that you should retrieve the vid only once.
>
> [[BO'M]] Is there not also the possibility that the vhost device gets
> disconnected between the call to get_vid() and rxq_recv()?
You mean disconnect between netdev_dpdk_get_vid(dev) and
rte_vhost_dequeue_burst(vid) ?
There is no issue in this case, because 'destroy_device()' will wait for other
threads to
quiesce. This means that device structure inside dpdk will not be freed while
we're inside
netdev_rxq_recv(). We can safely call any rte_vhost API for the old vid until
device not
freed inside dpdk.
>
> Also, given these required calls to get_vid (which afaik requires some slow
> memory fencing) wouldn't that argue for the original approach where the rxq
> len is returned from rxq_recv(). As the call to rxq_length() would be made
> once per batch once the queue is not being drained rxq_recv() the overhead
> could be significant.
I'm not sure (I hope that Jan tested the performance of this version), but I
feel that
'rte_vhost_rx_queue_count()' is more heavy operation.
>
>>
>>>
>>>>
>>>>> + /* The DPDK API returns a uint32_t which often has invalid bits in
>>>>> the
>>>>> + * upper 16-bits. Need to restrict the value uint16_t. */
>>>>> + return rte_vhost_rx_queue_count(netdev_dpdk_get_vid(dev),
>>>>> + qid * VIRTIO_QNUM + VIRTIO_TXQ)
>>>>> + & UINT16_MAX;
>>>>> +}
>>>>> +
>>>>> +static int
>>>>> +netdev_dpdk_rxq_length(struct netdev_rxq *rxq) {
>>>>> + struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq);
>>>>> +
>>>>
>>>> Same here:
>>>>
>>>> struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
>>>>
>>>> if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) {
>>>> return -EAGAIN;
>>>> }
>>>>
>>>>> + return rte_eth_rx_queue_count(rx->port_id, rxq->queue_id); }
>>>>> +
>>>>> static inline int
>>>>> netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf
>> **pkts,
>>>>> int cnt, bool may_steal) @@ -3580,7 +3601,7 @@
>>>>> unlock:
>>>>> GET_CARRIER, GET_STATS,
>>>>> \
>>>>> GET_CUSTOM_STATS,
>> \
>>>>> GET_FEATURES, GET_STATUS, \
>>>>> - RECONFIGURE, RXQ_RECV) \
>>>>> + RECONFIGURE, RXQ_RECV, RXQ_LENGTH) \
>>>>> { \
>>>>> NAME, \
>>>>> true, /* is_pmd */ \
>>>>> @@ -3649,6 +3670,7 @@ unlock:
>>>>> RXQ_RECV, \
>>>>> NULL, /* rx_wait */ \
>>>>> NULL, /* rxq_drain */ \
>>>>> + RXQ_LENGTH, \
>>>>> NO_OFFLOAD_API \
>>>>> }
>>>>>
>>>>> @@ -3667,7 +3689,8 @@ static const struct netdev_class dpdk_class =
>>>>> netdev_dpdk_get_features,
>>>>> netdev_dpdk_get_status,
>>>>> netdev_dpdk_reconfigure,
>>>>> - netdev_dpdk_rxq_recv);
>>>>> + netdev_dpdk_rxq_recv,
>>>>> + netdev_dpdk_rxq_length);
>>>>>
>>>>> static const struct netdev_class dpdk_ring_class =
>>>>> NETDEV_DPDK_CLASS(
>>>>> @@ -3684,7 +3707,8 @@ static const struct netdev_class
>> dpdk_ring_class =
>>>>> netdev_dpdk_get_features,
>>>>> netdev_dpdk_get_status,
>>>>> netdev_dpdk_reconfigure,
>>>>> - netdev_dpdk_rxq_recv);
>>>>> + netdev_dpdk_rxq_recv,
>>>>> + NULL);
>>>>>
>>>>> static const struct netdev_class dpdk_vhost_class =
>>>>> NETDEV_DPDK_CLASS(
>>>>> @@ -3701,7 +3725,8 @@ static const struct netdev_class
>> dpdk_vhost_class =
>>>>> NULL,
>>>>> NULL,
>>>>> netdev_dpdk_vhost_reconfigure,
>>>>> - netdev_dpdk_vhost_rxq_recv);
>>>>> + netdev_dpdk_vhost_rxq_recv,
>>>>> + netdev_dpdk_vhost_rxq_length);
>>>>> static const struct netdev_class dpdk_vhost_client_class =
>>>>> NETDEV_DPDK_CLASS(
>>>>> "dpdkvhostuserclient",
>>>>> @@ -3717,7 +3742,8 @@ static const struct netdev_class
>> dpdk_vhost_client_class =
>>>>> NULL,
>>>>> NULL,
>>>>> netdev_dpdk_vhost_client_reconfigure,
>>>>> - netdev_dpdk_vhost_rxq_recv);
>>>>> + netdev_dpdk_vhost_rxq_recv,
>>>>> + netdev_dpdk_vhost_rxq_length);
>>>>>
>>>>> void
>>>>> netdev_dpdk_register(void)
>>>>> diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c index
>>>>> 4246af3..7e2c0a2 100644
>>>>> --- a/lib/netdev-dummy.c
>>>>> +++ b/lib/netdev-dummy.c
>>>>> @@ -1457,6 +1457,7 @@ netdev_dummy_update_flags(struct netdev
>> *netdev_,
>>>>> netdev_dummy_rxq_recv, \
>>>>> netdev_dummy_rxq_wait, \
>>>>> netdev_dummy_rxq_drain, \
>>>>> + NULL, /* rxq_length */ \
>>>>> \
>>>>> NO_OFFLOAD_API \
>>>>> }
>>>>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index
>>>>> 37143b8..8b19890 100644
>>>>> --- a/lib/netdev-linux.c
>>>>> +++ b/lib/netdev-linux.c
>>>>> @@ -2890,6 +2890,7 @@ netdev_linux_update_flags(struct netdev
>> *netdev_, enum netdev_flags off,
>>>>> netdev_linux_rxq_recv, \
>>>>> netdev_linux_rxq_wait, \
>>>>> netdev_linux_rxq_drain, \
>>>>> + NULL, /* rxq_length */ \
>>>>> \
>>>>> FLOW_OFFLOAD_API \
>>>>> }
>>>>> diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h index
>>>>> 25bd671..297644a 100644
>>>>> --- a/lib/netdev-provider.h
>>>>> +++ b/lib/netdev-provider.h
>>>>> @@ -801,6 +801,9 @@ struct netdev_class {
>>>>> /* Discards all packets waiting to be received from 'rx'. */
>>>>> int (*rxq_drain)(struct netdev_rxq *rx);
>>>>>
>>>>> + /* Retrieve the number of packets present in an rx queue. */
>>>>
>>>> Comment should be extended. See below.
>>>>
>>>> In addition we need to mark here that function is not thread-safe and
>>>> could not be used while device reconfiguration.
>>>
>>> I'm curious: is there any difference regarding thread-safeness and use
>> during device configuration compared to the other netdev rxq functions?
>> Why would it be necessary to list these constraints for this function but not
>> the others?
>>
>> Thread safety in general and particulary for netdev_rxq_*() functions
>> described in corresponding section at the top of lib/netdev.h . Thread safety
>> of netdev_send() in details described near to send() function definition in
>> lib/netdev-provider.h because it has it's own thread-safe related argument.
>>
>> Comment near to reconfigure() in lib/netdev-provider.h states that we must
>> not use rxq_recv() and send() simultaneously with it.
>>
>> I think, we should not describe the thread-safety and dependency with
>> reconfigure() for rxq_length() in it's own comment, but update comments
>> listed above with this new function.
>>
>>>
>>> And why should this read-only function be a priori thread-unsafe? And in
>> which sense: concurrent invocations of this function? Or concurrent
>> invocation of this function with rxq_recv() and rxq_drain()?
>>
>> At least it could return wrong result in case of concurrent invocation with
>> rxq_recv().
>>
>>>
>>> Should the netdev.h and the netdev_provider.h not rather have a general
>> disclaimer that the rxq functions are only to be called from a single thread
>> assigned "polling" the rx queue, or alternatively be protected by a lock on
>> the
>> netdev user side?
>>
>> lib/netdev.h already has general thread-safety disclaimer.
>>
>>>
>>>>
>>>>> + int (*rxq_length)(struct netdev_rxq *rx);
>>>>> +
>>>>> /* ## -------------------------------- ## */
>>>>> /* ## netdev flow offloading functions ## */
>>>>> /* ## -------------------------------- ## */ diff --git
>>>>> a/lib/netdev-vport.c b/lib/netdev-vport.c index 478ed90..1e7bc96
>>>>> 100644
>>>>> --- a/lib/netdev-vport.c
>>>>> +++ b/lib/netdev-vport.c
>>>>> @@ -944,6 +944,7 @@ netdev_vport_get_ifindex(const struct netdev
>> *netdev_)
>>>>> NULL, /* rx_recv */ \
>>>>> NULL, /* rx_wait */ \
>>>>> NULL, /* rx_drain */ \
>>>>> + NULL, /* rx_length */ \
>>>>> \
>>>>> NETDEV_FLOW_OFFLOAD_API
>>>>>
>>>>> diff --git a/lib/netdev.c b/lib/netdev.c index be05dc6..063c318
>>>>> 100644
>>>>> --- a/lib/netdev.c
>>>>> +++ b/lib/netdev.c
>>>>> @@ -724,6 +724,15 @@ netdev_rxq_drain(struct netdev_rxq *rx)
>>>>> : 0);
>>>>> }
>>>>>
>>>>> +/* Retrieve the number of packets present in an rx queue. */
>>>>
>>>> Comment should clearly declare what kind of result should be treated
>>>> as an error, and what is the result in case of success. You may use
>>>> description for 'netdev_get_ifindex' as a reference.
>>>> Something like:
>>>>
>>>> /* Returns the number of packets present in an rx queue, if
>>>> successful, as a
>>>> * positive number. On failure, returns a negative errno value.
>>>> *
>>>> * Some network devices may not implement support for this function.
>>>> In such
>>>> * cases this function will always return -EOPNOTSUPP. */
>>>
>>> OK.
>>>>
>>>>> +int
>>>>> +netdev_rxq_length(struct netdev_rxq *rx) {
>>>>> + return (rx->netdev->netdev_class->rxq_length
>>>>> + ? rx->netdev->netdev_class->rxq_length(rx)
>>>>> + : -1);
>>>>> +}
>>>>> +
>>>>> /* Configures the number of tx queues of 'netdev'. Returns 0 if
>> successful,
>>>>> * otherwise a positive errno value.
>>>>> *
>>>>> diff --git a/lib/netdev.h b/lib/netdev.h index ff1b604..edd41b1
>>>>> 100644
>>>>> --- a/lib/netdev.h
>>>>> +++ b/lib/netdev.h
>>>>> @@ -178,6 +178,7 @@ int netdev_rxq_get_queue_id(const struct
>>>>> netdev_rxq *); int netdev_rxq_recv(struct netdev_rxq *rx, struct
>>>>> dp_packet_batch *); void netdev_rxq_wait(struct netdev_rxq *); int
>>>>> netdev_rxq_drain(struct netdev_rxq *);
>>>>> +int netdev_rxq_length(struct netdev_rxq *rx);
>>>>
>>>> argument's name not needed here.
>>>
>>> OK.
>>>
>>>>
>>>>>
>>>>> /* Packet transmission. */
>>>>> int netdev_send(struct netdev *, int qid, struct dp_packet_batch *,
>>>>>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev