[dpdk-dev] [PATCH 3/3] maintainers: add stable mailing list
Signed-off-by: Yuanhan Liu --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 3df1754..076e86c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -35,6 +35,7 @@ F: scripts/test-build.sh Stable Branches --- T: git://dpdk.org/dpdk-stable +M: stable at dpdk.org Security Issues --- -- 1.9.0
[dpdk-dev] [PATCH 2/3] maintainers: update virtio section name
Signed-off-by: Yuanhan Liu --- hmm.., maybe we could seperate lib vhost and virtio pmd, into two different sections? --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index cc1ab68..3df1754 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -364,7 +364,7 @@ M: Sony Chacko F: drivers/net/qede/ F: doc/guides/nics/qede.rst -RedHat virtio +Virtio PMD and vhost lib M: Yuanhan Liu T: git://dpdk.org/next/dpdk-next-virtio F: drivers/net/virtio/ -- 1.9.0
[dpdk-dev] [PATCH 1/3] maintainers: update virtio maintainer
Huawei has left DPDK team for months, and he hasn't showed up since then. Remove him. Cc: Huawei Xie Signed-off-by: Yuanhan Liu --- MAINTAINERS | 1 - 1 file changed, 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 26d9590..cc1ab68 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -365,7 +365,6 @@ F: drivers/net/qede/ F: doc/guides/nics/qede.rst RedHat virtio -M: Huawei Xie M: Yuanhan Liu T: git://dpdk.org/next/dpdk-next-virtio F: drivers/net/virtio/ -- 1.9.0
[dpdk-dev] [PATCH 0/3] maintainers: minor updates to virtio and stable
--- Yuanhan Liu (3): maintainers: update virtio maintainer maintainers: update virtio section name maintainers: add stable mailing list MAINTAINERS | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- 1.9.0
[dpdk-dev] [dpdk-announce] DPDK 16.07.2 released
(1): net/bnxt: fix crash when closing Beilei Xing (1): net/i40e: fix floating VEB Bernard Iremonger (1): app/testpmd: fix DCB configuration David Marchand (1): ethdev: fix vendor id in debug message E. Scott Daniels (1): ethdev: prevent duplicate event callback Eric Kinzie (1): net/bonding: validate speed after link up Ferruh Yigit (4): net/ring: fix ring device creation via devargs net/bnx2x: fix build with icc kni: fix build with kernel 4.8 kni: fix build with kernel 4.9 Gowrishankar Muthukrishnan (1): examples/ip_pipeline: fix plugin loading Igor Ryzhov (1): pci: fix probing error if no driver found Jasvinder Singh (1): examples/qos_sched: fix dequeue from ring Jeff Guo (1): net/i40e: fix hash filter on X722 Jianbo Liu (2): eal/arm: fix file descriptor leak when getting CPU features eal/ppc: fix file descriptor leak when getting CPU features Jingjing Wu (2): net/i40e: fix DCB configuration doc: add limitations for i40e PMD John Daley (5): net/enic: fix flow director net/enic: fix crash with removed flow director filters net/enic: fix multi-queue Rx performance net/enic: fix crash on MTU update or Rx queue reconfigure net/enic: fix max packet length check John W. Linville (4): net/ena: improve safety of string handling net/bnxt: ensure entry length is unsigned net/i40e: do not use VSI before NULL check net/bnxt: fix bit shift size Kamil Rytarowski (1): net/thunderx: fix Tx checksum handling Michael Qiu (2): examples/tep_term: fix L4 length examples/tep_term: fix packet length with multi-segments Mohammad Abdul Awal (1): app/testpmd: fix RSS hash key size Nelio Laranjeiro (1): net/mlx5: fix Rx checksum macros Nelson Escobar (3): net/enic: revert truncated packets counter fix net/enic: document how to configure vNIC parameters net/enic: fix Rx queue index when not using Rx scatter Nipun Gupta (1): mempool: fix leak if populate fails N?lio Laranjeiro (7): net/mlx5: fix Rx function selection net/mlx5: fix hash key size retrieval net/mlx5: support Mellanox OFED 3.4 net/mlx5: re-factorize functions net/mlx5: fix inline logic net/mlx5: fix link speed capability information net/mlx5: fix support for newer link speeds Olga Shern (1): net/mlx5: fix link status report Olivier Gournet (1): net/mlx5: fix initialization in secondary process Pablo de Lara (3): app/test: fix hash multiwriter sequence hash: fix unlimited cuckoo path hash: fix bucket size usage Piotr Azarewicz (1): examples/l2fwd-crypto: fix verify with decrypt in chain Qi Zhang (4): net/ixgbe: fix out of order Rx read net/fm10k: fix out of order Rx read net/i40e: fix Rx hang when disable LLDP net/i40e: fix out of order Rx read Qiming Yang (3): net/i40e: fix link status change interrupt net/i40e: fix VF bonded device link down net/i40e: fixed build error with icc Rasesh Mody (3): net/bnx2x: fix maximum PF queues net/bnx2x: fix socket id for slowpath memory net/qede/base: fix 32-bit build Raslan Darawsheh (2): net/mlx5: fix removing VLAN filter net/mlx5: fix handling of small mbuf sizes Reshma Pattan (2): pdump: fix created directory permissions app/procinfo: free xstats memory upon failure Sagi Grimberg (1): net/mlx5: fix possible NULL dereference in Rx path Sergio Gonzalez Monroy (1): examples/ipsec-secgw: check SP only when setup Wei Dai (3): lpm: fix freeing unused sub-table on rule delete mempool: fix search of maximum contiguous pages lpm: fix freeing memory Wenzhuo Lu (6): app/testpmd: fix DCB configuration app/testpmd: fix PF/VF check of flow director net/ixgbe: fix flow director mask app/testpmd: fix flow director mask app/testpmd: fix flow director endianness net/ixgbe: fix VF registers Xiao Wang (2): net/fm10k: fix Rx checksum flags net/fm10k: fix VF Tx queue initialization Yaacov Hazan (3): net/mlx5: fix inconsistent return value in flow director net/mlx5: refactor allocation of flow director queues net/mlx5: fix flow director drop mode Yong Wang (1): net/vmxnet3: fix mbuf release on reset/stop Yuanhan Liu (2): net/virtio: revert fix restart version: 16.07.2 Zhihong Wang (1): vhost: fix Windows VM hang
[dpdk-dev] [PATCH v2] virtio: tx with can_push when VERSION_1 is set
On Wed, Nov 30, 2016 at 09:18:42AM +, Pierre Pfister (ppfister) wrote: > Current virtio driver advertises VERSION_1 support, > but does not handle device's VERSION_1 support when > sending packets (it looks for ANY_LAYOUT feature, > which is absent). > > This patch enables 'can_push' in tx path when VERSION_1 > is advertised by the device. > > This significantly improves small packets forwarding rate > towards devices advertising VERSION_1 feature. > > Signed-off-by: Pierre Pfister >From the virtio spec point of view, that VERSION_1 implies ANY_LAYOUT, this patch is right and I think we could apply this patch. No objections? --yliu > --- > drivers/net/virtio/virtio_rxtx.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/virtio/virtio_rxtx.c > b/drivers/net/virtio/virtio_rxtx.c > index 22d97a4..1e5a6b9 100644 > --- a/drivers/net/virtio/virtio_rxtx.c > +++ b/drivers/net/virtio/virtio_rxtx.c > @@ -1015,7 +1015,8 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf > **tx_pkts, uint16_t nb_pkts) > } > > /* optimize ring usage */ > - if (vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) && > + if ((vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) || > + vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) && > rte_mbuf_refcnt_read(txm) == 1 && > RTE_MBUF_DIRECT(txm) && > txm->nb_segs == 1 && > -- > 2.7.4 (Apple Git-66)
[dpdk-dev] [PATCH] doc: introduce PVP reference benchmark
On Thu, Nov 24, 2016 at 08:35:51AM +0100, Maxime Coquelin wrote: > > > On 11/24/2016 06:07 AM, Yuanhan Liu wrote: > >First of all, thanks for the doc! It's a great one. > Thanks. > I would be interested to know if you have other tuning I don't mention > in this doc. I was thinking we may need doc some performance impacts by some features, say we observed that indirect desc may be good for some cases, while may be bad for others. Also, the non mergeable Rx path outweighs the mergeable Rx path. If user cares about the perfomance and ascertains all packets fits into a typical MTU, he may likely want to disable the mergeable feature, which is enabled by default. Maybe we could start a new doc, or maybe we could add a new section here? --yliu
[dpdk-dev] [PATCH] tools: add tags and cscope index file generation support
On Sun, Nov 27, 2016 at 05:42:42AM +0530, Jerin Jacob wrote: > This script generates cscope, gtags, and tags > index files based on EAL environment. > (architecture and OS(linux/bsd)) > > Selection of the architecture and OS environment > is based on dpdk configuration target(T=) > > example usage: > make tags T=x86_64-native-linuxapp-gcc > make cscope T=x86_64-native-linuxapp-gcc > make gtags T=x86_64-native-linuxapp-gcc > > Signed-off-by: Jerin Jacob It's handy. Thanks! Reviewed-by: Yuanhan Liu --yliu
[dpdk-dev] dpdk/vpp and cross-version migration for vhost
On Thu, Nov 24, 2016 at 09:30:49AM +, Kevin Traynor wrote: > On 11/24/2016 06:31 AM, Yuanhan Liu wrote: > > On Tue, Nov 22, 2016 at 04:53:05PM +0200, Michael S. Tsirkin wrote: > >>>> You keep assuming that you have the VM started first and > >>>> figure out things afterwards, but this does not work. > >>>> > >>>> Think about a cluster of machines. You want to start a VM in > >>>> a way that will ensure compatibility with all hosts > >>>> in a cluster. > >>> > >>> I see. I was more considering about the case when the dst > >>> host (including the qemu and dpdk combo) is given, and > >>> then determine whether it will be a successfull migration > >>> or not. > >>> > >>> And you are asking that we need to know which host could > >>> be a good candidate before starting the migration. In such > >>> case, we indeed need some inputs from both the qemu and > >>> vhost-user backend. > >>> > >>> For DPDK, I think it could be simple, just as you said, it > >>> could be either a tiny script, or even a macro defined in > >>> the source code file (we extend it every time we add a > >>> new feature) to let the libvirt to read it. Or something > >>> else. > >> > >> There's the issue of APIs that tweak features as Maxime > >> suggested. > > > > Yes, it's a good point. > > > >> Maybe the only thing to do is to deprecate it, > > > > Looks like so. > > > >> but I feel some way for application to pass info into > >> guest might be benefitial. > > > > The two APIs are just for tweaking feature bits DPDK supports before > > any device got connected. It's another way to disable some features > > (the another obvious way is to through QEMU command lines). > > > > IMO, it's bit handy only in a case like: we have bunch of VMs. Instead > > of disabling something though qemu one by one, we could disable it > > once in DPDK. > > > > But I doubt the useful of it. It's only used in DPDK's vhost example > > after all. Nor is it used in vhost pmd, neither is it used in OVS. > > rte_vhost_feature_disable() is currently used in OVS, lib/netdev-dpdk.c Hmmm. I must have checked very old code ... > > netdev_dpdk_vhost_class_init(void) > { > static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; > > /* This function can be called for different classes. The > initialization > * needs to be done only once */ > if (ovsthread_once_start()) { > rte_vhost_driver_callback_register(_net_device_ops); > rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4 > | 1ULL << VIRTIO_NET_F_HOST_TSO6 > | 1ULL << VIRTIO_NET_F_CSUM); I saw the commit introduced such change, but it tells no reason why it was added. commit 362ca39639ae871806be5ae97d55e1cbb14afd92 Author: mweglicx Date: Thu Apr 14 17:40:06 2016 +0100 Update relevant artifacts to add support for DPDK 16.04. Following changes are applied: - INSTALL.DPDK.md: CONFIG_RTE_BUILD_COMBINE_LIBS step has been removed because it is no longer present in DPDK configuration (combined library is created by default), - INSTALL.DPDK.md: VHost Cuse configuration is updated, - netdev-dpdk.c: Link speed definition is changed in DPDK and netdev_dpdk_get_features is updated accordingly, - netdev-dpdk.c: TSO and checksum offload has been disabled for vhostuser device. - .travis/linux-build.sh: DPDK version is updated and legacy flags have been removed in configuration. Signed-off-by: Michal Weglicki Signed-off-by: Panu Matilainen Acked-by: Daniele Di Proietto --yliu
[dpdk-dev] dpdk/vpp and cross-version migration for vhost
On Tue, Nov 22, 2016 at 04:53:05PM +0200, Michael S. Tsirkin wrote: > > > You keep assuming that you have the VM started first and > > > figure out things afterwards, but this does not work. > > > > > > Think about a cluster of machines. You want to start a VM in > > > a way that will ensure compatibility with all hosts > > > in a cluster. > > > > I see. I was more considering about the case when the dst > > host (including the qemu and dpdk combo) is given, and > > then determine whether it will be a successfull migration > > or not. > > > > And you are asking that we need to know which host could > > be a good candidate before starting the migration. In such > > case, we indeed need some inputs from both the qemu and > > vhost-user backend. > > > > For DPDK, I think it could be simple, just as you said, it > > could be either a tiny script, or even a macro defined in > > the source code file (we extend it every time we add a > > new feature) to let the libvirt to read it. Or something > > else. > > There's the issue of APIs that tweak features as Maxime > suggested. Yes, it's a good point. > Maybe the only thing to do is to deprecate it, Looks like so. > but I feel some way for application to pass info into > guest might be benefitial. The two APIs are just for tweaking feature bits DPDK supports before any device got connected. It's another way to disable some features (the another obvious way is to through QEMU command lines). IMO, it's bit handy only in a case like: we have bunch of VMs. Instead of disabling something though qemu one by one, we could disable it once in DPDK. But I doubt the useful of it. It's only used in DPDK's vhost example after all. Nor is it used in vhost pmd, neither is it used in OVS. > > > If you don't, guest visible interface will change > > > and you won't be able to migrate. > > > > > > It does not make sense to discuss feature bits specifically > > > since that is not the only part of interface. > > > For example, max ring size supported might change. > > > > I don't quite understand why we have to consider the max ring > > size here? Isn't it a virtio device attribute, that QEMU could > > provide such compatibility information? > > > > I mean, DPDK is supposed to support vary vring size, it's QEMU > > to give a specifc value. > > If backend supports s/g of any size up to 2^16, there's no issue. I don't know others, but I see no issues in DPDK. > ATM some backends might be assuming up to 1K s/g since > QEMU never supported bigger ones. We might classify this > as a bug, or not and add a feature flag. > > But it's just an example. There might be more values at issue > in the future. Yeah, maybe. But we could analysis it one by one. > > > Let me describe how it works in qemu/libvirt. > > > When you install a VM, you can specify compatibility > > > level (aka "machine type"), and you can query the supported compatibility > > > levels. Management uses that to find the supported compatibility > > > and stores the compatibility in XML that is migrated with the VM. > > > There's also a way to find the latest level which is the > > > default unless overridden by user, again this level > > > is recorded and then > > > - management can make sure migration destination is compatible > > > - management can avoid migration to hosts without that support > > > > Thanks for the info, it helps. > > > > ... > > > > > >>As version here is an opaque string for libvirt and qemu, > > > > > >>anything can be used - but I suggest either a list > > > > > >>of values defining the interface, e.g. > > > > > >>any_layout=on,max_ring=256 > > > > > >>or a version including the name and vendor of the backend, > > > > > >>e.g. "org.dpdk.v4.5.6". > > > > The version scheme may not be ideal here. Assume a QEMU is supposed > > to work with a specific DPDK version, however, user may disable some > > newer features through qemu command line, that it also could work with > > an elder DPDK version. Using the version scheme will not allow us doing > > such migration to an elder DPDK version. The MTU is a lively example > > here? (when MTU feature is provided by QEMU but is actually disabled > > by user, that it could also work with an elder DPDK without MTU support). > > > > --yliu > > OK, so does a list of values look better to you then? Yes, if there are no better way. And I think it may be better to not list all those features, literally. But instead, using the number should be better, say, features=0xdeadbeef. Listing the feature names means we have to come to an agreement in all components involved here (QEMU, libvirt, DPDK, VPP, and maybe more backends), that we have to use the exact same feature names. Though it may not be a big deal, it lacks some flexibility. A feature bits will not have this issue. --yliu > > > > > > > >> > > > > > >>Note that typically the list of supported versions can only be > > > > > >>extended, not shrunk. Also, if the host/guest
[dpdk-dev] Proposal for a new Committer model
On Wed, Nov 23, 2016 at 03:19:19PM -0500, Neil Horman wrote: > On Wed, Nov 23, 2016 at 11:41:20PM +0800, Yuanhan Liu wrote: > > On Wed, Nov 23, 2016 at 09:11:54AM -0500, Neil Horman wrote: > > > > Could we define some of the potential subtrees now and look to > > > > introduce them in the this release cycle? EAL and the Core libs, as > > > > suggested by Thomas, seem like 2 obvious ones. > > > > > > > Sure, I'd suggest the following: > > > > I would pull the git history to see which components are in > > active status in last release (or even, in last few release). > > And try to make a sub-tree if corresponding component is hot. > > > > # the 2nd volume shows how many patches prefixed with a related component > > [yliu at yliu-dev ~/dpdk]$ git log --oneline v16.07..v16.11 | awk '{print > > $2}' | \ > > sort | uniq -c | sort -nr | head -30 | nl > > 1 52 doc: > > 2 40 net/ixgbe/base: > > 3 38 app/test: > > 4 37 kni: > > 5 27 vhost: > > 6 27 net/virtio: > > 7 27 net/mlx5: > > 8 26 app/testpmd: > > 9 25 net/i40e: > > 10 23 net/pcap: > > 11 22 net/bnxt: > > 12 20 net/enic: > > 13 18 net/qede: > > 14 17 net/thunderx: > > 15 16 net/qede/base: > > 16 16 eal: > > 17 15 net/ixgbe: > > 18 14 net: > > 19 14 crypto/qat: > > 20 13 scripts: > > 21 13 net/bnx2x: > > 22 12 net/i40e/base: > > 23 12 examples/ipsec-secgw: > > 24 11 mbuf: > > 25 11 hash: > > 26 10 lib: > > 27 10 examples/ip_pipeline: > > 28 10 ethdev: > > 299 pci: > > 307 net/vmxnet3: > > ... > > 463 pdump: > > 473 net/virtio_user: > > 483 net/ring: > > 493 net/nfp: > > 503 net/mlx: > > 513 net/ena: > > 523 net/e1000: > > 533 net/bonding: > > ... > > 562 sched: > > 572 port: > > ... > > 651 timer: > > 661 remove > > 671 pmdinfogen: > > 681 net/igb: > > 691 net/enic/base: > > 701 meter: > > ... > > 841 cfgfile: > > 851 app/procinfo: > > 861 app/proc_info: > > 871 acl: > > > > Something obvious is that: > > > > - "doc" deserves a sub-tree, and John is a perfect committer for that > > if he's willing to. > > > > - generally, I'd agree with Neil that most (if not all) pmds may need > > a sub-tree. While, some others may not, for example, net/ring, net/pcap. > > > No, thats the opposite of what I think. I think all net pmds should flow > through a single subtree, all crypto pmds through another, etc. I misunderstood it. I was think you were suggesting to create a sub-tree for most (or all) pmds. Some of my comments didn't apply then. But yes, we have already done that: we have next-net and next-crypto. > > For those non-active pmds, I think it's okay to let the generic > > pmd committer to cover them. > > > Not sure what you're getting at here. Low volume pms (or any library) can > still > go through a subtree. The goal is to fragmet the commit work so one person > doesn't have to do it all. > > > - it's not that wise to me to list all the components we have so far > > and make a sub-tree for each of them. > > > I think you misunderstood the organization of my last note. I agree with you > here. When I listed the core and listed several libraries under it, my intent > was to create a core subtree that accepted patches for all of those libraries. > > > For example, some components like librte_{port, pdump, cfgfile, acl, > > and etc} just have few (or even, just one) commits in last release. > > It makes no sense to me to introduce a tree for each of them. > > > Yes, this is what I was saying in my last note. > > > Another thought is we could also create sub-trees based on category > > but not on components like Neil suggested, especially that EAL looks > > way too big to be maintained in one tree. Instead, it could be something > > like: > > > > - a tree for BSD > > > This gets tricky, because then seve
[dpdk-dev] [PATCH] doc: introduce PVP reference benchmark
First of all, thanks for the doc! It's a great one. On Wed, Nov 23, 2016 at 10:00:06PM +0100, Maxime Coquelin wrote: > +Qemu build > +~~ > + > + .. code-block:: console > + > +git clone git://dpdk.org/dpdk > +cd dpdk > +export RTE_SDK=$PWD > +make install T=x86_64-native-linuxapp-gcc DESTDIR=install It's actually DPDK build. I will take a closer look at it and also render it to see how it looks like when I get back to office next week. --yliu > + > +DPDK build > +~~ > + > + .. code-block:: console > + > +git clone git://dpdk.org/dpdk > +cd dpdk > +export RTE_SDK=$PWD > +make install T=x86_64-native-linuxapp-gcc DESTDIR=install > +
[dpdk-dev] Proposal for a new Committer model
On Wed, Nov 23, 2016 at 09:11:54AM -0500, Neil Horman wrote: > > Could we define some of the potential subtrees now and look to introduce > > them in the this release cycle? EAL and the Core libs, as suggested by > > Thomas, seem like 2 obvious ones. > > > Sure, I'd suggest the following: I would pull the git history to see which components are in active status in last release (or even, in last few release). And try to make a sub-tree if corresponding component is hot. # the 2nd volume shows how many patches prefixed with a related component [yliu at yliu-dev ~/dpdk]$ git log --oneline v16.07..v16.11 | awk '{print $2}' | \ sort | uniq -c | sort -nr | head -30 | nl 1 52 doc: 2 40 net/ixgbe/base: 3 38 app/test: 4 37 kni: 5 27 vhost: 6 27 net/virtio: 7 27 net/mlx5: 8 26 app/testpmd: 9 25 net/i40e: 10 23 net/pcap: 11 22 net/bnxt: 12 20 net/enic: 13 18 net/qede: 14 17 net/thunderx: 15 16 net/qede/base: 16 16 eal: 17 15 net/ixgbe: 18 14 net: 19 14 crypto/qat: 20 13 scripts: 21 13 net/bnx2x: 22 12 net/i40e/base: 23 12 examples/ipsec-secgw: 24 11 mbuf: 25 11 hash: 26 10 lib: 27 10 examples/ip_pipeline: 28 10 ethdev: 299 pci: 307 net/vmxnet3: ... 463 pdump: 473 net/virtio_user: 483 net/ring: 493 net/nfp: 503 net/mlx: 513 net/ena: 523 net/e1000: 533 net/bonding: ... 562 sched: 572 port: ... 651 timer: 661 remove 671 pmdinfogen: 681 net/igb: 691 net/enic/base: 701 meter: ... 841 cfgfile: 851 app/procinfo: 861 app/proc_info: 871 acl: Something obvious is that: - "doc" deserves a sub-tree, and John is a perfect committer for that if he's willing to. - generally, I'd agree with Neil that most (if not all) pmds may need a sub-tree. While, some others may not, for example, net/ring, net/pcap. For those non-active pmds, I think it's okay to let the generic pmd committer to cover them. - it's not that wise to me to list all the components we have so far and make a sub-tree for each of them. For example, some components like librte_{port, pdump, cfgfile, acl, and etc} just have few (or even, just one) commits in last release. It makes no sense to me to introduce a tree for each of them. Another thought is we could also create sub-trees based on category but not on components like Neil suggested, especially that EAL looks way too big to be maintained in one tree. Instead, it could be something like: - a tree for BSD - a tree for ARM (and some other trees for other platforms) - a tree for mem related (mempool, mbuf, hugepage, etc) - a tree for BUS - ... Last but not the least, I think it's general good to have more and more trees in the end. But I don't think it's a good idea to go radically and create all those trees once (say in one release). Something I would like to suggest is one or two (or a bit more) at a release. For example, if I remember them well, we have next-net tree at 16.04, and next-virtio (including vhost) at 16.07, and a recent one, next-crypto at 16.11. --yliu > * net-pmds: > - all network pmds located under drivers/net > - librte_net > - libtre_ether > - librte_ip_frag > - librte_pdump > - librte_port > * crypto-pmds: > - all crypto pmds located under drivers/crypto > - librte_cryptodev > * eal: > - librte_eal > * core: > - librte_cfgfile > - librte_cmdline > - librte_compat > - librte_kvargs > - librte_kni > - librte_compat > * misc: > - librte_acl > - librte_distributor > - librte_hash > - librte_jobstats > - librte_lpm > - librte_meter > - librte_pipeline > - librte_power > - librte_reorder > - librte_ring > - librte_sched > - librte_table > - librte_timer > - librte_vhost > > Thats just a rough stab mind, but perhaps it would get the ball rolling. I'd > be > willing to take maintainership of one of these subtrees if there is consensus > around my doing so.
[dpdk-dev] dpdk/vpp and cross-version migration for vhost
On Thu, Nov 17, 2016 at 07:37:09PM +0200, Michael S. Tsirkin wrote: > On Thu, Nov 17, 2016 at 05:49:36PM +0800, Yuanhan Liu wrote: > > On Thu, Nov 17, 2016 at 09:47:09AM +0100, Maxime Coquelin wrote: > > > > > > > > > On 11/17/2016 09:29 AM, Yuanhan Liu wrote: > > > >As usaual, sorry for late response :/ > > > > > > > >On Thu, Oct 13, 2016 at 08:50:52PM +0300, Michael S. Tsirkin wrote: > > > >>Hi! > > > >>So it looks like we face a problem with cross-version > > > >>migration when using vhost. It's not new but became more > > > >>acute with the advent of vhost user. > > > >> > > > >>For users to be able to migrate between different versions > > > >>of the hypervisor the interface exposed to guests > > > >>by hypervisor must stay unchanged. > > > >> > > > >>The problem is that a qemu device is connected > > > >>to a backend in another process, so the interface > > > >>exposed to guests depends on the capabilities of that > > > >>process. > > > >> > > > >>Specifically, for vhost user interface based on virtio, this includes > > > >>the "host features" bitmap that defines the interface, as well as more > > > >>host values such as the max ring size. Adding new features/changing > > > >>values to this interface is required to make progress, but on the other > > > >>hand we need ability to get the old host features to be compatible. > > > > > > > >It looks like to the same issue of vhost-user reconnect to me. For > > > >example, > > > > > > > >- start dpdk 16.07 & qemu 2.5 > > > >- kill dpdk > > > >- start dpdk 16.11 > > > > > > > >Though DPDK 16.11 has more features comparing to dpdk 16.07 (say, > > > >indirect), > > > >above should work. Because qemu saves the negotiated features before the > > > >disconnect and stores it back after the reconnection. > > > > > > > >commit a463215b087c41d7ca94e51aa347cde523831873 > > > >Author: Marc-Andr? Lureau > > > >Date: Mon Jun 6 18:45:05 2016 +0200 > > > > > > > >vhost-net: save & restore vhost-user acked features > > > > > > > >The initial vhost-user connection sets the features to be > > > > negotiated > > > >with the driver. Renegotiation isn't possible without device > > > > reset. > > > > > > > >To handle reconnection of vhost-user backend, ensure the same > > > > set of > > > >features are provided, and reuse already acked features. > > > > > > > >Signed-off-by: Marc-Andr? Lureau > > > > > > > > > > > >So we could do similar to vhost-user? I mean, save the acked features > > > >before migration and store it back after it. This should be able to > > > >keep the compatibility. If user downgrades DPDK version, it also could > > > >be easily detected, and then exit with an error to user: migration > > > >failed due to un-compatible vhost features. > > > > > > > >Just some rough thoughts. Makes tiny sense? > > > > > > My understanding is that the management tool has to know whether > > > versions are compatible before initiating the migration: > > > > Makes sense. How about getting and restoring the acked features through > > qemu command lines then, say, through the monitor interface? > > > > With that, it would be something like: > > > > - start vhost-user backend (DPDK, VPP, or whatever) & qemu in the src host > > > > - read the acked features (through monitor interface) > > > > - start vhost-user backend in the dst host > > > > - start qemu in the dst host with the just queried acked features > > > > QEMU then is expected to use this feature set for the later vhost-user > > feature negotitation. Exit if features compatibility is broken. > > > > Thoughts? > > > > --yliu > > > You keep assuming that you have the VM started first and > figure out things afterwards, but this does not work. > > Think about a cluster of machines. You want to start a VM in > a way that will ensure compatibility with all hosts > in a cluster. I see. I was more considering about the case when the dst host (includ
[dpdk-dev] 16.07.2 stable patches review and test
Hi all, Here is a list of patches targeted for 16.07.2 release. Please help reviewing and testing. The planned date for the final release is 30th, Nov. Before that, please shout if anyone has objections with these patches being applied, or if I missed some important fixes. These patches are located at branch 16.07 of dpdk-stable repo: http://dpdk.org/browse/dpdk-stable/ Please also be noted that this would be the last stable release for 16.07. Thanks. --yliu --- Adrien Mazarguil (1): net/mlx5: fix Rx VLAN offload capability report Ajit Khaparde (1): net/bnxt: fix crash when closing Beilei Xing (1): net/i40e: fix floating VEB Bernard Iremonger (1): app/testpmd: fix DCB configuration David Marchand (1): ethdev: fix vendor id in debug message E. Scott Daniels (1): ethdev: prevent duplicate event callback Eric Kinzie (1): net/bonding: validate speed after link up Ferruh Yigit (4): net/ring: fix ring device creation via devargs net/bnx2x: fix build with icc kni: fix build with kernel 4.8 kni: fix build with kernel 4.9 Gowrishankar Muthukrishnan (1): examples/ip_pipeline: fix plugin loading Igor Ryzhov (1): pci: fix probing error if no driver found Jasvinder Singh (1): examples/qos_sched: fix dequeue from ring Jeff Guo (1): net/i40e: fix hash filter on X722 Jianbo Liu (2): eal/arm: fix file descriptor leak when getting CPU features eal/ppc: fix file descriptor leak when getting CPU features Jingjing Wu (2): net/i40e: fix DCB configuration doc: add limitations for i40e PMD John Daley (5): net/enic: fix flow director net/enic: fix crash with removed flow director filters net/enic: fix multi-queue Rx performance net/enic: fix crash on MTU update or Rx queue reconfigure net/enic: fix max packet length check John W. Linville (4): net/ena: improve safety of string handling net/bnxt: ensure entry length is unsigned net/i40e: do not use VSI before NULL check net/bnxt: fix bit shift size Kamil Rytarowski (1): net/thunderx: fix Tx checksum handling Michael Qiu (2): examples/tep_term: fix L4 length examples/tep_term: fix packet length with multi-segments Mohammad Abdul Awal (1): app/testpmd: fix RSS hash key size Nelio Laranjeiro (1): net/mlx5: fix Rx checksum macros Nelson Escobar (3): net/enic: revert truncated packets counter fix net/enic: document how to configure vNIC parameters net/enic: fix Rx queue index when not using Rx scatter Nipun Gupta (1): mempool: fix leak if populate fails N?lio Laranjeiro (7): net/mlx5: fix Rx function selection net/mlx5: fix hash key size retrieval net/mlx5: support Mellanox OFED 3.4 net/mlx5: re-factorize functions net/mlx5: fix inline logic net/mlx5: fix link speed capability information net/mlx5: fix support for newer link speeds Olga Shern (1): net/mlx5: fix link status report Olivier Gournet (1): net/mlx5: fix initialization in secondary process Pablo de Lara (3): app/test: fix hash multiwriter sequence hash: fix unlimited cuckoo path hash: fix bucket size usage Piotr Azarewicz (1): examples/l2fwd-crypto: fix verify with decrypt in chain Qi Zhang (4): net/ixgbe: fix out of order Rx read net/fm10k: fix out of order Rx read net/i40e: fix Rx hang when disable LLDP net/i40e: fix out of order Rx read Qiming Yang (2): net/i40e: fix link status change interrupt net/i40e: fix VF bonded device link down Rasesh Mody (3): net/bnx2x: fix maximum PF queues net/bnx2x: fix socket id for slowpath memory net/qede/base: fix 32-bit build Raslan Darawsheh (2): net/mlx5: fix removing VLAN filter net/mlx5: fix handling of small mbuf sizes Reshma Pattan (2): pdump: fix created directory permissions app/procinfo: free xstats memory upon failure Sagi Grimberg (1): net/mlx5: fix possible NULL dereference in Rx path Sergio Gonzalez Monroy (1): examples/ipsec-secgw: check SP only when setup Wei Dai (3): lpm: fix freeing unused sub-table on rule delete mempool: fix search of maximum contiguous pages lpm: fix freeing memory Wenzhuo Lu (6): app/testpmd: fix DCB configuration app/testpmd: fix PF/VF check of flow director net/ixgbe: fix flow director mask app/testpmd: fix flow director mask app/testpmd: fix flow director endianness net/ixgbe: fix VF registers Xiao Wang (2): net/fm10k: fix Rx checksum flags net/fm10k: fix VF Tx queue initialization Yaacov Hazan (3): net/mlx5: fix inconsistent return value in flow director net/mlx5: refactor allocation of flow director queues net/mlx5: fix flow director drop mode Yong Wang (1): net/vmxnet3: fix mbuf release on reset/stop Yuanhan Liu (1
[dpdk-dev] dpdk/vpp and cross-version migration for vhost
On Thu, Nov 17, 2016 at 09:47:09AM +0100, Maxime Coquelin wrote: > > > On 11/17/2016 09:29 AM, Yuanhan Liu wrote: > >As usaual, sorry for late response :/ > > > >On Thu, Oct 13, 2016 at 08:50:52PM +0300, Michael S. Tsirkin wrote: > >>Hi! > >>So it looks like we face a problem with cross-version > >>migration when using vhost. It's not new but became more > >>acute with the advent of vhost user. > >> > >>For users to be able to migrate between different versions > >>of the hypervisor the interface exposed to guests > >>by hypervisor must stay unchanged. > >> > >>The problem is that a qemu device is connected > >>to a backend in another process, so the interface > >>exposed to guests depends on the capabilities of that > >>process. > >> > >>Specifically, for vhost user interface based on virtio, this includes > >>the "host features" bitmap that defines the interface, as well as more > >>host values such as the max ring size. Adding new features/changing > >>values to this interface is required to make progress, but on the other > >>hand we need ability to get the old host features to be compatible. > > > >It looks like to the same issue of vhost-user reconnect to me. For example, > > > >- start dpdk 16.07 & qemu 2.5 > >- kill dpdk > >- start dpdk 16.11 > > > >Though DPDK 16.11 has more features comparing to dpdk 16.07 (say, indirect), > >above should work. Because qemu saves the negotiated features before the > >disconnect and stores it back after the reconnection. > > > >commit a463215b087c41d7ca94e51aa347cde523831873 > >Author: Marc-Andr? Lureau > >Date: Mon Jun 6 18:45:05 2016 +0200 > > > >vhost-net: save & restore vhost-user acked features > > > >The initial vhost-user connection sets the features to be negotiated > >with the driver. Renegotiation isn't possible without device reset. > > > >To handle reconnection of vhost-user backend, ensure the same set of > >features are provided, and reuse already acked features. > > > >Signed-off-by: Marc-Andr? Lureau > > > > > >So we could do similar to vhost-user? I mean, save the acked features > >before migration and store it back after it. This should be able to > >keep the compatibility. If user downgrades DPDK version, it also could > >be easily detected, and then exit with an error to user: migration > >failed due to un-compatible vhost features. > > > >Just some rough thoughts. Makes tiny sense? > > My understanding is that the management tool has to know whether > versions are compatible before initiating the migration: Makes sense. How about getting and restoring the acked features through qemu command lines then, say, through the monitor interface? With that, it would be something like: - start vhost-user backend (DPDK, VPP, or whatever) & qemu in the src host - read the acked features (through monitor interface) - start vhost-user backend in the dst host - start qemu in the dst host with the just queried acked features QEMU then is expected to use this feature set for the later vhost-user feature negotitation. Exit if features compatibility is broken. Thoughts? --yliu > 1. The downtime could be unpredictable if a VM has to move from hosts > to hosts multiple times, which is problematic, especially for NFV. > 2. If migration is not possible, maybe the management tool would > prefer not to interrupt the VM on current host. > > I have little experience with migration though, so I could be mistaken. > > Thanks, > Maxime > > > > > --yliu > >> > >>To solve this problem within qemu, qemu has a versioning system based on > >>a machine type concept which fundamentally is a version string, by > >>specifying that string one can get hardware compatible with a previous > >>qemu version. QEMU also reports the latest version and list of versions > >>supported so libvirt records the version at VM creation and then is > >>careful to use this machine version whenever it migrates a VM. > >> > >>One might wonder how is this solved with a kernel vhost backend. The > >>answer is that it mostly isn't - instead an assumption is made, that > >>qemu versions are deployed together with the kernel - this is generally > >>true for downstreams. Thus whenever qemu gains a new feature, it is > >>already supported by the kernel as well. However, if one attempts > >>migration with a new qemu fr
[dpdk-dev] dpdk/vpp and cross-version migration for vhost
As usaual, sorry for late response :/ On Thu, Oct 13, 2016 at 08:50:52PM +0300, Michael S. Tsirkin wrote: > Hi! > So it looks like we face a problem with cross-version > migration when using vhost. It's not new but became more > acute with the advent of vhost user. > > For users to be able to migrate between different versions > of the hypervisor the interface exposed to guests > by hypervisor must stay unchanged. > > The problem is that a qemu device is connected > to a backend in another process, so the interface > exposed to guests depends on the capabilities of that > process. > > Specifically, for vhost user interface based on virtio, this includes > the "host features" bitmap that defines the interface, as well as more > host values such as the max ring size. Adding new features/changing > values to this interface is required to make progress, but on the other > hand we need ability to get the old host features to be compatible. It looks like to the same issue of vhost-user reconnect to me. For example, - start dpdk 16.07 & qemu 2.5 - kill dpdk - start dpdk 16.11 Though DPDK 16.11 has more features comparing to dpdk 16.07 (say, indirect), above should work. Because qemu saves the negotiated features before the disconnect and stores it back after the reconnection. commit a463215b087c41d7ca94e51aa347cde523831873 Author: Marc-Andr? Lureau Date: Mon Jun 6 18:45:05 2016 +0200 vhost-net: save & restore vhost-user acked features The initial vhost-user connection sets the features to be negotiated with the driver. Renegotiation isn't possible without device reset. To handle reconnection of vhost-user backend, ensure the same set of features are provided, and reuse already acked features. Signed-off-by: Marc-Andr? Lureau So we could do similar to vhost-user? I mean, save the acked features before migration and store it back after it. This should be able to keep the compatibility. If user downgrades DPDK version, it also could be easily detected, and then exit with an error to user: migration failed due to un-compatible vhost features. Just some rough thoughts. Makes tiny sense? --yliu > > To solve this problem within qemu, qemu has a versioning system based on > a machine type concept which fundamentally is a version string, by > specifying that string one can get hardware compatible with a previous > qemu version. QEMU also reports the latest version and list of versions > supported so libvirt records the version at VM creation and then is > careful to use this machine version whenever it migrates a VM. > > One might wonder how is this solved with a kernel vhost backend. The > answer is that it mostly isn't - instead an assumption is made, that > qemu versions are deployed together with the kernel - this is generally > true for downstreams. Thus whenever qemu gains a new feature, it is > already supported by the kernel as well. However, if one attempts > migration with a new qemu from a system with a new to old kernel, one > would get a failure. > > In the world where we have multiple userspace backends, with some of > these supplied by ISVs, this seems non-realistic. > > IMO we need to support vhost backend versioning, ideally > in a way that will also work for vhost kernel backends. > > So I'd like to get some input from both backend and management > developers on what a good solution would look like. > > If we want to emulate the qemu solution, this involves adding the > concept of interface versions to dpdk. For example, dpdk could supply a > file (or utility printing?) with list of versions: latest and versions > supported. libvirt could read that and > - store latest version at vm creation > - pass it around with the vm > - pass it to qemu > > >From here, qemu could pass this over the vhost-user channel, > thus making sure it's initialized with the correct > compatible interface. > > As version here is an opaque string for libvirt and qemu, > anything can be used - but I suggest either a list > of values defining the interface, e.g. > any_layout=on,max_ring=256 > or a version including the name and vendor of the backend, > e.g. "org.dpdk.v4.5.6". > > Note that typically the list of supported versions can only be > extended, not shrunk. Also, if the host/guest interface > does not change, don't change the current version as > this just creates work for everyone. > > Thoughts? Would this work well for management? dpdk? vpp? > > Thanks! > > -- > MST
[dpdk-dev] [PATCH v2 10/10] net/virtio: fix multiple queue enabling
On Sat, Nov 05, 2016 at 05:41:05PM +0800, Yuanhan Liu wrote: > When queue number shrinks to 1 from X, the following code stops us > sending the multiple queue ctrl message: > > if (nb_queues > 1) { > if (virtio_set_multiple_queues(dev, nb_queues) != 0) > return -EINVAL; > } > > This ends up with still X queues being enabled, which is obviously > wrong. Fix it by removing the check. > > Fixes: 823ad647950a ("virtio: support multiple queues") > > Signed-off-by: Yuanhan Liu This breaks the virtio-user case, where ctrl-queue is not enabled by defeault. Here is an update patch would fix this. --yliu --- >From 22502943764a99b1398d40f0110f8ce28323323a Mon Sep 17 00:00:00 2001 From: Yuanhan Liu <yuanhan@linux.intel.com> Date: Sat, 5 Nov 2016 16:53:27 +0800 Subject: [PATCH v3 10/10] net/virtio: fix multiple queue enabling When queue number shrinks to 1 from X, the following code stops us sending the multiple queue ctrl message: if (nb_queues > 1) { if (virtio_set_multiple_queues(dev, nb_queues) != 0) return -EINVAL; } This ends up with still X queues being enabled, which is obviously wrong. Fix it by replacing the check with a multiple queue enabled or not check. Fixes: 823ad647950a ("virtio: support multiple queues") Signed-off-by: Yuanhan Liu --- v3: - fix the virtio-user case, which is default with ctrl-queue being disabled. Thus virtio_set_multiple_queues fails. --- drivers/net/virtio/virtio_ethdev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index d70bd00..18da98f 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1472,6 +1472,7 @@ virtio_dev_start(struct rte_eth_dev *dev) uint16_t nb_queues, i; struct virtnet_rx *rxvq; struct virtnet_tx *txvq __rte_unused; + struct virtio_hw *hw = dev->data->dev_private; /* check if lsc interrupt feature is enabled */ if (dev->data->dev_conf.intr_conf.lsc) { @@ -1494,7 +1495,7 @@ virtio_dev_start(struct rte_eth_dev *dev) *vhost backend will have no chance to be waked up */ nb_queues = RTE_MAX(dev->data->nb_rx_queues, dev->data->nb_tx_queues); - if (nb_queues > 1) { + if (hw->max_queue_pairs > 1) { if (virtio_set_multiple_queues(dev, nb_queues) != 0) return -EINVAL; } -- 1.9.0
[dpdk-dev] [PATCH v2 09/10] net/virtio: fix less queues being enabled issue
>From the virtio spec of view, multiple-queue is always enabled/disabled in queue pairs. DPDK somehow allows the case when Tx and Rx queue number are different. Currently, virtio PMD get the queue pair number from the nb_rx_queues field, which could be an issue when Tx queue number > Rx queue number. Say, 2 Tx queues and 1 Rx queues. This would end up with 1 quues being enabled. Which is wrong. The fix is straightforward. Just pick a bigger number and enable that many of queues. Fixes: 823ad647950a ("virtio: support multiple queues") Signed-off-by: Yuanhan Liu --- drivers/net/virtio/virtio_ethdev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 5ec3e0e..d70bd00 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1493,7 +1493,7 @@ virtio_dev_start(struct rte_eth_dev *dev) *Otherwise the tap backend might already stop its queue due to fullness. *vhost backend will have no chance to be waked up */ - nb_queues = dev->data->nb_rx_queues; + nb_queues = RTE_MAX(dev->data->nb_rx_queues, dev->data->nb_tx_queues); if (nb_queues > 1) { if (virtio_set_multiple_queues(dev, nb_queues) != 0) return -EINVAL; @@ -1501,7 +1501,7 @@ virtio_dev_start(struct rte_eth_dev *dev) PMD_INIT_LOG(DEBUG, "nb_queues=%d", nb_queues); - for (i = 0; i < nb_queues; i++) { + for (i = 0; i < dev->data->nb_rx_queues; i++) { rxvq = dev->data->rx_queues[i]; virtqueue_notify(rxvq->vq); } -- 1.9.0
[dpdk-dev] [PATCH v2 08/10] net/virtio: remove started field
The "hw->started" field was introduced to stop touching queues on restart. We never touches queues on restart any more, thus it's safe to remove this flag. Signed-off-by: Yuanhan Liu Reviewed-by: Maxime Coquelin --- drivers/net/virtio/virtio_ethdev.c | 15 ++- drivers/net/virtio/virtio_pci.h| 1 - 2 files changed, 2 insertions(+), 14 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 15c48b9..5ec3e0e 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -602,7 +602,6 @@ virtio_dev_close(struct rte_eth_dev *dev) if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC) vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR); vtpci_reset(hw); - hw->started = 0; virtio_dev_free_mbufs(dev); virtio_free_queues(hw); } @@ -1345,17 +1344,14 @@ static int eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev) { struct rte_pci_device *pci_dev; - struct virtio_hw *hw = eth_dev->data->dev_private; PMD_INIT_FUNC_TRACE(); if (rte_eal_process_type() == RTE_PROC_SECONDARY) return -EPERM; - if (hw->started == 1) { - virtio_dev_stop(eth_dev); - virtio_dev_close(eth_dev); - } + virtio_dev_stop(eth_dev); + virtio_dev_close(eth_dev); pci_dev = eth_dev->pci_dev; eth_dev->dev_ops = NULL; @@ -1474,7 +1470,6 @@ static int virtio_dev_start(struct rte_eth_dev *dev) { uint16_t nb_queues, i; - struct virtio_hw *hw = dev->data->dev_private; struct virtnet_rx *rxvq; struct virtnet_tx *txvq __rte_unused; @@ -1494,12 +1489,6 @@ virtio_dev_start(struct rte_eth_dev *dev) /* Initialize Link state */ virtio_dev_link_update(dev, 0); - /* On restart after stop do not touch queues */ - if (hw->started) - return 0; - - hw->started = 1; - /*Notify the backend *Otherwise the tap backend might already stop its queue due to fullness. *vhost backend will have no chance to be waked up diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index f63f76c..de271bf 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -252,7 +252,6 @@ struct virtio_hw { uint16_tvtnet_hdr_size; uint8_t vlan_strip; uint8_t use_msix; - uint8_t started; uint8_t modern; uint8_t use_simple_rxtx; uint8_t mac_addr[ETHER_ADDR_LEN]; -- 1.9.0
[dpdk-dev] [PATCH v2 07/10] net/virtio: complete init stage at the right place
Invoking vtpci_reinit_complete() at port start stage doesn't make any sense, instead, it should be done at the end of dev init stage. So move it here. Signed-off-by: Yuanhan Liu Reviewed-by: Maxime Coquelin --- drivers/net/virtio/virtio_ethdev.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index a3e2aa9..15c48b9 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1277,6 +1277,7 @@ virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features) ret = virtio_alloc_queues(eth_dev); if (ret < 0) return ret; + vtpci_reinit_complete(hw); if (pci_dev) PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x", @@ -1497,8 +1498,6 @@ virtio_dev_start(struct rte_eth_dev *dev) if (hw->started) return 0; - vtpci_reinit_complete(hw); - hw->started = 1; /*Notify the backend -- 1.9.0
[dpdk-dev] [PATCH v2 06/10] net/virtio: move queue configure code to proper place
The only piece of code of virtio_dev_rxtx_start() is actually doing queue configure/setup work. So, move it to corresponding queue_setup callback. Once that is done, virtio_dev_rxtx_start() becomes an empty function, thus it's being removed. Signed-off-by: Yuanhan Liu Reviewed-by: Maxime Coquelin --- drivers/net/virtio/virtio_ethdev.c | 2 - drivers/net/virtio/virtio_ethdev.h | 2 - drivers/net/virtio/virtio_rxtx.c | 186 - 3 files changed, 78 insertions(+), 112 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 82dcc97..a3e2aa9 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1497,8 +1497,6 @@ virtio_dev_start(struct rte_eth_dev *dev) if (hw->started) return 0; - /* Do final configuration before rx/tx engine starts */ - virtio_dev_rxtx_start(dev); vtpci_reinit_complete(hw); hw->started = 1; diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h index 8a3fa6d..27d9a19 100644 --- a/drivers/net/virtio/virtio_ethdev.h +++ b/drivers/net/virtio/virtio_ethdev.h @@ -78,8 +78,6 @@ void virtio_dev_cq_start(struct rte_eth_dev *dev); /* * RX/TX function prototypes */ -void virtio_dev_rxtx_start(struct rte_eth_dev *dev); - int virtio_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id, uint16_t nb_rx_desc, unsigned int socket_id, const struct rte_eth_rxconf *rx_conf, diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 24129d6..22d97a4 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -388,112 +388,6 @@ virtio_dev_cq_start(struct rte_eth_dev *dev) } } -void -virtio_dev_rxtx_start(struct rte_eth_dev *dev) -{ - /* -* Start receive and transmit vrings -* -Setup vring structure for all queues -* -Initialize descriptor for the rx vring -* -Allocate blank mbufs for the each rx descriptor -* -*/ - uint16_t i; - uint16_t desc_idx; - struct virtio_hw *hw = dev->data->dev_private; - - PMD_INIT_FUNC_TRACE(); - - /* Start rx vring. */ - for (i = 0; i < dev->data->nb_rx_queues; i++) { - struct virtnet_rx *rxvq = dev->data->rx_queues[i]; - struct virtqueue *vq = rxvq->vq; - int error, nbufs; - struct rte_mbuf *m; - - if (rxvq->mpool == NULL) { - rte_exit(EXIT_FAILURE, - "Cannot allocate mbufs for rx virtqueue"); - } - - /* Allocate blank mbufs for the each rx descriptor */ - nbufs = 0; - error = ENOSPC; - - if (hw->use_simple_rxtx) { - for (desc_idx = 0; desc_idx < vq->vq_nentries; -desc_idx++) { - vq->vq_ring.avail->ring[desc_idx] = desc_idx; - vq->vq_ring.desc[desc_idx].flags = - VRING_DESC_F_WRITE; - } - } - - memset(>fake_mbuf, 0, sizeof(rxvq->fake_mbuf)); - for (desc_idx = 0; desc_idx < RTE_PMD_VIRTIO_RX_MAX_BURST; -desc_idx++) { - vq->sw_ring[vq->vq_nentries + desc_idx] = - >fake_mbuf; - } - - while (!virtqueue_full(vq)) { - m = rte_mbuf_raw_alloc(rxvq->mpool); - if (m == NULL) - break; - - /** - * Enqueue allocated buffers* - ***/ - if (hw->use_simple_rxtx) - error = virtqueue_enqueue_recv_refill_simple(vq, m); - else - error = virtqueue_enqueue_recv_refill(vq, m); - - if (error) { - rte_pktmbuf_free(m); - break; - } - nbufs++; - } - - vq_update_avail_idx(vq); - - PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs); - - VIRTQUEUE_DUMP(vq); - } - - /* Start tx vring. */ - for (i = 0; i < dev->data->nb_tx_queues; i++) { - struct virtnet_tx *txvq = dev->data->tx_queues[i]; - struct virtqueue *vq = txvq->vq; - - if (hw->use_simple_rxtx) { -
[dpdk-dev] [PATCH v2 05/10] net/virtio: initiate vring at init stage
virtio_dev_vring_start() is actually doing the vring initiation job. And the vring initiation job should be done at the dev init stage, as stated with great details in former commit. So move it there, and rename it to virtio_init_vring(). Signed-off-by: Yuanhan Liu Reviewed-by: Maxime Coquelin --- drivers/net/virtio/virtio_ethdev.c | 32 +++- drivers/net/virtio/virtio_rxtx.c | 32 2 files changed, 31 insertions(+), 33 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 67a175d..82dcc97 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -308,6 +308,35 @@ virtio_get_nr_vq(struct virtio_hw *hw) return nr_vq; } +static void +virtio_init_vring(struct virtqueue *vq) +{ + int size = vq->vq_nentries; + struct vring *vr = >vq_ring; + uint8_t *ring_mem = vq->vq_ring_virt_mem; + + PMD_INIT_FUNC_TRACE(); + + /* +* Reinitialise since virtio port might have been stopped and restarted +*/ + memset(ring_mem, 0, vq->vq_ring_size); + vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN); + vq->vq_used_cons_idx = 0; + vq->vq_desc_head_idx = 0; + vq->vq_avail_idx = 0; + vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1); + vq->vq_free_cnt = vq->vq_nentries; + memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries); + + vring_desc_init(vr->desc, size); + + /* +* Disable device(host) interrupting guest +*/ + virtqueue_disable_intr(vq); +} + static int virtio_init_queue(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx) { @@ -371,7 +400,6 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx) vq->hw = hw; vq->vq_queue_index = vtpci_queue_idx; vq->vq_nentries = vq_size; - vq->vq_free_cnt = vq_size; /* * Reserve a memzone for vring elements @@ -402,6 +430,8 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx) PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%" PRIx64, (uint64_t)(uintptr_t)mz->addr); + virtio_init_vring(vq); + if (sz_hdr_mz) { snprintf(vq_hdr_name, sizeof(vq_hdr_name), "port%d_vq%d_hdr", dev->data->port_id, vtpci_queue_idx); diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 6e7ff27..24129d6 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -378,42 +378,12 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie, vq_update_avail_ring(vq, head_idx); } -static void -virtio_dev_vring_start(struct virtqueue *vq) -{ - int size = vq->vq_nentries; - struct vring *vr = >vq_ring; - uint8_t *ring_mem = vq->vq_ring_virt_mem; - - PMD_INIT_FUNC_TRACE(); - - /* -* Reinitialise since virtio port might have been stopped and restarted -*/ - memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size); - vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN); - vq->vq_used_cons_idx = 0; - vq->vq_desc_head_idx = 0; - vq->vq_avail_idx = 0; - vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1); - vq->vq_free_cnt = vq->vq_nentries; - memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries); - - vring_desc_init(vr->desc, size); - - /* -* Disable device(host) interrupting guest -*/ - virtqueue_disable_intr(vq); -} - void virtio_dev_cq_start(struct rte_eth_dev *dev) { struct virtio_hw *hw = dev->data->dev_private; if (hw->cvq && hw->cvq->vq) { - virtio_dev_vring_start(hw->cvq->vq); VIRTQUEUE_DUMP((struct virtqueue *)hw->cvq->vq); } } @@ -441,7 +411,6 @@ virtio_dev_rxtx_start(struct rte_eth_dev *dev) int error, nbufs; struct rte_mbuf *m; - virtio_dev_vring_start(vq); if (rxvq->mpool == NULL) { rte_exit(EXIT_FAILURE, "Cannot allocate mbufs for rx virtqueue"); @@ -499,7 +468,6 @@ virtio_dev_rxtx_start(struct rte_eth_dev *dev) struct virtnet_tx *txvq = dev->data->tx_queues[i]; struct virtqueue *vq = txvq->vq; - virtio_dev_vring_start(vq); if (hw->use_simple_rxtx) { uint16_t mid_idx = vq->vq_nentries >> 1; -- 1.9.0
[dpdk-dev] [PATCH v2 04/10] net/virtio: allocate queue at init stage
Queue allocation should be done once, since the queue related info (such as vring addreess) will only be informed to the vhost-user backend once without virtio device reset. That means, if you allocate queues again after the vhost-user negotiation, the vhost-user backend will not be informed any more. Leading to a state that the vring info mismatches between virtio PMD driver and vhost-backend: the driver switches to the new address has just been allocated, while the vhost-backend still sticks to the old address has been assigned in the init stage. Unfortunately, that is exactly how the virtio driver is coded so far: queue allocation is done at queue_setup stage (when rte_eth_tx/rx_queue_setup is invoked). This is wrong, because queue_setup can be invoked several times. For example, $ start_testpmd.sh ... --txq=1 --rxq=1 ... > port stop 0 > port config all txq 1 # just trigger the queue_setup callback again > port config all rxq 1 > port start 0 The right way to do is allocate the queues in the init stage, so that the vring info could be persistent with the vhost-user backend. Besides that, we should allocate max_queue pairs the device supports, but not nr queue pairs firstly configured, to make following case work. $ start_testpmd.sh ... --txq=1 --rxq=1 ... > port stop 0 > port config all txq 2 > port config all rxq 2 > port start 0 Since the allocation is switched to init stage, the free should also moved from the rx/tx_queue_release to dev close stage. That leading we could do nothing an empty rx/tx_queue_release() implementation. Signed-off-by: Yuanhan Liu --- v2: - free queues on dev close --- drivers/net/virtio/virtio_ethdev.c | 162 + drivers/net/virtio/virtio_ethdev.h | 14 drivers/net/virtio/virtio_pci.h| 2 + drivers/net/virtio/virtio_rxtx.c | 83 +-- drivers/net/virtio/virtqueue.h | 1 - 5 files changed, 111 insertions(+), 151 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 5a2c14b..67a175d 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -280,28 +280,36 @@ virtio_set_multiple_queues(struct rte_eth_dev *dev, uint16_t nb_queues) return 0; } -void -virtio_dev_queue_release(struct virtqueue *vq) +static void +virtio_dev_queue_release(void *queue __rte_unused) { - struct virtio_hw *hw; + /* do nothing */ +} - if (vq) { - hw = vq->hw; - if (vq->configured) - hw->vtpci_ops->del_queue(hw, vq); +static int +virtio_get_queue_type(struct virtio_hw *hw, uint16_t vtpci_queue_idx) +{ + if (vtpci_queue_idx == hw->max_queue_pairs * 2) + return VTNET_CQ; + else if (vtpci_queue_idx % 2 == 0) + return VTNET_RQ; + else + return VTNET_TQ; +} - rte_free(vq->sw_ring); - rte_free(vq); - } +static uint16_t +virtio_get_nr_vq(struct virtio_hw *hw) +{ + uint16_t nr_vq = hw->max_queue_pairs * 2; + + if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) + nr_vq += 1; + + return nr_vq; } -int virtio_dev_queue_setup(struct rte_eth_dev *dev, - int queue_type, - uint16_t queue_idx, - uint16_t vtpci_queue_idx, - uint16_t nb_desc, - unsigned int socket_id, - void **pvq) +static int +virtio_init_queue(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx) { char vq_name[VIRTQUEUE_MAX_NAME_SZ]; char vq_hdr_name[VIRTQUEUE_MAX_NAME_SZ]; @@ -314,6 +322,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, struct virtqueue *vq; size_t sz_hdr_mz = 0; void *sw_ring = NULL; + int queue_type = virtio_get_queue_type(hw, vtpci_queue_idx); int ret; PMD_INIT_LOG(DEBUG, "setting up queue: %u", vtpci_queue_idx); @@ -351,18 +360,18 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, sz_hdr_mz = PAGE_SIZE; } - vq = rte_zmalloc_socket(vq_name, size, RTE_CACHE_LINE_SIZE, socket_id); + vq = rte_zmalloc_socket(vq_name, size, RTE_CACHE_LINE_SIZE, + SOCKET_ID_ANY); if (vq == NULL) { PMD_INIT_LOG(ERR, "can not allocate vq"); return -ENOMEM; } + hw->vqs[vtpci_queue_idx] = vq; + vq->hw = hw; vq->vq_queue_index = vtpci_queue_idx; vq->vq_nentries = vq_size; - - if (nb_desc == 0 || nb_desc > vq_size) - nb_desc = vq_size; - vq->vq_free_cnt = nb_desc; + vq->vq_free_cnt = vq_size; /* * Reserve a memzone for vring elements @@ -372,7 +381,8 @@ int virti
[dpdk-dev] [PATCH v2 03/10] net/virtio: simplify queue allocation
Let rxq/txq/cq be the union field of the virtqueue struct. This would simplifies the vq allocation a bit: we don't need calculate the vq_size any more based on the queue type. Signed-off-by: Yuanhan Liu Reviewed-by: Maxime Coquelin --- drivers/net/virtio/virtio_ethdev.c | 18 +++--- drivers/net/virtio/virtqueue.h | 7 +++ 2 files changed, 14 insertions(+), 11 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index d082df5..5a2c14b 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -312,7 +312,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, struct virtnet_tx *txvq = NULL; struct virtnet_ctl *cvq = NULL; struct virtqueue *vq; - size_t sz_vq, sz_q = 0, sz_hdr_mz = 0; + size_t sz_hdr_mz = 0; void *sw_ring = NULL; int ret; @@ -337,25 +337,21 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, snprintf(vq_name, sizeof(vq_name), "port%d_vq%d", dev->data->port_id, vtpci_queue_idx); - sz_vq = RTE_ALIGN_CEIL(sizeof(*vq) + + size = RTE_ALIGN_CEIL(sizeof(*vq) + vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE); - if (queue_type == VTNET_RQ) { - sz_q = sz_vq + sizeof(*rxvq); - } else if (queue_type == VTNET_TQ) { - sz_q = sz_vq + sizeof(*txvq); + if (queue_type == VTNET_TQ) { /* * For each xmit packet, allocate a virtio_net_hdr * and indirect ring elements */ sz_hdr_mz = vq_size * sizeof(struct virtio_tx_region); } else if (queue_type == VTNET_CQ) { - sz_q = sz_vq + sizeof(*cvq); /* Allocate a page for control vq command, data and status */ sz_hdr_mz = PAGE_SIZE; } - vq = rte_zmalloc_socket(vq_name, sz_q, RTE_CACHE_LINE_SIZE, socket_id); + vq = rte_zmalloc_socket(vq_name, size, RTE_CACHE_LINE_SIZE, socket_id); if (vq == NULL) { PMD_INIT_LOG(ERR, "can not allocate vq"); return -ENOMEM; @@ -425,14 +421,14 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, } vq->sw_ring = sw_ring; - rxvq = (struct virtnet_rx *)RTE_PTR_ADD(vq, sz_vq); + rxvq = >rxq; rxvq->vq = vq; rxvq->port_id = dev->data->port_id; rxvq->queue_id = queue_idx; rxvq->mz = mz; *pvq = rxvq; } else if (queue_type == VTNET_TQ) { - txvq = (struct virtnet_tx *)RTE_PTR_ADD(vq, sz_vq); + txvq = >txq; txvq->vq = vq; txvq->port_id = dev->data->port_id; txvq->queue_id = queue_idx; @@ -442,7 +438,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, *pvq = txvq; } else if (queue_type == VTNET_CQ) { - cvq = (struct virtnet_ctl *)RTE_PTR_ADD(vq, sz_vq); + cvq = >cq; cvq->vq = vq; cvq->mz = mz; cvq->virtio_net_hdr_mz = hdr_mz; diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h index ef0027b..bbeb2f2 100644 --- a/drivers/net/virtio/virtqueue.h +++ b/drivers/net/virtio/virtqueue.h @@ -44,6 +44,7 @@ #include "virtio_pci.h" #include "virtio_ring.h" #include "virtio_logs.h" +#include "virtio_rxtx.h" struct rte_mbuf; @@ -191,6 +192,12 @@ struct virtqueue { void *vq_ring_virt_mem; /**< linear address of vring*/ unsigned int vq_ring_size; + union { + struct virtnet_rx rxq; + struct virtnet_tx txq; + struct virtnet_ctl cq; + }; + phys_addr_t vq_ring_mem; /**< physical address of vring, * or virtual address for virtio_user. */ -- 1.9.0
[dpdk-dev] [PATCH v2 01/10] net/virtio: revert fix restart
This reverts commit 9a0615af7746 ("virtio: fix restart"); conflict is manually addressed. Kyle reported an issue with above commit qemu-kvm: Guest moved used index from 5 to 1 with following steps, 1) Start my virtio interfaces 2) Send some traffic into/out of the interfaces 3) Stop the interfaces 4) Start the interfaces 5) Send some more traffic And here are some quotes from Kyle's analysis, Prior to the patch, if an interface were stopped then started, without restarting the application, the queues would be left as-is, because hw->started would be set to 1. Now, calling stop sets hw->started to 0, which means the next call to start will "touch the queues". This is the unintended side-effect that causes the problem. We should not touch the queues once the init is done, otherwise, the vring state of virtio PMD driver and vhost-user would be inconsistent, leading some issue like above. Thus this patch is reverted. Fixes: 9a0615af7746 ("virtio: fix restart") Cc: Jianfeng Tan Cc: Reported-by: Kyle Larose Signed-off-by: Yuanhan Liu Reviewed-by: Maxime Coquelin --- drivers/net/virtio/virtio_ethdev.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index b388134..5815875 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -550,9 +550,6 @@ virtio_dev_close(struct rte_eth_dev *dev) PMD_INIT_LOG(DEBUG, "virtio_dev_close"); - if (hw->started == 1) - virtio_dev_stop(dev); - if (hw->cvq) virtio_dev_queue_release(hw->cvq->vq); @@ -560,6 +557,7 @@ virtio_dev_close(struct rte_eth_dev *dev) if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC) vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR); vtpci_reset(hw); + hw->started = 0; virtio_dev_free_mbufs(dev); virtio_free_queues(dev); } @@ -1296,15 +1294,17 @@ static int eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev) { struct rte_pci_device *pci_dev; + struct virtio_hw *hw = eth_dev->data->dev_private; PMD_INIT_FUNC_TRACE(); if (rte_eal_process_type() == RTE_PROC_SECONDARY) return -EPERM; - /* Close it anyway since there's no way to know if closed */ - virtio_dev_close(eth_dev); - + if (hw->started == 1) { + virtio_dev_stop(eth_dev); + virtio_dev_close(eth_dev); + } pci_dev = eth_dev->pci_dev; eth_dev->dev_ops = NULL; @@ -1543,12 +1543,9 @@ static void virtio_dev_stop(struct rte_eth_dev *dev) { struct rte_eth_link link; - struct virtio_hw *hw = dev->data->dev_private; PMD_INIT_LOG(DEBUG, "stop"); - hw->started = 0; - if (dev->data->dev_conf.intr_conf.lsc) rte_intr_disable(>pci_dev->intr_handle); -- 1.9.0
[dpdk-dev] [PATCH v2 00/10] net/virtio: fix queue reconfigure issue
This patchset fixes few issues related to virtio queue reconfigure: increase or shrink the queue number. The major issue and the reason behind is described with length details in patch 4 "net/virtio: allocate queue at init stage". Those bugs can not be fixed by few lines of code, it's because the current driver init logic is quite wrong, that I need change quite many places to make it right. Meanwhile, I have already done my best to keep the changes being as minimal as possible, so that we could have fewer changes to break something else; also, it's would be easier for review. v2: - fix two more minor issues regarding to queue enabling; see patch 9 and 10. - refined commit log a bit. Thanks. --yliu --- Yuanhan Liu (10): net/virtio: revert fix restart net/virtio: simplify queue memzone name net/virtio: simplify queue allocation net/virtio: allocate queue at init stage net/virtio: initiate vring at init stage net/virtio: move queue configure code to proper place net/virtio: complete init stage at the right place net/virtio: remove started field net/virtio: fix less queues being enabled issue net/virtio: fix multiple queue enabling drivers/net/virtio/virtio_ethdev.c | 248 +-- drivers/net/virtio/virtio_ethdev.h | 16 -- drivers/net/virtio/virtio_pci.h| 3 +- drivers/net/virtio/virtio_rxtx.c | 291 - drivers/net/virtio/virtqueue.h | 7 + 5 files changed, 237 insertions(+), 328 deletions(-) -- 1.9.0
[dpdk-dev] [PATCH 4/8] net/virtio: allocate queue at init stage
On Fri, Nov 04, 2016 at 08:30:00PM +, Kevin Traynor wrote: > On 11/04/2016 03:21 PM, Kevin Traynor wrote: > > On 11/03/2016 04:09 PM, Yuanhan Liu wrote: > >> Queue allocation should be done once, since the queue related info (such > >> as vring addreess) will only be informed to the vhost-user backend once > >> without virtio device reset. > >> > >> That means, if you allocate queues again after the vhost-user negotiation, > >> the vhost-user backend will not be informed any more. Leading to a state > >> that the vring info mismatches between virtio PMD driver and vhost-backend: > >> the driver switches to the new address has just been allocated, while the > >> vhost-backend still sticks to the old address has been assigned in the init > >> stage. > >> > >> Unfortunately, that is exactly how the virtio driver is coded so far: queue > >> allocation is done at queue_setup stage (when rte_eth_tx/rx_queue_setup is > >> invoked). This is wrong, because queue_setup can be invoked several times. > >> For example, > >> > >> $ start_testpmd.sh ... --txq=1 --rxq=1 ... > >> > port stop 0 > >> > port config all txq 1 # just trigger the queue_setup callback again > >> > port config all rxq 1 > >> > port start 0 > >> > >> The right way to do is allocate the queues in the init stage, so that the > >> vring info could be persistent with the vhost-user backend. > >> > >> Besides that, we should allocate max_queue pairs the device supports, but > >> not nr queue pairs firstly configured, to make following case work. > >> > >> $ start_testpmd.sh ... --txq=1 --rxq=1 ... > >> > port stop 0 > >> > port config all txq 2 > >> > port config all rxq 2 > >> > port start 0 > > > > hi Yuanhan, firstly - thanks for this patchset. It is certainly needed > > to fix the silent failure after increase num q's. > > > > I tried a few tests and I'm seeing an issue. I can stop the port, > > increase the number of queues and traffic is ok, but if I try to > > decrease the number of queues it hangs on port start. I'm running head > > of the master with your patches in the guest and 16.07 in the host. > > > > $ testpmd -c 0x5f -n 4 --socket-mem 1024 -- --burst=64 -i > > --disable-hw-vlan --rxq=2 --txq=2 --rxd=256 --txd=256 --forward-mode=io > >> port stop all > >> port config all rxq 1 > >> port config all txq 1 > >> port start all > > Configuring Port 0 (socket 0) > > (hang here) > > > > I've tested a few different scenarios and anytime the queues are > > decreased from the previous number the hang occurs. > > > > I can debug further but wanted to report early as maybe issue is an > > obvious one? Kevin, thanks for testing! Hmm, it's a case I missed: I was thinking/testing more about increasing (but not shrinking) the queue size :( > virtio_dev_start() is getting stuck as soon as it needs to send a That's because the connection is closed (for a bad reason, see detailes below). You could figure it out quickly from the vhost log: testpmd> VHOST_CONFIG: read message VHOST_USER_GET_VRING_BASE PMD: Connection closed Those messages showed immediately after you execute "port start all". It's actually triggered by the "rx_queue_release", which in turn invokes the "del_queue" virtio-pci method. QEMU then resets the device once the message is got. Thus, you saw above log. Since we now allocate queue once, it doesn't make sense to free those queues and invoke the "del_queue" method at rx/tx_queue_release callback then. The queue_release callback will be invoked when we shrink the queue size. And then I saw this case works like a charm. I'm about to catch a train soon, but I will try to re-post v2 today, with another minor fix I noticed while checking this issue: we should also send the VIRTIO_NET_CTRL_MQ message while the queues shrinks from 2 to 1. --yliu
[dpdk-dev] [PATCH v2] net/virtio: cache Rx/Tx offload ability check
It's not a good idea to do the check of whether Rx/Tx offload is enabled at the data path. Instead, we could do the check at init stage and store the result, so that we could avoid the check again and again at the critical datapath. Cc: Olivier Matz Signed-off-by: Yuanhan Liu --- v2: - rebase on top of the bug fix patches - define rx/tx_offload as uint8_t instead of int drivers/net/virtio/virtio_ethdev.c | 19 +++ drivers/net/virtio/virtio_pci.h| 2 ++ drivers/net/virtio/virtio_rxtx.c | 31 +-- 3 files changed, 26 insertions(+), 26 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 1505f67..2adae58 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1188,6 +1188,22 @@ rx_func_get(struct rte_eth_dev *eth_dev) eth_dev->rx_pkt_burst = _recv_pkts; } +static inline int +rx_offload_enabled(struct virtio_hw *hw) +{ + return vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_CSUM) || + vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO4) || + vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO6); +} + +static inline int +tx_offload_enabled(struct virtio_hw *hw) +{ + return vtpci_with_feature(hw, VIRTIO_NET_F_CSUM) || + vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO4) || + vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO6); +} + /* reset device and renegotiate features if needed */ static int virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features) @@ -1209,6 +1225,9 @@ virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features) if (virtio_negotiate_features(hw, req_features) < 0) return -1; + hw->tx_offload = tx_offload_enabled(hw); + hw->rx_offload = rx_offload_enabled(hw); + /* If host does not support status then disable LSC */ if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) eth_dev->data->dev_flags &= ~RTE_ETH_DEV_INTR_LSC; diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index de271bf..7d1dd9b 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -254,6 +254,8 @@ struct virtio_hw { uint8_t use_msix; uint8_t modern; uint8_t use_simple_rxtx; + uint8_t tx_offload; + uint8_t rx_offload; uint8_t mac_addr[ETHER_ADDR_LEN]; uint32_tnotify_off_multiplier; uint8_t *isr; diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 4cb2ce7..7571191 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -250,14 +250,6 @@ virtio_tso_fix_cksum(struct rte_mbuf *m) } } -static inline int -tx_offload_enabled(struct virtio_hw *hw) -{ - return vtpci_with_feature(hw, VIRTIO_NET_F_CSUM) || - vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO4) || - vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO6); -} - static inline void virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie, uint16_t needed, int use_indirect, int can_push) @@ -270,9 +262,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie, uint16_t head_idx, idx; uint16_t head_size = vq->hw->vtnet_hdr_size; struct virtio_net_hdr *hdr; - int offload; - offload = tx_offload_enabled(vq->hw); head_idx = vq->vq_desc_head_idx; idx = head_idx; dxp = >vq_descx[idx]; @@ -286,7 +276,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie, hdr = (struct virtio_net_hdr *) rte_pktmbuf_prepend(cookie, head_size); /* if offload disabled, it is not zeroed below, do it now */ - if (offload == 0) + if (vq->hw->tx_offload == 0) memset(hdr, 0, head_size); } else if (use_indirect) { /* setup tx ring slot to point to indirect @@ -318,7 +308,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie, } /* Checksum Offload / TSO */ - if (offload) { + if (vq->hw->tx_offload) { if (cookie->ol_flags & PKT_TX_TCP_SEG) cookie->ol_flags |= PKT_TX_TCP_CKSUM; @@ -735,14 +725,6 @@ virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr) return 0; } -static inline int -rx_offload_enabled(struct virtio_hw *hw) -{ - return vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_CSUM) || - vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO4) || - vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO6); -} - #define VIRTIO_MBUF_BURST_SZ 64 #define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring
[dpdk-dev] [PATCH 1/8] net/virtio: revert "virtio: fix restart"
On Fri, Nov 04, 2016 at 09:09:22AM +0100, Maxime Coquelin wrote: > > > On 11/04/2016 03:00 AM, Yuanhan Liu wrote: > >On Thu, Nov 03, 2016 at 09:36:52PM +0100, Maxime Coquelin wrote: > >>Hi Yuanhan, > >> > >>On 11/03/2016 05:09 PM, Yuanhan Liu wrote: > >>>This reverts commit 9a0615af7746 ("virtio: fix restart"); conflict is > >>>manually addressed. > >>> > >>>Kyle reported an issue with above commit > >>> > >>> qemu-kvm: Guest moved used index from 5 to 1 > >>> > >>>with following steps, > >>> > >>> 1) Start my virtio interfaces > >>> 2) Send some traffic into/out of the interfaces > >>> 3) Stop the interfaces > >>> 4) Start the interfaces > >>> 5) Send some more traffic > >>> > >>>And here are some quotes from Kyle's analysis, > >>> > >>> Prior to the patch, if an interface were stopped then started, without > >>> restarting the application, the queues would be left as-is, because > >>> hw->started would be set to 1. Now, calling stop sets hw->started to 0, > >>> which means the next call to start will "touch the queues". This is the > >>> unintended side-effect that causes the problem. > >> > >>Maybe a good idea to explain what is the problem the revert aims to fix. > > > >It aims to fix the issue, by "not touching the queues" on restart. > > > >>It does not seem to be clearly stated in the commit message. > > > >I was thinking the quote from Kyle is enough. How about following supplement: > > > >We should not touch the queues once the init is done, otherwise, the > >vring state of virtio PMD driver and vhost-user would be inconsistent, > >leading some issue like above. > > > >Thus this patch is reverted. > > > >Better now? > Yes, this is much clearer from my PoV. Fixed, and series applied to dpdk-next-virtio. Thanks for the review! --yliu
[dpdk-dev] [PATCH 3/3] vhost: update comments
On Fri, Nov 04, 2016 at 01:28:45PM +, Mcnamara, John wrote: > > > > -Original Message- > > From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com] > > Sent: Wednesday, November 2, 2016 3:15 AM > > To: dev at dpdk.org > > Cc: Mcnamara, John ; Yuanhan Liu > > > > Subject: [PATCH 3/3] vhost: update comments > > > > vhost-cuse is removed, update corresponding comments that are still > > referencing it. > > > > Signed-off-by: Yuanhan Liu > > > > ... > > > > /* > > - * Backend-specific cleanup. Defined by vhost-cuse and vhost-user. > > + * Backend-specific cleanup. > > + * > > + * TODO: fix it; we have one backend now > > */ > > void vhost_backend_cleanup(struct virtio_net *dev); > > In general it is not a good idea to leave TODOs/Fixmes around unless they > that a better explanation of why they are there. Agreed. In this case, if you look at the delete line, you will find we have such function implemented both in vhost-user and vhost-cuse. Now we get one only. So, we may could just rename it to vhost_user_cleanup(). Or even, remove it completely by doing some small refactors. It's obvious that we need do that in another patch. Thus, the TODO is been added. --yliu > > Apart from that: > > > Acked-by: John McNamara >
[dpdk-dev] [PATCH RESEND v2] net/i40e: fix floating VEB issue
From: Beilei XingTurning off S-TAG identification will impact floating VEB, VFs can't communicate with each other. This patch fixes this issue by judging whether floating VEB is enabled, S-TAG identification will be turned off only when floating VEB is disabled. Fixes: 4d61120d5ce7 ("net/i40e: fix dropping packets with ethertype 0x88A8") Signed-off-by: Beilei Xing Acked-by: Jingjing Wu --- v2 changes: * Modify the limitation description in i40e.rst :: Seems that Beilei failed to send this patches out. While I have sent out few :: patches out successfully, I'm sending it for her. doc/guides/nics/i40e.rst | 6 ++ drivers/net/i40e/i40e_ethdev.c | 12 +++- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst index c0163fc..5780268 100644 --- a/doc/guides/nics/i40e.rst +++ b/doc/guides/nics/i40e.rst @@ -453,3 +453,9 @@ To work around this issue, ``ethtool -s autoneg on`` should be set first and then the link can be brought up through ``ifconfig up``. NOTE: requires Linux kernel i40e driver version >= 1.4.X + +Receive packets with Ethertype 0x88A8 +~ + +Due to the FW limitation, PF can receive packets with Ethertype 0x88A8 +only when floating VEB is disabled. diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index bb81b15..631a93f 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -1121,11 +1121,13 @@ eth_i40e_dev_init(struct rte_eth_dev *dev) /* Disable double vlan by default */ i40e_vsi_config_double_vlan(vsi, FALSE); - /* Disable S-TAG identification by default */ - ret = I40E_READ_REG(hw, I40E_PRT_L2TAGSEN); - if (ret & I40E_L2_TAGS_S_TAG_MASK) { - ret &= ~I40E_L2_TAGS_S_TAG_MASK; - I40E_WRITE_REG(hw, I40E_PRT_L2TAGSEN, ret); + /* Disable S-TAG identification when floating_veb is disabled */ + if (!pf->floating_veb) { + ret = I40E_READ_REG(hw, I40E_PRT_L2TAGSEN); + if (ret & I40E_L2_TAGS_S_TAG_MASK) { + ret &= ~I40E_L2_TAGS_S_TAG_MASK; + I40E_WRITE_REG(hw, I40E_PRT_L2TAGSEN, ret); + } } if (!vsi->max_macaddrs) -- 2.5.0
[dpdk-dev] [PATCH RESEND v2] net/i40e: fix floating VEB issue
On Fri, Nov 04, 2016 at 05:33:20PM +0800, Yuanhan Liu wrote: > Turning off S-TAG identification will impact floating VEB, > VFs can't communicate with each other. > This patch fixes this issue by judging whether floating > VEB is enabled, S-TAG identification will be turned off > only when floating VEB is disabled. > > Fixes: 4d61120d5ce7 ("net/i40e: fix dropping packets with ethertype 0x88A8") > > Signed-off-by: Beilei Xing > Acked-by: Jingjing Wu > --- > v2 changes: > * Modify the limitation description in i40e.rst > > Seems that Beilei failed to send this patches out; while I have sent out few > patches out successfully. I'm sending it for her. Please ignore this patch. The original patch is with wrong format, that it turns the authorship to me. I will resend another one soon. --yliu
[dpdk-dev] [PATCH RESEND v2] net/i40e: fix floating VEB issue
Turning off S-TAG identification will impact floating VEB, VFs can't communicate with each other. This patch fixes this issue by judging whether floating VEB is enabled, S-TAG identification will be turned off only when floating VEB is disabled. Fixes: 4d61120d5ce7 ("net/i40e: fix dropping packets with ethertype 0x88A8") Signed-off-by: Beilei Xing Acked-by: Jingjing Wu --- v2 changes: * Modify the limitation description in i40e.rst Seems that Beilei failed to send this patches out; while I have sent out few patches out successfully. I'm sending it for her. doc/guides/nics/i40e.rst | 6 ++ drivers/net/i40e/i40e_ethdev.c | 12 +++- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst index c0163fc..5780268 100644 --- a/doc/guides/nics/i40e.rst +++ b/doc/guides/nics/i40e.rst @@ -453,3 +453,9 @@ To work around this issue, ``ethtool -s autoneg on`` should be set first and then the link can be brought up through ``ifconfig up``. NOTE: requires Linux kernel i40e driver version >= 1.4.X + +Receive packets with Ethertype 0x88A8 +~ + +Due to the FW limitation, PF can receive packets with Ethertype 0x88A8 +only when floating VEB is disabled. diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index bb81b15..631a93f 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -1121,11 +1121,13 @@ eth_i40e_dev_init(struct rte_eth_dev *dev) /* Disable double vlan by default */ i40e_vsi_config_double_vlan(vsi, FALSE); - /* Disable S-TAG identification by default */ - ret = I40E_READ_REG(hw, I40E_PRT_L2TAGSEN); - if (ret & I40E_L2_TAGS_S_TAG_MASK) { - ret &= ~I40E_L2_TAGS_S_TAG_MASK; - I40E_WRITE_REG(hw, I40E_PRT_L2TAGSEN, ret); + /* Disable S-TAG identification when floating_veb is disabled */ + if (!pf->floating_veb) { + ret = I40E_READ_REG(hw, I40E_PRT_L2TAGSEN); + if (ret & I40E_L2_TAGS_S_TAG_MASK) { + ret &= ~I40E_L2_TAGS_S_TAG_MASK; + I40E_WRITE_REG(hw, I40E_PRT_L2TAGSEN, ret); + } } if (!vsi->max_macaddrs) -- 2.5.0
[dpdk-dev] [PATCH] net/virtio: cache Rx/Tx offload enabled check
It's not a good idea to do Rx/Tx offload enabled check at the data path. Instead, we could do the check at init stage and store the result, so that we could avoid the check again and again at the critical datapath. Cc: Olivier Matz Signed-off-by: Yuanhan Liu --- drivers/net/virtio/virtio_ethdev.c | 19 +++ drivers/net/virtio/virtio_pci.h| 2 ++ drivers/net/virtio/virtio_rxtx.c | 30 -- 3 files changed, 25 insertions(+), 26 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index b388134..708c0e0 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1141,6 +1141,22 @@ rx_func_get(struct rte_eth_dev *eth_dev) eth_dev->rx_pkt_burst = _recv_pkts; } +static inline int +rx_offload_enabled(struct virtio_hw *hw) +{ + return vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_CSUM) || + vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO4) || + vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO6); +} + +static inline int +tx_offload_enabled(struct virtio_hw *hw) +{ + return vtpci_with_feature(hw, VIRTIO_NET_F_CSUM) || + vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO4) || + vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO6); +} + /* reset device and renegotiate features if needed */ static int virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features) @@ -1161,6 +1177,9 @@ virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features) if (virtio_negotiate_features(hw, req_features) < 0) return -1; + hw->tx_offload = tx_offload_enabled(hw); + hw->rx_offload = rx_offload_enabled(hw); + /* If host does not support status then disable LSC */ if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) eth_dev->data->dev_flags &= ~RTE_ETH_DEV_INTR_LSC; diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index 0c5ed31..442eed2 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -264,6 +264,8 @@ struct virtio_hw { struct virtio_net_config *dev_cfg; const struct virtio_pci_ops *vtpci_ops; void*virtio_user_dev; + int tx_offload; + int rx_offload; }; /* diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index b4c4aa4..ba46595 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -250,14 +250,6 @@ virtio_tso_fix_cksum(struct rte_mbuf *m) } } -static inline int -tx_offload_enabled(struct virtio_hw *hw) -{ - return vtpci_with_feature(hw, VIRTIO_NET_F_CSUM) || - vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO4) || - vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO6); -} - static inline void virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie, uint16_t needed, int use_indirect, int can_push) @@ -270,9 +262,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie, uint16_t head_idx, idx; uint16_t head_size = vq->hw->vtnet_hdr_size; struct virtio_net_hdr *hdr; - int offload; - offload = tx_offload_enabled(vq->hw); head_idx = vq->vq_desc_head_idx; idx = head_idx; dxp = >vq_descx[idx]; @@ -286,7 +276,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie, hdr = (struct virtio_net_hdr *) rte_pktmbuf_prepend(cookie, head_size); /* if offload disabled, it is not zeroed below, do it now */ - if (offload == 0) + if (vq->hw->tx_offload == 0) memset(hdr, 0, head_size); } else if (use_indirect) { /* setup tx ring slot to point to indirect @@ -318,7 +308,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie, } /* Checksum Offload / TSO */ - if (offload) { + if (vq->hw->tx_offload) { if (cookie->ol_flags & PKT_TX_TCP_SEG) cookie->ol_flags |= PKT_TX_TCP_CKSUM; @@ -798,14 +788,6 @@ virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr) return 0; } -static inline int -rx_offload_enabled(struct virtio_hw *hw) -{ - return vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_CSUM) || - vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO4) || - vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO6); -} - #define VIRTIO_MBUF_BURST_SZ 64 #define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc)) uint16_t @@ -821,7 +803,6 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) int error; uint32_t i, nb_enqueued; uint32_t h
[dpdk-dev] [PATCH v2] scripts: show fixes with release version of bug
On Fri, Sep 30, 2016 at 04:18:42PM +0200, Thomas Monjalon wrote: > This script can help to find commits to backport in stable branches. > > Fixes are found if there is the word "fix" in the headline or > if there is a tag Fixes: or Reverts: in the message. > Chained fixes of fixes are explored to find the oldest origin. > Fixes of not released bugs are ignored. Thomas, that saves my life! It helps a lot when I'm trying to pick commits for 16.07.2. So, Reviewed-by: Yuanhan Liu Thanks for the patch (and sorry for late reply). --yliu
[dpdk-dev] [PATCH 1/8] net/virtio: revert "virtio: fix restart"
On Thu, Nov 03, 2016 at 09:36:52PM +0100, Maxime Coquelin wrote: > Hi Yuanhan, > > On 11/03/2016 05:09 PM, Yuanhan Liu wrote: > >This reverts commit 9a0615af7746 ("virtio: fix restart"); conflict is > >manually addressed. > > > >Kyle reported an issue with above commit > > > >qemu-kvm: Guest moved used index from 5 to 1 > > > >with following steps, > > > >1) Start my virtio interfaces > >2) Send some traffic into/out of the interfaces > >3) Stop the interfaces > >4) Start the interfaces > >5) Send some more traffic > > > >And here are some quotes from Kyle's analysis, > > > >Prior to the patch, if an interface were stopped then started, without > >restarting the application, the queues would be left as-is, because > >hw->started would be set to 1. Now, calling stop sets hw->started to 0, > >which means the next call to start will "touch the queues". This is the > >unintended side-effect that causes the problem. > > Maybe a good idea to explain what is the problem the revert aims to fix. It aims to fix the issue, by "not touching the queues" on restart. > It does not seem to be clearly stated in the commit message. I was thinking the quote from Kyle is enough. How about following supplement: We should not touch the queues once the init is done, otherwise, the vring state of virtio PMD driver and vhost-user would be inconsistent, leading some issue like above. Thus this patch is reverted. Better now? --yliu
[dpdk-dev] [PATCH 3/8] net/virtio: simplify queue allocation
On Thu, Nov 03, 2016 at 09:48:03PM +0100, Maxime Coquelin wrote: > > > On 11/03/2016 05:09 PM, Yuanhan Liu wrote: > >Let rxq/txq/cq be the union field of the virtqueue struct. This would > >simplifies the vq allocation a bit: we don't need calculate the vq_size > >any more based on the queue time. > s/time/type/ ? Oops... will fix it. Thanks. --yliu > > > >Signed-off-by: Yuanhan Liu > >--- > > drivers/net/virtio/virtio_ethdev.c | 18 +++--- > > drivers/net/virtio/virtqueue.h | 7 +++ > > 2 files changed, 14 insertions(+), 11 deletions(-) > > Other than that: > Reviewed-by: Maxime Coquelin > > Thanks, > Maxime
[dpdk-dev] [PATCH 4/8] net/virtio: allocate queue at init stage
On Thu, Nov 03, 2016 at 10:11:43PM +0100, Maxime Coquelin wrote: > > > On 11/03/2016 05:09 PM, Yuanhan Liu wrote: > >Queue allocation should be done once, since the queue related info (such > >as vring addreess) will only be informed to the vhost-user backend once > >without virtio device reset. > > > >That means, if you allocate queues again after the vhost-user negotiation, > >the vhost-user backend will not be informed any more. Leading to a state > >that the vring info mismatches between virtio PMD driver and vhost-backend: > >the driver switches to the new address has just been allocated, while the > >vhost-backend still sticks to the old address has been assigned in the init > >stage. > > > >Unfortunately, that is exactly how the virtio driver is coded so far: queue > >allocation is done at queue_setup stage (when rte_eth_tx/rx_queue_setup is > >invoked). This is wrong, because queue_setup can be invoked several times. > >For example, > > > >$ start_testpmd.sh ... --txq=1 --rxq=1 ... > >> port stop 0 > >> port config all txq 1 # just trigger the queue_setup callback again > >> port config all rxq 1 > >> port start 0 > > > >The right way to do is allocate the queues in the init stage, so that the > >vring info could be persistent with the vhost-user backend. > > > >Besides that, we should allocate max_queue pairs the device supports, but > >not nr queue pairs firstly configured, to make following case work. > I understand, but how much memory overhead does that represent? We are allocating max queue pairs the device supports, but not the virtio-net spec supports, which, as you stated, would be too much. I don't know the typical value of the queue pairs being used in production, but normally, I would assume it be small, something like 2, or 4. It's 1 by default after all. So I think it will not be an issue. > Have you considered performing a device reset when queue number is > changed? Not a good idea and clean solution to me. --yliu
[dpdk-dev] [PATCH 8/8] net/virtio: remove started field
The "hw->started" field was introduced to stop touching queues on restart. We never touches queues on restart any more, thus it's safe to remove this flag. Signed-off-by: Yuanhan Liu --- drivers/net/virtio/virtio_ethdev.c | 15 ++- drivers/net/virtio/virtio_pci.h| 1 - 2 files changed, 2 insertions(+), 14 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index c147909..1505f67 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -607,7 +607,6 @@ virtio_dev_close(struct rte_eth_dev *dev) if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC) vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR); vtpci_reset(hw); - hw->started = 0; virtio_dev_free_mbufs(dev); virtio_free_queues(dev); } @@ -1350,17 +1349,14 @@ static int eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev) { struct rte_pci_device *pci_dev; - struct virtio_hw *hw = eth_dev->data->dev_private; PMD_INIT_FUNC_TRACE(); if (rte_eal_process_type() == RTE_PROC_SECONDARY) return -EPERM; - if (hw->started == 1) { - virtio_dev_stop(eth_dev); - virtio_dev_close(eth_dev); - } + virtio_dev_stop(eth_dev); + virtio_dev_close(eth_dev); pci_dev = eth_dev->pci_dev; eth_dev->dev_ops = NULL; @@ -1479,7 +1475,6 @@ static int virtio_dev_start(struct rte_eth_dev *dev) { uint16_t nb_queues, i; - struct virtio_hw *hw = dev->data->dev_private; struct virtnet_rx *rxvq; struct virtnet_tx *txvq __rte_unused; @@ -1499,12 +1494,6 @@ virtio_dev_start(struct rte_eth_dev *dev) /* Initialize Link state */ virtio_dev_link_update(dev, 0); - /* On restart after stop do not touch queues */ - if (hw->started) - return 0; - - hw->started = 1; - /*Notify the backend *Otherwise the tap backend might already stop its queue due to fullness. *vhost backend will have no chance to be waked up diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index f63f76c..de271bf 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -252,7 +252,6 @@ struct virtio_hw { uint16_tvtnet_hdr_size; uint8_t vlan_strip; uint8_t use_msix; - uint8_t started; uint8_t modern; uint8_t use_simple_rxtx; uint8_t mac_addr[ETHER_ADDR_LEN]; -- 1.9.0
[dpdk-dev] [PATCH 7/8] net/virtio: complete init stage at the right place
Invoking vtpci_reinit_complete() at port start stage doesn't make any sense, instead, it should be done at the end of dev init stage. So move it here. Signed-off-by: Yuanhan Liu --- drivers/net/virtio/virtio_ethdev.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 37459e7..c147909 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1282,6 +1282,7 @@ virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features) ret = virtio_alloc_queues(eth_dev); if (ret < 0) return ret; + vtpci_reinit_complete(hw); if (pci_dev) PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x", @@ -1502,8 +1503,6 @@ virtio_dev_start(struct rte_eth_dev *dev) if (hw->started) return 0; - vtpci_reinit_complete(hw); - hw->started = 1; /*Notify the backend -- 1.9.0
[dpdk-dev] [PATCH 6/8] net/virtio: move queue configure code to proper place
The only piece of code of virtio_dev_rxtx_start() is actually doing queue configure/setup work. So, move it to corresponding queue_setup callback. Once that is done, virtio_dev_rxtx_start() becomes an empty function, thus it's being removed. Signed-off-by: Yuanhan Liu --- drivers/net/virtio/virtio_ethdev.c | 2 - drivers/net/virtio/virtio_ethdev.h | 2 - drivers/net/virtio/virtio_rxtx.c | 187 - 3 files changed, 79 insertions(+), 112 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index b80b6f5..37459e7 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1502,8 +1502,6 @@ virtio_dev_start(struct rte_eth_dev *dev) if (hw->started) return 0; - /* Do final configuration before rx/tx engine starts */ - virtio_dev_rxtx_start(dev); vtpci_reinit_complete(hw); hw->started = 1; diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h index 5db8632..1396c6e 100644 --- a/drivers/net/virtio/virtio_ethdev.h +++ b/drivers/net/virtio/virtio_ethdev.h @@ -78,8 +78,6 @@ void virtio_dev_cq_start(struct rte_eth_dev *dev); /* * RX/TX function prototypes */ -void virtio_dev_rxtx_start(struct rte_eth_dev *dev); - void virtio_dev_queue_release(struct virtqueue *vq); int virtio_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id, diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index bce0663..4cb2ce7 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -388,112 +388,6 @@ virtio_dev_cq_start(struct rte_eth_dev *dev) } } -void -virtio_dev_rxtx_start(struct rte_eth_dev *dev) -{ - /* -* Start receive and transmit vrings -* -Setup vring structure for all queues -* -Initialize descriptor for the rx vring -* -Allocate blank mbufs for the each rx descriptor -* -*/ - uint16_t i; - uint16_t desc_idx; - struct virtio_hw *hw = dev->data->dev_private; - - PMD_INIT_FUNC_TRACE(); - - /* Start rx vring. */ - for (i = 0; i < dev->data->nb_rx_queues; i++) { - struct virtnet_rx *rxvq = dev->data->rx_queues[i]; - struct virtqueue *vq = rxvq->vq; - int error, nbufs; - struct rte_mbuf *m; - - if (rxvq->mpool == NULL) { - rte_exit(EXIT_FAILURE, - "Cannot allocate mbufs for rx virtqueue"); - } - - /* Allocate blank mbufs for the each rx descriptor */ - nbufs = 0; - error = ENOSPC; - - if (hw->use_simple_rxtx) { - for (desc_idx = 0; desc_idx < vq->vq_nentries; -desc_idx++) { - vq->vq_ring.avail->ring[desc_idx] = desc_idx; - vq->vq_ring.desc[desc_idx].flags = - VRING_DESC_F_WRITE; - } - } - - memset(>fake_mbuf, 0, sizeof(rxvq->fake_mbuf)); - for (desc_idx = 0; desc_idx < RTE_PMD_VIRTIO_RX_MAX_BURST; -desc_idx++) { - vq->sw_ring[vq->vq_nentries + desc_idx] = - >fake_mbuf; - } - - while (!virtqueue_full(vq)) { - m = rte_mbuf_raw_alloc(rxvq->mpool); - if (m == NULL) - break; - - /** - * Enqueue allocated buffers* - ***/ - if (hw->use_simple_rxtx) - error = virtqueue_enqueue_recv_refill_simple(vq, m); - else - error = virtqueue_enqueue_recv_refill(vq, m); - - if (error) { - rte_pktmbuf_free(m); - break; - } - nbufs++; - } - - vq_update_avail_idx(vq); - - PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs); - - VIRTQUEUE_DUMP(vq); - } - - /* Start tx vring. */ - for (i = 0; i < dev->data->nb_tx_queues; i++) { - struct virtnet_tx *txvq = dev->data->tx_queues[i]; - struct virtqueue *vq = txvq->vq; - - if (hw->use_simple_rxtx) { - uint16_t mid_idx = vq->vq_nentries >> 1; - - for (desc_idx = 0; desc_idx < mid_idx; de
[dpdk-dev] [PATCH 5/8] net/virtio: initiate vring at init stage
virtio_dev_vring_start() is actually doing the vring initiation job. And the vring initiation job should be done at the dev init stage, as stated with great details in former commit. So move it there, and rename it to virtio_init_vring(). Signed-off-by: Yuanhan Liu --- drivers/net/virtio/virtio_ethdev.c | 32 +++- drivers/net/virtio/virtio_rxtx.c | 32 2 files changed, 31 insertions(+), 33 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 253bcb5..b80b6f5 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -306,6 +306,35 @@ virtio_get_queue_type(struct virtio_hw *hw, uint16_t vtpci_queue_idx) return VTNET_TQ; } +static void +virtio_init_vring(struct virtqueue *vq) +{ + int size = vq->vq_nentries; + struct vring *vr = >vq_ring; + uint8_t *ring_mem = vq->vq_ring_virt_mem; + + PMD_INIT_FUNC_TRACE(); + + /* +* Reinitialise since virtio port might have been stopped and restarted +*/ + memset(ring_mem, 0, vq->vq_ring_size); + vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN); + vq->vq_used_cons_idx = 0; + vq->vq_desc_head_idx = 0; + vq->vq_avail_idx = 0; + vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1); + vq->vq_free_cnt = vq->vq_nentries; + memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries); + + vring_desc_init(vr->desc, size); + + /* +* Disable device(host) interrupting guest +*/ + virtqueue_disable_intr(vq); +} + static int virtio_init_queue(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx) { @@ -369,7 +398,6 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx) vq->hw = hw; vq->vq_queue_index = vtpci_queue_idx; vq->vq_nentries = vq_size; - vq->vq_free_cnt = vq_size; /* * Reserve a memzone for vring elements @@ -400,6 +428,8 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx) PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%" PRIx64, (uint64_t)(uintptr_t)mz->addr); + virtio_init_vring(vq); + if (sz_hdr_mz) { snprintf(vq_hdr_name, sizeof(vq_hdr_name), "port%d_vq%d_hdr", dev->data->port_id, vtpci_queue_idx); diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index fb703d2..bce0663 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -378,42 +378,12 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie, vq_update_avail_ring(vq, head_idx); } -static void -virtio_dev_vring_start(struct virtqueue *vq) -{ - int size = vq->vq_nentries; - struct vring *vr = >vq_ring; - uint8_t *ring_mem = vq->vq_ring_virt_mem; - - PMD_INIT_FUNC_TRACE(); - - /* -* Reinitialise since virtio port might have been stopped and restarted -*/ - memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size); - vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN); - vq->vq_used_cons_idx = 0; - vq->vq_desc_head_idx = 0; - vq->vq_avail_idx = 0; - vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1); - vq->vq_free_cnt = vq->vq_nentries; - memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries); - - vring_desc_init(vr->desc, size); - - /* -* Disable device(host) interrupting guest -*/ - virtqueue_disable_intr(vq); -} - void virtio_dev_cq_start(struct rte_eth_dev *dev) { struct virtio_hw *hw = dev->data->dev_private; if (hw->cvq && hw->cvq->vq) { - virtio_dev_vring_start(hw->cvq->vq); VIRTQUEUE_DUMP((struct virtqueue *)hw->cvq->vq); } } @@ -441,7 +411,6 @@ virtio_dev_rxtx_start(struct rte_eth_dev *dev) int error, nbufs; struct rte_mbuf *m; - virtio_dev_vring_start(vq); if (rxvq->mpool == NULL) { rte_exit(EXIT_FAILURE, "Cannot allocate mbufs for rx virtqueue"); @@ -499,7 +468,6 @@ virtio_dev_rxtx_start(struct rte_eth_dev *dev) struct virtnet_tx *txvq = dev->data->tx_queues[i]; struct virtqueue *vq = txvq->vq; - virtio_dev_vring_start(vq); if (hw->use_simple_rxtx) { uint16_t mid_idx = vq->vq_nentries >> 1; -- 1.9.0
[dpdk-dev] [PATCH 4/8] net/virtio: allocate queue at init stage
Queue allocation should be done once, since the queue related info (such as vring addreess) will only be informed to the vhost-user backend once without virtio device reset. That means, if you allocate queues again after the vhost-user negotiation, the vhost-user backend will not be informed any more. Leading to a state that the vring info mismatches between virtio PMD driver and vhost-backend: the driver switches to the new address has just been allocated, while the vhost-backend still sticks to the old address has been assigned in the init stage. Unfortunately, that is exactly how the virtio driver is coded so far: queue allocation is done at queue_setup stage (when rte_eth_tx/rx_queue_setup is invoked). This is wrong, because queue_setup can be invoked several times. For example, $ start_testpmd.sh ... --txq=1 --rxq=1 ... > port stop 0 > port config all txq 1 # just trigger the queue_setup callback again > port config all rxq 1 > port start 0 The right way to do is allocate the queues in the init stage, so that the vring info could be persistent with the vhost-user backend. Besides that, we should allocate max_queue pairs the device supports, but not nr queue pairs firstly configured, to make following case work. $ start_testpmd.sh ... --txq=1 --rxq=1 ... > port stop 0 > port config all txq 2 > port config all rxq 2 > port start 0 Signed-off-by: Yuanhan Liu --- drivers/net/virtio/virtio_ethdev.c | 105 +++-- drivers/net/virtio/virtio_ethdev.h | 8 --- drivers/net/virtio/virtio_pci.h| 2 + drivers/net/virtio/virtio_rxtx.c | 38 +++--- 4 files changed, 85 insertions(+), 68 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 5a2c14b..253bcb5 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -295,13 +295,19 @@ virtio_dev_queue_release(struct virtqueue *vq) } } -int virtio_dev_queue_setup(struct rte_eth_dev *dev, - int queue_type, - uint16_t queue_idx, - uint16_t vtpci_queue_idx, - uint16_t nb_desc, - unsigned int socket_id, - void **pvq) +static int +virtio_get_queue_type(struct virtio_hw *hw, uint16_t vtpci_queue_idx) +{ + if (vtpci_queue_idx == hw->max_queue_pairs * 2) + return VTNET_CQ; + else if (vtpci_queue_idx % 2 == 0) + return VTNET_RQ; + else + return VTNET_TQ; +} + +static int +virtio_init_queue(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx) { char vq_name[VIRTQUEUE_MAX_NAME_SZ]; char vq_hdr_name[VIRTQUEUE_MAX_NAME_SZ]; @@ -314,6 +320,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, struct virtqueue *vq; size_t sz_hdr_mz = 0; void *sw_ring = NULL; + int queue_type = virtio_get_queue_type(hw, vtpci_queue_idx); int ret; PMD_INIT_LOG(DEBUG, "setting up queue: %u", vtpci_queue_idx); @@ -351,18 +358,18 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, sz_hdr_mz = PAGE_SIZE; } - vq = rte_zmalloc_socket(vq_name, size, RTE_CACHE_LINE_SIZE, socket_id); + vq = rte_zmalloc_socket(vq_name, size, RTE_CACHE_LINE_SIZE, + SOCKET_ID_ANY); if (vq == NULL) { PMD_INIT_LOG(ERR, "can not allocate vq"); return -ENOMEM; } + hw->vqs[vtpci_queue_idx] = vq; + vq->hw = hw; vq->vq_queue_index = vtpci_queue_idx; vq->vq_nentries = vq_size; - - if (nb_desc == 0 || nb_desc > vq_size) - nb_desc = vq_size; - vq->vq_free_cnt = nb_desc; + vq->vq_free_cnt = vq_size; /* * Reserve a memzone for vring elements @@ -372,7 +379,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, PMD_INIT_LOG(DEBUG, "vring_size: %d, rounded_vring_size: %d", size, vq->vq_ring_size); - mz = rte_memzone_reserve_aligned(vq_name, vq->vq_ring_size, socket_id, + mz = rte_memzone_reserve_aligned(vq_name, vq->vq_ring_size, +SOCKET_ID_ANY, 0, VIRTIO_PCI_VRING_ALIGN); if (mz == NULL) { if (rte_errno == EEXIST) @@ -396,7 +404,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, snprintf(vq_hdr_name, sizeof(vq_hdr_name), "port%d_vq%d_hdr", dev->data->port_id, vtpci_queue_idx); hdr_mz = rte_memzone_reserve_aligned(vq_hdr_name, sz_hdr_mz, -socket_id, 0, +SOCKET_ID_ANY, 0,
[dpdk-dev] [PATCH 3/8] net/virtio: simplify queue allocation
Let rxq/txq/cq be the union field of the virtqueue struct. This would simplifies the vq allocation a bit: we don't need calculate the vq_size any more based on the queue time. Signed-off-by: Yuanhan Liu --- drivers/net/virtio/virtio_ethdev.c | 18 +++--- drivers/net/virtio/virtqueue.h | 7 +++ 2 files changed, 14 insertions(+), 11 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index d082df5..5a2c14b 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -312,7 +312,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, struct virtnet_tx *txvq = NULL; struct virtnet_ctl *cvq = NULL; struct virtqueue *vq; - size_t sz_vq, sz_q = 0, sz_hdr_mz = 0; + size_t sz_hdr_mz = 0; void *sw_ring = NULL; int ret; @@ -337,25 +337,21 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, snprintf(vq_name, sizeof(vq_name), "port%d_vq%d", dev->data->port_id, vtpci_queue_idx); - sz_vq = RTE_ALIGN_CEIL(sizeof(*vq) + + size = RTE_ALIGN_CEIL(sizeof(*vq) + vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE); - if (queue_type == VTNET_RQ) { - sz_q = sz_vq + sizeof(*rxvq); - } else if (queue_type == VTNET_TQ) { - sz_q = sz_vq + sizeof(*txvq); + if (queue_type == VTNET_TQ) { /* * For each xmit packet, allocate a virtio_net_hdr * and indirect ring elements */ sz_hdr_mz = vq_size * sizeof(struct virtio_tx_region); } else if (queue_type == VTNET_CQ) { - sz_q = sz_vq + sizeof(*cvq); /* Allocate a page for control vq command, data and status */ sz_hdr_mz = PAGE_SIZE; } - vq = rte_zmalloc_socket(vq_name, sz_q, RTE_CACHE_LINE_SIZE, socket_id); + vq = rte_zmalloc_socket(vq_name, size, RTE_CACHE_LINE_SIZE, socket_id); if (vq == NULL) { PMD_INIT_LOG(ERR, "can not allocate vq"); return -ENOMEM; @@ -425,14 +421,14 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, } vq->sw_ring = sw_ring; - rxvq = (struct virtnet_rx *)RTE_PTR_ADD(vq, sz_vq); + rxvq = >rxq; rxvq->vq = vq; rxvq->port_id = dev->data->port_id; rxvq->queue_id = queue_idx; rxvq->mz = mz; *pvq = rxvq; } else if (queue_type == VTNET_TQ) { - txvq = (struct virtnet_tx *)RTE_PTR_ADD(vq, sz_vq); + txvq = >txq; txvq->vq = vq; txvq->port_id = dev->data->port_id; txvq->queue_id = queue_idx; @@ -442,7 +438,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, *pvq = txvq; } else if (queue_type == VTNET_CQ) { - cvq = (struct virtnet_ctl *)RTE_PTR_ADD(vq, sz_vq); + cvq = >cq; cvq->vq = vq; cvq->mz = mz; cvq->virtio_net_hdr_mz = hdr_mz; diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h index ef0027b..bbeb2f2 100644 --- a/drivers/net/virtio/virtqueue.h +++ b/drivers/net/virtio/virtqueue.h @@ -44,6 +44,7 @@ #include "virtio_pci.h" #include "virtio_ring.h" #include "virtio_logs.h" +#include "virtio_rxtx.h" struct rte_mbuf; @@ -191,6 +192,12 @@ struct virtqueue { void *vq_ring_virt_mem; /**< linear address of vring*/ unsigned int vq_ring_size; + union { + struct virtnet_rx rxq; + struct virtnet_tx txq; + struct virtnet_ctl cq; + }; + phys_addr_t vq_ring_mem; /**< physical address of vring, * or virtual address for virtio_user. */ -- 1.9.0
[dpdk-dev] [PATCH 2/8] net/virtio: simplify queue memzone name
Instead of setting up a queue memzone name like "port0_rxq0", "port0_txq0", it could be simplified a bit to something like "port0_vq0", "port0_vq1" ... Meanwhile, the code is also simplified a bit. Signed-off-by: Yuanhan Liu --- drivers/net/virtio/virtio_ethdev.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 5815875..d082df5 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -312,7 +312,6 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, struct virtnet_tx *txvq = NULL; struct virtnet_ctl *cvq = NULL; struct virtqueue *vq; - const char *queue_names[] = {"rvq", "txq", "cvq"}; size_t sz_vq, sz_q = 0, sz_hdr_mz = 0; void *sw_ring = NULL; int ret; @@ -335,8 +334,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, return -EINVAL; } - snprintf(vq_name, sizeof(vq_name), "port%d_%s%d", -dev->data->port_id, queue_names[queue_type], queue_idx); + snprintf(vq_name, sizeof(vq_name), "port%d_vq%d", +dev->data->port_id, vtpci_queue_idx); sz_vq = RTE_ALIGN_CEIL(sizeof(*vq) + vq_size * sizeof(struct vq_desc_extra), @@ -398,9 +397,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, (uint64_t)(uintptr_t)mz->addr); if (sz_hdr_mz) { - snprintf(vq_hdr_name, sizeof(vq_hdr_name), "port%d_%s%d_hdr", -dev->data->port_id, queue_names[queue_type], -queue_idx); + snprintf(vq_hdr_name, sizeof(vq_hdr_name), "port%d_vq%d_hdr", +dev->data->port_id, vtpci_queue_idx); hdr_mz = rte_memzone_reserve_aligned(vq_hdr_name, sz_hdr_mz, socket_id, 0, RTE_CACHE_LINE_SIZE); -- 1.9.0
[dpdk-dev] [PATCH 1/8] net/virtio: revert "virtio: fix restart"
This reverts commit 9a0615af7746 ("virtio: fix restart"); conflict is manually addressed. Kyle reported an issue with above commit qemu-kvm: Guest moved used index from 5 to 1 with following steps, 1) Start my virtio interfaces 2) Send some traffic into/out of the interfaces 3) Stop the interfaces 4) Start the interfaces 5) Send some more traffic And here are some quotes from Kyle's analysis, Prior to the patch, if an interface were stopped then started, without restarting the application, the queues would be left as-is, because hw->started would be set to 1. Now, calling stop sets hw->started to 0, which means the next call to start will "touch the queues". This is the unintended side-effect that causes the problem. Fixes: 9a0615af7746 ("virtio: fix restart") Cc: Jianfeng Tan Cc: Reported-by: Kyle Larose Signed-off-by: Yuanhan Liu --- drivers/net/virtio/virtio_ethdev.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index b388134..5815875 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -550,9 +550,6 @@ virtio_dev_close(struct rte_eth_dev *dev) PMD_INIT_LOG(DEBUG, "virtio_dev_close"); - if (hw->started == 1) - virtio_dev_stop(dev); - if (hw->cvq) virtio_dev_queue_release(hw->cvq->vq); @@ -560,6 +557,7 @@ virtio_dev_close(struct rte_eth_dev *dev) if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC) vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR); vtpci_reset(hw); + hw->started = 0; virtio_dev_free_mbufs(dev); virtio_free_queues(dev); } @@ -1296,15 +1294,17 @@ static int eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev) { struct rte_pci_device *pci_dev; + struct virtio_hw *hw = eth_dev->data->dev_private; PMD_INIT_FUNC_TRACE(); if (rte_eal_process_type() == RTE_PROC_SECONDARY) return -EPERM; - /* Close it anyway since there's no way to know if closed */ - virtio_dev_close(eth_dev); - + if (hw->started == 1) { + virtio_dev_stop(eth_dev); + virtio_dev_close(eth_dev); + } pci_dev = eth_dev->pci_dev; eth_dev->dev_ops = NULL; @@ -1543,12 +1543,9 @@ static void virtio_dev_stop(struct rte_eth_dev *dev) { struct rte_eth_link link; - struct virtio_hw *hw = dev->data->dev_private; PMD_INIT_LOG(DEBUG, "stop"); - hw->started = 0; - if (dev->data->dev_conf.intr_conf.lsc) rte_intr_disable(>pci_dev->intr_handle); -- 1.9.0
[dpdk-dev] [PATCH 0/8] net/virtio: fix queue reconfigure issue
This patchset fixes an issue (virtio/vhost become broken when queue setup is done 2 or more times, check patch 2 for great details) reported by Ilya at the late stage of last release. I sensed it would be some amounts of code work, that I delayed it to this release. However, I failed to make the fix in the early stage of this release, that we come to the same situation again: we are being at the late stage of v16.11 :( However, honestly, I don't want to miss it again in this release, as it is hard to backport such fix to a stable release. It's not a bug can be fixed by few lines of code. The virtio init logic is quite wrong that I need change quite many places to fix it. Meanwhile, I have done my best to keep the change being as minimal as possible. Besides that, judging that v16.11 would be a LTS and it's such an severe bug that should be fixed ASAP (at least, we should make it work in a LTS release), I then tried hard today to make this patchset, with the hope we could fix this severe issue at this release. The issue is decribed in length in patch 4 "net/virtio: allocate queue at init stage". Again, it's not a bug can be fixed by one patch, that you see no one "fix" tag in none of those patches. All below patches work together to fix this issue. Thanks. --yliu --- Yuanhan Liu (8): net/virtio: revert "virtio: fix restart" net/virtio: simplify queue memzone name net/virtio: simplify queue allocation net/virtio: allocate queue at init stage net/virtio: initiate vring at init stage net/virtio: move queue configure code to proper place net/virtio: complete init stage at the right place net/virtio: remove started field drivers/net/virtio/virtio_ethdev.c | 182 --- drivers/net/virtio/virtio_ethdev.h | 10 -- drivers/net/virtio/virtio_pci.h| 3 +- drivers/net/virtio/virtio_rxtx.c | 247 ++--- drivers/net/virtio/virtqueue.h | 7 ++ 5 files changed, 208 insertions(+), 241 deletions(-) -- 1.9.0
[dpdk-dev] [PATCH 3/3] vhost: update comments
vhost-cuse is removed, update corresponding comments that are still referencing it. Signed-off-by: Yuanhan Liu --- examples/tep_termination/main.c | 7 ++- examples/vhost/main.c | 4 +--- examples/vhost_xen/main.c | 3 +-- lib/librte_vhost/rte_virtio_net.h | 5 ++--- lib/librte_vhost/vhost.c | 9 - lib/librte_vhost/vhost.h | 4 +++- 6 files changed, 13 insertions(+), 19 deletions(-) diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c index 622f248..1d6d463 100644 --- a/examples/tep_termination/main.c +++ b/examples/tep_termination/main.c @@ -1151,8 +1151,7 @@ print_stats(void) } /** - * Main function, does initialisation and calls the per-lcore functions. The CUSE - * device is also registered here to handle the IOCTLs. + * Main function, does initialisation and calls the per-lcore functions. */ int main(int argc, char *argv[]) @@ -1253,14 +1252,12 @@ main(int argc, char *argv[]) } rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_MRG_RXBUF); - /* Register CUSE device to handle IOCTLs. */ ret = rte_vhost_driver_register((char *)_basename, 0); if (ret != 0) - rte_exit(EXIT_FAILURE, "CUSE device setup failure.\n"); + rte_exit(EXIT_FAILURE, "failed to register vhost driver.\n"); rte_vhost_driver_callback_register(_net_device_ops); - /* Start CUSE session. */ rte_vhost_driver_session_start(); return 0; diff --git a/examples/vhost/main.c b/examples/vhost/main.c index 91000e8..0709859 100644 --- a/examples/vhost/main.c +++ b/examples/vhost/main.c @@ -1409,8 +1409,7 @@ create_mbuf_pool(uint16_t nr_port, uint32_t nr_switch_core, uint32_t mbuf_size, } /* - * Main function, does initialisation and calls the per-lcore functions. The CUSE - * device is also registered here to handle the IOCTLs. + * Main function, does initialisation and calls the per-lcore functions. */ int main(int argc, char *argv[]) @@ -1531,7 +1530,6 @@ main(int argc, char *argv[]) rte_vhost_driver_callback_register(_net_device_ops); - /* Start CUSE session. */ rte_vhost_driver_session_start(); return 0; diff --git a/examples/vhost_xen/main.c b/examples/vhost_xen/main.c index 2e40357..f4dbaa4 100644 --- a/examples/vhost_xen/main.c +++ b/examples/vhost_xen/main.c @@ -1407,8 +1407,7 @@ print_stats(void) int init_virtio_net(struct virtio_net_device_ops const * const ops); /* - * Main function, does initialisation and calls the per-lcore functions. The CUSE - * device is also registered here to handle the IOCTLs. + * Main function, does initialisation and calls the per-lcore functions. */ int main(int argc, char *argv[]) diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h index c53ff64..926039c 100644 --- a/lib/librte_vhost/rte_virtio_net.h +++ b/lib/librte_vhost/rte_virtio_net.h @@ -123,9 +123,8 @@ int rte_vhost_get_numa_node(int vid); uint32_t rte_vhost_get_queue_num(int vid); /** - * Get the virtio net device's ifname. For vhost-cuse, ifname is the - * path of the char device. For vhost-user, ifname is the vhost-user - * socket file path. + * Get the virtio net device's ifname, which is the vhost-user socket + * file path. * * @param vid * virtio-net device ID diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index d8116ff..31825b8 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -227,9 +227,8 @@ reset_device(struct virtio_net *dev) } /* - * Function is called from the CUSE open function. The device structure is - * initialised and a new entry is added to the device configuration linked - * list. + * Invoked when there is a new vhost-user connection established (when + * there is a new virtio device being attached). */ int vhost_new_device(void) @@ -261,8 +260,8 @@ vhost_new_device(void) } /* - * Function is called from the CUSE release function. This function will - * cleanup the device and remove it from device configuration linked list. + * Invoked when there is the vhost-user connection is broken (when + * the virtio device is being detached). */ void vhost_destroy_device(int vid) diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index acec772..22564f1 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -284,7 +284,9 @@ void vhost_set_ifname(int, const char *if_name, unsigned int if_len); void vhost_enable_dequeue_zero_copy(int vid); /* - * Backend-specific cleanup. Defined by vhost-cuse and vhost-user. + * Backend-specific cleanup. + * + * TODO: fix it; we have one backend now */ void vhost_backend_cleanup(struct virtio_net *dev); -- 1.9.0
[dpdk-dev] [PATCH 2/3] doc: update the vhost sample guide
For vhost-switch sample, the old guide takes too many words on vhost-cuse, which is mainly due to vhost-cuse is invented before vhost-user. Now vhost-cuse is removed, meaning the best part of the doc is useless. Instead of amending one piece here and there, this patch simply removes the most part of the doc and replace it with a simple test guide. For tep_term sample, mainly for removing the part has "vhost-cuse". Signed-off-by: Yuanhan Liu --- The test is simplified a bit (from using two virtio-net devices to one), meaning few figures doesn't apply anymore; thus, vhost_net_sample and tx_dpdk_testpmd figure are removed. And apparently, the vhost_net_arch figure also becomes useless. Yet I have no time to draw one for vhost-user, I also removed the two left figures: qemu_virtio_net and virtio_linux_vhost, which shows the arch of typical virtio net and linux kernel vhost-net. --- doc/guides/sample_app_ug/img/qemu_virtio_net.png | Bin 31557 -> 0 bytes doc/guides/sample_app_ug/img/tx_dpdk_testpmd.png | Bin 76019 -> 0 bytes doc/guides/sample_app_ug/img/vhost_net_arch.png| Bin 154920 -> 0 bytes .../sample_app_ug/img/vhost_net_sample_app.png | Bin 23800 -> 0 bytes .../sample_app_ug/img/virtio_linux_vhost.png | Bin 30290 -> 0 bytes doc/guides/sample_app_ug/index.rst | 10 - doc/guides/sample_app_ug/tep_termination.rst | 54 +- doc/guides/sample_app_ug/vhost.rst | 869 +++-- 8 files changed, 128 insertions(+), 805 deletions(-) delete mode 100644 doc/guides/sample_app_ug/img/qemu_virtio_net.png delete mode 100644 doc/guides/sample_app_ug/img/tx_dpdk_testpmd.png delete mode 100644 doc/guides/sample_app_ug/img/vhost_net_arch.png delete mode 100644 doc/guides/sample_app_ug/img/vhost_net_sample_app.png delete mode 100644 doc/guides/sample_app_ug/img/virtio_linux_vhost.png diff --git a/doc/guides/sample_app_ug/img/qemu_virtio_net.png b/doc/guides/sample_app_ug/img/qemu_virtio_net.png deleted file mode 100644 index a852c1662fe978e1dc785f4771ff285172ae5e86.. GIT binary patch literal 0 HcmV?d1 literal 31557 zcmXtf1yoy2*EI$D;10!IQ at jMX;_gsfQz-7Ph2mP=p~VUA4lNc61h=5Yi at V#O_W$0y z)?KWGWHNW=$UbN9iBeaU!$K!RM?gTpQjnL{L_m1u0>9Hy|AW6HlVv6h|AXu%si2Jt zzx+`@M#5jCIm_$2As}G)|Gi(OvSO1XAkZKvNK0sY=N#tx<Pgrb)#hs{aW-^7PTs zKUjyBePcmqa3zQp{>nnes|fths~T-5%P8)Cpp_Y2qtHdlY3r?RKZER-vvyX;Zwk)l zcwc4A!eM?V;EN}XS<#j0m4^2o!aOq<4{`6Ba@@cjJ5?>KI`RNUj=xs`{=NR}k zlDCQOLQG7YE*8t$Z){{_bkDV at +l4@!o~N$c5~IH+O6g?7h;`KwDGSop!TH`)ocL2J zqQPg=x6?$t9rA^a0SQH?!HsB_$0X^S|=4viH;OS!84rB0?&0STnIC3M{w?w2{)i zjm?eqFGCtY-Wi_Sf}~vG?*0^})YLCFb`Ax;r(Ar2-FXECbc8BWTF912|J85DsHCK% z;K2v11WPL_C at EpIS5Gf53_TahPA@Kpg7-qDBtm{7 at IEAzd=F$sLcy|lXm%)z9{wSb zioBRbs?D1bM?R<V;X`<V?Vl_~4Htl6i at nwRyZ4KW2Bae1sCCvUX>%XkRSKJ$!ZZpY z^?GU+dRjJlMMQEhP;cC;DbBQkt-43i}6trC6E^lSf~XeRml0d%-#!1YMlCX z{b`RlT58t(Aolq^&5y6&`%qE7AS1rg)?viO<g%S1+4nr^Ke(D)oRbuJQ)tz1z3 z>vJ%X{nGhs85x-rEDHJBs?_>dNU%;9BqY|iiHMJ4)X at d!+t*c?l_b(!xMqK9pR8>h z=}6GO+mU at Y`-}<-y32_fue2c9uHf&)TQ!Qpa7-|FkhJ4v4b>?Cga_jj{3q at -7Ob z*<x$$^(Z8WvDF*WN!kaSEIYIznC_>k+3Kz`hKX^l`Cvbc8^TP(TU= zxf_E{zdn_%T`?9Ty
[dpdk-dev] [PATCH 1/3] doc: update vhost programming guide
vhost-cuse has been removed in this release. Update the doc, with the vhost-cuse part being removed. Signed-off-by: Yuanhan Liu --- doc/guides/prog_guide/vhost_lib.rst | 62 + 1 file changed, 7 insertions(+), 55 deletions(-) diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index 573a318..4f997d4 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -46,26 +46,8 @@ vhost library should be able to: * Know all the necessary information about the vring: Information such as where the available ring is stored. Vhost defines some - messages to tell the backend all the information it needs to know how to - manipulate the vring. - -Currently, there are two ways to pass these messages and as a result there are -two Vhost implementations in DPDK: *vhost-cuse* (where the character devices -are in user space) and *vhost-user*. - -Vhost-cuse creates a user space character device and hook to a function ioctl, -so that all ioctl commands that are sent from the frontend (QEMU) will be -captured and handled. - -Vhost-user creates a Unix domain socket file through which messages are -passed. - -.. Note:: - - Since DPDK v2.2, the majority of the development effort has gone into - enhancing vhost-user, such as multiple queue, live migration, and - reconnect. Thus, it is strongly advised to use vhost-user instead of - vhost-cuse. + messages (passed through a Unix domain socket file) to tell the backend all + the information it needs to know how to manipulate the vring. Vhost API Overview @@ -75,11 +57,10 @@ The following is an overview of the Vhost API functions: * ``rte_vhost_driver_register(path, flags)`` - This function registers a vhost driver into the system. For vhost-cuse, a - ``/dev/path`` character device file will be created. For vhost-user server - mode, a Unix domain socket file ``path`` will be created. + This function registers a vhost driver into the system. ``path`` specifies + the Unix domain socket file path. - Currently supported flags are (these are valid for vhost-user only): + Currently supported flags are: - ``RTE_VHOST_USER_CLIENT`` @@ -171,35 +152,8 @@ The following is an overview of the Vhost API functions: default. -Vhost Implementations -- - -Vhost-cuse implementation -~ - -When vSwitch registers the vhost driver, it will register a cuse device driver -into the system and creates a character device file. This cuse driver will -receive vhost open/release/IOCTL messages from the QEMU simulator. - -When the open call is received, the vhost driver will create a vhost device -for the virtio device in the guest. - -When the ``VHOST_SET_MEM_TABLE`` ioctl is received, vhost searches the memory -region to find the starting user space virtual address that maps the memory of -the guest virtual machine. Through this virtual address and the QEMU pid, -vhost can find the file QEMU uses to map the guest memory. Vhost maps this -file into its address space, in this way vhost can fully access the guest -physical memory, which means vhost could access the shared virtio ring and the -guest physical address specified in the entry of the ring. - -The guest virtual machine tells the vhost whether the virtio device is ready -for processing or is de-activated through the ``VHOST_NET_SET_BACKEND`` -message. The registered callback from vSwitch will be called. - -When the release call is made, vhost will destroy the device. - -Vhost-user implementation -~ +Vhost-user Implementations +-- Vhost-user uses Unix domain sockets for passing messages. This means the DPDK vhost-user implementation has two options: @@ -246,8 +200,6 @@ For ``VHOST_SET_MEM_TABLE`` message, QEMU will send information for each memory region and its file descriptor in the ancillary data of the message. The file descriptor is used to map that region. -There is no ``VHOST_NET_SET_BACKEND`` message as in vhost-cuse to signal -whether the virtio device is ready or stopped. Instead, ``VHOST_SET_VRING_KICK`` is used as the signal to put the vhost device into the data plane, and ``VHOST_GET_VRING_BASE`` is used as the signal to remove the vhost device from the data plane. -- 1.9.0
[dpdk-dev] [PATCH 0/3] vhost: comments and doc update due to vhost-cuse removal
Here is a small patchset of updating vhost programming and sample guide and comments, due to the removal of vhost-cuse. --- Yuanhan Liu (3): doc: update vhost programming guide doc: update the vhost sample guide vhost: update comments doc/guides/prog_guide/vhost_lib.rst| 62 +- doc/guides/sample_app_ug/img/qemu_virtio_net.png | Bin 31557 -> 0 bytes doc/guides/sample_app_ug/img/tx_dpdk_testpmd.png | Bin 76019 -> 0 bytes doc/guides/sample_app_ug/img/vhost_net_arch.png| Bin 154920 -> 0 bytes .../sample_app_ug/img/vhost_net_sample_app.png | Bin 23800 -> 0 bytes .../sample_app_ug/img/virtio_linux_vhost.png | Bin 30290 -> 0 bytes doc/guides/sample_app_ug/index.rst | 10 - doc/guides/sample_app_ug/tep_termination.rst | 54 +- doc/guides/sample_app_ug/vhost.rst | 869 +++-- examples/tep_termination/main.c| 7 +- examples/vhost/main.c | 4 +- examples/vhost_xen/main.c | 3 +- lib/librte_vhost/rte_virtio_net.h | 5 +- lib/librte_vhost/vhost.c | 9 +- lib/librte_vhost/vhost.h | 4 +- 15 files changed, 148 insertions(+), 879 deletions(-) delete mode 100644 doc/guides/sample_app_ug/img/qemu_virtio_net.png delete mode 100644 doc/guides/sample_app_ug/img/tx_dpdk_testpmd.png delete mode 100644 doc/guides/sample_app_ug/img/vhost_net_arch.png delete mode 100644 doc/guides/sample_app_ug/img/vhost_net_sample_app.png delete mode 100644 doc/guides/sample_app_ug/img/virtio_linux_vhost.png -- 1.9.0
[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path
On Tue, Nov 01, 2016 at 10:39:35AM +0100, Thomas Monjalon wrote: > 2016-11-01 16:15, Yuanhan Liu: > > On Fri, Oct 28, 2016 at 09:58:51AM +0200, Maxime Coquelin wrote: > > > Agree, what we need is to be able to disable Virtio PMD features > > > without having to rebuild the PMD. > > > > I want this feature (or more precisely, ability) long times ago. > > For example, I'd wish there is an option like "force_legacy" when > > both legacy and modern exist. > > You can change the behaviour of the driver with a run-time parameter > as a struct rte_devargs. Thanks for the tip! Yeah, it's a workable solution, not an ideal one though. --yliu
[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path
On Fri, Oct 28, 2016 at 09:58:51AM +0200, Maxime Coquelin wrote: > >From my experience, and as Michael pointed out, the best mode for small > >packets is obviously > >ANY_LAYOUT so it uses a single descriptor per packet. > Of course, having a single descriptor is in theory the best way. > But, in current Virtio PMD implementation, with no offload supported, we > never access the virtio header at transmit time, it is allocated and > zeroed at startup. > > For ANY_LAYOUT case, the virtio header is prepended to the packet, and > need to be zeroed at packet transmit time. The performance impact is > quite important, as show the measurements I made one month ago (txonly): > - 2 descs per packet: 11.6Mpps > - 1 desc per packet: 9.6Mpps > > As Michael suggested, I tried to replace the memset by direct > fields assignation, but it only recovers a small part of the drop. > > What I suggested is to introduce a new feature, so that we can skip the > virtio header when no offload is negotiated. > > Maybe you have other ideas? > > >So, disabling indirect descriptors may give you better pps for 64 bytes > >packets, but that doesn't mean you should not implement, or enable, it in > >your driver. That just means that the guest is not taking the right > >decision, and uses indirect while it should actually use any_layout. > +1, it really depends on the use-case. > > > >Given virtio/vhost design (most decision comes from the guest), the host > >should be liberal in what it accepts, and not try to influence guest > >implementation by carefully picking the features it supports. Otherwise > >guests will never get a chance to make the right decisions either. > Agree, what we need is to be able to disable Virtio PMD features > without having to rebuild the PMD. I want this feature (or more precisely, ability) long times ago. For example, I'd wish there is an option like "force_legacy" when both legacy and modern exist. > It will certainly require an new API change to add this option. I don't think it will work. Firstly, generally, as discussed before, it's not a good idea to introduce some PMD specific APIs. Secondly, even if it's okay to do so, you need to let the application (say testpmd) invoke it. Once you have done that, you need introduce some CLI options to make sure that part is invoked. And as stated before, it's also not a good idea to add virtio PMD specific CLI options to testpmd. For this case, however, I think it makes more sense if test provides some commands to enable/disable csum and tso stuff. With that, we could enable it dynamically. It has "tso set/show" commands though: it just doesn't look like the right interface to me to enable/disable tso. --yliu
[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path
On Thu, Oct 27, 2016 at 12:35:11PM +0200, Maxime Coquelin wrote: > > > On 10/27/2016 12:33 PM, Yuanhan Liu wrote: > >On Thu, Oct 27, 2016 at 11:10:34AM +0200, Maxime Coquelin wrote: > >>Hi Zhihong, > >> > >>On 10/27/2016 11:00 AM, Wang, Zhihong wrote: > >>>Hi Maxime, > >>> > >>>Seems indirect desc feature is causing serious performance > >>>degradation on Haswell platform, about 20% drop for both > >>>mrg=on and mrg=off (--txqflags=0xf00, non-vector version), > >>>both iofwd and macfwd. > >>I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy Bridge > >>platform, and didn't faced such a drop. > > > >I was actually wondering that may be the cause. I tested it with > >my IvyBridge server as well, I saw no drop. > > > >Maybe you should find a similar platform (Haswell) and have a try? > Yes, that's why I asked Zhihong whether he could test Txonly in guest to > see if issue is reproducible like this. I have no Haswell box, otherwise I could do a quick test for you. IIRC, he tried to disable the indirect_desc feature, then the performance recovered. So, it's likely the indirect_desc is the culprit here. > I will be easier for me to find an Haswell machine if it has not to be > connected back to back to and HW/SW packet generator. Makes sense. --yliu > > Thanks, > Maxime > > > > > --yliu > > > >>Have you tried to pass indirect_desc=off to qemu cmdline to see if you > >>recover the performance? > >> > >>Yuanhan, which platform did you use when you tested it with zero copy? > >> > >>> > >>>I'm using RC2, and the CPU is Xeon E5-2699 v3 @ 2.30GHz. > >>> > >>>Could you please verify if this is true in your test? > >>I'll try -rc1/-rc2 on my platform, and let you know. > >> > >>Thanks, > >>Maxime > >> > >>> > >>> > >>>Thanks > >>>Zhihong > >>> > >>>>-Original Message- > >>>>From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com] > >>>>Sent: Monday, October 17, 2016 10:15 PM > >>>>To: Yuanhan Liu > >>>>Cc: Wang, Zhihong ; Xie, Huawei > >>>>; dev at dpdk.org; vkaplans at redhat.com; > >>>>mst at redhat.com; stephen at networkplumber.org > >>>>Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support > >>>>to the TX path > >>>> > >>>> > >>>> > >>>>On 10/17/2016 03:21 PM, Yuanhan Liu wrote: > >>>>>On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote: > >>>>>>>On my side, I just setup 2 Windows 2016 VMs, and confirm the issue. > >>>>>>>I'll continue the investigation early next week. > >>>>>> > >>>>>>The root cause is identified. > >>>>>>When INDIRECT_DESC feature is negotiated, Windows guest uses indirect > >>>>>>for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD & > >>>>>>virtio-net kernel driver) use indirect only for Tx. > >>>>>>I'll implement indirect support for the Rx path in vhost lib, but the > >>>>>>change will be too big for -rc release. > >>>>>>I propose in the mean time to disable INDIRECT_DESC feature in vhost > >>>>>>lib, we can still enable it locally for testing. > >>>>>> > >>>>>>Yuanhan, is it ok for you? > >>>>> > >>>>>That's okay. > >>>>I'll send a patch to disable it then. > >>>> > >>>>> > >>>>>> > >>>>>>>Has anyone already tested Windows guest with vhost-net, which also > >>>>has > >>>>>>>indirect descs support? > >>>>>> > >>>>>>I tested and confirm it works with vhost-net. > >>>>> > >>>>>I'm a bit confused then. IIRC, vhost-net also doesn't support indirect > >>>>>for Rx path, right? > >>>> > >>>>No, it does support it actually. > >>>>I thought it didn't support too, I misread the Kernel implementation of > >>>>vhost-net and virtio-net. Acutally, virtio-net makes use of indirect > >>>>in Rx path when mergeable buffers is disabled. > >>>> > >>>>The confusion certainly comes from me, sorry about that. > >>>> > >>>>Maxime
[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path
On Thu, Oct 27, 2016 at 11:10:34AM +0200, Maxime Coquelin wrote: > Hi Zhihong, > > On 10/27/2016 11:00 AM, Wang, Zhihong wrote: > >Hi Maxime, > > > >Seems indirect desc feature is causing serious performance > >degradation on Haswell platform, about 20% drop for both > >mrg=on and mrg=off (--txqflags=0xf00, non-vector version), > >both iofwd and macfwd. > I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy Bridge > platform, and didn't faced such a drop. I was actually wondering that may be the cause. I tested it with my IvyBridge server as well, I saw no drop. Maybe you should find a similar platform (Haswell) and have a try? --yliu > Have you tried to pass indirect_desc=off to qemu cmdline to see if you > recover the performance? > > Yuanhan, which platform did you use when you tested it with zero copy? > > > > >I'm using RC2, and the CPU is Xeon E5-2699 v3 @ 2.30GHz. > > > >Could you please verify if this is true in your test? > I'll try -rc1/-rc2 on my platform, and let you know. > > Thanks, > Maxime > > > > > > >Thanks > >Zhihong > > > >>-Original Message- > >>From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com] > >>Sent: Monday, October 17, 2016 10:15 PM > >>To: Yuanhan Liu > >>Cc: Wang, Zhihong ; Xie, Huawei > >>; dev at dpdk.org; vkaplans at redhat.com; > >>mst at redhat.com; stephen at networkplumber.org > >>Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support > >>to the TX path > >> > >> > >> > >>On 10/17/2016 03:21 PM, Yuanhan Liu wrote: > >>>On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote: > >>>>>On my side, I just setup 2 Windows 2016 VMs, and confirm the issue. > >>>>>I'll continue the investigation early next week. > >>>> > >>>>The root cause is identified. > >>>>When INDIRECT_DESC feature is negotiated, Windows guest uses indirect > >>>>for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD & > >>>>virtio-net kernel driver) use indirect only for Tx. > >>>>I'll implement indirect support for the Rx path in vhost lib, but the > >>>>change will be too big for -rc release. > >>>>I propose in the mean time to disable INDIRECT_DESC feature in vhost > >>>>lib, we can still enable it locally for testing. > >>>> > >>>>Yuanhan, is it ok for you? > >>> > >>>That's okay. > >>I'll send a patch to disable it then. > >> > >>> > >>>> > >>>>>Has anyone already tested Windows guest with vhost-net, which also > >>has > >>>>>indirect descs support? > >>>> > >>>>I tested and confirm it works with vhost-net. > >>> > >>>I'm a bit confused then. IIRC, vhost-net also doesn't support indirect > >>>for Rx path, right? > >> > >>No, it does support it actually. > >>I thought it didn't support too, I misread the Kernel implementation of > >>vhost-net and virtio-net. Acutally, virtio-net makes use of indirect > >>in Rx path when mergeable buffers is disabled. > >> > >>The confusion certainly comes from me, sorry about that. > >> > >>Maxime
[dpdk-dev] [PATCH v7 3/7] vhost: simplify mergeable Rx vring reservation
On Wed, Oct 26, 2016 at 12:08:49AM +0200, Thomas Monjalon wrote: > 2016-10-14 17:34, Yuanhan Liu: > > -static inline uint32_t __attribute__((always_inline)) > > +static inline int __attribute__((always_inline)) > > copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue > > *vq, > > - uint16_t end_idx, struct rte_mbuf *m, > > - struct buf_vector *buf_vec) > > + struct rte_mbuf *m, struct buf_vector *buf_vec, > > + uint16_t num_buffers) > > { > > struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; > > uint32_t vec_idx = 0; > > - uint16_t start_idx = vq->last_used_idx; > > - uint16_t cur_idx = start_idx; > > + uint16_t cur_idx = vq->last_used_idx; > > uint64_t desc_addr; > > uint32_t desc_chain_head; > > uint32_t desc_chain_len; > > @@ -394,21 +393,21 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, > > struct vhost_virtqueue *vq, > > struct rte_mbuf *hdr_mbuf; > > > > if (unlikely(m == NULL)) > > - return 0; > > + return -1; > > > > LOG_DEBUG(VHOST_DATA, "(%d) current index %d | end index %d\n", > > dev->vid, cur_idx, end_idx); > > There is a build error: > lib/librte_vhost/virtio_net.c:399:22: error: ?end_idx? undeclared Oops... you know, my robot is broken since the holiday :( I just had a quick fix. Hopefully, it will start working again... > It is probably trivial and could be fixed directly in the already applied > commit in next-virtio. Yes, and FYI, here is the overall diffs I made to fix this bug. --yliu --- diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index b784dba..eed0b1c 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -443,9 +443,6 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct rte_mbuf *m, if (unlikely(m == NULL)) return -1; - LOG_DEBUG(VHOST_DATA, "(%d) current index %d | end index %d\n", - dev->vid, cur_idx, end_idx); - desc_addr = gpa_to_vva(dev, buf_vec[vec_idx].buf_addr); if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr) return -1; @@ -555,6 +552,10 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, break; } + LOG_DEBUG(VHOST_DATA, "(%d) current index %d | end index %d\n", + dev->vid, vq->last_avail_idx, + vq->last_avail_idx + num_buffers); + if (copy_mbuf_to_desc_mergeable(dev, pkts[pkt_idx], buf_vec, num_buffers) < 0) { vq->shadow_used_idx -= num_buffers;
[dpdk-dev] [PATCH v2 1/3] drivers: add name alias registration for rte_driver
On Mon, Oct 24, 2016 at 12:22:21PM -0400, Jan Blunck wrote: > This adds infrastructure for drivers to allow being requested by an alias > so that a renamed driver can still get loaded by its legacy name. > > Signed-off-by: Jan Blunck > Reviewed-by: Maxime Coquelin > Tested-by: Pablo de Lara Series Reviewed-by: Yuanhan Liu --yliu
[dpdk-dev] [PATCH 0/2] vhost: Add support to indirect descriptors on Rx path
On Tue, Oct 18, 2016 at 05:35:37PM +0200, Maxime Coquelin wrote: > Indirect descriptor feature has been enabled in v16.11-rc1, but only TX path > was implemented. > However, some configurations expect it is supported for the Rx path: > - Windows guests with and without mergeable buffers enabled > - Linux guests with Kernel drivr with mergeable buffers disabled > > This series add the support of indirect descs for the Rx path, and has been > tested with following configurations: > - Windows guest with indirect ON, mergeeable ON/OFF > - Linux guest with Kernel driver with indirect ON/OFF, mergeable ON/OFF > - Linux guest with Virtio PMD with mergeable ON/OFF > > The performance degradation measured with/without this series with Virtio PMD > is around 1% (which doesn't use indirect descs for the Rx path). > > The series is based on top of "vhost: optimize mergeable Rx path" v7. Series applied to dpdk-next-virtio. Thanks. --yliu > > Maxime Coquelin (2): > vhost: Add indirect desc support to Rx mergeable path > vhost: Add indirect desc support to the Rx no-mergeable path > > lib/librte_vhost/virtio_net.c | 73 > ++- > 1 file changed, 51 insertions(+), 22 deletions(-) > > -- > 2.7.4
[dpdk-dev] [PATCH v7 0/7] vhost: optimize mergeable Rx path
Applied to dpdk-next-virtio. And thanks for testing and reviewing. --yliu On Fri, Oct 14, 2016 at 05:34:31PM +0800, Yuanhan Liu wrote: > This is a new set of patches to optimize the mergeable Rx code path. > No refactoring (rewrite) was made this time. It just applies some > findings from Zhihong (kudos to him!) that could improve the mergeable > Rx path on the old code. > > The two major factors that could improve the performance greatly are: > > - copy virtio header together with packet data. This could remove > the buubbles between the two copy to optimize the cache access. > > This is implemented in patch 2 "vhost: optimize cache access" > > - shadow used ring update and update them at once > > The basic idea is to update used ring in a local buffer and flush > them to the virtio used ring at once in the end. Again, this is > for optimizing the cache access. > > This is implemented in patch 5 "vhost: shadow used ring update" > > The two optimizations could yield 40+% performance in micro testing > and 20+% in PVP case testing with 64B packet size. > > Besides that, there are some tiny optimizations, such as prefetch > avail ring (patch 6) and retrieve avail head once (patch 7). > > Note: the shadow used ring tech could also be applied to the non-mrg > Rx path (and even the dequeu) path. I didn't do that for two reasons: > > - we already update used ring in batch in both path: it's not shadowed > first though. > > - it's a bit too late too make many changes at this stage: RC1 is out. > > Please help testing. > > Thanks. > > --yliu > > Cc: Jianbo Liu > --- > Yuanhan Liu (4): > vhost: simplify mergeable Rx vring reservation > vhost: use last avail idx for avail ring reservation > vhost: prefetch avail ring > vhost: retrieve avail head once > > Zhihong Wang (3): > vhost: remove useless volatile > vhost: optimize cache access > vhost: shadow used ring update > > lib/librte_vhost/vhost.c | 13 ++- > lib/librte_vhost/vhost.h | 5 +- > lib/librte_vhost/vhost_user.c | 23 +++-- > lib/librte_vhost/virtio_net.c | 193 > +- > 4 files changed, 149 insertions(+), 85 deletions(-) > > -- > 1.9.0
[dpdk-dev] [PATCH] drivers: make driver names consistent
On Tue, Oct 18, 2016 at 06:46:40PM +0200, Thomas Monjalon wrote: > 2016-10-18 22:18, Yuanhan Liu: > > On Tue, Oct 18, 2016 at 03:42:54PM +0200, Thomas Monjalon wrote: > > > 2016-10-18 21:06, Yuanhan Liu: > > > > On Tue, Oct 18, 2016 at 02:50:16PM +0200, Jan Blunck wrote: > > > > > >From my understanding this is a massive API breakage. This forces all > > > > > existing users of the virtual PMDs to change with zero benefit. Even > > > > > if that isn't enough it also makes it impossible to switch between > > > > > releases by recompiling. > > > > > > > > > > Can we please revert these changes and work on some aliasing support > > > > > for the PMDs to fix it long term? > > > > > > > > +1. Aliasing is also something I would suggest before making such > > > > renames. > > > > > > It is a brutal change, yes. > > > It was announced in 16.07 release notes though. > > > > Yes, but it still took me a while (by running git bisect) to figure out > > what went wrong: I wasn't aware of such note, that I was thinking maybe > > something is broken. > > > > Later I also got quite few same complains. It may also took them a while > > to know what's happened. > > > > Anyway, my point is, for this kind of change, we should have added the > > alias support firstly. > > Yes. > > > If that's been done, then the announcement is not > > needed at all? > > The announcement would be needed to remove the aliases, later. Why do you have to remove an aliase? What's wrong if they stay there forever. I think the naming will be switched (maybe slowly) when we haved updated the doc (whereas it states "net_vhost" but not "eth_vhost" only). After few releases, the name should have been switched fluently. And we should not care about those alias. --yliu
[dpdk-dev] [PATCH] vhost: fix use after free issue
Fix the coverity USE_AFTER_FREE issue. Fixes: a277c7159876 ("vhost: refactor code structure") Coverity issue: 137884 Reported-by: John McNamara Signed-off-by: Yuanhan Liu --- lib/librte_vhost/socket.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 967cb65..aaa9c27 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -250,8 +250,8 @@ vhost_user_read_cb(int connfd, void *dat, int *remove) vsocket->connfd = -1; close(connfd); *remove = 1; - free(conn); vhost_destroy_device(conn->vid); + free(conn); if (vsocket->reconnect) vhost_user_create_client(vsocket); -- 1.9.0
[dpdk-dev] [PATCH v3] vhost: Only access header if offloading is supported in dequeue path
On Fri, Oct 14, 2016 at 10:07:07AM +0200, Maxime Coquelin wrote: > If offloading features are not negotiated, parsing the virtio header > is not needed. > > Micro-benchmark with testpmd shows that the gain is +4% with indirect > descriptors, +1% when using direct descriptors. > > Signed-off-by: Maxime Coquelin Applied to dpdk-next-virtio. Thanks. --yliu
[dpdk-dev] [PATCH] drivers: make driver names consistent
On Tue, Oct 18, 2016 at 03:42:54PM +0200, Thomas Monjalon wrote: > 2016-10-18 21:06, Yuanhan Liu: > > On Tue, Oct 18, 2016 at 02:50:16PM +0200, Jan Blunck wrote: > > > >From my understanding this is a massive API breakage. This forces all > > > existing users of the virtual PMDs to change with zero benefit. Even > > > if that isn't enough it also makes it impossible to switch between > > > releases by recompiling. > > > > > > Can we please revert these changes and work on some aliasing support > > > for the PMDs to fix it long term? > > > > +1. Aliasing is also something I would suggest before making such renames. > > It is a brutal change, yes. > It was announced in 16.07 release notes though. Yes, but it still took me a while (by running git bisect) to figure out what went wrong: I wasn't aware of such note, that I was thinking maybe something is broken. Later I also got quite few same complains. It may also took them a while to know what's happened. Anyway, my point is, for this kind of change, we should have added the alias support firstly. If that's been done, then the announcement is not needed at all? --yliu > > We can try to make this change more progressive by keeping old names > as aliases for some time. > Is there a volunteer to work on vdev names aliases, > with the target of integrating them in RC2 or RC3?
[dpdk-dev] [PATCH] drivers: make driver names consistent
On Tue, Oct 18, 2016 at 02:50:16PM +0200, Jan Blunck wrote: > >From my understanding this is a massive API breakage. This forces all > existing users of the virtual PMDs to change with zero benefit. Even > if that isn't enough it also makes it impossible to switch between > releases by recompiling. > > Can we please revert these changes and work on some aliasing support > for the PMDs to fix it long term? +1. Aliasing is also something I would suggest before making such renames. --yliu > > Thanks, > Jan > > > On Fri, Sep 16, 2016 at 11:58 AM, Thomas Monjalon > wrote: > > 2016-08-24 22:37, Mcnamara, John: > >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pablo de Lara > >> > > >> > ... > >> > > >> > -$RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev > >> > 'eth_pcap0,rx_pcap=/path/to/ file_rx.pcap,tx_pcap=/path/to/file_tx.pcap' > >> > -- --port-topology=chained > >> > +$RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev > >> > 'net_pcap0,rx_pcap=/path/to/ file_rx.pcap,tx_pcap=/path/to/file_tx.pcap' > >> > -- --port-topology=chained > >> > >> > >> I know that this is an existing issue but there shouldn't be a space in > >> "/path/to/ file". Perhaps you could fix that (in a number of places) as > >> part > >> of this patch. You could probably leave out the "/path/to/" part > >> altogether as > >> it may be clearer, see below. > >> > >> Also, could you wrap the long code lines in the sections that you change at > >> 80 chars using "\" to keep them on the page in the PDF docs, like: > >> > >> $RTE_TARGET/app/testpmd -c '0xf' -n 4 \ > >> --vdev > >> 'net_pcap0,rx_pcap=/path/to/file_rx.pcap,tx_pcap=/path/to/file_tx.pcap' \ > >> -- --port-topology=chained > >> > >> Or without the path part: > >> > >> $RTE_TARGET/app/testpmd -c '0xf' -n 4 \ > >> --vdev 'net_pcap0,rx_pcap=file_rx.pcap,tx_pcap=file_tx.pcap' \ > >> -- --port-topology=chained > > > > Applied with above comments fixed and release notes updated, thanks.
[dpdk-dev] [PATCH] vhost: disable indirect descriptors feature
On Tue, Oct 18, 2016 at 10:13:54AM +0200, Maxime Coquelin wrote: > >>Thanks to Zhihong series you reworked, the changes to be done for > >>mergeable buffers case is greatly simplified. > >>I'll send the series later today. > > > >Do you mean the v6 from Zhihong? Unluckily, it will not be merged. That > >series has been simplified to not rewrite the enqueue from scratch. See > >V7. > No, I meant the v7. Oh, glad to know that the rework helps here :) --yliu > > > > >For this patch, I think you should also update (or remove?) the related > >section in the release note. > Yes, sure. > > Thanks, > Maxime
[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path
On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote: > >On my side, I just setup 2 Windows 2016 VMs, and confirm the issue. > >I'll continue the investigation early next week. > > The root cause is identified. > When INDIRECT_DESC feature is negotiated, Windows guest uses indirect > for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD & > virtio-net kernel driver) use indirect only for Tx. > > I'll implement indirect support for the Rx path in vhost lib, but the > change will be too big for -rc release. > I propose in the mean time to disable INDIRECT_DESC feature in vhost > lib, we can still enable it locally for testing. > > Yuanhan, is it ok for you? That's okay. > > >Has anyone already tested Windows guest with vhost-net, which also has > >indirect descs support? > > I tested and confirm it works with vhost-net. I'm a bit confused then. IIRC, vhost-net also doesn't support indirect for Rx path, right? --yliu
[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
On Thu, Oct 13, 2016 at 11:23:44AM +0200, Maxime Coquelin wrote: > I was going to re-run some PVP benchmark with 0% pkt loss, as I had > some strange results last week. > > Problem is that your series no more apply cleanly due to > next-virtio's master branch history rewrite. > Any chance you send me a rebased version so that I can apply the series? I think it's pointless to do that now: it won't be merged after all. We have refactored out the new series, please help review if you got time :) BTW, apologize that I forgot to include your Reviewed-by for the first patch. I intended to do that ... --yliu
[dpdk-dev] [PATCH v7 7/7] vhost: retrieve avail head once
There is no need to retrieve the latest avail head every time we enqueue a packet in the mereable Rx path by avail_idx = *((volatile uint16_t *)>avail->idx); Instead, we could just retrieve it once at the beginning of the enqueue path. This could diminish the cache penalty slightly, because the virtio driver could be updating it while vhost is reading it (for each packet). Signed-off-by: Yuanhan Liu Signed-off-by: Zhihong Wang --- lib/librte_vhost/virtio_net.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 12a037b..b784dba 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -387,10 +387,10 @@ fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx, */ static inline int reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, uint32_t size, - struct buf_vector *buf_vec, uint16_t *num_buffers) + struct buf_vector *buf_vec, uint16_t *num_buffers, + uint16_t avail_head) { uint16_t cur_idx; - uint16_t avail_idx; uint32_t vec_idx = 0; uint16_t tries = 0; @@ -401,8 +401,7 @@ reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, uint32_t size, cur_idx = vq->last_avail_idx; while (size > 0) { - avail_idx = *((volatile uint16_t *)>avail->idx); - if (unlikely(cur_idx == avail_idx)) + if (unlikely(cur_idx == avail_head)) return -1; if (unlikely(fill_vec_buf(vq, cur_idx, _idx, buf_vec, @@ -523,6 +522,7 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, uint32_t pkt_idx = 0; uint16_t num_buffers; struct buf_vector buf_vec[BUF_VECTOR_MAX]; + uint16_t avail_head; LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__); if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) { @@ -542,11 +542,12 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, rte_prefetch0(>avail->ring[vq->last_avail_idx & (vq->size - 1)]); vq->shadow_used_idx = 0; + avail_head = *((volatile uint16_t *)>avail->idx); for (pkt_idx = 0; pkt_idx < count; pkt_idx++) { uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen; if (unlikely(reserve_avail_buf_mergeable(vq, pkt_len, buf_vec, -_buffers) < 0)) { + _buffers, avail_head) < 0)) { LOG_DEBUG(VHOST_DATA, "(%d) failed to get enough desc from vring\n", dev->vid); -- 1.9.0
[dpdk-dev] [PATCH v7 6/7] vhost: prefetch avail ring
Signed-off-by: Yuanhan Liu Signed-off-by: Zhihong Wang --- lib/librte_vhost/virtio_net.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 2bdc2fe..12a037b 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -539,6 +539,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, if (count == 0) return 0; + rte_prefetch0(>avail->ring[vq->last_avail_idx & (vq->size - 1)]); + vq->shadow_used_idx = 0; for (pkt_idx = 0; pkt_idx < count; pkt_idx++) { uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen; -- 1.9.0
[dpdk-dev] [PATCH v7 5/7] vhost: shadow used ring update
From: Zhihong Wang <zhihong.w...@intel.com> The basic idea is to shadow the used ring update: update them into a local buffer first, and then flush them all to the virtio used vring at once in the end. And since we do avail ring reservation before enqueuing data, we would know which and how many descs will be used. Which means we could update the shadow used ring at the reservation time. It also introduce another slight advantage: we don't need access the desc->flag any more inside copy_mbuf_to_desc_mergeable(). Signed-off-by: Zhihong Wang Signed-off-by: Yuanhan Liu --- lib/librte_vhost/vhost.c | 13 +++- lib/librte_vhost/vhost.h | 3 + lib/librte_vhost/vhost_user.c | 23 +-- lib/librte_vhost/virtio_net.c | 138 +- 4 files changed, 113 insertions(+), 64 deletions(-) diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 469117a..d8116ff 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -121,9 +121,18 @@ static void free_device(struct virtio_net *dev) { uint32_t i; + struct vhost_virtqueue *rxq, *txq; - for (i = 0; i < dev->virt_qp_nb; i++) - rte_free(dev->virtqueue[i * VIRTIO_QNUM]); + for (i = 0; i < dev->virt_qp_nb; i++) { + rxq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ]; + txq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ]; + + rte_free(rxq->shadow_used_ring); + rte_free(txq->shadow_used_ring); + + /* rxq and txq are allocated together as queue-pair */ + rte_free(rxq); + } rte_free(dev); } diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 17c557f..acec772 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -105,6 +105,9 @@ struct vhost_virtqueue { uint16_tlast_zmbuf_idx; struct zcopy_mbuf *zmbufs; struct zcopy_mbuf_list zmbuf_list; + + struct vring_used_elem *shadow_used_ring; + uint16_tshadow_used_idx; } __rte_cache_aligned; /* Old kernels have no such macro defined */ diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 3074227..6b83c15 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -198,6 +198,15 @@ vhost_user_set_vring_num(struct virtio_net *dev, } } + vq->shadow_used_ring = rte_malloc(NULL, + vq->size * sizeof(struct vring_used_elem), + RTE_CACHE_LINE_SIZE); + if (!vq->shadow_used_ring) { + RTE_LOG(ERR, VHOST_CONFIG, + "failed to allocate memory for shadow used ring.\n"); + return -1; + } + return 0; } @@ -711,6 +720,8 @@ static int vhost_user_get_vring_base(struct virtio_net *dev, struct vhost_vring_state *state) { + struct vhost_virtqueue *vq = dev->virtqueue[state->index]; + /* We have to stop the queue (virtio) if it is running. */ if (dev->flags & VIRTIO_DEV_RUNNING) { dev->flags &= ~VIRTIO_DEV_RUNNING; @@ -718,7 +729,7 @@ vhost_user_get_vring_base(struct virtio_net *dev, } /* Here we are safe to get the last used index */ - state->num = dev->virtqueue[state->index]->last_used_idx; + state->num = vq->last_used_idx; RTE_LOG(INFO, VHOST_CONFIG, "vring base idx:%d file:%d\n", state->index, state->num); @@ -727,13 +738,15 @@ vhost_user_get_vring_base(struct virtio_net *dev, * sent and only sent in vhost_vring_stop. * TODO: cleanup the vring, it isn't usable since here. */ - if (dev->virtqueue[state->index]->kickfd >= 0) - close(dev->virtqueue[state->index]->kickfd); + if (vq->kickfd >= 0) + close(vq->kickfd); - dev->virtqueue[state->index]->kickfd = VIRTIO_UNINITIALIZED_EVENTFD; + vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD; if (dev->dequeue_zero_copy) - free_zmbufs(dev->virtqueue[state->index]); + free_zmbufs(vq); + rte_free(vq->shadow_used_ring); + vq->shadow_used_ring = NULL; return 0; } diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index b5ba633..2bdc2fe 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -91,6 +91,56 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t qp_nb) return (is_tx ^ (idx & 1)) == 0 && idx < qp_nb * VIRTIO_QNUM; } +static inline void __attribute__((always_inline)) +do_flush_shadow_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq, +
[dpdk-dev] [PATCH v7 3/7] vhost: simplify mergeable Rx vring reservation
Let it return "num_buffers" we reserved, so that we could re-use it with copy_mbuf_to_desc_mergeable() directly, instead of calculating it again there. Meanwhile, the return type of copy_mbuf_to_desc_mergeable is changed to "int". -1 will be return on error. Signed-off-by: Yuanhan Liu Signed-off-by: Zhihong Wang --- lib/librte_vhost/virtio_net.c | 41 + 1 file changed, 21 insertions(+), 20 deletions(-) diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index d4fc62a..1a40c91 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -336,7 +336,7 @@ fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx, */ static inline int reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, uint32_t size, - uint16_t *end, struct buf_vector *buf_vec) + struct buf_vector *buf_vec, uint16_t *num_buffers) { uint16_t cur_idx; uint16_t avail_idx; @@ -370,19 +370,18 @@ reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, uint32_t size, return -1; } - *end = cur_idx; + *num_buffers = cur_idx - vq->last_used_idx; return 0; } -static inline uint32_t __attribute__((always_inline)) +static inline int __attribute__((always_inline)) copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, - uint16_t end_idx, struct rte_mbuf *m, - struct buf_vector *buf_vec) + struct rte_mbuf *m, struct buf_vector *buf_vec, + uint16_t num_buffers) { struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; uint32_t vec_idx = 0; - uint16_t start_idx = vq->last_used_idx; - uint16_t cur_idx = start_idx; + uint16_t cur_idx = vq->last_used_idx; uint64_t desc_addr; uint32_t desc_chain_head; uint32_t desc_chain_len; @@ -394,21 +393,21 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, struct rte_mbuf *hdr_mbuf; if (unlikely(m == NULL)) - return 0; + return -1; LOG_DEBUG(VHOST_DATA, "(%d) current index %d | end index %d\n", dev->vid, cur_idx, end_idx); desc_addr = gpa_to_vva(dev, buf_vec[vec_idx].buf_addr); if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr) - return 0; + return -1; hdr_mbuf = m; hdr_addr = desc_addr; hdr_phys_addr = buf_vec[vec_idx].buf_addr; rte_prefetch0((void *)(uintptr_t)hdr_addr); - virtio_hdr.num_buffers = end_idx - start_idx; + virtio_hdr.num_buffers = num_buffers; LOG_DEBUG(VHOST_DATA, "(%d) RX: num merge buffers %d\n", dev->vid, virtio_hdr.num_buffers); @@ -440,7 +439,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, desc_addr = gpa_to_vva(dev, buf_vec[vec_idx].buf_addr); if (unlikely(!desc_addr)) - return 0; + return -1; /* Prefetch buffer address. */ rte_prefetch0((void *)(uintptr_t)desc_addr); @@ -489,7 +488,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, offsetof(struct vring_used, ring[used_idx]), sizeof(vq->used->ring[used_idx])); - return end_idx - start_idx; + return 0; } static inline uint32_t __attribute__((always_inline)) @@ -497,8 +496,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, struct rte_mbuf **pkts, uint32_t count) { struct vhost_virtqueue *vq; - uint32_t pkt_idx = 0, nr_used = 0; - uint16_t end; + uint32_t pkt_idx = 0; + uint16_t num_buffers; struct buf_vector buf_vec[BUF_VECTOR_MAX]; LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__); @@ -519,22 +518,24 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, for (pkt_idx = 0; pkt_idx < count; pkt_idx++) { uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen; - if (unlikely(reserve_avail_buf_mergeable(vq, pkt_len, -, buf_vec) < 0)) { + if (unlikely(reserve_avail_buf_mergeable(vq, pkt_len, buf_vec, +_buffers) < 0)) { LOG_DEBUG(VHOST_DATA, "(%d) failed to get enough desc from vring\n", dev->vid); break; } - nr_used = copy_mbuf_
[dpdk-dev] [PATCH v7 2/7] vhost: optimize cache access
From: Zhihong Wang <zhihong.w...@intel.com> This patch reorders the code to delay virtio header write to improve cache access efficiency for cases where the mrg_rxbuf feature is turned on. CPU pipeline stall cycles can be significantly reduced. Virtio header write and mbuf data copy are all remote store operations which takes a long time to finish. It's a good idea to put them together to remove bubbles in between, to let as many remote store instructions as possible go into store buffer at the same time to hide latency, and to let the H/W prefetcher goes to work as early as possible. On a Haswell machine, about 100 cycles can be saved per packet by this patch alone. Taking 64B packets traffic for example, this means about 60% efficiency improvement for the enqueue operation. Signed-off-by: Zhihong Wang Signed-off-by: Yuanhan Liu --- lib/librte_vhost/virtio_net.c | 22 -- 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 812e5d3..d4fc62a 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -390,6 +390,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, uint32_t desc_offset, desc_avail; uint32_t cpy_len; uint16_t desc_idx, used_idx; + uint64_t hdr_addr, hdr_phys_addr; + struct rte_mbuf *hdr_mbuf; if (unlikely(m == NULL)) return 0; @@ -401,17 +403,15 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr) return 0; - rte_prefetch0((void *)(uintptr_t)desc_addr); + hdr_mbuf = m; + hdr_addr = desc_addr; + hdr_phys_addr = buf_vec[vec_idx].buf_addr; + rte_prefetch0((void *)(uintptr_t)hdr_addr); virtio_hdr.num_buffers = end_idx - start_idx; LOG_DEBUG(VHOST_DATA, "(%d) RX: num merge buffers %d\n", dev->vid, virtio_hdr.num_buffers); - virtio_enqueue_offload(m, _hdr.hdr); - copy_virtio_net_hdr(dev, desc_addr, virtio_hdr); - vhost_log_write(dev, buf_vec[vec_idx].buf_addr, dev->vhost_hlen); - PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0); - desc_avail = buf_vec[vec_idx].buf_len - dev->vhost_hlen; desc_offset = dev->vhost_hlen; desc_chain_head = buf_vec[vec_idx].desc_idx; @@ -456,6 +456,16 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, mbuf_avail = rte_pktmbuf_data_len(m); } + if (hdr_addr) { + virtio_enqueue_offload(hdr_mbuf, _hdr.hdr); + copy_virtio_net_hdr(dev, hdr_addr, virtio_hdr); + vhost_log_write(dev, hdr_phys_addr, dev->vhost_hlen); + PRINT_PACKET(dev, (uintptr_t)hdr_addr, +dev->vhost_hlen, 0); + + hdr_addr = 0; + } + cpy_len = RTE_MIN(desc_avail, mbuf_avail); rte_memcpy((void *)((uintptr_t)(desc_addr + desc_offset)), rte_pktmbuf_mtod_offset(m, void *, mbuf_offset), -- 1.9.0
[dpdk-dev] [PATCH v7 1/7] vhost: remove useless volatile
From: Zhihong Wanglast_used_idx is a local var, there is no need to decorate it by "volatile". Signed-off-by: Zhihong Wang --- lib/librte_vhost/vhost.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 53dbf33..17c557f 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -85,7 +85,7 @@ struct vhost_virtqueue { uint32_tsize; uint16_tlast_avail_idx; - volatile uint16_t last_used_idx; + uint16_tlast_used_idx; #define VIRTIO_INVALID_EVENTFD (-1) #define VIRTIO_UNINITIALIZED_EVENTFD (-2) -- 1.9.0
[dpdk-dev] [PATCH v7 0/7] vhost: optimize mergeable Rx path
This is a new set of patches to optimize the mergeable Rx code path. No refactoring (rewrite) was made this time. It just applies some findings from Zhihong (kudos to him!) that could improve the mergeable Rx path on the old code. The two major factors that could improve the performance greatly are: - copy virtio header together with packet data. This could remove the buubbles between the two copy to optimize the cache access. This is implemented in patch 2 "vhost: optimize cache access" - shadow used ring update and update them at once The basic idea is to update used ring in a local buffer and flush them to the virtio used ring at once in the end. Again, this is for optimizing the cache access. This is implemented in patch 5 "vhost: shadow used ring update" The two optimizations could yield 40+% performance in micro testing and 20+% in PVP case testing with 64B packet size. Besides that, there are some tiny optimizations, such as prefetch avail ring (patch 6) and retrieve avail head once (patch 7). Note: the shadow used ring tech could also be applied to the non-mrg Rx path (and even the dequeu) path. I didn't do that for two reasons: - we already update used ring in batch in both path: it's not shadowed first though. - it's a bit too late too make many changes at this stage: RC1 is out. Please help testing. Thanks. --yliu Cc: Jianbo Liu --- Yuanhan Liu (4): vhost: simplify mergeable Rx vring reservation vhost: use last avail idx for avail ring reservation vhost: prefetch avail ring vhost: retrieve avail head once Zhihong Wang (3): vhost: remove useless volatile vhost: optimize cache access vhost: shadow used ring update lib/librte_vhost/vhost.c | 13 ++- lib/librte_vhost/vhost.h | 5 +- lib/librte_vhost/vhost_user.c | 23 +++-- lib/librte_vhost/virtio_net.c | 193 +- 4 files changed, 149 insertions(+), 85 deletions(-) -- 1.9.0
[dpdk-dev] 16.07.1 stable patches review and test
Hi, I have applied most of bug fixing patches (listed below) to the 16.07 stable branch at http://dpdk.org/browse/dpdk-stable/ Please help reviewing and testing. The planned date for the final release is 26th, Oct. Before that, please shout if anyone has objections with these patches being applied. Thanks. --yliu --- Alejandro Lucero (1): net/nfp: fix copying MAC address Aleksey Katargin (1): table: fix symbol exports Alex Zelezniak (1): net/ixgbe: fix VF reset to apply to correct VF Ali Volkan Atli (1): net/e1000: fix returned number of available Rx descriptors Arek Kusztal (1): app/test: fix verification of digest for GCM Beilei Xing (2): net/i40e: fix dropping packets with ethertype 0x88A8 net/i40e: fix parsing QinQ packets type Bruce Richardson (1): net/mlx: fix debug build with gcc 6.1 Christian Ehrhardt (1): examples/ip_pipeline: fix Python interpreter Deepak Kumar Jain (2): crypto/null: fix key size increment value crypto/qat: fix FreeBSD build Dror Birkman (1): net/pcap: fix memory leak in jumbo frames Ferruh Yigit (2): app/testpmd: fix help of MTU set commmand pmdinfogen: fix clang build Gary Mussar (1): tools: fix virtio interface name when binding Gowrishankar Muthukrishnan (1): examples/ip_pipeline: fix lcore mapping for ppc64 Hiroyuki Mikita (1): sched: fix releasing enqueued packets James Poole (1): app/testpmd: fix timeout in Rx queue flushing Jianfeng Tan (3): net/virtio_user: fix first queue pair without multiqueue net/virtio_user: fix wrong sequence of messages net/virtio_user: fix error management during init Jim Harris (1): contigmem: zero all pages during mmap John Daley (1): net/enic: fix bad L4 checksum flag on ICMP packets Karmarkar Suyash (1): timer: fix lag delay Maciej Czekaj (1): mem: fix crash on hugepage mapping error Nelson Escobar (1): net/enic: fix freeing memory for descriptor ring Olivier Matz (4): app/testpmd: fix crash when mempool allocation fails tools: fix json output of pmdinfo mbuf: fix error handling on pool creation mem: fix build with -O1 Pablo de Lara (3): hash: fix ring size hash: fix false zero signature key hit lookup crypto: fix build with icc Qi Zhang (1): net/i40e/base: fix UDP packet header Rich Lane (1): net/i40e: fix null pointer dereferences when using VMDq+RSS Weiliang Luo (1): mempool: fix corruption due to invalid handler Xiao Wang (5): net/fm10k: fix MAC address removal from switch net/ixgbe/base: fix pointer check net/ixgbe/base: fix check for NACK net/ixgbe/base: fix possible corruption of shadow RAM net/ixgbe/base: fix skipping PHY config Yangchao Zhou (1): pci: fix memory leak when detaching device Yury Kylulin (2): net/ixgbe: fix mbuf leak during Rx queue release net/i40e: fix mbuf leak during Rx queue release Zhiyong Yang (1): net/virtio: fix xstats name
[dpdk-dev] [PATCH v3 12/12] net/virtio: add Tso support
On Thu, Oct 13, 2016 at 04:16:11PM +0200, Olivier Matz wrote: > +/* When doing TSO, the IP length is not included in the pseudo header > + * checksum of the packet given to the PMD, but for virtio it is > + * expected. > + */ > +static void > +virtio_tso_fix_cksum(struct rte_mbuf *m) > +{ > + /* common case: header is not fragmented */ > + if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len + > + m->l4_len)) { > + struct ipv4_hdr *iph; > + struct ipv6_hdr *ip6h; > + struct tcp_hdr *th; > + uint16_t prev_cksum, new_cksum, ip_len, ip_paylen; > + uint32_t tmp; ... > + } else { As discussed just now, if you drop the else part, you could add my ACK for the whole virtio changes, and Review-ed by for all mbuf and other changes. Thoams, please pick them by youself directly: since it depends on other patches and they will be picked (or already be picked?) by you. Thanks. --yliu
[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
On Thu, Oct 13, 2016 at 05:45:21PM +0200, Olivier Matz wrote: > >> If you have a packet split like this: > >> > >> mbuf segment 1 mbuf segment 2 > >> -- > >> | Ethernet header | IP hea| |der | TCP header | data > >> -- > >>^ > >>iph > > > >Thanks, that's clear. How could you be able to access the tcp header > >from the first mbuf then? I mean, how is the following code supposed > >to work? > > > >prev_cksum.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *, > > m->l2_len + m->l3_len + 16); > > > > Oh I see... Sorry there was a confusion on my side with another (internal) > macro that browses the segments if the offset ils not in the first one. > > If you agree, let's add the code without the else part, I'll fix it for the > rc2. Good. That's okay to me. --yliu
[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
On Thu, Oct 13, 2016 at 05:15:24PM +0200, Olivier MATZ wrote: > > > On 10/13/2016 05:01 PM, Yuanhan Liu wrote: > >On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote: > >> > >> > >>On 10/13/2016 04:16 PM, Yuanhan Liu wrote: > >>>On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote: > >>>> > >>>> > >>>>On 10/13/2016 10:18 AM, Yuanhan Liu wrote: > >>>>>On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote: > >>>>>>+/* When doing TSO, the IP length is not included in the pseudo header > >>>>>>+ * checksum of the packet given to the PMD, but for virtio it is > >>>>>>+ * expected. > >>>>>>+ */ > >>>>>>+static void > >>>>>>+virtio_tso_fix_cksum(struct rte_mbuf *m) > >>>>>>+{ > >>>>>>+ /* common case: header is not fragmented */ > >>>>>>+ if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len + > >>>>>>+ m->l4_len)) { > >>>>>... > >>>>>>+ /* replace it in the packet */ > >>>>>>+ th->cksum = new_cksum; > >>>>>>+ } else { > >>>>>... > >>>>>>+ /* replace it in the packet */ > >>>>>>+ *rte_pktmbuf_mtod_offset(m, uint8_t *, > >>>>>>+ m->l2_len + m->l3_len + 16) = new_cksum.u8[0]; > >>>>>>+ *rte_pktmbuf_mtod_offset(m, uint8_t *, > >>>>>>+ m->l2_len + m->l3_len + 17) = new_cksum.u8[1]; > >>>>>>+ } > >>>>> > >>>>>The tcp header will always be in the mbuf, right? Otherwise, you can't > >>>>>update the cksum field here. What's the point of introducing the "else > >>>>>clause" then? > >>>> > >>>>Sorry, I don't see the problem you're pointing out here. > >>>> > >>>>What I want to solve here is to support the cases where the mbuf is > >>>>segmented in the middle of the network header (which is probably a rare > >>>>case). > >>> > >>>How it's gonna segmented? > >> > >>The mbuf is given by the application. So if the application generates a > >>segmented mbuf, it should work. > >> > >>This could happen for instance if the application uses mbuf clones to share > >>the IP/TCP/data part of the mbuf and prepend a specific Ethernet/vlan for > >>different destination. > >> > >> > >>>>In the "else" part, I only access the mbuf byte by byte using the > >>>>rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy > >>>>the header in a linear buffer, fix the checksum, then copy it again in the > >>>>packet, but there is no mbuf helpers to do these copies for now. > >>> > >>>In the "else" clause, the ip header is still in the mbuf, right? > >>>Why do you have to access it the way like: > >>> > >>> ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *, > >>> m->l2_len) >> 4; > >>> > >>>Why can't you just use > >>> > >>> iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len); > >>> iph->version_ihl ; > >> > >>AFAIK, there is no requirement that each network header has to be contiguous > >>in a mbuf segment. > >> > >>Of course, a split in the middle of a network header probably never > >>happens... but we never knows, as it is not forbidden. I think the code > >>should be robust enough to avoid accesses to wrong addresses. > >> > >>Hope it's clear enough :) > > > >Thanks, but not really. Maybe let me ask this way: what wrong would > >happen if we use > > iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len); > >to access the IP header? Is it about the endian? > > If you have a packet split like this: > > mbuf segment 1 mbuf segment 2 > -- > | Ethernet header | IP hea| |der | TCP header | data > -- >^ >iph Thanks, that's clear. How could you be able to access the tcp header from the first mbuf then? I mean, how is the following code supposed to work? prev_cksum.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *, m->l2_len + m->l3_len + 16); > The IP header is not contiguous. So accessing to the end of the structure > will access to a wrong location. > > >One more question is do you have any case to trigger the "else" clause? > > No, but I think it may happen. A piece of untest code is not trusted though ... --yliu
[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote: > >In the "else" clause, the ip header is still in the mbuf, right? > >Why do you have to access it the way like: > > > > ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *, > > m->l2_len) >> 4; > > > >Why can't you just use > > > > iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len); > > iph->version_ihl ; > > AFAIK, there is no requirement that each network header has to be contiguous > in a mbuf segment. > > Of course, a split in the middle of a network header probably never > happens... but we never knows, as it is not forbidden. I think the code > should be robust enough to avoid accesses to wrong addresses. One more question is do you have any case to trigger the "else" clause? --yliu
[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote: > > > On 10/13/2016 04:16 PM, Yuanhan Liu wrote: > >On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote: > >> > >> > >>On 10/13/2016 10:18 AM, Yuanhan Liu wrote: > >>>On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote: > >>>>+/* When doing TSO, the IP length is not included in the pseudo header > >>>>+ * checksum of the packet given to the PMD, but for virtio it is > >>>>+ * expected. > >>>>+ */ > >>>>+static void > >>>>+virtio_tso_fix_cksum(struct rte_mbuf *m) > >>>>+{ > >>>>+ /* common case: header is not fragmented */ > >>>>+ if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len + > >>>>+ m->l4_len)) { > >>>... > >>>>+ /* replace it in the packet */ > >>>>+ th->cksum = new_cksum; > >>>>+ } else { > >>>... > >>>>+ /* replace it in the packet */ > >>>>+ *rte_pktmbuf_mtod_offset(m, uint8_t *, > >>>>+ m->l2_len + m->l3_len + 16) = new_cksum.u8[0]; > >>>>+ *rte_pktmbuf_mtod_offset(m, uint8_t *, > >>>>+ m->l2_len + m->l3_len + 17) = new_cksum.u8[1]; > >>>>+ } > >>> > >>>The tcp header will always be in the mbuf, right? Otherwise, you can't > >>>update the cksum field here. What's the point of introducing the "else > >>>clause" then? > >> > >>Sorry, I don't see the problem you're pointing out here. > >> > >>What I want to solve here is to support the cases where the mbuf is > >>segmented in the middle of the network header (which is probably a rare > >>case). > > > >How it's gonna segmented? > > The mbuf is given by the application. So if the application generates a > segmented mbuf, it should work. > > This could happen for instance if the application uses mbuf clones to share > the IP/TCP/data part of the mbuf and prepend a specific Ethernet/vlan for > different destination. > > > >>In the "else" part, I only access the mbuf byte by byte using the > >>rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy > >>the header in a linear buffer, fix the checksum, then copy it again in the > >>packet, but there is no mbuf helpers to do these copies for now. > > > >In the "else" clause, the ip header is still in the mbuf, right? > >Why do you have to access it the way like: > > > > ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *, > > m->l2_len) >> 4; > > > >Why can't you just use > > > > iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len); > > iph->version_ihl ; > > AFAIK, there is no requirement that each network header has to be contiguous > in a mbuf segment. > > Of course, a split in the middle of a network header probably never > happens... but we never knows, as it is not forbidden. I think the code > should be robust enough to avoid accesses to wrong addresses. > > Hope it's clear enough :) Thanks, but not really. Maybe let me ask this way: what wrong would happen if we use iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len); to access the IP header? Is it about the endian? --yliu
[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote: > > > On 10/13/2016 10:18 AM, Yuanhan Liu wrote: > >On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote: > >>+/* When doing TSO, the IP length is not included in the pseudo header > >>+ * checksum of the packet given to the PMD, but for virtio it is > >>+ * expected. > >>+ */ > >>+static void > >>+virtio_tso_fix_cksum(struct rte_mbuf *m) > >>+{ > >>+ /* common case: header is not fragmented */ > >>+ if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len + > >>+ m->l4_len)) { > >... > >>+ /* replace it in the packet */ > >>+ th->cksum = new_cksum; > >>+ } else { > >... > >>+ /* replace it in the packet */ > >>+ *rte_pktmbuf_mtod_offset(m, uint8_t *, > >>+ m->l2_len + m->l3_len + 16) = new_cksum.u8[0]; > >>+ *rte_pktmbuf_mtod_offset(m, uint8_t *, > >>+ m->l2_len + m->l3_len + 17) = new_cksum.u8[1]; > >>+ } > > > >The tcp header will always be in the mbuf, right? Otherwise, you can't > >update the cksum field here. What's the point of introducing the "else > >clause" then? > > Sorry, I don't see the problem you're pointing out here. > > What I want to solve here is to support the cases where the mbuf is > segmented in the middle of the network header (which is probably a rare > case). How it's gonna segmented? > > In the "else" part, I only access the mbuf byte by byte using the > rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy > the header in a linear buffer, fix the checksum, then copy it again in the > packet, but there is no mbuf helpers to do these copies for now. In the "else" clause, the ip header is still in the mbuf, right? Why do you have to access it the way like: ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *, m->l2_len) >> 4; Why can't you just use iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len); iph->version_ihl ; Sorry, I'm just a bit lost. --yliu
[dpdk-dev] [PATCH v2 10/12] virtio: add Tx checksum offload support
On Mon, Oct 03, 2016 at 11:00:21AM +0200, Olivier Matz wrote: > + /* Checksum Offload */ > + switch (cookie->ol_flags & PKT_TX_L4_MASK) { > + case PKT_TX_UDP_CKSUM: > + hdr->csum_start = cookie->l2_len + cookie->l3_len; > + hdr->csum_offset = 6; > + hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; > + break; > + > + case PKT_TX_TCP_CKSUM: > + hdr->csum_start = cookie->l2_len + cookie->l3_len; > + hdr->csum_offset = 16; I would suggest to use "offsetof(...)" here, instead of some magic number like 16. --yliu
[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote: > +/* When doing TSO, the IP length is not included in the pseudo header > + * checksum of the packet given to the PMD, but for virtio it is > + * expected. > + */ > +static void > +virtio_tso_fix_cksum(struct rte_mbuf *m) > +{ > + /* common case: header is not fragmented */ > + if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len + > + m->l4_len)) { ... > + /* replace it in the packet */ > + th->cksum = new_cksum; > + } else { ... > + /* replace it in the packet */ > + *rte_pktmbuf_mtod_offset(m, uint8_t *, > + m->l2_len + m->l3_len + 16) = new_cksum.u8[0]; > + *rte_pktmbuf_mtod_offset(m, uint8_t *, > + m->l2_len + m->l3_len + 17) = new_cksum.u8[1]; > + } The tcp header will always be in the mbuf, right? Otherwise, you can't update the cksum field here. What's the point of introducing the "else clause" then? --yliu
[dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device in configure callback
On Wed, Oct 12, 2016 at 06:01:25PM +0200, Olivier MATZ wrote: > Hello Yuanhan, > > On 10/12/2016 04:41 PM, Yuanhan Liu wrote: > >On Mon, Oct 03, 2016 at 11:00:14AM +0200, Olivier Matz wrote: > >>@@ -1344,6 +1347,7 @@ virtio_dev_configure(struct rte_eth_dev *dev) > >> { > >>const struct rte_eth_rxmode *rxmode = >data->dev_conf.rxmode; > >>struct virtio_hw *hw = dev->data->dev_private; > >>+ uint64_t req_features; > >>int ret; > >> > >>PMD_INIT_LOG(DEBUG, "configure"); > >>@@ -1353,6 +1357,14 @@ virtio_dev_configure(struct rte_eth_dev *dev) > >>return -EINVAL; > >>} > >> > >>+ req_features = VIRTIO_PMD_GUEST_FEATURES; > >>+ /* if request features changed, reinit the device */ > >>+ if (req_features != hw->req_guest_features) { > >>+ ret = virtio_init_device(dev, req_features); > >>+ if (ret < 0) > >>+ return ret; > >>+ } > > > >Why do you have to reset virtio here? This doesn't make too much sense > >to me. > > > >IIUC, you want to make sure those TSO related features being unset at > >init time, and enable it (by doing reset) when it's asked to be enabled > >(by rte_eth_dev_configure)? > > > >Why not always setting those features? We could do the actual offloads > >when: > > > >- those features have been negoiated > > > >- they are enabled through rte_eth_dev_configure > > > >With that, I think we could avoid the reset here? > > It would work for TX, since you decide to use or not the feature. But I > think this won't work for RX: if you negociate LRO at init, the host may > send you large packets, even if LRO is disabled in dev_configure. I see. Thanks. Besides, I think you should return error when LRO is not negoiated after the reset (say, when it's disabled through qemu command line)? --yliu
[dpdk-dev] [dpdk-stable] [PATCH v6 1/6] vhost: fix windows vm hang
On Mon, Sep 19, 2016 at 10:00:12PM -0400, Zhihong Wang wrote: > This patch fixes a Windows VM compatibility issue in DPDK 16.07 vhost code > which causes the guest to hang once any packets are enqueued when mrg_rxbuf > is turned on by setting the right id and len in the used ring. > > As defined in virtio spec 0.95 and 1.0, in each used ring element, id means > index of start of used descriptor chain, and len means total length of the > descriptor chain which was written to. While in 16.07 code, index of the > last descriptor is assigned to id, and the length of the last descriptor is > assigned to len. > > How to test? > > 1. Start testpmd in the host with a vhost port. > > 2. Start a Windows VM image with qemu and connect to the vhost port. > > 3. Start io forwarding with tx_first in host testpmd. > > For 16.07 code, the Windows VM will hang once any packets are enqueued. > > Cc: > Signed-off-by: Zhihong Wang Applied to dpdk-next-virtio (this patch only). Thanks. --yliu
[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
On Wed, Oct 12, 2016 at 12:22:08PM +, Wang, Zhihong wrote: > > > >> > 3. How many percentage of drop are you seeing? > > > The testing result: > > > size (bytes) improvement (%) > > > 64 3.92 > > > 128 11.51 > > > 256 24.16 > > > 512 -13.79 > > > 1024-22.51 > > > 1500-12.22 > > > A correction is that performance is dropping if byte size is larger than > > > 512. > > > > I have thought of this twice. Unfortunately, I think I may need NACK this > > series. > > > > Merging two code path into one is really good: as you stated, it improves > > the maintainability. But only if we see no performance regression on both > > path after the refactor. Unfortunately, that's not the case here: it hurts > > the performance for one code path (non-mrg Rx). > > > > That makes me think we may should not do the code path merge at all. I think > > that also aligns with what you have said before (internally): we could do > > the > > merge if it gives comparable performance before and after that. > > > > Besides that, I don't quite like the way you did in patch 2 (rewrite > > enqueue): > > you made a lot of changes in one patch. That means if something wrong > > happened, > > it is hard to narrow down which change introduces that regression. Badly, > > that's exactly what we met here. Weeks have been passed, I see no progress. > > > > That's the reason we like the idea of "one patch only does one thing, an > > atomic thing". > > > Yuanhan, folks, > > Thanks for the analysis. I disagree here though. > > I analyze, develop, benchmark on x86 platforms, where this patch > works great. Yes, that's great effort! With your hardwork, we know what the bottleneck is and how it could be improved. However, you don't have to do code refactor (merge two code path to one) to apply those improvements. From what I know, in this patchset, there are two factors could improve the performance: - copy hdr together with packet data - shadow used ring update and update at once The overall performance boost I got with your v6 patchset with mrg-Rx code path is about 27% (in PVP case). And I have just applied the 1st optimization, it yields about 20% boosts. The left could be covered if we apply the 2nd optimization (I guess). That would be a clean way to optimize vhost mergeable Rx path: - you don't touch non-mrg Rx path (well, you may could apply the shadow_used_ring trick to it as wel) This would at least make sure we will have no such performance regression issue reported by ARM guys. - you don't refactor the code The rewrite from scratch could introduce other issues, besides the performance regression. We may just don't know it yet. Make sense to you? If you agree, I think we could still make it in this release: they would be some small changes after all. For example, below is the patch applies the 1st optimization tip on top of dpdk-next-virtio --yliu --- diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 8a151af..0ddb5af 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -379,7 +379,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, uint16_t end_idx, struct rte_mbuf *m, struct buf_vector *buf_vec) { - struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; + struct virtio_net_hdr_mrg_rxbuf *virtio_hdr; uint32_t vec_idx = 0; uint16_t start_idx = vq->last_used_idx; uint16_t cur_idx = start_idx; @@ -388,6 +388,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, uint32_t desc_offset, desc_avail; uint32_t cpy_len; uint16_t desc_idx, used_idx; + uint16_t num_buffers = end_idx - start_idx; + int hdr_copied = 0; if (unlikely(m == NULL)) return 0; @@ -399,16 +401,11 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr) return 0; - rte_prefetch0((void *)(uintptr_t)desc_addr); + virtio_hdr = (struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr; + rte_prefetch0(virtio_hdr); - virtio_hdr.num_buffers = end_idx - start_idx; LOG_DEBUG(VHOST_DATA, "(%d) RX: num merge buffers %d\n", - dev->vid, virtio_hdr.num_buffers); - - virtio_enqueue_offload(m, _hdr.hdr); - copy_virtio_net_hdr(dev, desc_addr, virtio_hdr); - vhost_log_write(dev, buf_vec[vec_idx].buf_addr, dev->vhost_hlen); - PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0); + dev->vid, num_buffers); desc_avail = buf_vec[vec_idx].buf_len - dev->vhost_hlen; desc_offset =
[dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device in configure callback
On Mon, Oct 03, 2016 at 11:00:14AM +0200, Olivier Matz wrote: > @@ -1344,6 +1347,7 @@ virtio_dev_configure(struct rte_eth_dev *dev) > { > const struct rte_eth_rxmode *rxmode = >data->dev_conf.rxmode; > struct virtio_hw *hw = dev->data->dev_private; > + uint64_t req_features; > int ret; > > PMD_INIT_LOG(DEBUG, "configure"); > @@ -1353,6 +1357,14 @@ virtio_dev_configure(struct rte_eth_dev *dev) > return -EINVAL; > } > > + req_features = VIRTIO_PMD_GUEST_FEATURES; > + /* if request features changed, reinit the device */ > + if (req_features != hw->req_guest_features) { > + ret = virtio_init_device(dev, req_features); > + if (ret < 0) > + return ret; > + } Why do you have to reset virtio here? This doesn't make too much sense to me. IIUC, you want to make sure those TSO related features being unset at init time, and enable it (by doing reset) when it's asked to be enabled (by rte_eth_dev_configure)? Why not always setting those features? We could do the actual offloads when: - those features have been negoiated - they are enabled through rte_eth_dev_configure With that, I think we could avoid the reset here? --yliu
[dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support
On Wed, Oct 05, 2016 at 03:27:47PM +0200, Maxime Coquelin wrote: > >/* Update offload features */ > >- if (virtio_rx_offload(rxm, hdr) < 0) { > >+ if ((features & VIRTIO_NET_F_GUEST_CSUM) && > s/VIRTIO_NET_F_GUEST_CSUM/(1u << VIRTIO_NET_F_GUEST_CSUM)/ There is a helper function for that: vtpci_with_feature. --yliu
[dpdk-dev] [PATCH v3 0/7] vhost: add dequeue zero copy support
On Sun, Oct 09, 2016 at 03:27:53PM +0800, Yuanhan Liu wrote: > This patch set enables vhost dequeue zero copy. The majority work goes > to patch 4: "vhost: add dequeue zero copy". Applied to dpdk-next-virtio. --yliu > > The basic idea of dequeue zero copy is, instead of copying data from the > desc buf, here we let the mbuf reference the desc buf addr directly. > > The major issue behind that is how and when to update the used ring. > You could check the commit log of patch 4 for more details. > > Patch 5 introduces a new flag, RTE_VHOST_USER_DEQUEUE_ZERO_COPY, to enable > dequeue zero copy, which is disabled by default. > > The performance gain is quite impressive. For a simple dequeue workload > (running rxonly in vhost-pmd and runnin txonly in guest testpmd), it yields > 50+% performance boost for packet size 1500B. For VM2VM iperf test case, > it's even better: about 70% boost. > > For small packets, the performance is worse (it's expected, as the extra > overhead introduced by zero copy outweighs the benefits from saving few > bytes copy). > > v3: - rebase: mainly for removing conflicts with the Tx indirect patch > - don't update last_used_idx twice for zero-copy mode > - handle two mssiing "Tx -> dequeue" renames in log and usage > > v2: - renamed "tx zero copy" to "dequeue zero copy", to reduce confusions. > - hnadle the case that a desc buf might across 2 host phys pages > - use MAP_POPULATE to let kernel populate the page tables > - updated release note > - doc-ed the limitations for the vm2nic case > - merge 2 continuous guest phys memory region > - and few more trivial changes, please see them in the corresponding > patches > > --- > Yuanhan Liu (7): > vhost: simplify memory regions handling > vhost: get guest/host physical address mappings > vhost: introduce last avail idx for dequeue > vhost: add dequeue zero copy > vhost: add a flag to enable dequeue zero copy > examples/vhost: add an option to enable dequeue zero copy > net/vhost: add an option to enable dequeue zero copy > > doc/guides/prog_guide/vhost_lib.rst| 35 +++- > doc/guides/rel_notes/release_16_11.rst | 13 ++ > drivers/net/vhost/rte_eth_vhost.c | 13 ++ > examples/vhost/main.c | 19 +- > lib/librte_vhost/rte_virtio_net.h | 1 + > lib/librte_vhost/socket.c | 5 + > lib/librte_vhost/vhost.c | 12 ++ > lib/librte_vhost/vhost.h | 102 --- > lib/librte_vhost/vhost_user.c | 315 > ++--- > lib/librte_vhost/virtio_net.c | 196 +--- > 10 files changed, 549 insertions(+), 162 deletions(-) > > -- > 1.9.0
[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Tue, Oct 11, 2016 at 02:57:49PM +0800, Yuanhan Liu wrote: > > > > > > There was an example: the vhost enqueue optmization patchset from > > > > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC) Though it doesn't matter now, but I have verified it yesterday (with and wihtout memset), the drop could be up to 30+%. This is to let you know that it could behaviour badly if memset is not inlined. > > > > > > on my Ivybridge server: it has no such issue on his server though. > > > > > > > > > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html > > > > > > > > > > > > --yliu > > > > > > > > > > I'd say that's weird. what's your config? any chance you > > > > > are using an old compiler? > > > > > > > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC, > > > > he said the memset is not well optimized for Ivybridge server. > > > > > > The dst is remote in that case. It's fine on Haswell but has complication > > > in Ivy Bridge which (wasn't supposed to but) causes serious frontend > > > issue. > > > > > > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1. > > > > > > So try something like this then: > > Yes, I saw memset is inlined when this diff is applied. I have another concern though: It's a trick could let gcc do the inline, I am not quite sure whether that's ture with other compilers (i.e. clang, icc, or even, older gcc). For this case, I think I still prefer some trick like *(struct ..*) = {0, } Or even, we may could introduce rte_memset(). IIRC, that has been proposed somehow before? --yliu
[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
On Thu, Sep 22, 2016 at 01:47:45PM +0800, Jianbo Liu wrote: > On 22 September 2016 at 10:29, Yuanhan Liu > wrote: > > On Wed, Sep 21, 2016 at 08:54:11PM +0800, Jianbo Liu wrote: > >> >> > My setup consists of one host running a guest. > >> >> > The guest generates as much 64bytes packets as possible using > >> >> > >> >> Have you tested with other different packet size? > >> >> My testing shows that performance is dropping when packet size is more > >> >> than 256. > >> > > >> > > >> > Hi Jianbo, > >> > > >> > Thanks for reporting this. > >> > > >> > 1. Are you running the vector frontend with mrg_rxbuf=off? > >> > > Yes, my testing is mrg_rxbuf=off, but not vector frontend PMD. > > >> > 2. Could you please specify what CPU you're running? Is it Haswell > >> > or Ivy Bridge? > >> > > It's an ARM server. > > >> > 3. How many percentage of drop are you seeing? > The testing result: > size (bytes) improvement (%) > 64 3.92 > 128 11.51 > 256 24.16 > 512 -13.79 > 1024-22.51 > 1500-12.22 > A correction is that performance is dropping if byte size is larger than 512. I have thought of this twice. Unfortunately, I think I may need NACK this series. Merging two code path into one is really good: as you stated, it improves the maintainability. But only if we see no performance regression on both path after the refactor. Unfortunately, that's not the case here: it hurts the performance for one code path (non-mrg Rx). That makes me think we may should not do the code path merge at all. I think that also aligns with what you have said before (internally): we could do the merge if it gives comparable performance before and after that. Besides that, I don't quite like the way you did in patch 2 (rewrite enqueue): you made a lot of changes in one patch. That means if something wrong happened, it is hard to narrow down which change introduces that regression. Badly, that's exactly what we met here. Weeks have been passed, I see no progress. That's the reason we like the idea of "one patch only does one thing, an atomic thing". So I will apply the first patch (it's a bug fixing patch) and ask you to refactor the rest, without the code path merge. I think we could still have a good maintainability code base if we introduce more common helper functions that can be used on both Rx path, or even on Tx path (such as update_used_ring, or shadow_used_ring). It's a bit late for too many changes for v16.11. I think you could just grab patch 6 (vhost: optimize cache access) to the old mrg-Rx code path, if that also helps the performance? Let us handle the left in next release, such as shadow used ring. Thanks. --yliu
[dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support
On Tue, Oct 11, 2016 at 02:14:10PM +0200, Olivier MATZ wrote: > Hi Yuanhan, > > On 10/11/2016 01:35 PM, Yuanhan Liu wrote: > >Hi, > > > >Firstly, apologize for so late review. It's been forgotten :( > > > >BTW, please feel free to ping me in future if I made no response > >in one or two weeks! > > > >I haven't reviewed it carefully yet (something I will do tomorrow). > >Before that, few quick questions. > > > >Firstly, would you write down some test steps? Honestly, I'm not > >quite sure how that works without the TCP/IP stack. > > Not sure I'm getting your question. > The test plan described in the cover letter works without any dpdk tcp/ip > stack. It uses testpmd, which is able to bridge packets and ask for TCP > segmentation. Oops, I thought the patch list is the end of the cover letter :( It looks like a great doc after a first glimpse. I will look at your code tomorrow. Thanks. --yliu
[dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support
Hi, Firstly, apologize for so late review. It's been forgotten :( BTW, please feel free to ping me in future if I made no response in one or two weeks! I haven't reviewed it carefully yet (something I will do tomorrow). Before that, few quick questions. Firstly, would you write down some test steps? Honestly, I'm not quite sure how that works without the TCP/IP stack. On Mon, Oct 03, 2016 at 11:00:11AM +0200, Olivier Matz wrote: > This patchset, targetted for 16.11, introduces the support of rx and tx > offload in virtio pmd. To achieve this, some new mbuf flags must be > introduced, as discussed in [1]. > > It applies on top of: > - software packet type [2] > - testpmd enhancements [3] I didn't do the search. Have the two got merged? --yliu > > The new mbuf checksum flags are backward compatible for current > applications that assume that unknown_csum = good_cum (since there > was only a bad_csum flag). But it the patchset is integrated, we > should consider updating the PMDs to match the new API for 16.11. > > [1] http://dpdk.org/ml/archives/dev/2016-May/039920.html > [2] http://dpdk.org/ml/archives/dev/2016-October/048073.html > [3] http://dpdk.org/ml/archives/dev/2016-September/046443.html > > changes v1 -> v2 > - change mbuf checksum calculation static inline > - fix checksum calculation for protocol where csum=0 means no csum > - move mbuf checksum calculation in librte_net > - use RTE_MIN() to set max rx/tx queue > - rebase on top of head > > Olivier Matz (12): > virtio: move device initialization in a function > virtio: setup and start cq in configure callback > virtio: reinitialize the device in configure callback > net: add function to calculate a checksum in a mbuf > mbuf: add new Rx checksum mbuf flags > app/testpmd: fix checksum stats in csum engine > mbuf: new flag for LRO > app/testpmd: display lro segment size > virtio: add Rx checksum offload support > virtio: add Tx checksum offload support > virtio: add Lro support > virtio: add Tso support > > app/test-pmd/csumonly.c| 8 +- > doc/guides/rel_notes/release_16_11.rst | 16 ++ > drivers/net/virtio/virtio_ethdev.c | 182 +- > drivers/net/virtio/virtio_ethdev.h | 18 +-- > drivers/net/virtio/virtio_pci.h| 4 +- > drivers/net/virtio/virtio_rxtx.c | 270 > ++--- > drivers/net/virtio/virtqueue.h | 1 + > lib/librte_mbuf/rte_mbuf.c | 18 ++- > lib/librte_mbuf/rte_mbuf.h | 58 ++- > lib/librte_net/rte_ip.h| 60 > 10 files changed, 526 insertions(+), 109 deletions(-) > > Test plan > = > > (not fully replayed on v2, but no major change) > > Platform description > > > guest (dpdk) > ++ > || > || > | port0 +-<---+ > | ixgbe / | | > | directio | | > || | > |port1 | ^ flow1 > ++ | (flow2 is the reverse) > | | > | virtio| > v | > ++ | > | tap0 / | | > |1.1.1.1 / | | > |ns-tap / | | > | / | | > |/ ixgbe2 +-->--+ > | /1.1.1.2 | > |/ ns-ixgbe | > ++ > host (linux, vhost-net) > > > flow1: > host -(ixgbe)-> guest -(virtio)-> host > 1.1.1.2 -> 1.1.1.1 > > flow2: > host -(virtio)-> guest -(ixgbe)-> host > 1.1.1.2 -> 1.1.1.1 > > Host configuration > -- > > Start qemu with: > > - a ne2k management interface to avoi any conflict with dpdk > - 2 ixgbe interfaces given to with vm through vfio > - a virtio net device, connected to a tap interface through vhost-net > > /usr/bin/qemu-system-x86_64 -k fr -daemonize --enable-kvm -m 1G -cpu host \ > -smp 3 -serial telnet::40564,server,nowait -serial null \ > -qmp tcp::44340,server,nowait -monitor telnet::49229,server,nowait \ > -device ne2k_pci,mac=de:ad:de:01:02:03,netdev=user.0,addr=03 \ > -netdev user,id=user.0,hostfwd=tcp::34965-:22 \ > -device vfio-pci,host=:04:00.0 -device vfio-pci,host=:04:00.1 \ > -netdev type=tap,id=vhostnet0,script=no,vhost=on,queues=8 \ > -device virtio-net-pci,netdev=vhostnet0,ioeventfd=on,mq=on,vectors=17 \ > -hda "/path/to/ubuntu-14.04-template.qcow2" \ > -snapshot -vga none -display none > > Move the tap interface in a netns, and configure it: > > ip netns add ns-tap > ip netns exec ns-tap ip l set lo up > ip link set tap0 netns ns-tap > ip netns exec ns-tap ip l set tap0 down > ip netns exec ns-tap ip l set addr 02:00:00:00:00:01 dev tap0 > ip netns exec ns-tap ip l set tap0 up > ip netns exec ns-tap ip a a 1.1.1.1/24 dev tap0 > ip netns exec ns-tap arp -s 1.1.1.2
[dpdk-dev] [PATCH v2] vhost: Only access header if offloading is supported in dequeue path
On Tue, Oct 11, 2016 at 09:45:27AM +0200, Maxime Coquelin wrote: > @@ -684,12 +699,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct > vring_desc *descs, > struct rte_mempool *mbuf_pool) > { > struct vring_desc *desc; > - uint64_t desc_addr; > + uint64_t desc_addr = 0; > uint32_t desc_avail, desc_offset; > uint32_t mbuf_avail, mbuf_offset; > uint32_t cpy_len; > struct rte_mbuf *cur = m, *prev = m; > - struct virtio_net_hdr *hdr; > + struct virtio_net_hdr *hdr = NULL; > /* A counter to avoid desc dead loop chain */ > uint32_t nr_desc = 1; > > @@ -698,12 +713,14 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct > vring_desc *descs, > (desc->flags & VRING_DESC_F_INDIRECT)) > return -1; > > - desc_addr = gpa_to_vva(dev, desc->addr); > - if (unlikely(!desc_addr)) > - return -1; > + if (virtio_net_with_host_offload(dev)) { > + desc_addr = gpa_to_vva(dev, desc->addr); > + if (unlikely(!desc_addr)) > + return -1; > > - hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr); > - rte_prefetch0(hdr); > + hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr); > + rte_prefetch0(hdr); > + } > > /* >* A virtio driver normally uses at least 2 desc buffers > @@ -720,18 +737,24 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct > vring_desc *descs, > if (unlikely(!desc_addr)) > return -1; > > - rte_prefetch0((void *)(uintptr_t)desc_addr); > - > desc_offset = 0; > desc_avail = desc->len; > nr_desc+= 1; > - > - PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0); > } else { > + if (!desc_addr) { > + desc_addr = gpa_to_vva(dev, desc->addr); > + if (unlikely(!desc_addr)) > + return -1; > + } > + I think this piece of code make things a bit complex. I think what you want to achieve is, besides saving hdr prefetch, to save one call to gpa_to_vva() for the non-ANY_LAYOUT case. Does that matter too much? How about just saving the hdr prefetch? if (virtio_net_with_host_offload(dev)) { hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr); rte_prefetch0(hdr); } --yliu > desc_avail = desc->len - dev->vhost_hlen; > desc_offset = dev->vhost_hlen; > } > > + rte_prefetch0((void *)(uintptr_t)(desc_addr + desc_offset)); > + > + PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), desc_avail, 0); > + > mbuf_offset = 0; > mbuf_avail = m->buf_len - RTE_PKTMBUF_HEADROOM; > while (1) { > @@ -795,7 +818,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct > vring_desc *descs, > prev->data_len = mbuf_offset; > m->pkt_len+= mbuf_offset; > > - if (hdr->flags != 0 || hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) > + if (virtio_net_with_host_offload(dev)) > vhost_dequeue_offload(hdr, m); > > return 0; > -- > 2.7.4
[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Mon, Oct 10, 2016 at 07:39:59AM +0300, Michael S. Tsirkin wrote: > > > > > > 1. why is this memset expensive? > > > > > > > > > > I don't have the exact answer, but just some rough thoughts: > > > > > > > > > > It's an external clib function: there is a call stack and the > > > > > IP register will bounch back and forth. > > > > > > > > for memset 0? gcc 5.3.1 on fedora happily inlines it. > > > > > > Good to know! > > > > > > > > overkill to use that for resetting 14 bytes structure. > > > > > > > > > > Some trick like > > > > > *(struct virtio_net_hdr *)hdr = {0, }; > > > > > > > > > > Or even > > > > > hdr->xxx = 0; > > > > > hdr->yyy = 0; > > > > > > > > > > should behaviour better. > > > > > > > > > > There was an example: the vhost enqueue optmization patchset from > > > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC) > > > > > on my Ivybridge server: it has no such issue on his server though. > > > > > > > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html > > > > > > > > > > --yliu > > > > > > > > I'd say that's weird. what's your config? any chance you > > > > are using an old compiler? > > > > > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC, > > > he said the memset is not well optimized for Ivybridge server. > > > > The dst is remote in that case. It's fine on Haswell but has complication > > in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue. > > > > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1. > > > So try something like this then: Yes, I saw memset is inlined when this diff is applied. So, mind to send a formal patch? You might want to try build at least: it doesn't build. > Generally pointer chasing in vq->hw->vtnet_hdr_size can't be good > for performance. Move fields used on data path into vq > and use from there to avoid indirections? Good suggestion! --yliu
[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Tue, Oct 11, 2016 at 08:39:54AM +0200, Maxime Coquelin wrote: > > > On 10/11/2016 08:04 AM, Yuanhan Liu wrote: > >On Mon, Oct 10, 2016 at 04:54:39PM +0200, Maxime Coquelin wrote: > >> > >> > >>On 10/10/2016 04:42 PM, Yuanhan Liu wrote: > >>>On Mon, Oct 10, 2016 at 02:40:44PM +0200, Maxime Coquelin wrote: > >>>>>>>At that time, a packet always use 2 descs. Since indirect desc is > >>>>>>>enabled (by default) now, the assumption is not true then. What's > >>>>>>>worse, it might even slow things a bit down. That should also be > >>>>>>>part of the reason why performance is slightly worse than before. > >>>>>>> > >>>>>>> --yliu > >>>>>> > >>>>>>I'm not sure I get what you are saying > >>>>>> > >>>>>>>commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98 > >>>>>>>Author: Yuanhan Liu > >>>>>>>Date: Mon May 2 17:46:17 2016 -0700 > >>>>>>> > >>>>>>> vhost: optimize dequeue for small packets > >>>>>>> > >>>>>>> A virtio driver normally uses at least 2 desc buffers for Tx: the > >>>>>>> first for storing the header, and the others for storing the data. > >>>>>>> > >>>>>>> Therefore, we could fetch the first data desc buf before the main > >>>>>>> loop, and do the copy first before the check of "are we done yet?". > >>>>>>> This could save one check for small packets that just have one data > >>>>>>> desc buffer and need one mbuf to store it. > >>>>>>> > >>>>>>> Signed-off-by: Yuanhan Liu > >>>>>>> Acked-by: Huawei Xie > >>>>>>> Tested-by: Rich Lane > >>>>>> > >>>>>>This fast-paths the 2-descriptors format but it's not active > >>>>>>for indirect descriptors. Is this what you mean? > >>>>> > >>>>>Yes. It's also not active when ANY_LAYOUT is actually turned on. > >>>>>>Should be a simple matter to apply this optimization for indirect. > >>>>> > >>>>>Might be. > >>>> > >>>>If I understand the code correctly, indirect descs also benefit from this > >>>>optimization, or am I missing something? > >>> > >>>Aha..., you are right! > >> > >>The interesting thing is that the patch I send on Thursday that removes > >>header access when no offload has been negotiated[0] seems to reduce > >>almost to zero the performance seen with indirect descriptors enabled. > > > >Didn't follow that. > > > >>I see this with 64 bytes packets using testpmd on both ends. > >> > >>When I did the patch, I would have expected the same gain with both > >>modes, whereas I measured +1% for direct and +4% for indirect. > > > >IIRC, I did a test before (remove those offload code piece), and the > >performance was basically the same before and after that. Well, there > >might be some small difference, say 1% as you said. But the result has > >never been steady. > > > >Anyway, I think your patch is good to have: I just didn't see v2. > > I waited to gather some comments/feedback before sending the v2. > I'll send it today or tomorrow. Interesting, I saw a deadlock then: I haven't looked at the code carefully once you said there is a v2, thus I'm waiting for it. However, you are waitting for my review. :) Anyway, I will take time to look at it shortly. --yliu
[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Mon, Oct 10, 2016 at 04:54:39PM +0200, Maxime Coquelin wrote: > > > On 10/10/2016 04:42 PM, Yuanhan Liu wrote: > >On Mon, Oct 10, 2016 at 02:40:44PM +0200, Maxime Coquelin wrote: > >>>>>At that time, a packet always use 2 descs. Since indirect desc is > >>>>>enabled (by default) now, the assumption is not true then. What's > >>>>>worse, it might even slow things a bit down. That should also be > >>>>>part of the reason why performance is slightly worse than before. > >>>>> > >>>>> --yliu > >>>> > >>>>I'm not sure I get what you are saying > >>>> > >>>>>commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98 > >>>>>Author: Yuanhan Liu > >>>>>Date: Mon May 2 17:46:17 2016 -0700 > >>>>> > >>>>> vhost: optimize dequeue for small packets > >>>>> > >>>>> A virtio driver normally uses at least 2 desc buffers for Tx: the > >>>>> first for storing the header, and the others for storing the data. > >>>>> > >>>>> Therefore, we could fetch the first data desc buf before the main > >>>>> loop, and do the copy first before the check of "are we done yet?". > >>>>> This could save one check for small packets that just have one data > >>>>> desc buffer and need one mbuf to store it. > >>>>> > >>>>> Signed-off-by: Yuanhan Liu > >>>>> Acked-by: Huawei Xie > >>>>> Tested-by: Rich Lane > >>>> > >>>>This fast-paths the 2-descriptors format but it's not active > >>>>for indirect descriptors. Is this what you mean? > >>> > >>>Yes. It's also not active when ANY_LAYOUT is actually turned on. > >>>>Should be a simple matter to apply this optimization for indirect. > >>> > >>>Might be. > >> > >>If I understand the code correctly, indirect descs also benefit from this > >>optimization, or am I missing something? > > > >Aha..., you are right! > > The interesting thing is that the patch I send on Thursday that removes > header access when no offload has been negotiated[0] seems to reduce > almost to zero the performance seen with indirect descriptors enabled. Didn't follow that. > I see this with 64 bytes packets using testpmd on both ends. > > When I did the patch, I would have expected the same gain with both > modes, whereas I measured +1% for direct and +4% for indirect. IIRC, I did a test before (remove those offload code piece), and the performance was basically the same before and after that. Well, there might be some small difference, say 1% as you said. But the result has never been steady. Anyway, I think your patch is good to have: I just didn't see v2. --yliu
[dpdk-dev] 17.02 Roadmap
On Mon, Oct 10, 2016 at 10:42:58PM +0200, Thomas Monjalon wrote: > > Support New Device Types in Vhost-User: Support will be added to vhost-user > > for new device types including vhost-scsi and vhost-blk. > > Great! > Is it only related to networking or should we expect some storage-related > code or drivers to come up? It's not only netowrking related. It just introduces few more APIs to export those buf infos (virt addr, phys addr, len, etc) to applications, so that they can handle/parse the data in the way they want. For example, for SPDK (https://01.org/spdk), they can use those APIs to fetch guest data and parse it following virtio-SCSI spec. The dequeue path (guest Tx) might look like something below: rte_vhost_dequeue_vring_desc_burst(vid, queue_id, iov, count) { for (i = 0; i < count; i++) { desc_idx = get_next_desc(); iov[i]->addr = desc_virt_addr(desc[desc_idx]->addr); iov[i]->phys_addr = desc_phys_addr(desc[desc_idx]->addr); iov[i]->len = desc[desc_idx]->len; iov[i]->desc = desc_idx; } return i; } rte_vhost_update_used_ring(vid, queue_id, descs, count) { for (i = 0; i < count; i++) { used_idx = get_next_used_idx(); vq->used->ring[used_idx] = descs[i]; } vq->used->idx += i; } And we introduce similar APIs to the enqueue path. --yliu
[dpdk-dev] [PATCH V2 2/2] virtio: support IOMMU platform
On Fri, Oct 07, 2016 at 07:24:44AM +0300, Michael S. Tsirkin wrote: > On Wed, Sep 28, 2016 at 04:25:12PM +0800, Jason Wang wrote: > > Negotiate VIRTIO_F_IOMMU_PLATFORM to have IOMMU support. > > > > Signed-off-by: Jason Wang > > --- > > Changes from v1: > > - remove unnecessary NEED_MAPPING flag > > One thing we probably should do is enable this flag > with VFIO but not with UIO or VFIO-noiommu. Good suggestion. I think we could do that in another patch. > > --- > > drivers/net/virtio/virtio_ethdev.h | 3 ++- > > drivers/net/virtio/virtio_pci.h| 3 ++- > > 2 files changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/virtio/virtio_ethdev.h > > b/drivers/net/virtio/virtio_ethdev.h > > index 2ecec6e..04a06e2 100644 > > --- a/drivers/net/virtio/virtio_ethdev.h > > +++ b/drivers/net/virtio/virtio_ethdev.h > > @@ -63,7 +63,8 @@ > > 1u << VIRTIO_NET_F_CTRL_RX | \ > > 1u << VIRTIO_NET_F_CTRL_VLAN | \ > > 1u << VIRTIO_NET_F_MRG_RXBUF | \ > > -1ULL << VIRTIO_F_VERSION_1) > > +1ULL << VIRTIO_F_VERSION_1 | \ > > +1ULL << VIRTIO_F_IOMMU_PLATFORM ) > > Space before ) looks kind of ugly. Yes, a bit. I will remove it while apply. > > #define VIRTIO_TRANSPORT_F_START 28 > > -#define VIRTIO_TRANSPORT_F_END 32 > > +#define VIRTIO_TRANSPORT_F_END 34 > > > > This seems unused. Drop it? Indeed. I will submit a patch to remove both (_START and _END). --yliu