from:"Yuanhan Liu"

[dpdk-dev] [PATCH 3/3] maintainers: add stable mailing list

2016-12-01 Thread Yuanhan Liu

Signed-off-by: Yuanhan Liu 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3df1754..076e86c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -35,6 +35,7 @@ F: scripts/test-build.sh
 Stable Branches
 ---
 T: git://dpdk.org/dpdk-stable
+M: stable at dpdk.org

 Security Issues
 ---
-- 
1.9.0

[dpdk-dev] [PATCH 2/3] maintainers: update virtio section name

2016-12-01 Thread Yuanhan Liu

Signed-off-by: Yuanhan Liu 
---

hmm.., maybe we could seperate lib vhost and virtio pmd, into two
different sections?
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index cc1ab68..3df1754 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -364,7 +364,7 @@ M: Sony Chacko 
 F: drivers/net/qede/
 F: doc/guides/nics/qede.rst

-RedHat virtio
+Virtio PMD and vhost lib
 M: Yuanhan Liu 
 T: git://dpdk.org/next/dpdk-next-virtio
 F: drivers/net/virtio/
-- 
1.9.0

[dpdk-dev] [PATCH 1/3] maintainers: update virtio maintainer

2016-12-01 Thread Yuanhan Liu

Huawei has left DPDK team for months, and he hasn't showed up since
then. Remove him.

Cc: Huawei Xie 
Signed-off-by: Yuanhan Liu 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 26d9590..cc1ab68 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -365,7 +365,6 @@ F: drivers/net/qede/
 F: doc/guides/nics/qede.rst

 RedHat virtio
-M: Huawei Xie 
 M: Yuanhan Liu 
 T: git://dpdk.org/next/dpdk-next-virtio
 F: drivers/net/virtio/
-- 
1.9.0

[dpdk-dev] [PATCH 0/3] maintainers: minor updates to virtio and stable

2016-12-01 Thread Yuanhan Liu

---
Yuanhan Liu (3):
  maintainers: update virtio maintainer
  maintainers: update virtio section name
  maintainers: add stable mailing list

 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
1.9.0

[dpdk-dev] [dpdk-announce] DPDK 16.07.2 released

2016-11-30 Thread Yuanhan Liu

 (1):
  net/bnxt: fix crash when closing

Beilei Xing (1):
  net/i40e: fix floating VEB

Bernard Iremonger (1):
  app/testpmd: fix DCB configuration

David Marchand (1):
  ethdev: fix vendor id in debug message

E. Scott Daniels (1):
  ethdev: prevent duplicate event callback

Eric Kinzie (1):
  net/bonding: validate speed after link up

Ferruh Yigit (4):
  net/ring: fix ring device creation via devargs
  net/bnx2x: fix build with icc
  kni: fix build with kernel 4.8
  kni: fix build with kernel 4.9

Gowrishankar Muthukrishnan (1):
  examples/ip_pipeline: fix plugin loading

Igor Ryzhov (1):
  pci: fix probing error if no driver found

Jasvinder Singh (1):
  examples/qos_sched: fix dequeue from ring

Jeff Guo (1):
  net/i40e: fix hash filter on X722

Jianbo Liu (2):
  eal/arm: fix file descriptor leak when getting CPU features
  eal/ppc: fix file descriptor leak when getting CPU features

Jingjing Wu (2):
  net/i40e: fix DCB configuration
  doc: add limitations for i40e PMD

John Daley (5):
  net/enic: fix flow director
  net/enic: fix crash with removed flow director filters
  net/enic: fix multi-queue Rx performance
  net/enic: fix crash on MTU update or Rx queue reconfigure
  net/enic: fix max packet length check

John W. Linville (4):
  net/ena: improve safety of string handling
  net/bnxt: ensure entry length is unsigned
  net/i40e: do not use VSI before NULL check
  net/bnxt: fix bit shift size

Kamil Rytarowski (1):
  net/thunderx: fix Tx checksum handling

Michael Qiu (2):
  examples/tep_term: fix L4 length
  examples/tep_term: fix packet length with multi-segments

Mohammad Abdul Awal (1):
  app/testpmd: fix RSS hash key size

Nelio Laranjeiro (1):
  net/mlx5: fix Rx checksum macros

Nelson Escobar (3):
  net/enic: revert truncated packets counter fix
  net/enic: document how to configure vNIC parameters
  net/enic: fix Rx queue index when not using Rx scatter

Nipun Gupta (1):
  mempool: fix leak if populate fails

N?lio Laranjeiro (7):
  net/mlx5: fix Rx function selection
  net/mlx5: fix hash key size retrieval
  net/mlx5: support Mellanox OFED 3.4
  net/mlx5: re-factorize functions
  net/mlx5: fix inline logic
  net/mlx5: fix link speed capability information
  net/mlx5: fix support for newer link speeds

Olga Shern (1):
  net/mlx5: fix link status report

Olivier Gournet (1):
  net/mlx5: fix initialization in secondary process

Pablo de Lara (3):
  app/test: fix hash multiwriter sequence
  hash: fix unlimited cuckoo path
  hash: fix bucket size usage

Piotr Azarewicz (1):
  examples/l2fwd-crypto: fix verify with decrypt in chain

Qi Zhang (4):
  net/ixgbe: fix out of order Rx read
  net/fm10k: fix out of order Rx read
  net/i40e: fix Rx hang when disable LLDP
  net/i40e: fix out of order Rx read

Qiming Yang (3):
  net/i40e: fix link status change interrupt
  net/i40e: fix VF bonded device link down
  net/i40e: fixed build error with icc

Rasesh Mody (3):
  net/bnx2x: fix maximum PF queues
  net/bnx2x: fix socket id for slowpath memory
  net/qede/base: fix 32-bit build

Raslan Darawsheh (2):
  net/mlx5: fix removing VLAN filter
  net/mlx5: fix handling of small mbuf sizes

Reshma Pattan (2):
  pdump: fix created directory permissions
  app/procinfo: free xstats memory upon failure

Sagi Grimberg (1):
  net/mlx5: fix possible NULL dereference in Rx path

Sergio Gonzalez Monroy (1):
  examples/ipsec-secgw: check SP only when setup

Wei Dai (3):
  lpm: fix freeing unused sub-table on rule delete
  mempool: fix search of maximum contiguous pages
  lpm: fix freeing memory

Wenzhuo Lu (6):
  app/testpmd: fix DCB configuration
  app/testpmd: fix PF/VF check of flow director
  net/ixgbe: fix flow director mask
  app/testpmd: fix flow director mask
  app/testpmd: fix flow director endianness
  net/ixgbe: fix VF registers

Xiao Wang (2):
  net/fm10k: fix Rx checksum flags
  net/fm10k: fix VF Tx queue initialization

Yaacov Hazan (3):
  net/mlx5: fix inconsistent return value in flow director
  net/mlx5: refactor allocation of flow director queues
  net/mlx5: fix flow director drop mode

Yong Wang (1):
  net/vmxnet3: fix mbuf release on reset/stop

Yuanhan Liu (2):
  net/virtio: revert fix restart
  version: 16.07.2

Zhihong Wang (1):
  vhost: fix Windows VM hang

[dpdk-dev] [PATCH v2] virtio: tx with can_push when VERSION_1 is set

2016-11-30 Thread Yuanhan Liu

On Wed, Nov 30, 2016 at 09:18:42AM +, Pierre Pfister (ppfister) wrote:
> Current virtio driver advertises VERSION_1 support,
> but does not handle device's VERSION_1 support when
> sending packets (it looks for ANY_LAYOUT feature,
> which is absent).
> 
> This patch enables 'can_push' in tx path when VERSION_1
> is advertised by the device.
> 
> This significantly improves small packets forwarding rate
> towards devices advertising VERSION_1 feature.
> 
> Signed-off-by: Pierre Pfister 

>From the virtio spec point of view, that VERSION_1 implies ANY_LAYOUT,
this patch is right and I think we could apply this patch. No objections?

--yliu
> ---
>  drivers/net/virtio/virtio_rxtx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio/virtio_rxtx.c 
> b/drivers/net/virtio/virtio_rxtx.c
> index 22d97a4..1e5a6b9 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -1015,7 +1015,8 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
> **tx_pkts, uint16_t nb_pkts)
>   }
>  
>   /* optimize ring usage */
> - if (vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) &&
> + if ((vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) ||
> +   vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) &&
>   rte_mbuf_refcnt_read(txm) == 1 &&
>   RTE_MBUF_DIRECT(txm) &&
>   txm->nb_segs == 1 &&
> -- 
> 2.7.4 (Apple Git-66)

[dpdk-dev] [PATCH] doc: introduce PVP reference benchmark

2016-11-29 Thread Yuanhan Liu

On Thu, Nov 24, 2016 at 08:35:51AM +0100, Maxime Coquelin wrote:
> 
> 
> On 11/24/2016 06:07 AM, Yuanhan Liu wrote:
> >First of all, thanks for the doc! It's a great one.
> Thanks.
> I would be interested to know if you have other tuning I don't mention
> in this doc.

I was thinking we may need doc some performance impacts by some features,
say we observed that indirect desc may be good for some cases, while may
be bad for others. Also, the non mergeable Rx path outweighs the mergeable
Rx path. If user cares about the perfomance and ascertains all packets
fits into a typical MTU, he may likely want to disable the mergeable
feature, which is enabled by default.

Maybe we could start a new doc, or maybe we could add a new section here?

--yliu

[dpdk-dev] [PATCH] tools: add tags and cscope index file generation support

2016-11-28 Thread Yuanhan Liu

On Sun, Nov 27, 2016 at 05:42:42AM +0530, Jerin Jacob wrote:
> This script generates cscope, gtags, and tags
> index files based on EAL environment.
> (architecture and OS(linux/bsd))
> 
> Selection of the architecture and OS environment
> is based on dpdk configuration target(T=)
> 
> example usage:
> make tags T=x86_64-native-linuxapp-gcc
> make cscope T=x86_64-native-linuxapp-gcc
> make gtags T=x86_64-native-linuxapp-gcc
> 
> Signed-off-by: Jerin Jacob 

It's handy. Thanks!

Reviewed-by: Yuanhan Liu 

--yliu

[dpdk-dev] dpdk/vpp and cross-version migration for vhost

2016-11-24 Thread Yuanhan Liu

On Thu, Nov 24, 2016 at 09:30:49AM +, Kevin Traynor wrote:
> On 11/24/2016 06:31 AM, Yuanhan Liu wrote:
> > On Tue, Nov 22, 2016 at 04:53:05PM +0200, Michael S. Tsirkin wrote:
> >>>> You keep assuming that you have the VM started first and
> >>>> figure out things afterwards, but this does not work.
> >>>>
> >>>> Think about a cluster of machines. You want to start a VM in
> >>>> a way that will ensure compatibility with all hosts
> >>>> in a cluster.
> >>>
> >>> I see. I was more considering about the case when the dst
> >>> host (including the qemu and dpdk combo) is given, and
> >>> then determine whether it will be a successfull migration
> >>> or not.
> >>>
> >>> And you are asking that we need to know which host could
> >>> be a good candidate before starting the migration. In such
> >>> case, we indeed need some inputs from both the qemu and
> >>> vhost-user backend.
> >>>
> >>> For DPDK, I think it could be simple, just as you said, it
> >>> could be either a tiny script, or even a macro defined in
> >>> the source code file (we extend it every time we add a
> >>> new feature) to let the libvirt to read it. Or something
> >>> else.
> >>
> >> There's the issue of APIs that tweak features as Maxime
> >> suggested.
> > 
> > Yes, it's a good point.
> > 
> >> Maybe the only thing to do is to deprecate it,
> > 
> > Looks like so.
> > 
> >> but I feel some way for application to pass info into
> >> guest might be benefitial.
> > 
> > The two APIs are just for tweaking feature bits DPDK supports before
> > any device got connected. It's another way to disable some features
> > (the another obvious way is to through QEMU command lines).
> > 
> > IMO, it's bit handy only in a case like: we have bunch of VMs. Instead
> > of disabling something though qemu one by one, we could disable it
> > once in DPDK.
> > 
> > But I doubt the useful of it. It's only used in DPDK's vhost example
> > after all. Nor is it used in vhost pmd, neither is it used in OVS.
> 
> rte_vhost_feature_disable() is currently used in OVS, lib/netdev-dpdk.c

Hmmm. I must have checked very old code ...
> 
> netdev_dpdk_vhost_class_init(void)
> {
> static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> 
> /* This function can be called for different classes.  The
> initialization
>  * needs to be done only once */
> if (ovsthread_once_start()) {
> rte_vhost_driver_callback_register(_net_device_ops);
> rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4
>   | 1ULL << VIRTIO_NET_F_HOST_TSO6
>   | 1ULL << VIRTIO_NET_F_CSUM);

I saw the commit introduced such change, but it tells no reason why
it was added.

commit 362ca39639ae871806be5ae97d55e1cbb14afd92
Author: mweglicx 
Date:   Thu Apr 14 17:40:06 2016 +0100

Update relevant artifacts to add support for DPDK 16.04.

Following changes are applied:
 - INSTALL.DPDK.md: CONFIG_RTE_BUILD_COMBINE_LIBS step has been
   removed because it is no longer present in DPDK configuration
   (combined library is created by default),
 - INSTALL.DPDK.md: VHost Cuse configuration is updated,
 - netdev-dpdk.c: Link speed definition is changed in DPDK and
   netdev_dpdk_get_features is updated accordingly,
 - netdev-dpdk.c: TSO and checksum offload has been disabled for
   vhostuser device.
 - .travis/linux-build.sh: DPDK version is updated and legacy
   flags have been removed in configuration.

Signed-off-by: Michal Weglicki 
Signed-off-by: Panu Matilainen 
Acked-by: Daniele Di Proietto 

--yliu

[dpdk-dev] dpdk/vpp and cross-version migration for vhost

2016-11-24 Thread Yuanhan Liu

On Tue, Nov 22, 2016 at 04:53:05PM +0200, Michael S. Tsirkin wrote:
> > > You keep assuming that you have the VM started first and
> > > figure out things afterwards, but this does not work.
> > > 
> > > Think about a cluster of machines. You want to start a VM in
> > > a way that will ensure compatibility with all hosts
> > > in a cluster.
> > 
> > I see. I was more considering about the case when the dst
> > host (including the qemu and dpdk combo) is given, and
> > then determine whether it will be a successfull migration
> > or not.
> > 
> > And you are asking that we need to know which host could
> > be a good candidate before starting the migration. In such
> > case, we indeed need some inputs from both the qemu and
> > vhost-user backend.
> > 
> > For DPDK, I think it could be simple, just as you said, it
> > could be either a tiny script, or even a macro defined in
> > the source code file (we extend it every time we add a
> > new feature) to let the libvirt to read it. Or something
> > else.
> 
> There's the issue of APIs that tweak features as Maxime
> suggested.

Yes, it's a good point.

> Maybe the only thing to do is to deprecate it,

Looks like so.

> but I feel some way for application to pass info into
> guest might be benefitial.

The two APIs are just for tweaking feature bits DPDK supports before
any device got connected. It's another way to disable some features
(the another obvious way is to through QEMU command lines).

IMO, it's bit handy only in a case like: we have bunch of VMs. Instead
of disabling something though qemu one by one, we could disable it
once in DPDK.

But I doubt the useful of it. It's only used in DPDK's vhost example
after all. Nor is it used in vhost pmd, neither is it used in OVS.

> > > If you don't, guest visible interface will change
> > > and you won't be able to migrate.
> > > 
> > > It does not make sense to discuss feature bits specifically
> > > since that is not the only part of interface.
> > > For example, max ring size supported might change.
> > 
> > I don't quite understand why we have to consider the max ring
> > size here? Isn't it a virtio device attribute, that QEMU could
> > provide such compatibility information?
> >
> > I mean, DPDK is supposed to support vary vring size, it's QEMU
> > to give a specifc value.
> 
> If backend supports s/g of any size up to 2^16, there's no issue.

I don't know others, but I see no issues in DPDK.

> ATM some backends might be assuming up to 1K s/g since
> QEMU never supported bigger ones. We might classify this
> as a bug, or not and add a feature flag.
> 
> But it's just an example. There might be more values at issue
> in the future.

Yeah, maybe. But we could analysis it one by one.

> > > Let me describe how it works in qemu/libvirt.
> > > When you install a VM, you can specify compatibility
> > > level (aka "machine type"), and you can query the supported compatibility
> > > levels. Management uses that to find the supported compatibility
> > > and stores the compatibility in XML that is migrated with the VM.
> > > There's also a way to find the latest level which is the
> > > default unless overridden by user, again this level
> > > is recorded and then
> > > - management can make sure migration destination is compatible
> > > - management can avoid migration to hosts without that support
> > 
> > Thanks for the info, it helps.
> > 
> > ...
> > > > > >>As version here is an opaque string for libvirt and qemu,
> > > > > >>anything can be used - but I suggest either a list
> > > > > >>of values defining the interface, e.g.
> > > > > >>any_layout=on,max_ring=256
> > > > > >>or a version including the name and vendor of the backend,
> > > > > >>e.g. "org.dpdk.v4.5.6".
> > 
> > The version scheme may not be ideal here. Assume a QEMU is supposed
> > to work with a specific DPDK version, however, user may disable some
> > newer features through qemu command line, that it also could work with
> > an elder DPDK version. Using the version scheme will not allow us doing
> > such migration to an elder DPDK version. The MTU is a lively example
> > here? (when MTU feature is provided by QEMU but is actually disabled
> > by user, that it could also work with an elder DPDK without MTU support).
> > 
> > --yliu
> 
> OK, so does a list of values look better to you then?

Yes, if there are no better way.

And I think it may be better to not list all those features, literally.
But instead, using the number should be better, say, features=0xdeadbeef.

Listing the feature names means we have to come to an agreement in all
components involved here (QEMU, libvirt, DPDK, VPP, and maybe more
backends), that we have to use the exact same feature names. Though it
may not be a big deal, it lacks some flexibility.

A feature bits will not have this issue.

--yliu

> 
> 
> > > > > >>
> > > > > >>Note that typically the list of supported versions can only be
> > > > > >>extended, not shrunk. Also, if the host/guest

[dpdk-dev] Proposal for a new Committer model

2016-11-24 Thread Yuanhan Liu

On Wed, Nov 23, 2016 at 03:19:19PM -0500, Neil Horman wrote:
> On Wed, Nov 23, 2016 at 11:41:20PM +0800, Yuanhan Liu wrote:
> > On Wed, Nov 23, 2016 at 09:11:54AM -0500, Neil Horman wrote:
> > > > Could we define some of the potential subtrees now and look to 
> > > > introduce them in the this release cycle? EAL and the Core libs, as 
> > > > suggested by Thomas, seem like 2 obvious ones.
> > > > 
> > > Sure, I'd suggest the following:
> > 
> > I would pull the git history to see which components are in
> > active status in last release (or even, in last few release).
> > And try to make a sub-tree if corresponding component is hot.
> > 
> > # the 2nd volume shows how many patches prefixed with a related component
> > [yliu at yliu-dev ~/dpdk]$ git log --oneline v16.07..v16.11 | awk '{print 
> > $2}' | \
> > sort | uniq -c  | sort -nr | head -30 | nl
> >  1   52 doc:
> >  2   40 net/ixgbe/base:
> >  3   38 app/test:
> >  4   37 kni:
> >  5   27 vhost:
> >  6   27 net/virtio:
> >  7   27 net/mlx5:
> >  8   26 app/testpmd:
> >  9   25 net/i40e:
> > 10   23 net/pcap:
> > 11   22 net/bnxt:
> > 12   20 net/enic:
> > 13   18 net/qede:
> > 14   17 net/thunderx:
> > 15   16 net/qede/base:
> > 16   16 eal:
> > 17   15 net/ixgbe:
> > 18   14 net:
> > 19   14 crypto/qat:
> > 20   13 scripts:
> > 21   13 net/bnx2x:
> > 22   12 net/i40e/base:
> > 23   12 examples/ipsec-secgw:
> > 24   11 mbuf:
> > 25   11 hash:
> > 26   10 lib:
> > 27   10 examples/ip_pipeline:
> > 28   10 ethdev:
> > 299 pci:
> > 307 net/vmxnet3:
> > ...
> > 463 pdump:
> > 473 net/virtio_user:
> > 483 net/ring:
> > 493 net/nfp:
> > 503 net/mlx:
> > 513 net/ena:
> > 523 net/e1000:
> > 533 net/bonding:
> > ...
> > 562 sched:
> > 572 port:
> > ...
> > 651 timer:
> > 661 remove
> > 671 pmdinfogen:
> > 681 net/igb:
> > 691 net/enic/base:
> > 701 meter:
> > ...
> > 841 cfgfile:
> > 851 app/procinfo:
> > 861 app/proc_info:
> > 871 acl:
> > 
> > Something obvious is that:
> > 
> > - "doc" deserves a sub-tree, and John is a perfect committer for that
> >   if he's willing to.
> > 
> > - generally, I'd agree with Neil that most (if not all) pmds may need
> >   a sub-tree. While, some others may not, for example, net/ring, net/pcap.
> > 
> No, thats the opposite of what I think.  I think all net pmds should flow
> through a single subtree, all crypto pmds through another, etc.

I misunderstood it. I was think you were suggesting to create a sub-tree
for most (or all) pmds. Some of my comments didn't apply then.

But yes, we have already done that: we have next-net and next-crypto.

> >   For those non-active pmds, I think it's okay to let the generic
> >   pmd committer to cover them.
> > 
> Not sure what you're getting at here.  Low volume pms (or any library) can 
> still
> go through a subtree.  The goal is to fragmet the commit work so one person
> doesn't have to do it all.
> 
> > - it's not that wise to me to list all the components we have so far
> >   and make a sub-tree for each of them.
> > 
> I think you misunderstood the organization of my last note.  I agree with you
> here.  When I listed the core and listed several libraries under it, my intent
> was to create a core subtree that accepted patches for all of those libraries.
> 
> >   For example, some components like librte_{port, pdump, cfgfile, acl,
> >   and etc} just have few (or even, just one) commits in last release.
> >   It makes no sense to me to introduce a tree for each of them.
> > 
> Yes, this is what I was saying in my last note.
> 
> > Another thought is we could also create sub-trees based on category
> > but not on components like Neil suggested, especially that EAL looks
> > way too big to be maintained in one tree. Instead, it could be something
> > like:
> > 
> > - a tree for BSD
> > 
> This gets tricky, because then seve

[dpdk-dev] [PATCH] doc: introduce PVP reference benchmark

2016-11-24 Thread Yuanhan Liu

First of all, thanks for the doc! It's a great one.

On Wed, Nov 23, 2016 at 10:00:06PM +0100, Maxime Coquelin wrote:
> +Qemu build
> +~~
> +
> +   .. code-block:: console
> +
> +git clone git://dpdk.org/dpdk
> +cd dpdk
> +export RTE_SDK=$PWD
> +make install T=x86_64-native-linuxapp-gcc DESTDIR=install

It's actually DPDK build.

I will take a closer look at it and also render it to see how it looks
like when I get back to office next week.

--yliu
> +
> +DPDK build
> +~~
> +
> +   .. code-block:: console
> +
> +git clone git://dpdk.org/dpdk
> +cd dpdk
> +export RTE_SDK=$PWD
> +make install T=x86_64-native-linuxapp-gcc DESTDIR=install
> +

[dpdk-dev] Proposal for a new Committer model

2016-11-23 Thread Yuanhan Liu

On Wed, Nov 23, 2016 at 09:11:54AM -0500, Neil Horman wrote:
> > Could we define some of the potential subtrees now and look to introduce 
> > them in the this release cycle? EAL and the Core libs, as suggested by 
> > Thomas, seem like 2 obvious ones.
> > 
> Sure, I'd suggest the following:

I would pull the git history to see which components are in
active status in last release (or even, in last few release).
And try to make a sub-tree if corresponding component is hot.

# the 2nd volume shows how many patches prefixed with a related component
[yliu at yliu-dev ~/dpdk]$ git log --oneline v16.07..v16.11 | awk '{print $2}' 
| \
sort | uniq -c  | sort -nr | head -30 | nl
 1   52 doc:
 2   40 net/ixgbe/base:
 3   38 app/test:
 4   37 kni:
 5   27 vhost:
 6   27 net/virtio:
 7   27 net/mlx5:
 8   26 app/testpmd:
 9   25 net/i40e:
10   23 net/pcap:
11   22 net/bnxt:
12   20 net/enic:
13   18 net/qede:
14   17 net/thunderx:
15   16 net/qede/base:
16   16 eal:
17   15 net/ixgbe:
18   14 net:
19   14 crypto/qat:
20   13 scripts:
21   13 net/bnx2x:
22   12 net/i40e/base:
23   12 examples/ipsec-secgw:
24   11 mbuf:
25   11 hash:
26   10 lib:
27   10 examples/ip_pipeline:
28   10 ethdev:
299 pci:
307 net/vmxnet3:
...
463 pdump:
473 net/virtio_user:
483 net/ring:
493 net/nfp:
503 net/mlx:
513 net/ena:
523 net/e1000:
533 net/bonding:
...
562 sched:
572 port:
...
651 timer:
661 remove
671 pmdinfogen:
681 net/igb:
691 net/enic/base:
701 meter:
...
841 cfgfile:
851 app/procinfo:
861 app/proc_info:
871 acl:

Something obvious is that:

- "doc" deserves a sub-tree, and John is a perfect committer for that
  if he's willing to.

- generally, I'd agree with Neil that most (if not all) pmds may need
  a sub-tree. While, some others may not, for example, net/ring, net/pcap.

  For those non-active pmds, I think it's okay to let the generic
  pmd committer to cover them.

- it's not that wise to me to list all the components we have so far
  and make a sub-tree for each of them.

  For example, some components like librte_{port, pdump, cfgfile, acl,
  and etc} just have few (or even, just one) commits in last release.
  It makes no sense to me to introduce a tree for each of them.

Another thought is we could also create sub-trees based on category
but not on components like Neil suggested, especially that EAL looks
way too big to be maintained in one tree. Instead, it could be something
like:

- a tree for BSD

- a tree for ARM (and some other trees for other platforms)

- a tree for mem related (mempool, mbuf, hugepage, etc)

- a tree for BUS

- ...


Last but not the least, I think it's general good to have more and
more trees in the end. But I don't think it's a good idea to go
radically and create all those trees once (say in one release).

Something I would like to suggest is one or two (or a bit more) at
a release. For example, if I remember them well, we have next-net
tree at 16.04, and next-virtio (including vhost) at 16.07, and a
recent one, next-crypto at 16.11.

--yliu


>   * net-pmds:
>   - all network pmds located under drivers/net
>   - librte_net
>   - libtre_ether
>   - librte_ip_frag
>   - librte_pdump
>   - librte_port
>   * crypto-pmds:
>   - all crypto pmds located under drivers/crypto
>   - librte_cryptodev
>   * eal:
>   - librte_eal
>   * core:
>   - librte_cfgfile
>   - librte_cmdline
>   - librte_compat
>   - librte_kvargs
>   - librte_kni
>   - librte_compat
>   * misc:
>   - librte_acl
>   - librte_distributor
>   - librte_hash
>   - librte_jobstats
>   - librte_lpm
>   - librte_meter
>   - librte_pipeline
>   - librte_power
>   - librte_reorder
>   - librte_ring
>   - librte_sched
>   - librte_table
>   - librte_timer
>   - librte_vhost
> 
> Thats just a rough stab mind, but perhaps it would get the ball rolling.  I'd 
> be
> willing to take maintainership of one of these subtrees if there is consensus
> around my doing so.

[dpdk-dev] dpdk/vpp and cross-version migration for vhost

2016-11-22 Thread Yuanhan Liu

On Thu, Nov 17, 2016 at 07:37:09PM +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 17, 2016 at 05:49:36PM +0800, Yuanhan Liu wrote:
> > On Thu, Nov 17, 2016 at 09:47:09AM +0100, Maxime Coquelin wrote:
> > > 
> > > 
> > > On 11/17/2016 09:29 AM, Yuanhan Liu wrote:
> > > >As usaual, sorry for late response :/
> > > >
> > > >On Thu, Oct 13, 2016 at 08:50:52PM +0300, Michael S. Tsirkin wrote:
> > > >>Hi!
> > > >>So it looks like we face a problem with cross-version
> > > >>migration when using vhost. It's not new but became more
> > > >>acute with the advent of vhost user.
> > > >>
> > > >>For users to be able to migrate between different versions
> > > >>of the hypervisor the interface exposed to guests
> > > >>by hypervisor must stay unchanged.
> > > >>
> > > >>The problem is that a qemu device is connected
> > > >>to a backend in another process, so the interface
> > > >>exposed to guests depends on the capabilities of that
> > > >>process.
> > > >>
> > > >>Specifically, for vhost user interface based on virtio, this includes
> > > >>the "host features" bitmap that defines the interface, as well as more
> > > >>host values such as the max ring size.  Adding new features/changing
> > > >>values to this interface is required to make progress, but on the other
> > > >>hand we need ability to get the old host features to be compatible.
> > > >
> > > >It looks like to the same issue of vhost-user reconnect to me. For 
> > > >example,
> > > >
> > > >- start dpdk 16.07 & qemu 2.5
> > > >- kill dpdk
> > > >- start dpdk 16.11
> > > >
> > > >Though DPDK 16.11 has more features comparing to dpdk 16.07 (say, 
> > > >indirect),
> > > >above should work. Because qemu saves the negotiated features before the
> > > >disconnect and stores it back after the reconnection.
> > > >
> > > >commit a463215b087c41d7ca94e51aa347cde523831873
> > > >Author: Marc-Andr? Lureau 
> > > >Date:   Mon Jun 6 18:45:05 2016 +0200
> > > >
> > > >vhost-net: save & restore vhost-user acked features
> > > >
> > > >The initial vhost-user connection sets the features to be 
> > > > negotiated
> > > >with the driver. Renegotiation isn't possible without device 
> > > > reset.
> > > >
> > > >To handle reconnection of vhost-user backend, ensure the same 
> > > > set of
> > > >features are provided, and reuse already acked features.
> > > >
> > > >Signed-off-by: Marc-Andr? Lureau 
> > > >
> > > >
> > > >So we could do similar to vhost-user? I mean, save the acked features
> > > >before migration and store it back after it. This should be able to
> > > >keep the compatibility. If user downgrades DPDK version, it also could
> > > >be easily detected, and then exit with an error to user: migration
> > > >failed due to un-compatible vhost features.
> > > >
> > > >Just some rough thoughts. Makes tiny sense?
> > > 
> > > My understanding is that the management tool has to know whether
> > > versions are compatible before initiating the migration:
> > 
> > Makes sense. How about getting and restoring the acked features through
> > qemu command lines then, say, through the monitor interface?
> > 
> > With that, it would be something like:
> > 
> > - start vhost-user backend (DPDK, VPP, or whatever) & qemu in the src host
> > 
> > - read the acked features (through monitor interface)
> > 
> > - start vhost-user backend in the dst host
> > 
> > - start qemu in the dst host with the just queried acked features
> > 
> >   QEMU then is expected to use this feature set for the later vhost-user
> >   feature negotitation. Exit if features compatibility is broken.
> > 
> > Thoughts?
> > 
> > --yliu
> 
> 
> You keep assuming that you have the VM started first and
> figure out things afterwards, but this does not work.
> 
> Think about a cluster of machines. You want to start a VM in
> a way that will ensure compatibility with all hosts
> in a cluster.

I see. I was more considering about the case when the dst
host (includ

[dpdk-dev] 16.07.2 stable patches review and test

2016-11-21 Thread Yuanhan Liu

Hi all,

Here is a list of patches targeted for 16.07.2 release. Please help
reviewing and testing. The planned date for the final release is 30th,
Nov. Before that, please shout if anyone has objections with these
patches being applied, or if I missed some important fixes.

These patches are located at branch 16.07 of dpdk-stable repo:
http://dpdk.org/browse/dpdk-stable/

Please also be noted that this would be the last stable release for 16.07.

Thanks.

--yliu

---
Adrien Mazarguil (1):
  net/mlx5: fix Rx VLAN offload capability report

Ajit Khaparde (1):
  net/bnxt: fix crash when closing

Beilei Xing (1):
  net/i40e: fix floating VEB

Bernard Iremonger (1):
  app/testpmd: fix DCB configuration

David Marchand (1):
  ethdev: fix vendor id in debug message

E. Scott Daniels (1):
  ethdev: prevent duplicate event callback

Eric Kinzie (1):
  net/bonding: validate speed after link up

Ferruh Yigit (4):
  net/ring: fix ring device creation via devargs
  net/bnx2x: fix build with icc
  kni: fix build with kernel 4.8
  kni: fix build with kernel 4.9

Gowrishankar Muthukrishnan (1):
  examples/ip_pipeline: fix plugin loading

Igor Ryzhov (1):
  pci: fix probing error if no driver found

Jasvinder Singh (1):
  examples/qos_sched: fix dequeue from ring

Jeff Guo (1):
  net/i40e: fix hash filter on X722

Jianbo Liu (2):
  eal/arm: fix file descriptor leak when getting CPU features
  eal/ppc: fix file descriptor leak when getting CPU features

Jingjing Wu (2):
  net/i40e: fix DCB configuration
  doc: add limitations for i40e PMD

John Daley (5):
  net/enic: fix flow director
  net/enic: fix crash with removed flow director filters
  net/enic: fix multi-queue Rx performance
  net/enic: fix crash on MTU update or Rx queue reconfigure
  net/enic: fix max packet length check

John W. Linville (4):
  net/ena: improve safety of string handling
  net/bnxt: ensure entry length is unsigned
  net/i40e: do not use VSI before NULL check
  net/bnxt: fix bit shift size

Kamil Rytarowski (1):
  net/thunderx: fix Tx checksum handling

Michael Qiu (2):
  examples/tep_term: fix L4 length
  examples/tep_term: fix packet length with multi-segments

Mohammad Abdul Awal (1):
  app/testpmd: fix RSS hash key size

Nelio Laranjeiro (1):
  net/mlx5: fix Rx checksum macros

Nelson Escobar (3):
  net/enic: revert truncated packets counter fix
  net/enic: document how to configure vNIC parameters
  net/enic: fix Rx queue index when not using Rx scatter

Nipun Gupta (1):
  mempool: fix leak if populate fails

N?lio Laranjeiro (7):
  net/mlx5: fix Rx function selection
  net/mlx5: fix hash key size retrieval
  net/mlx5: support Mellanox OFED 3.4
  net/mlx5: re-factorize functions
  net/mlx5: fix inline logic
  net/mlx5: fix link speed capability information
  net/mlx5: fix support for newer link speeds

Olga Shern (1):
  net/mlx5: fix link status report

Olivier Gournet (1):
  net/mlx5: fix initialization in secondary process

Pablo de Lara (3):
  app/test: fix hash multiwriter sequence
  hash: fix unlimited cuckoo path
  hash: fix bucket size usage

Piotr Azarewicz (1):
  examples/l2fwd-crypto: fix verify with decrypt in chain

Qi Zhang (4):
  net/ixgbe: fix out of order Rx read
  net/fm10k: fix out of order Rx read
  net/i40e: fix Rx hang when disable LLDP
  net/i40e: fix out of order Rx read

Qiming Yang (2):
  net/i40e: fix link status change interrupt
  net/i40e: fix VF bonded device link down

Rasesh Mody (3):
  net/bnx2x: fix maximum PF queues
  net/bnx2x: fix socket id for slowpath memory
  net/qede/base: fix 32-bit build

Raslan Darawsheh (2):
  net/mlx5: fix removing VLAN filter
  net/mlx5: fix handling of small mbuf sizes

Reshma Pattan (2):
  pdump: fix created directory permissions
  app/procinfo: free xstats memory upon failure

Sagi Grimberg (1):
  net/mlx5: fix possible NULL dereference in Rx path

Sergio Gonzalez Monroy (1):
  examples/ipsec-secgw: check SP only when setup

Wei Dai (3):
  lpm: fix freeing unused sub-table on rule delete
  mempool: fix search of maximum contiguous pages
  lpm: fix freeing memory

Wenzhuo Lu (6):
  app/testpmd: fix DCB configuration
  app/testpmd: fix PF/VF check of flow director
  net/ixgbe: fix flow director mask
  app/testpmd: fix flow director mask
  app/testpmd: fix flow director endianness
  net/ixgbe: fix VF registers

Xiao Wang (2):
  net/fm10k: fix Rx checksum flags
  net/fm10k: fix VF Tx queue initialization

Yaacov Hazan (3):
  net/mlx5: fix inconsistent return value in flow director
  net/mlx5: refactor allocation of flow director queues
  net/mlx5: fix flow director drop mode

Yong Wang (1):
  net/vmxnet3: fix mbuf release on reset/stop

Yuanhan Liu (1

[dpdk-dev] dpdk/vpp and cross-version migration for vhost

2016-11-17 Thread Yuanhan Liu

On Thu, Nov 17, 2016 at 09:47:09AM +0100, Maxime Coquelin wrote:
> 
> 
> On 11/17/2016 09:29 AM, Yuanhan Liu wrote:
> >As usaual, sorry for late response :/
> >
> >On Thu, Oct 13, 2016 at 08:50:52PM +0300, Michael S. Tsirkin wrote:
> >>Hi!
> >>So it looks like we face a problem with cross-version
> >>migration when using vhost. It's not new but became more
> >>acute with the advent of vhost user.
> >>
> >>For users to be able to migrate between different versions
> >>of the hypervisor the interface exposed to guests
> >>by hypervisor must stay unchanged.
> >>
> >>The problem is that a qemu device is connected
> >>to a backend in another process, so the interface
> >>exposed to guests depends on the capabilities of that
> >>process.
> >>
> >>Specifically, for vhost user interface based on virtio, this includes
> >>the "host features" bitmap that defines the interface, as well as more
> >>host values such as the max ring size.  Adding new features/changing
> >>values to this interface is required to make progress, but on the other
> >>hand we need ability to get the old host features to be compatible.
> >
> >It looks like to the same issue of vhost-user reconnect to me. For example,
> >
> >- start dpdk 16.07 & qemu 2.5
> >- kill dpdk
> >- start dpdk 16.11
> >
> >Though DPDK 16.11 has more features comparing to dpdk 16.07 (say, indirect),
> >above should work. Because qemu saves the negotiated features before the
> >disconnect and stores it back after the reconnection.
> >
> >commit a463215b087c41d7ca94e51aa347cde523831873
> >Author: Marc-Andr? Lureau 
> >Date:   Mon Jun 6 18:45:05 2016 +0200
> >
> >vhost-net: save & restore vhost-user acked features
> >
> >The initial vhost-user connection sets the features to be negotiated
> >with the driver. Renegotiation isn't possible without device reset.
> >
> >To handle reconnection of vhost-user backend, ensure the same set of
> >features are provided, and reuse already acked features.
> >
> >Signed-off-by: Marc-Andr? Lureau 
> >
> >
> >So we could do similar to vhost-user? I mean, save the acked features
> >before migration and store it back after it. This should be able to
> >keep the compatibility. If user downgrades DPDK version, it also could
> >be easily detected, and then exit with an error to user: migration
> >failed due to un-compatible vhost features.
> >
> >Just some rough thoughts. Makes tiny sense?
> 
> My understanding is that the management tool has to know whether
> versions are compatible before initiating the migration:

Makes sense. How about getting and restoring the acked features through
qemu command lines then, say, through the monitor interface?

With that, it would be something like:

- start vhost-user backend (DPDK, VPP, or whatever) & qemu in the src host

- read the acked features (through monitor interface)

- start vhost-user backend in the dst host

- start qemu in the dst host with the just queried acked features

  QEMU then is expected to use this feature set for the later vhost-user
  feature negotitation. Exit if features compatibility is broken.

Thoughts?

--yliu

>  1. The downtime could be unpredictable if a VM has to move from hosts
> to hosts multiple times, which is problematic, especially for NFV.
>  2. If migration is not possible, maybe the management tool would
> prefer not to interrupt the VM on current host.
> 
> I have little experience with migration though, so I could be mistaken.
> 
> Thanks,
> Maxime
> 
> >
> > --yliu
> >>
> >>To solve this problem within qemu, qemu has a versioning system based on
> >>a machine type concept which fundamentally is a version string, by
> >>specifying that string one can get hardware compatible with a previous
> >>qemu version. QEMU also reports the latest version and list of versions
> >>supported so libvirt records the version at VM creation and then is
> >>careful to use this machine version whenever it migrates a VM.
> >>
> >>One might wonder how is this solved with a kernel vhost backend. The
> >>answer is that it mostly isn't - instead an assumption is made, that
> >>qemu versions are deployed together with the kernel - this is generally
> >>true for downstreams.  Thus whenever qemu gains a new feature, it is
> >>already supported by the kernel as well.  However, if one attempts
> >>migration with a new qemu fr

[dpdk-dev] dpdk/vpp and cross-version migration for vhost

2016-11-17 Thread Yuanhan Liu

As usaual, sorry for late response :/

On Thu, Oct 13, 2016 at 08:50:52PM +0300, Michael S. Tsirkin wrote:
> Hi!
> So it looks like we face a problem with cross-version
> migration when using vhost. It's not new but became more
> acute with the advent of vhost user.
> 
> For users to be able to migrate between different versions
> of the hypervisor the interface exposed to guests
> by hypervisor must stay unchanged.
> 
> The problem is that a qemu device is connected
> to a backend in another process, so the interface
> exposed to guests depends on the capabilities of that
> process.
> 
> Specifically, for vhost user interface based on virtio, this includes
> the "host features" bitmap that defines the interface, as well as more
> host values such as the max ring size.  Adding new features/changing
> values to this interface is required to make progress, but on the other
> hand we need ability to get the old host features to be compatible.

It looks like to the same issue of vhost-user reconnect to me. For example,

- start dpdk 16.07 & qemu 2.5
- kill dpdk
- start dpdk 16.11

Though DPDK 16.11 has more features comparing to dpdk 16.07 (say, indirect),
above should work. Because qemu saves the negotiated features before the
disconnect and stores it back after the reconnection.

commit a463215b087c41d7ca94e51aa347cde523831873
Author: Marc-Andr? Lureau 
Date:   Mon Jun 6 18:45:05 2016 +0200

vhost-net: save & restore vhost-user acked features

The initial vhost-user connection sets the features to be negotiated
with the driver. Renegotiation isn't possible without device reset.

To handle reconnection of vhost-user backend, ensure the same set of
features are provided, and reuse already acked features.

Signed-off-by: Marc-Andr? Lureau 


So we could do similar to vhost-user? I mean, save the acked features
before migration and store it back after it. This should be able to
keep the compatibility. If user downgrades DPDK version, it also could
be easily detected, and then exit with an error to user: migration
failed due to un-compatible vhost features.

Just some rough thoughts. Makes tiny sense?

--yliu
> 
> To solve this problem within qemu, qemu has a versioning system based on
> a machine type concept which fundamentally is a version string, by
> specifying that string one can get hardware compatible with a previous
> qemu version. QEMU also reports the latest version and list of versions
> supported so libvirt records the version at VM creation and then is
> careful to use this machine version whenever it migrates a VM.
> 
> One might wonder how is this solved with a kernel vhost backend. The
> answer is that it mostly isn't - instead an assumption is made, that
> qemu versions are deployed together with the kernel - this is generally
> true for downstreams.  Thus whenever qemu gains a new feature, it is
> already supported by the kernel as well.  However, if one attempts
> migration with a new qemu from a system with a new to old kernel, one
> would get a failure.
> 
> In the world where we have multiple userspace backends, with some of
> these supplied by ISVs, this seems non-realistic.
> 
> IMO we need to support vhost backend versioning, ideally
> in a way that will also work for vhost kernel backends.
> 
> So I'd like to get some input from both backend and management
> developers on what a good solution would look like.
> 
> If we want to emulate the qemu solution, this involves adding the
> concept of interface versions to dpdk.  For example, dpdk could supply a
> file (or utility printing?) with list of versions: latest and versions
> supported. libvirt could read that and
> - store latest version at vm creation
> - pass it around with the vm
> - pass it to qemu
> 
> >From here, qemu could pass this over the vhost-user channel,
> thus making sure it's initialized with the correct
> compatible interface.
> 
> As version here is an opaque string for libvirt and qemu,
> anything can be used - but I suggest either a list
> of values defining the interface, e.g.
> any_layout=on,max_ring=256
> or a version including the name and vendor of the backend,
> e.g. "org.dpdk.v4.5.6".
> 
> Note that typically the list of supported versions can only be
> extended, not shrunk. Also, if the host/guest interface
> does not change, don't change the current version as
> this just creates work for everyone.
> 
> Thoughts? Would this work well for management? dpdk? vpp?
> 
> Thanks!
> 
> -- 
> MST

[dpdk-dev] [PATCH v2 10/10] net/virtio: fix multiple queue enabling

2016-11-07 Thread Yuanhan Liu

On Sat, Nov 05, 2016 at 05:41:05PM +0800, Yuanhan Liu wrote:
> When queue number shrinks to 1 from X, the following code stops us
> sending the multiple queue ctrl message:
> 
> if (nb_queues > 1) {
> if (virtio_set_multiple_queues(dev, nb_queues) != 0)
> return -EINVAL;
> }
> 
> This ends up with still X queues being enabled, which is obviously
> wrong. Fix it by removing the check.
> 
> Fixes: 823ad647950a ("virtio: support multiple queues")
> 
> Signed-off-by: Yuanhan Liu 

This breaks the virtio-user case, where ctrl-queue is not enabled by
defeault.

Here is an update patch would fix this.

--yliu

---
>From 22502943764a99b1398d40f0110f8ce28323323a Mon Sep 17 00:00:00 2001
From: Yuanhan Liu <yuanhan@linux.intel.com>
Date: Sat, 5 Nov 2016 16:53:27 +0800
Subject: [PATCH v3 10/10] net/virtio: fix multiple queue enabling

When queue number shrinks to 1 from X, the following code stops us
sending the multiple queue ctrl message:

if (nb_queues > 1) {
if (virtio_set_multiple_queues(dev, nb_queues) != 0)
return -EINVAL;
}

This ends up with still X queues being enabled, which is obviously
wrong. Fix it by replacing the check with a multiple queue enabled
or not check.

Fixes: 823ad647950a ("virtio: support multiple queues")

Signed-off-by: Yuanhan Liu 
---
v3: - fix the virtio-user case, which is default with ctrl-queue
  being disabled. Thus virtio_set_multiple_queues fails.
---
 drivers/net/virtio/virtio_ethdev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index d70bd00..18da98f 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1472,6 +1472,7 @@ virtio_dev_start(struct rte_eth_dev *dev)
uint16_t nb_queues, i;
struct virtnet_rx *rxvq;
struct virtnet_tx *txvq __rte_unused;
+   struct virtio_hw *hw = dev->data->dev_private;

/* check if lsc interrupt feature is enabled */
if (dev->data->dev_conf.intr_conf.lsc) {
@@ -1494,7 +1495,7 @@ virtio_dev_start(struct rte_eth_dev *dev)
 *vhost backend will have no chance to be waked up
 */
nb_queues = RTE_MAX(dev->data->nb_rx_queues, dev->data->nb_tx_queues);
-   if (nb_queues > 1) {
+   if (hw->max_queue_pairs > 1) {
if (virtio_set_multiple_queues(dev, nb_queues) != 0)
return -EINVAL;
}
-- 
1.9.0

[dpdk-dev] [PATCH v2 09/10] net/virtio: fix less queues being enabled issue

2016-11-05 Thread Yuanhan Liu

>From the virtio spec of view, multiple-queue is always enabled/disabled
in queue pairs. DPDK somehow allows the case when Tx and Rx queue number
are different.

Currently, virtio PMD get the queue pair number from the nb_rx_queues
field, which could be an issue when Tx queue number > Rx queue number.
Say, 2 Tx queues and 1 Rx queues. This would end up with 1 quues being
enabled. Which is wrong.

The fix is straightforward. Just pick a bigger number and enable that many
of queues.

Fixes: 823ad647950a ("virtio: support multiple queues")

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 5ec3e0e..d70bd00 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1493,7 +1493,7 @@ virtio_dev_start(struct rte_eth_dev *dev)
 *Otherwise the tap backend might already stop its queue due to 
fullness.
 *vhost backend will have no chance to be waked up
 */
-   nb_queues = dev->data->nb_rx_queues;
+   nb_queues = RTE_MAX(dev->data->nb_rx_queues, dev->data->nb_tx_queues);
if (nb_queues > 1) {
if (virtio_set_multiple_queues(dev, nb_queues) != 0)
return -EINVAL;
@@ -1501,7 +1501,7 @@ virtio_dev_start(struct rte_eth_dev *dev)

PMD_INIT_LOG(DEBUG, "nb_queues=%d", nb_queues);

-   for (i = 0; i < nb_queues; i++) {
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
rxvq = dev->data->rx_queues[i];
virtqueue_notify(rxvq->vq);
}
-- 
1.9.0

[dpdk-dev] [PATCH v2 08/10] net/virtio: remove started field

2016-11-05 Thread Yuanhan Liu

The "hw->started" field was introduced to stop touching queues
on restart. We never touches queues on restart any more, thus
it's safe to remove this flag.

Signed-off-by: Yuanhan Liu 
Reviewed-by: Maxime Coquelin 
---
 drivers/net/virtio/virtio_ethdev.c | 15 ++-
 drivers/net/virtio/virtio_pci.h|  1 -
 2 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 15c48b9..5ec3e0e 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -602,7 +602,6 @@ virtio_dev_close(struct rte_eth_dev *dev)
if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
vtpci_reset(hw);
-   hw->started = 0;
virtio_dev_free_mbufs(dev);
virtio_free_queues(hw);
 }
@@ -1345,17 +1344,14 @@ static int
 eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
 {
struct rte_pci_device *pci_dev;
-   struct virtio_hw *hw = eth_dev->data->dev_private;

PMD_INIT_FUNC_TRACE();

if (rte_eal_process_type() == RTE_PROC_SECONDARY)
return -EPERM;

-   if (hw->started == 1) {
-   virtio_dev_stop(eth_dev);
-   virtio_dev_close(eth_dev);
-   }
+   virtio_dev_stop(eth_dev);
+   virtio_dev_close(eth_dev);
pci_dev = eth_dev->pci_dev;

eth_dev->dev_ops = NULL;
@@ -1474,7 +1470,6 @@ static int
 virtio_dev_start(struct rte_eth_dev *dev)
 {
uint16_t nb_queues, i;
-   struct virtio_hw *hw = dev->data->dev_private;
struct virtnet_rx *rxvq;
struct virtnet_tx *txvq __rte_unused;

@@ -1494,12 +1489,6 @@ virtio_dev_start(struct rte_eth_dev *dev)
/* Initialize Link state */
virtio_dev_link_update(dev, 0);

-   /* On restart after stop do not touch queues */
-   if (hw->started)
-   return 0;
-
-   hw->started = 1;
-
/*Notify the backend
 *Otherwise the tap backend might already stop its queue due to 
fullness.
 *vhost backend will have no chance to be waked up
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index f63f76c..de271bf 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -252,7 +252,6 @@ struct virtio_hw {
uint16_tvtnet_hdr_size;
uint8_t vlan_strip;
uint8_t use_msix;
-   uint8_t started;
uint8_t modern;
uint8_t use_simple_rxtx;
uint8_t mac_addr[ETHER_ADDR_LEN];
-- 
1.9.0

[dpdk-dev] [PATCH v2 07/10] net/virtio: complete init stage at the right place

2016-11-05 Thread Yuanhan Liu

Invoking vtpci_reinit_complete() at port start stage doesn't make any
sense, instead, it should be done at the end of dev init stage.

So move it here.

Signed-off-by: Yuanhan Liu 
Reviewed-by: Maxime Coquelin 
---
 drivers/net/virtio/virtio_ethdev.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index a3e2aa9..15c48b9 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1277,6 +1277,7 @@ virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t 
req_features)
ret = virtio_alloc_queues(eth_dev);
if (ret < 0)
return ret;
+   vtpci_reinit_complete(hw);

if (pci_dev)
PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
@@ -1497,8 +1498,6 @@ virtio_dev_start(struct rte_eth_dev *dev)
if (hw->started)
return 0;

-   vtpci_reinit_complete(hw);
-
hw->started = 1;

/*Notify the backend
-- 
1.9.0

[dpdk-dev] [PATCH v2 06/10] net/virtio: move queue configure code to proper place

2016-11-05 Thread Yuanhan Liu

The only piece of code of virtio_dev_rxtx_start() is actually doing
queue configure/setup work. So, move it to corresponding queue_setup
callback.

Once that is done, virtio_dev_rxtx_start() becomes an empty function,
thus it's being removed.

Signed-off-by: Yuanhan Liu 
Reviewed-by: Maxime Coquelin 
---
 drivers/net/virtio/virtio_ethdev.c |   2 -
 drivers/net/virtio/virtio_ethdev.h |   2 -
 drivers/net/virtio/virtio_rxtx.c   | 186 -
 3 files changed, 78 insertions(+), 112 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 82dcc97..a3e2aa9 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1497,8 +1497,6 @@ virtio_dev_start(struct rte_eth_dev *dev)
if (hw->started)
return 0;

-   /* Do final configuration before rx/tx engine starts */
-   virtio_dev_rxtx_start(dev);
vtpci_reinit_complete(hw);

hw->started = 1;
diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index 8a3fa6d..27d9a19 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -78,8 +78,6 @@ void virtio_dev_cq_start(struct rte_eth_dev *dev);
 /*
  * RX/TX function prototypes
  */
-void virtio_dev_rxtx_start(struct rte_eth_dev *dev);
-
 int  virtio_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
uint16_t nb_rx_desc, unsigned int socket_id,
const struct rte_eth_rxconf *rx_conf,
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 24129d6..22d97a4 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -388,112 +388,6 @@ virtio_dev_cq_start(struct rte_eth_dev *dev)
}
 }

-void
-virtio_dev_rxtx_start(struct rte_eth_dev *dev)
-{
-   /*
-* Start receive and transmit vrings
-* -Setup vring structure for all queues
-* -Initialize descriptor for the rx vring
-* -Allocate blank mbufs for the each rx descriptor
-*
-*/
-   uint16_t i;
-   uint16_t desc_idx;
-   struct virtio_hw *hw = dev->data->dev_private;
-
-   PMD_INIT_FUNC_TRACE();
-
-   /* Start rx vring. */
-   for (i = 0; i < dev->data->nb_rx_queues; i++) {
-   struct virtnet_rx *rxvq = dev->data->rx_queues[i];
-   struct virtqueue *vq = rxvq->vq;
-   int error, nbufs;
-   struct rte_mbuf *m;
-
-   if (rxvq->mpool == NULL) {
-   rte_exit(EXIT_FAILURE,
-   "Cannot allocate mbufs for rx virtqueue");
-   }
-
-   /* Allocate blank mbufs for the each rx descriptor */
-   nbufs = 0;
-   error = ENOSPC;
-
-   if (hw->use_simple_rxtx) {
-   for (desc_idx = 0; desc_idx < vq->vq_nentries;
-desc_idx++) {
-   vq->vq_ring.avail->ring[desc_idx] = desc_idx;
-   vq->vq_ring.desc[desc_idx].flags =
-   VRING_DESC_F_WRITE;
-   }
-   }
-
-   memset(>fake_mbuf, 0, sizeof(rxvq->fake_mbuf));
-   for (desc_idx = 0; desc_idx < RTE_PMD_VIRTIO_RX_MAX_BURST;
-desc_idx++) {
-   vq->sw_ring[vq->vq_nentries + desc_idx] =
-   >fake_mbuf;
-   }
-
-   while (!virtqueue_full(vq)) {
-   m = rte_mbuf_raw_alloc(rxvq->mpool);
-   if (m == NULL)
-   break;
-
-   /**
-   * Enqueue allocated buffers*
-   ***/
-   if (hw->use_simple_rxtx)
-   error = 
virtqueue_enqueue_recv_refill_simple(vq, m);
-   else
-   error = virtqueue_enqueue_recv_refill(vq, m);
-
-   if (error) {
-   rte_pktmbuf_free(m);
-   break;
-   }
-   nbufs++;
-   }
-
-   vq_update_avail_idx(vq);
-
-   PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs);
-
-   VIRTQUEUE_DUMP(vq);
-   }
-
-   /* Start tx vring. */
-   for (i = 0; i < dev->data->nb_tx_queues; i++) {
-   struct virtnet_tx *txvq = dev->data->tx_queues[i];
-   struct virtqueue *vq = txvq->vq;
-
-   if (hw->use_simple_rxtx) {
-

[dpdk-dev] [PATCH v2 05/10] net/virtio: initiate vring at init stage

2016-11-05 Thread Yuanhan Liu

virtio_dev_vring_start() is actually doing the vring initiation job.
And the vring initiation job should be done at the dev init stage, as
stated with great details in former commit.

So move it there, and rename it to virtio_init_vring().

Signed-off-by: Yuanhan Liu 
Reviewed-by: Maxime Coquelin 
---
 drivers/net/virtio/virtio_ethdev.c | 32 +++-
 drivers/net/virtio/virtio_rxtx.c   | 32 
 2 files changed, 31 insertions(+), 33 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 67a175d..82dcc97 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -308,6 +308,35 @@ virtio_get_nr_vq(struct virtio_hw *hw)
return nr_vq;
 }

+static void
+virtio_init_vring(struct virtqueue *vq)
+{
+   int size = vq->vq_nentries;
+   struct vring *vr = >vq_ring;
+   uint8_t *ring_mem = vq->vq_ring_virt_mem;
+
+   PMD_INIT_FUNC_TRACE();
+
+   /*
+* Reinitialise since virtio port might have been stopped and restarted
+*/
+   memset(ring_mem, 0, vq->vq_ring_size);
+   vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
+   vq->vq_used_cons_idx = 0;
+   vq->vq_desc_head_idx = 0;
+   vq->vq_avail_idx = 0;
+   vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1);
+   vq->vq_free_cnt = vq->vq_nentries;
+   memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
+
+   vring_desc_init(vr->desc, size);
+
+   /*
+* Disable device(host) interrupting guest
+*/
+   virtqueue_disable_intr(vq);
+}
+
 static int
 virtio_init_queue(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx)
 {
@@ -371,7 +400,6 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
vtpci_queue_idx)
vq->hw = hw;
vq->vq_queue_index = vtpci_queue_idx;
vq->vq_nentries = vq_size;
-   vq->vq_free_cnt = vq_size;

/*
 * Reserve a memzone for vring elements
@@ -402,6 +430,8 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
vtpci_queue_idx)
PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%" PRIx64,
 (uint64_t)(uintptr_t)mz->addr);

+   virtio_init_vring(vq);
+
if (sz_hdr_mz) {
snprintf(vq_hdr_name, sizeof(vq_hdr_name), "port%d_vq%d_hdr",
 dev->data->port_id, vtpci_queue_idx);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 6e7ff27..24129d6 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -378,42 +378,12 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct 
rte_mbuf *cookie,
vq_update_avail_ring(vq, head_idx);
 }

-static void
-virtio_dev_vring_start(struct virtqueue *vq)
-{
-   int size = vq->vq_nentries;
-   struct vring *vr = >vq_ring;
-   uint8_t *ring_mem = vq->vq_ring_virt_mem;
-
-   PMD_INIT_FUNC_TRACE();
-
-   /*
-* Reinitialise since virtio port might have been stopped and restarted
-*/
-   memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size);
-   vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
-   vq->vq_used_cons_idx = 0;
-   vq->vq_desc_head_idx = 0;
-   vq->vq_avail_idx = 0;
-   vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1);
-   vq->vq_free_cnt = vq->vq_nentries;
-   memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
-
-   vring_desc_init(vr->desc, size);
-
-   /*
-* Disable device(host) interrupting guest
-*/
-   virtqueue_disable_intr(vq);
-}
-
 void
 virtio_dev_cq_start(struct rte_eth_dev *dev)
 {
struct virtio_hw *hw = dev->data->dev_private;

if (hw->cvq && hw->cvq->vq) {
-   virtio_dev_vring_start(hw->cvq->vq);
VIRTQUEUE_DUMP((struct virtqueue *)hw->cvq->vq);
}
 }
@@ -441,7 +411,6 @@ virtio_dev_rxtx_start(struct rte_eth_dev *dev)
int error, nbufs;
struct rte_mbuf *m;

-   virtio_dev_vring_start(vq);
if (rxvq->mpool == NULL) {
rte_exit(EXIT_FAILURE,
"Cannot allocate mbufs for rx virtqueue");
@@ -499,7 +468,6 @@ virtio_dev_rxtx_start(struct rte_eth_dev *dev)
struct virtnet_tx *txvq = dev->data->tx_queues[i];
struct virtqueue *vq = txvq->vq;

-   virtio_dev_vring_start(vq);
if (hw->use_simple_rxtx) {
uint16_t mid_idx  = vq->vq_nentries >> 1;

-- 
1.9.0

[dpdk-dev] [PATCH v2 04/10] net/virtio: allocate queue at init stage

2016-11-05 Thread Yuanhan Liu

Queue allocation should be done once, since the queue related info (such
as vring addreess) will only be informed to the vhost-user backend once
without virtio device reset.

That means, if you allocate queues again after the vhost-user negotiation,
the vhost-user backend will not be informed any more. Leading to a state
that the vring info mismatches between virtio PMD driver and vhost-backend:
the driver switches to the new address has just been allocated, while the
vhost-backend still sticks to the old address has been assigned in the init
stage.

Unfortunately, that is exactly how the virtio driver is coded so far: queue
allocation is done at queue_setup stage (when rte_eth_tx/rx_queue_setup is
invoked). This is wrong, because queue_setup can be invoked several times.
For example,

$ start_testpmd.sh ... --txq=1 --rxq=1 ...
> port stop 0
> port config all txq 1 # just trigger the queue_setup callback again
> port config all rxq 1
> port start 0

The right way to do is allocate the queues in the init stage, so that the
vring info could be persistent with the vhost-user backend.

Besides that, we should allocate max_queue pairs the device supports, but
not nr queue pairs firstly configured, to make following case work.

$ start_testpmd.sh ... --txq=1 --rxq=1 ...
> port stop 0
> port config all txq 2
> port config all rxq 2
> port start 0

Since the allocation is switched to init stage, the free should also
moved from the rx/tx_queue_release to dev close stage. That leading we
could do nothing an empty rx/tx_queue_release() implementation.

Signed-off-by: Yuanhan Liu 
---
v2: - free queues on dev close
---
 drivers/net/virtio/virtio_ethdev.c | 162 +
 drivers/net/virtio/virtio_ethdev.h |  14 
 drivers/net/virtio/virtio_pci.h|   2 +
 drivers/net/virtio/virtio_rxtx.c   |  83 +--
 drivers/net/virtio/virtqueue.h |   1 -
 5 files changed, 111 insertions(+), 151 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 5a2c14b..67a175d 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -280,28 +280,36 @@ virtio_set_multiple_queues(struct rte_eth_dev *dev, 
uint16_t nb_queues)
return 0;
 }

-void
-virtio_dev_queue_release(struct virtqueue *vq)
+static void
+virtio_dev_queue_release(void *queue __rte_unused)
 {
-   struct virtio_hw *hw;
+   /* do nothing */
+}

-   if (vq) {
-   hw = vq->hw;
-   if (vq->configured)
-   hw->vtpci_ops->del_queue(hw, vq);
+static int
+virtio_get_queue_type(struct virtio_hw *hw, uint16_t vtpci_queue_idx)
+{
+   if (vtpci_queue_idx == hw->max_queue_pairs * 2)
+   return VTNET_CQ;
+   else if (vtpci_queue_idx % 2 == 0)
+   return VTNET_RQ;
+   else
+   return VTNET_TQ;
+}

-   rte_free(vq->sw_ring);
-   rte_free(vq);
-   }
+static uint16_t
+virtio_get_nr_vq(struct virtio_hw *hw)
+{
+   uint16_t nr_vq = hw->max_queue_pairs * 2;
+
+   if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ))
+   nr_vq += 1;
+
+   return nr_vq;
 }

-int virtio_dev_queue_setup(struct rte_eth_dev *dev,
-   int queue_type,
-   uint16_t queue_idx,
-   uint16_t vtpci_queue_idx,
-   uint16_t nb_desc,
-   unsigned int socket_id,
-   void **pvq)
+static int
+virtio_init_queue(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx)
 {
char vq_name[VIRTQUEUE_MAX_NAME_SZ];
char vq_hdr_name[VIRTQUEUE_MAX_NAME_SZ];
@@ -314,6 +322,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
struct virtqueue *vq;
size_t sz_hdr_mz = 0;
void *sw_ring = NULL;
+   int queue_type = virtio_get_queue_type(hw, vtpci_queue_idx);
int ret;

PMD_INIT_LOG(DEBUG, "setting up queue: %u", vtpci_queue_idx);
@@ -351,18 +360,18 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
sz_hdr_mz = PAGE_SIZE;
}

-   vq = rte_zmalloc_socket(vq_name, size, RTE_CACHE_LINE_SIZE, socket_id);
+   vq = rte_zmalloc_socket(vq_name, size, RTE_CACHE_LINE_SIZE,
+   SOCKET_ID_ANY);
if (vq == NULL) {
PMD_INIT_LOG(ERR, "can not allocate vq");
return -ENOMEM;
}
+   hw->vqs[vtpci_queue_idx] = vq;
+
vq->hw = hw;
vq->vq_queue_index = vtpci_queue_idx;
vq->vq_nentries = vq_size;
-
-   if (nb_desc == 0 || nb_desc > vq_size)
-   nb_desc = vq_size;
-   vq->vq_free_cnt = nb_desc;
+   vq->vq_free_cnt = vq_size;

/*
 * Reserve a memzone for vring elements
@@ -372,7 +381,8 @@ int virti

[dpdk-dev] [PATCH v2 03/10] net/virtio: simplify queue allocation

2016-11-05 Thread Yuanhan Liu

Let rxq/txq/cq be the union field of the virtqueue struct. This would
simplifies the vq allocation a bit: we don't need calculate the vq_size
any more based on the queue type.

Signed-off-by: Yuanhan Liu 
Reviewed-by: Maxime Coquelin 
---
 drivers/net/virtio/virtio_ethdev.c | 18 +++---
 drivers/net/virtio/virtqueue.h |  7 +++
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index d082df5..5a2c14b 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -312,7 +312,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
struct virtnet_tx *txvq = NULL;
struct virtnet_ctl *cvq = NULL;
struct virtqueue *vq;
-   size_t sz_vq, sz_q = 0, sz_hdr_mz = 0;
+   size_t sz_hdr_mz = 0;
void *sw_ring = NULL;
int ret;

@@ -337,25 +337,21 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
snprintf(vq_name, sizeof(vq_name), "port%d_vq%d",
 dev->data->port_id, vtpci_queue_idx);

-   sz_vq = RTE_ALIGN_CEIL(sizeof(*vq) +
+   size = RTE_ALIGN_CEIL(sizeof(*vq) +
vq_size * sizeof(struct vq_desc_extra),
RTE_CACHE_LINE_SIZE);
-   if (queue_type == VTNET_RQ) {
-   sz_q = sz_vq + sizeof(*rxvq);
-   } else if (queue_type == VTNET_TQ) {
-   sz_q = sz_vq + sizeof(*txvq);
+   if (queue_type == VTNET_TQ) {
/*
 * For each xmit packet, allocate a virtio_net_hdr
 * and indirect ring elements
 */
sz_hdr_mz = vq_size * sizeof(struct virtio_tx_region);
} else if (queue_type == VTNET_CQ) {
-   sz_q = sz_vq + sizeof(*cvq);
/* Allocate a page for control vq command, data and status */
sz_hdr_mz = PAGE_SIZE;
}

-   vq = rte_zmalloc_socket(vq_name, sz_q, RTE_CACHE_LINE_SIZE, socket_id);
+   vq = rte_zmalloc_socket(vq_name, size, RTE_CACHE_LINE_SIZE, socket_id);
if (vq == NULL) {
PMD_INIT_LOG(ERR, "can not allocate vq");
return -ENOMEM;
@@ -425,14 +421,14 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
}

vq->sw_ring = sw_ring;
-   rxvq = (struct virtnet_rx *)RTE_PTR_ADD(vq, sz_vq);
+   rxvq = >rxq;
rxvq->vq = vq;
rxvq->port_id = dev->data->port_id;
rxvq->queue_id = queue_idx;
rxvq->mz = mz;
*pvq = rxvq;
} else if (queue_type == VTNET_TQ) {
-   txvq = (struct virtnet_tx *)RTE_PTR_ADD(vq, sz_vq);
+   txvq = >txq;
txvq->vq = vq;
txvq->port_id = dev->data->port_id;
txvq->queue_id = queue_idx;
@@ -442,7 +438,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,

*pvq = txvq;
} else if (queue_type == VTNET_CQ) {
-   cvq = (struct virtnet_ctl *)RTE_PTR_ADD(vq, sz_vq);
+   cvq = >cq;
cvq->vq = vq;
cvq->mz = mz;
cvq->virtio_net_hdr_mz = hdr_mz;
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index ef0027b..bbeb2f2 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -44,6 +44,7 @@
 #include "virtio_pci.h"
 #include "virtio_ring.h"
 #include "virtio_logs.h"
+#include "virtio_rxtx.h"

 struct rte_mbuf;

@@ -191,6 +192,12 @@ struct virtqueue {
void *vq_ring_virt_mem;  /**< linear address of vring*/
unsigned int vq_ring_size;

+   union {
+   struct virtnet_rx rxq;
+   struct virtnet_tx txq;
+   struct virtnet_ctl cq;
+   };
+
phys_addr_t vq_ring_mem; /**< physical address of vring,
  * or virtual address for virtio_user. */

-- 
1.9.0

[dpdk-dev] [PATCH v2 01/10] net/virtio: revert fix restart

2016-11-05 Thread Yuanhan Liu

This reverts commit 9a0615af7746 ("virtio: fix restart"); conflict is
manually addressed.

Kyle reported an issue with above commit

qemu-kvm: Guest moved used index from 5 to 1

with following steps,

1) Start my virtio interfaces
2) Send some traffic into/out of the interfaces
3) Stop the interfaces
4) Start the interfaces
5) Send some more traffic

And here are some quotes from Kyle's analysis,

Prior to the patch, if an interface were stopped then started, without
restarting the application, the queues would be left as-is, because
hw->started would be set to 1. Now, calling stop sets hw->started to 0,
which means the next call to start will "touch the queues". This is the
unintended side-effect that causes the problem.

We should not touch the queues once the init is done, otherwise, the vring
state of virtio PMD driver and vhost-user would be inconsistent, leading
some issue like above.

Thus this patch is reverted.

Fixes: 9a0615af7746 ("virtio: fix restart")

Cc: Jianfeng Tan 
Cc: 
Reported-by: Kyle Larose 
Signed-off-by: Yuanhan Liu 
Reviewed-by: Maxime Coquelin 
---
 drivers/net/virtio/virtio_ethdev.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index b388134..5815875 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -550,9 +550,6 @@ virtio_dev_close(struct rte_eth_dev *dev)

PMD_INIT_LOG(DEBUG, "virtio_dev_close");

-   if (hw->started == 1)
-   virtio_dev_stop(dev);
-
if (hw->cvq)
virtio_dev_queue_release(hw->cvq->vq);

@@ -560,6 +557,7 @@ virtio_dev_close(struct rte_eth_dev *dev)
if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
vtpci_reset(hw);
+   hw->started = 0;
virtio_dev_free_mbufs(dev);
virtio_free_queues(dev);
 }
@@ -1296,15 +1294,17 @@ static int
 eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
 {
struct rte_pci_device *pci_dev;
+   struct virtio_hw *hw = eth_dev->data->dev_private;

PMD_INIT_FUNC_TRACE();

if (rte_eal_process_type() == RTE_PROC_SECONDARY)
return -EPERM;

-   /* Close it anyway since there's no way to know if closed */
-   virtio_dev_close(eth_dev);
-
+   if (hw->started == 1) {
+   virtio_dev_stop(eth_dev);
+   virtio_dev_close(eth_dev);
+   }
pci_dev = eth_dev->pci_dev;

eth_dev->dev_ops = NULL;
@@ -1543,12 +1543,9 @@ static void
 virtio_dev_stop(struct rte_eth_dev *dev)
 {
struct rte_eth_link link;
-   struct virtio_hw *hw = dev->data->dev_private;

PMD_INIT_LOG(DEBUG, "stop");

-   hw->started = 0;
-
if (dev->data->dev_conf.intr_conf.lsc)
rte_intr_disable(>pci_dev->intr_handle);

-- 
1.9.0

[dpdk-dev] [PATCH v2 00/10] net/virtio: fix queue reconfigure issue

2016-11-05 Thread Yuanhan Liu

This patchset fixes few issues related to virtio queue reconfigure:
increase or shrink the queue number. The major issue and the reason
behind is described with length details in patch 4 "net/virtio: allocate
queue at init stage".

Those bugs can not be fixed by few lines of code, it's because the
current driver init logic is quite wrong, that I need change quite many
places to make it right. Meanwhile, I have already done my best to keep 
the changes being as minimal as possible, so that we could have fewer
changes to break something else; also, it's would be easier for review.

v2: - fix two more minor issues regarding to queue enabling; see patch 9
  and 10.
- refined commit log a bit.

Thanks.

--yliu

---
Yuanhan Liu (10):
  net/virtio: revert fix restart
  net/virtio: simplify queue memzone name
  net/virtio: simplify queue allocation
  net/virtio: allocate queue at init stage
  net/virtio: initiate vring at init stage
  net/virtio: move queue configure code to proper place
  net/virtio: complete init stage at the right place
  net/virtio: remove started field
  net/virtio: fix less queues being enabled issue
  net/virtio: fix multiple queue enabling

 drivers/net/virtio/virtio_ethdev.c | 248 +--
 drivers/net/virtio/virtio_ethdev.h |  16 --
 drivers/net/virtio/virtio_pci.h|   3 +-
 drivers/net/virtio/virtio_rxtx.c   | 291 -
 drivers/net/virtio/virtqueue.h |   7 +
 5 files changed, 237 insertions(+), 328 deletions(-)

-- 
1.9.0

[dpdk-dev] [PATCH 4/8] net/virtio: allocate queue at init stage

2016-11-05 Thread Yuanhan Liu

On Fri, Nov 04, 2016 at 08:30:00PM +, Kevin Traynor wrote:
> On 11/04/2016 03:21 PM, Kevin Traynor wrote:
> > On 11/03/2016 04:09 PM, Yuanhan Liu wrote:
> >> Queue allocation should be done once, since the queue related info (such
> >> as vring addreess) will only be informed to the vhost-user backend once
> >> without virtio device reset.
> >>
> >> That means, if you allocate queues again after the vhost-user negotiation,
> >> the vhost-user backend will not be informed any more. Leading to a state
> >> that the vring info mismatches between virtio PMD driver and vhost-backend:
> >> the driver switches to the new address has just been allocated, while the
> >> vhost-backend still sticks to the old address has been assigned in the init
> >> stage.
> >>
> >> Unfortunately, that is exactly how the virtio driver is coded so far: queue
> >> allocation is done at queue_setup stage (when rte_eth_tx/rx_queue_setup is
> >> invoked). This is wrong, because queue_setup can be invoked several times.
> >> For example,
> >>
> >> $ start_testpmd.sh ... --txq=1 --rxq=1 ...
> >> > port stop 0
> >> > port config all txq 1 # just trigger the queue_setup callback again
> >> > port config all rxq 1
> >> > port start 0
> >>
> >> The right way to do is allocate the queues in the init stage, so that the
> >> vring info could be persistent with the vhost-user backend.
> >>
> >> Besides that, we should allocate max_queue pairs the device supports, but
> >> not nr queue pairs firstly configured, to make following case work.
> >>
> >> $ start_testpmd.sh ... --txq=1 --rxq=1 ...
> >> > port stop 0
> >> > port config all txq 2
> >> > port config all rxq 2
> >> > port start 0
> > 
> > hi Yuanhan, firstly - thanks for this patchset. It is certainly needed
> > to fix the silent failure after increase num q's.
> > 
> > I tried a few tests and I'm seeing an issue. I can stop the port,
> > increase the number of queues and traffic is ok, but if I try to
> > decrease the number of queues it hangs on port start. I'm running head
> > of the master with your patches in the guest and 16.07 in the host.
> > 
> > $ testpmd -c 0x5f -n 4 --socket-mem 1024 -- --burst=64 -i
> > --disable-hw-vlan --rxq=2 --txq=2 --rxd=256 --txd=256 --forward-mode=io
> >> port stop all
> >> port config all rxq 1
> >> port config all txq 1
> >> port start all
> > Configuring Port 0 (socket 0)
> > (hang here)
> > 
> > I've tested a few different scenarios and anytime the queues are
> > decreased from the previous number the hang occurs.
> > 
> > I can debug further but wanted to report early as maybe issue is an
> > obvious one?

Kevin, thanks for testing! Hmm, it's a case I missed: I was thinking/testing
more about increasing (but not shrinking) the queue size :(

> virtio_dev_start() is getting stuck as soon as it needs to send a

That's because the connection is closed (for a bad reason, see detailes
below). You could figure it out quickly from the vhost log:

testpmd> VHOST_CONFIG: read message VHOST_USER_GET_VRING_BASE
PMD: Connection closed

Those messages showed immediately after you execute "port start all".

It's actually triggered by the "rx_queue_release", which in turn invokes
the "del_queue" virtio-pci method. QEMU then resets the device once
the message is got. Thus, you saw above log.

Since we now allocate queue once, it doesn't make sense to free those
queues and invoke the "del_queue" method at rx/tx_queue_release callback
then. The queue_release callback will be invoked when we shrink the
queue size.

And then I saw this case works like a charm.

I'm about to catch a train soon, but I will try to re-post v2 today, with
another minor fix I noticed while checking this issue: we should also send
the VIRTIO_NET_CTRL_MQ message while the queues shrinks from 2 to 1.

--yliu

[dpdk-dev] [PATCH v2] net/virtio: cache Rx/Tx offload ability check

2016-11-04 Thread Yuanhan Liu

It's not a good idea to do the check of whether Rx/Tx offload is
enabled at the data path. Instead, we could do the check at init
stage and store the result, so that we could avoid the check again
and again at the critical datapath.

Cc: Olivier Matz 
Signed-off-by: Yuanhan Liu 
---
v2: - rebase on top of the bug fix patches
- define rx/tx_offload as uint8_t instead of int

 drivers/net/virtio/virtio_ethdev.c | 19 +++
 drivers/net/virtio/virtio_pci.h|  2 ++
 drivers/net/virtio/virtio_rxtx.c   | 31 +--
 3 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 1505f67..2adae58 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1188,6 +1188,22 @@ rx_func_get(struct rte_eth_dev *eth_dev)
eth_dev->rx_pkt_burst = _recv_pkts;
 }

+static inline int
+rx_offload_enabled(struct virtio_hw *hw)
+{
+   return vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_CSUM) ||
+   vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO4) ||
+   vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO6);
+}
+
+static inline int
+tx_offload_enabled(struct virtio_hw *hw)
+{
+   return vtpci_with_feature(hw, VIRTIO_NET_F_CSUM) ||
+   vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO4) ||
+   vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO6);
+}
+
 /* reset device and renegotiate features if needed */
 static int
 virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features)
@@ -1209,6 +1225,9 @@ virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t 
req_features)
if (virtio_negotiate_features(hw, req_features) < 0)
return -1;

+   hw->tx_offload = tx_offload_enabled(hw);
+   hw->rx_offload = rx_offload_enabled(hw);
+
/* If host does not support status then disable LSC */
if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS))
eth_dev->data->dev_flags &= ~RTE_ETH_DEV_INTR_LSC;
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index de271bf..7d1dd9b 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -254,6 +254,8 @@ struct virtio_hw {
uint8_t use_msix;
uint8_t modern;
uint8_t use_simple_rxtx;
+   uint8_t tx_offload;
+   uint8_t rx_offload;
uint8_t mac_addr[ETHER_ADDR_LEN];
uint32_tnotify_off_multiplier;
uint8_t *isr;
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 4cb2ce7..7571191 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -250,14 +250,6 @@ virtio_tso_fix_cksum(struct rte_mbuf *m)
}
 }

-static inline int
-tx_offload_enabled(struct virtio_hw *hw)
-{
-   return vtpci_with_feature(hw, VIRTIO_NET_F_CSUM) ||
-   vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO4) ||
-   vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO6);
-}
-
 static inline void
 virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
   uint16_t needed, int use_indirect, int can_push)
@@ -270,9 +262,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct 
rte_mbuf *cookie,
uint16_t head_idx, idx;
uint16_t head_size = vq->hw->vtnet_hdr_size;
struct virtio_net_hdr *hdr;
-   int offload;

-   offload = tx_offload_enabled(vq->hw);
head_idx = vq->vq_desc_head_idx;
idx = head_idx;
dxp = >vq_descx[idx];
@@ -286,7 +276,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct 
rte_mbuf *cookie,
hdr = (struct virtio_net_hdr *)
rte_pktmbuf_prepend(cookie, head_size);
/* if offload disabled, it is not zeroed below, do it now */
-   if (offload == 0)
+   if (vq->hw->tx_offload == 0)
memset(hdr, 0, head_size);
} else if (use_indirect) {
/* setup tx ring slot to point to indirect
@@ -318,7 +308,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct 
rte_mbuf *cookie,
}

/* Checksum Offload / TSO */
-   if (offload) {
+   if (vq->hw->tx_offload) {
if (cookie->ol_flags & PKT_TX_TCP_SEG)
cookie->ol_flags |= PKT_TX_TCP_CKSUM;

@@ -735,14 +725,6 @@ virtio_rx_offload(struct rte_mbuf *m, struct 
virtio_net_hdr *hdr)
return 0;
 }

-static inline int
-rx_offload_enabled(struct virtio_hw *hw)
-{
-   return vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_CSUM) ||
-   vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO4) ||
-   vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO6);
-}
-
 #define VIRTIO_MBUF_BURST_SZ 64
 #define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring

[dpdk-dev] [PATCH 1/8] net/virtio: revert "virtio: fix restart"

2016-11-04 Thread Yuanhan Liu

On Fri, Nov 04, 2016 at 09:09:22AM +0100, Maxime Coquelin wrote:
> 
> 
> On 11/04/2016 03:00 AM, Yuanhan Liu wrote:
> >On Thu, Nov 03, 2016 at 09:36:52PM +0100, Maxime Coquelin wrote:
> >>Hi Yuanhan,
> >>
> >>On 11/03/2016 05:09 PM, Yuanhan Liu wrote:
> >>>This reverts commit 9a0615af7746 ("virtio: fix restart"); conflict is
> >>>manually addressed.
> >>>
> >>>Kyle reported an issue with above commit
> >>>
> >>>   qemu-kvm: Guest moved used index from 5 to 1
> >>>
> >>>with following steps,
> >>>
> >>>   1) Start my virtio interfaces
> >>>   2) Send some traffic into/out of the interfaces
> >>>   3) Stop the interfaces
> >>>   4) Start the interfaces
> >>>   5) Send some more traffic
> >>>
> >>>And here are some quotes from Kyle's analysis,
> >>>
> >>>   Prior to the patch, if an interface were stopped then started, without
> >>>   restarting the application, the queues would be left as-is, because
> >>>   hw->started would be set to 1. Now, calling stop sets hw->started to 0,
> >>>   which means the next call to start will "touch the queues". This is the
> >>>   unintended side-effect that causes the problem.
> >>
> >>Maybe a good idea to explain what is the problem the revert aims to fix.
> >
> >It aims to fix the issue, by "not touching the queues" on restart.
> >
> >>It does not seem to be clearly stated in the commit message.
> >
> >I was thinking the quote from Kyle is enough. How about following supplement:
> >
> >We should not touch the queues once the init is done, otherwise, the
> >vring state of virtio PMD driver and vhost-user would be inconsistent,
> >leading some issue like above.
> >
> >Thus this patch is reverted.
> >
> >Better now?
> Yes, this is much clearer from my PoV.

Fixed, and series applied to dpdk-next-virtio.

Thanks for the review!

--yliu

[dpdk-dev] [PATCH 3/3] vhost: update comments

2016-11-04 Thread Yuanhan Liu

On Fri, Nov 04, 2016 at 01:28:45PM +, Mcnamara, John wrote:
> 
> 
> > -Original Message-
> > From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> > Sent: Wednesday, November 2, 2016 3:15 AM
> > To: dev at dpdk.org
> > Cc: Mcnamara, John ; Yuanhan Liu
> > 
> > Subject: [PATCH 3/3] vhost: update comments
> > 
> > vhost-cuse is removed, update corresponding comments that are still
> > referencing it.
> > 
> > Signed-off-by: Yuanhan Liu 
> >
> > ...
> >
> >  /*
> > - * Backend-specific cleanup. Defined by vhost-cuse and vhost-user.
> > + * Backend-specific cleanup.
> > + *
> > + * TODO: fix it; we have one backend now
> >   */
> >  void vhost_backend_cleanup(struct virtio_net *dev);
> 
> In general it is not a good idea to leave TODOs/Fixmes around unless they
> that a better explanation of why they are there.

Agreed.

In this case, if you look at the delete line, you will find we have such
function implemented both in vhost-user and vhost-cuse. Now we get one
only. So, we may could just rename it to vhost_user_cleanup(). Or even,
remove it completely by doing some small refactors.

It's obvious that we need do that in another patch. Thus, the TODO is
been added.

--yliu

> 
> Apart from that:
> 
> 
> Acked-by: John McNamara 
>

[dpdk-dev] [PATCH RESEND v2] net/i40e: fix floating VEB issue

2016-11-04 Thread Yuanhan Liu

From: Beilei Xing 

Turning off S-TAG identification will impact floating VEB,
VFs can't communicate with each other.
This patch fixes this issue by judging whether floating
VEB is enabled, S-TAG identification will be turned off
only when floating VEB is disabled.

Fixes: 4d61120d5ce7 ("net/i40e: fix dropping packets with ethertype 0x88A8")

Signed-off-by: Beilei Xing 
Acked-by: Jingjing Wu 
---
 v2 changes:
 * Modify the limitation description in i40e.rst

 :: Seems that Beilei failed to send this patches out. While I have sent out few
 :: patches out successfully, I'm sending it for her.

 doc/guides/nics/i40e.rst   |  6 ++
 drivers/net/i40e/i40e_ethdev.c | 12 +++-
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst
index c0163fc..5780268 100644
--- a/doc/guides/nics/i40e.rst
+++ b/doc/guides/nics/i40e.rst
@@ -453,3 +453,9 @@ To work around this issue, ``ethtool -s  autoneg on`` 
should be
 set first and then the link can be brought up through ``ifconfig  up``.

 NOTE: requires Linux kernel i40e driver version >= 1.4.X
+
+Receive packets with Ethertype 0x88A8
+~
+
+Due to the FW limitation, PF can receive packets with Ethertype 0x88A8
+only when floating VEB is disabled.
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index bb81b15..631a93f 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1121,11 +1121,13 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
/* Disable double vlan by default */
i40e_vsi_config_double_vlan(vsi, FALSE);

-   /* Disable S-TAG identification by default */
-   ret = I40E_READ_REG(hw, I40E_PRT_L2TAGSEN);
-   if (ret & I40E_L2_TAGS_S_TAG_MASK) {
-   ret &= ~I40E_L2_TAGS_S_TAG_MASK;
-   I40E_WRITE_REG(hw, I40E_PRT_L2TAGSEN, ret);
+   /* Disable S-TAG identification when floating_veb is disabled */
+   if (!pf->floating_veb) {
+   ret = I40E_READ_REG(hw, I40E_PRT_L2TAGSEN);
+   if (ret & I40E_L2_TAGS_S_TAG_MASK) {
+   ret &= ~I40E_L2_TAGS_S_TAG_MASK;
+   I40E_WRITE_REG(hw, I40E_PRT_L2TAGSEN, ret);
+   }
}

if (!vsi->max_macaddrs)
-- 
2.5.0

[dpdk-dev] [PATCH RESEND v2] net/i40e: fix floating VEB issue

2016-11-04 Thread Yuanhan Liu

On Fri, Nov 04, 2016 at 05:33:20PM +0800, Yuanhan Liu wrote:
> Turning off S-TAG identification will impact floating VEB,
> VFs can't communicate with each other.
> This patch fixes this issue by judging whether floating
> VEB is enabled, S-TAG identification will be turned off
> only when floating VEB is disabled.
> 
> Fixes: 4d61120d5ce7 ("net/i40e: fix dropping packets with ethertype 0x88A8")
> 
> Signed-off-by: Beilei Xing 
> Acked-by: Jingjing Wu 
> ---
>  v2 changes:
>  * Modify the limitation description in i40e.rst
> 
>  Seems that Beilei failed to send this patches out; while I have sent out few
>  patches out successfully. I'm sending it for her.

Please ignore this patch. The original patch is with wrong format, that
it turns the authorship to me. I will resend another one soon.

--yliu

[dpdk-dev] [PATCH RESEND v2] net/i40e: fix floating VEB issue

2016-11-04 Thread Yuanhan Liu

Turning off S-TAG identification will impact floating VEB,
VFs can't communicate with each other.
This patch fixes this issue by judging whether floating
VEB is enabled, S-TAG identification will be turned off
only when floating VEB is disabled.

Fixes: 4d61120d5ce7 ("net/i40e: fix dropping packets with ethertype 0x88A8")

Signed-off-by: Beilei Xing 
Acked-by: Jingjing Wu 
---
 v2 changes:
 * Modify the limitation description in i40e.rst

 Seems that Beilei failed to send this patches out; while I have sent out few
 patches out successfully. I'm sending it for her.

 doc/guides/nics/i40e.rst   |  6 ++
 drivers/net/i40e/i40e_ethdev.c | 12 +++-
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst
index c0163fc..5780268 100644
--- a/doc/guides/nics/i40e.rst
+++ b/doc/guides/nics/i40e.rst
@@ -453,3 +453,9 @@ To work around this issue, ``ethtool -s  autoneg on`` 
should be
 set first and then the link can be brought up through ``ifconfig  up``.

 NOTE: requires Linux kernel i40e driver version >= 1.4.X
+
+Receive packets with Ethertype 0x88A8
+~
+
+Due to the FW limitation, PF can receive packets with Ethertype 0x88A8
+only when floating VEB is disabled.
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index bb81b15..631a93f 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1121,11 +1121,13 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
/* Disable double vlan by default */
i40e_vsi_config_double_vlan(vsi, FALSE);

-   /* Disable S-TAG identification by default */
-   ret = I40E_READ_REG(hw, I40E_PRT_L2TAGSEN);
-   if (ret & I40E_L2_TAGS_S_TAG_MASK) {
-   ret &= ~I40E_L2_TAGS_S_TAG_MASK;
-   I40E_WRITE_REG(hw, I40E_PRT_L2TAGSEN, ret);
+   /* Disable S-TAG identification when floating_veb is disabled */
+   if (!pf->floating_veb) {
+   ret = I40E_READ_REG(hw, I40E_PRT_L2TAGSEN);
+   if (ret & I40E_L2_TAGS_S_TAG_MASK) {
+   ret &= ~I40E_L2_TAGS_S_TAG_MASK;
+   I40E_WRITE_REG(hw, I40E_PRT_L2TAGSEN, ret);
+   }
}

if (!vsi->max_macaddrs)
-- 
2.5.0

[dpdk-dev] [PATCH] net/virtio: cache Rx/Tx offload enabled check

2016-11-04 Thread Yuanhan Liu

It's not a good idea to do Rx/Tx offload enabled check at the data
path. Instead, we could do the check at init stage and store the
result, so that we could avoid the check again and again at the
critical datapath.

Cc: Olivier Matz 
Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c | 19 +++
 drivers/net/virtio/virtio_pci.h|  2 ++
 drivers/net/virtio/virtio_rxtx.c   | 30 --
 3 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index b388134..708c0e0 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1141,6 +1141,22 @@ rx_func_get(struct rte_eth_dev *eth_dev)
eth_dev->rx_pkt_burst = _recv_pkts;
 }

+static inline int
+rx_offload_enabled(struct virtio_hw *hw)
+{
+   return vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_CSUM) ||
+   vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO4) ||
+   vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO6);
+}
+
+static inline int
+tx_offload_enabled(struct virtio_hw *hw)
+{
+   return vtpci_with_feature(hw, VIRTIO_NET_F_CSUM) ||
+   vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO4) ||
+   vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO6);
+}
+
 /* reset device and renegotiate features if needed */
 static int
 virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features)
@@ -1161,6 +1177,9 @@ virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t 
req_features)
if (virtio_negotiate_features(hw, req_features) < 0)
return -1;

+   hw->tx_offload = tx_offload_enabled(hw);
+   hw->rx_offload = rx_offload_enabled(hw);
+
/* If host does not support status then disable LSC */
if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS))
eth_dev->data->dev_flags &= ~RTE_ETH_DEV_INTR_LSC;
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 0c5ed31..442eed2 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -264,6 +264,8 @@ struct virtio_hw {
struct virtio_net_config *dev_cfg;
const struct virtio_pci_ops *vtpci_ops;
void*virtio_user_dev;
+   int tx_offload;
+   int rx_offload;
 };

 /*
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index b4c4aa4..ba46595 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -250,14 +250,6 @@ virtio_tso_fix_cksum(struct rte_mbuf *m)
}
 }

-static inline int
-tx_offload_enabled(struct virtio_hw *hw)
-{
-   return vtpci_with_feature(hw, VIRTIO_NET_F_CSUM) ||
-   vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO4) ||
-   vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO6);
-}
-
 static inline void
 virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
   uint16_t needed, int use_indirect, int can_push)
@@ -270,9 +262,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct 
rte_mbuf *cookie,
uint16_t head_idx, idx;
uint16_t head_size = vq->hw->vtnet_hdr_size;
struct virtio_net_hdr *hdr;
-   int offload;

-   offload = tx_offload_enabled(vq->hw);
head_idx = vq->vq_desc_head_idx;
idx = head_idx;
dxp = >vq_descx[idx];
@@ -286,7 +276,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct 
rte_mbuf *cookie,
hdr = (struct virtio_net_hdr *)
rte_pktmbuf_prepend(cookie, head_size);
/* if offload disabled, it is not zeroed below, do it now */
-   if (offload == 0)
+   if (vq->hw->tx_offload == 0)
memset(hdr, 0, head_size);
} else if (use_indirect) {
/* setup tx ring slot to point to indirect
@@ -318,7 +308,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct 
rte_mbuf *cookie,
}

/* Checksum Offload / TSO */
-   if (offload) {
+   if (vq->hw->tx_offload) {
if (cookie->ol_flags & PKT_TX_TCP_SEG)
cookie->ol_flags |= PKT_TX_TCP_CKSUM;

@@ -798,14 +788,6 @@ virtio_rx_offload(struct rte_mbuf *m, struct 
virtio_net_hdr *hdr)
return 0;
 }

-static inline int
-rx_offload_enabled(struct virtio_hw *hw)
-{
-   return vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_CSUM) ||
-   vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO4) ||
-   vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO6);
-}
-
 #define VIRTIO_MBUF_BURST_SZ 64
 #define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc))
 uint16_t
@@ -821,7 +803,6 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
int error;
uint32_t i, nb_enqueued;
uint32_t h

[dpdk-dev] [PATCH v2] scripts: show fixes with release version of bug

2016-11-04 Thread Yuanhan Liu

On Fri, Sep 30, 2016 at 04:18:42PM +0200, Thomas Monjalon wrote:
> This script can help to find commits to backport in stable branches.
> 
> Fixes are found if there is the word "fix" in the headline or
> if there is a tag Fixes: or Reverts: in the message.
> Chained fixes of fixes are explored to find the oldest origin.
> Fixes of not released bugs are ignored.

Thomas, that saves my life! It helps a lot when I'm trying to pick
commits for 16.07.2. So,

Reviewed-by: Yuanhan Liu 

Thanks for the patch (and sorry for late reply).

--yliu

[dpdk-dev] [PATCH 1/8] net/virtio: revert "virtio: fix restart"

2016-11-04 Thread Yuanhan Liu

On Thu, Nov 03, 2016 at 09:36:52PM +0100, Maxime Coquelin wrote:
> Hi Yuanhan,
> 
> On 11/03/2016 05:09 PM, Yuanhan Liu wrote:
> >This reverts commit 9a0615af7746 ("virtio: fix restart"); conflict is
> >manually addressed.
> >
> >Kyle reported an issue with above commit
> >
> >qemu-kvm: Guest moved used index from 5 to 1
> >
> >with following steps,
> >
> >1) Start my virtio interfaces
> >2) Send some traffic into/out of the interfaces
> >3) Stop the interfaces
> >4) Start the interfaces
> >5) Send some more traffic
> >
> >And here are some quotes from Kyle's analysis,
> >
> >Prior to the patch, if an interface were stopped then started, without
> >restarting the application, the queues would be left as-is, because
> >hw->started would be set to 1. Now, calling stop sets hw->started to 0,
> >which means the next call to start will "touch the queues". This is the
> >unintended side-effect that causes the problem.
> 
> Maybe a good idea to explain what is the problem the revert aims to fix.

It aims to fix the issue, by "not touching the queues" on restart.

> It does not seem to be clearly stated in the commit message.

I was thinking the quote from Kyle is enough. How about following supplement:

We should not touch the queues once the init is done, otherwise, the
vring state of virtio PMD driver and vhost-user would be inconsistent,
leading some issue like above.

Thus this patch is reverted.

Better now?

--yliu

[dpdk-dev] [PATCH 3/8] net/virtio: simplify queue allocation

2016-11-04 Thread Yuanhan Liu

On Thu, Nov 03, 2016 at 09:48:03PM +0100, Maxime Coquelin wrote:
> 
> 
> On 11/03/2016 05:09 PM, Yuanhan Liu wrote:
> >Let rxq/txq/cq be the union field of the virtqueue struct. This would
> >simplifies the vq allocation a bit: we don't need calculate the vq_size
> >any more based on the queue time.
> s/time/type/ ?

Oops... will fix it.

Thanks.

    --yliu
> >
> >Signed-off-by: Yuanhan Liu 
> >---
> > drivers/net/virtio/virtio_ethdev.c | 18 +++---
> > drivers/net/virtio/virtqueue.h |  7 +++
> > 2 files changed, 14 insertions(+), 11 deletions(-)
> 
> Other than that:
> Reviewed-by: Maxime Coquelin 
> 
> Thanks,
> Maxime

[dpdk-dev] [PATCH 4/8] net/virtio: allocate queue at init stage

2016-11-04 Thread Yuanhan Liu

On Thu, Nov 03, 2016 at 10:11:43PM +0100, Maxime Coquelin wrote:
> 
> 
> On 11/03/2016 05:09 PM, Yuanhan Liu wrote:
> >Queue allocation should be done once, since the queue related info (such
> >as vring addreess) will only be informed to the vhost-user backend once
> >without virtio device reset.
> >
> >That means, if you allocate queues again after the vhost-user negotiation,
> >the vhost-user backend will not be informed any more. Leading to a state
> >that the vring info mismatches between virtio PMD driver and vhost-backend:
> >the driver switches to the new address has just been allocated, while the
> >vhost-backend still sticks to the old address has been assigned in the init
> >stage.
> >
> >Unfortunately, that is exactly how the virtio driver is coded so far: queue
> >allocation is done at queue_setup stage (when rte_eth_tx/rx_queue_setup is
> >invoked). This is wrong, because queue_setup can be invoked several times.
> >For example,
> >
> >$ start_testpmd.sh ... --txq=1 --rxq=1 ...
> >> port stop 0
> >> port config all txq 1 # just trigger the queue_setup callback again
> >> port config all rxq 1
> >> port start 0
> >
> >The right way to do is allocate the queues in the init stage, so that the
> >vring info could be persistent with the vhost-user backend.
> >
> >Besides that, we should allocate max_queue pairs the device supports, but
> >not nr queue pairs firstly configured, to make following case work.
> I understand, but how much memory overhead does that represent?

We are allocating max queue pairs the device supports, but not the
virtio-net spec supports, which, as you stated, would be too much.

I don't know the typical value of the queue pairs being used in
production, but normally, I would assume it be small, something
like 2, or 4. It's 1 by default after all.

So I think it will not be an issue.

> Have you considered performing a device reset when queue number is
> changed?

Not a good idea and clean solution to me.

--yliu

[dpdk-dev] [PATCH 8/8] net/virtio: remove started field

2016-11-04 Thread Yuanhan Liu

The "hw->started" field was introduced to stop touching queues
on restart. We never touches queues on restart any more, thus
it's safe to remove this flag.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c | 15 ++-
 drivers/net/virtio/virtio_pci.h|  1 -
 2 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index c147909..1505f67 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -607,7 +607,6 @@ virtio_dev_close(struct rte_eth_dev *dev)
if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
vtpci_reset(hw);
-   hw->started = 0;
virtio_dev_free_mbufs(dev);
virtio_free_queues(dev);
 }
@@ -1350,17 +1349,14 @@ static int
 eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
 {
struct rte_pci_device *pci_dev;
-   struct virtio_hw *hw = eth_dev->data->dev_private;

PMD_INIT_FUNC_TRACE();

if (rte_eal_process_type() == RTE_PROC_SECONDARY)
return -EPERM;

-   if (hw->started == 1) {
-   virtio_dev_stop(eth_dev);
-   virtio_dev_close(eth_dev);
-   }
+   virtio_dev_stop(eth_dev);
+   virtio_dev_close(eth_dev);
pci_dev = eth_dev->pci_dev;

eth_dev->dev_ops = NULL;
@@ -1479,7 +1475,6 @@ static int
 virtio_dev_start(struct rte_eth_dev *dev)
 {
uint16_t nb_queues, i;
-   struct virtio_hw *hw = dev->data->dev_private;
struct virtnet_rx *rxvq;
struct virtnet_tx *txvq __rte_unused;

@@ -1499,12 +1494,6 @@ virtio_dev_start(struct rte_eth_dev *dev)
/* Initialize Link state */
virtio_dev_link_update(dev, 0);

-   /* On restart after stop do not touch queues */
-   if (hw->started)
-   return 0;
-
-   hw->started = 1;
-
/*Notify the backend
 *Otherwise the tap backend might already stop its queue due to 
fullness.
 *vhost backend will have no chance to be waked up
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index f63f76c..de271bf 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -252,7 +252,6 @@ struct virtio_hw {
uint16_tvtnet_hdr_size;
uint8_t vlan_strip;
uint8_t use_msix;
-   uint8_t started;
uint8_t modern;
uint8_t use_simple_rxtx;
uint8_t mac_addr[ETHER_ADDR_LEN];
-- 
1.9.0

[dpdk-dev] [PATCH 7/8] net/virtio: complete init stage at the right place

2016-11-04 Thread Yuanhan Liu

Invoking vtpci_reinit_complete() at port start stage doesn't make any
sense, instead, it should be done at the end of dev init stage.

So move it here.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 37459e7..c147909 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1282,6 +1282,7 @@ virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t 
req_features)
ret = virtio_alloc_queues(eth_dev);
if (ret < 0)
return ret;
+   vtpci_reinit_complete(hw);

if (pci_dev)
PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
@@ -1502,8 +1503,6 @@ virtio_dev_start(struct rte_eth_dev *dev)
if (hw->started)
return 0;

-   vtpci_reinit_complete(hw);
-
hw->started = 1;

/*Notify the backend
-- 
1.9.0

[dpdk-dev] [PATCH 6/8] net/virtio: move queue configure code to proper place

2016-11-04 Thread Yuanhan Liu

The only piece of code of virtio_dev_rxtx_start() is actually doing
queue configure/setup work. So, move it to corresponding queue_setup
callback.

Once that is done, virtio_dev_rxtx_start() becomes an empty function,
thus it's being removed.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c |   2 -
 drivers/net/virtio/virtio_ethdev.h |   2 -
 drivers/net/virtio/virtio_rxtx.c   | 187 -
 3 files changed, 79 insertions(+), 112 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index b80b6f5..37459e7 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1502,8 +1502,6 @@ virtio_dev_start(struct rte_eth_dev *dev)
if (hw->started)
return 0;

-   /* Do final configuration before rx/tx engine starts */
-   virtio_dev_rxtx_start(dev);
vtpci_reinit_complete(hw);

hw->started = 1;
diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index 5db8632..1396c6e 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -78,8 +78,6 @@ void virtio_dev_cq_start(struct rte_eth_dev *dev);
 /*
  * RX/TX function prototypes
  */
-void virtio_dev_rxtx_start(struct rte_eth_dev *dev);
-
 void virtio_dev_queue_release(struct virtqueue *vq);

 int  virtio_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index bce0663..4cb2ce7 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -388,112 +388,6 @@ virtio_dev_cq_start(struct rte_eth_dev *dev)
}
 }

-void
-virtio_dev_rxtx_start(struct rte_eth_dev *dev)
-{
-   /*
-* Start receive and transmit vrings
-* -Setup vring structure for all queues
-* -Initialize descriptor for the rx vring
-* -Allocate blank mbufs for the each rx descriptor
-*
-*/
-   uint16_t i;
-   uint16_t desc_idx;
-   struct virtio_hw *hw = dev->data->dev_private;
-
-   PMD_INIT_FUNC_TRACE();
-
-   /* Start rx vring. */
-   for (i = 0; i < dev->data->nb_rx_queues; i++) {
-   struct virtnet_rx *rxvq = dev->data->rx_queues[i];
-   struct virtqueue *vq = rxvq->vq;
-   int error, nbufs;
-   struct rte_mbuf *m;
-
-   if (rxvq->mpool == NULL) {
-   rte_exit(EXIT_FAILURE,
-   "Cannot allocate mbufs for rx virtqueue");
-   }
-
-   /* Allocate blank mbufs for the each rx descriptor */
-   nbufs = 0;
-   error = ENOSPC;
-
-   if (hw->use_simple_rxtx) {
-   for (desc_idx = 0; desc_idx < vq->vq_nentries;
-desc_idx++) {
-   vq->vq_ring.avail->ring[desc_idx] = desc_idx;
-   vq->vq_ring.desc[desc_idx].flags =
-   VRING_DESC_F_WRITE;
-   }
-   }
-
-   memset(>fake_mbuf, 0, sizeof(rxvq->fake_mbuf));
-   for (desc_idx = 0; desc_idx < RTE_PMD_VIRTIO_RX_MAX_BURST;
-desc_idx++) {
-   vq->sw_ring[vq->vq_nentries + desc_idx] =
-   >fake_mbuf;
-   }
-
-   while (!virtqueue_full(vq)) {
-   m = rte_mbuf_raw_alloc(rxvq->mpool);
-   if (m == NULL)
-   break;
-
-   /**
-   * Enqueue allocated buffers*
-   ***/
-   if (hw->use_simple_rxtx)
-   error = 
virtqueue_enqueue_recv_refill_simple(vq, m);
-   else
-   error = virtqueue_enqueue_recv_refill(vq, m);
-
-   if (error) {
-   rte_pktmbuf_free(m);
-   break;
-   }
-   nbufs++;
-   }
-
-   vq_update_avail_idx(vq);
-
-   PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs);
-
-   VIRTQUEUE_DUMP(vq);
-   }
-
-   /* Start tx vring. */
-   for (i = 0; i < dev->data->nb_tx_queues; i++) {
-   struct virtnet_tx *txvq = dev->data->tx_queues[i];
-   struct virtqueue *vq = txvq->vq;
-
-   if (hw->use_simple_rxtx) {
-   uint16_t mid_idx  = vq->vq_nentries >> 1;
-
-   for (desc_idx = 0; desc_idx < mid_idx; de

[dpdk-dev] [PATCH 5/8] net/virtio: initiate vring at init stage

2016-11-04 Thread Yuanhan Liu

virtio_dev_vring_start() is actually doing the vring initiation job.
And the vring initiation job should be done at the dev init stage, as
stated with great details in former commit.

So move it there, and rename it to virtio_init_vring().

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c | 32 +++-
 drivers/net/virtio/virtio_rxtx.c   | 32 
 2 files changed, 31 insertions(+), 33 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 253bcb5..b80b6f5 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -306,6 +306,35 @@ virtio_get_queue_type(struct virtio_hw *hw, uint16_t 
vtpci_queue_idx)
return VTNET_TQ;
 }

+static void
+virtio_init_vring(struct virtqueue *vq)
+{
+   int size = vq->vq_nentries;
+   struct vring *vr = >vq_ring;
+   uint8_t *ring_mem = vq->vq_ring_virt_mem;
+
+   PMD_INIT_FUNC_TRACE();
+
+   /*
+* Reinitialise since virtio port might have been stopped and restarted
+*/
+   memset(ring_mem, 0, vq->vq_ring_size);
+   vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
+   vq->vq_used_cons_idx = 0;
+   vq->vq_desc_head_idx = 0;
+   vq->vq_avail_idx = 0;
+   vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1);
+   vq->vq_free_cnt = vq->vq_nentries;
+   memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
+
+   vring_desc_init(vr->desc, size);
+
+   /*
+* Disable device(host) interrupting guest
+*/
+   virtqueue_disable_intr(vq);
+}
+
 static int
 virtio_init_queue(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx)
 {
@@ -369,7 +398,6 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
vtpci_queue_idx)
vq->hw = hw;
vq->vq_queue_index = vtpci_queue_idx;
vq->vq_nentries = vq_size;
-   vq->vq_free_cnt = vq_size;

/*
 * Reserve a memzone for vring elements
@@ -400,6 +428,8 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
vtpci_queue_idx)
PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%" PRIx64,
 (uint64_t)(uintptr_t)mz->addr);

+   virtio_init_vring(vq);
+
if (sz_hdr_mz) {
snprintf(vq_hdr_name, sizeof(vq_hdr_name), "port%d_vq%d_hdr",
 dev->data->port_id, vtpci_queue_idx);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index fb703d2..bce0663 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -378,42 +378,12 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct 
rte_mbuf *cookie,
vq_update_avail_ring(vq, head_idx);
 }

-static void
-virtio_dev_vring_start(struct virtqueue *vq)
-{
-   int size = vq->vq_nentries;
-   struct vring *vr = >vq_ring;
-   uint8_t *ring_mem = vq->vq_ring_virt_mem;
-
-   PMD_INIT_FUNC_TRACE();
-
-   /*
-* Reinitialise since virtio port might have been stopped and restarted
-*/
-   memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size);
-   vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
-   vq->vq_used_cons_idx = 0;
-   vq->vq_desc_head_idx = 0;
-   vq->vq_avail_idx = 0;
-   vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1);
-   vq->vq_free_cnt = vq->vq_nentries;
-   memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
-
-   vring_desc_init(vr->desc, size);
-
-   /*
-* Disable device(host) interrupting guest
-*/
-   virtqueue_disable_intr(vq);
-}
-
 void
 virtio_dev_cq_start(struct rte_eth_dev *dev)
 {
struct virtio_hw *hw = dev->data->dev_private;

if (hw->cvq && hw->cvq->vq) {
-   virtio_dev_vring_start(hw->cvq->vq);
VIRTQUEUE_DUMP((struct virtqueue *)hw->cvq->vq);
}
 }
@@ -441,7 +411,6 @@ virtio_dev_rxtx_start(struct rte_eth_dev *dev)
int error, nbufs;
struct rte_mbuf *m;

-   virtio_dev_vring_start(vq);
if (rxvq->mpool == NULL) {
rte_exit(EXIT_FAILURE,
"Cannot allocate mbufs for rx virtqueue");
@@ -499,7 +468,6 @@ virtio_dev_rxtx_start(struct rte_eth_dev *dev)
struct virtnet_tx *txvq = dev->data->tx_queues[i];
struct virtqueue *vq = txvq->vq;

-   virtio_dev_vring_start(vq);
if (hw->use_simple_rxtx) {
uint16_t mid_idx  = vq->vq_nentries >> 1;

-- 
1.9.0

[dpdk-dev] [PATCH 4/8] net/virtio: allocate queue at init stage

2016-11-04 Thread Yuanhan Liu

Queue allocation should be done once, since the queue related info (such
as vring addreess) will only be informed to the vhost-user backend once
without virtio device reset.

That means, if you allocate queues again after the vhost-user negotiation,
the vhost-user backend will not be informed any more. Leading to a state
that the vring info mismatches between virtio PMD driver and vhost-backend:
the driver switches to the new address has just been allocated, while the
vhost-backend still sticks to the old address has been assigned in the init
stage.

Unfortunately, that is exactly how the virtio driver is coded so far: queue
allocation is done at queue_setup stage (when rte_eth_tx/rx_queue_setup is
invoked). This is wrong, because queue_setup can be invoked several times.
For example,

$ start_testpmd.sh ... --txq=1 --rxq=1 ...
> port stop 0
> port config all txq 1 # just trigger the queue_setup callback again
> port config all rxq 1
> port start 0

The right way to do is allocate the queues in the init stage, so that the
vring info could be persistent with the vhost-user backend.

Besides that, we should allocate max_queue pairs the device supports, but
not nr queue pairs firstly configured, to make following case work.

$ start_testpmd.sh ... --txq=1 --rxq=1 ...
> port stop 0
> port config all txq 2
> port config all rxq 2
> port start 0

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c | 105 +++--
 drivers/net/virtio/virtio_ethdev.h |   8 ---
 drivers/net/virtio/virtio_pci.h|   2 +
 drivers/net/virtio/virtio_rxtx.c   |  38 +++---
 4 files changed, 85 insertions(+), 68 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 5a2c14b..253bcb5 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -295,13 +295,19 @@ virtio_dev_queue_release(struct virtqueue *vq)
}
 }

-int virtio_dev_queue_setup(struct rte_eth_dev *dev,
-   int queue_type,
-   uint16_t queue_idx,
-   uint16_t vtpci_queue_idx,
-   uint16_t nb_desc,
-   unsigned int socket_id,
-   void **pvq)
+static int
+virtio_get_queue_type(struct virtio_hw *hw, uint16_t vtpci_queue_idx)
+{
+   if (vtpci_queue_idx == hw->max_queue_pairs * 2)
+   return VTNET_CQ;
+   else if (vtpci_queue_idx % 2 == 0)
+   return VTNET_RQ;
+   else
+   return VTNET_TQ;
+}
+
+static int
+virtio_init_queue(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx)
 {
char vq_name[VIRTQUEUE_MAX_NAME_SZ];
char vq_hdr_name[VIRTQUEUE_MAX_NAME_SZ];
@@ -314,6 +320,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
struct virtqueue *vq;
size_t sz_hdr_mz = 0;
void *sw_ring = NULL;
+   int queue_type = virtio_get_queue_type(hw, vtpci_queue_idx);
int ret;

PMD_INIT_LOG(DEBUG, "setting up queue: %u", vtpci_queue_idx);
@@ -351,18 +358,18 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
sz_hdr_mz = PAGE_SIZE;
}

-   vq = rte_zmalloc_socket(vq_name, size, RTE_CACHE_LINE_SIZE, socket_id);
+   vq = rte_zmalloc_socket(vq_name, size, RTE_CACHE_LINE_SIZE,
+   SOCKET_ID_ANY);
if (vq == NULL) {
PMD_INIT_LOG(ERR, "can not allocate vq");
return -ENOMEM;
}
+   hw->vqs[vtpci_queue_idx] = vq;
+
vq->hw = hw;
vq->vq_queue_index = vtpci_queue_idx;
vq->vq_nentries = vq_size;
-
-   if (nb_desc == 0 || nb_desc > vq_size)
-   nb_desc = vq_size;
-   vq->vq_free_cnt = nb_desc;
+   vq->vq_free_cnt = vq_size;

/*
 * Reserve a memzone for vring elements
@@ -372,7 +379,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
PMD_INIT_LOG(DEBUG, "vring_size: %d, rounded_vring_size: %d",
 size, vq->vq_ring_size);

-   mz = rte_memzone_reserve_aligned(vq_name, vq->vq_ring_size, socket_id,
+   mz = rte_memzone_reserve_aligned(vq_name, vq->vq_ring_size,
+SOCKET_ID_ANY,
 0, VIRTIO_PCI_VRING_ALIGN);
if (mz == NULL) {
if (rte_errno == EEXIST)
@@ -396,7 +404,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
snprintf(vq_hdr_name, sizeof(vq_hdr_name), "port%d_vq%d_hdr",
 dev->data->port_id, vtpci_queue_idx);
hdr_mz = rte_memzone_reserve_aligned(vq_hdr_name, sz_hdr_mz,
-socket_id, 0,
+SOCKET_ID_ANY, 0,

[dpdk-dev] [PATCH 3/8] net/virtio: simplify queue allocation

2016-11-04 Thread Yuanhan Liu

Let rxq/txq/cq be the union field of the virtqueue struct. This would
simplifies the vq allocation a bit: we don't need calculate the vq_size
any more based on the queue time.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c | 18 +++---
 drivers/net/virtio/virtqueue.h |  7 +++
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index d082df5..5a2c14b 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -312,7 +312,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
struct virtnet_tx *txvq = NULL;
struct virtnet_ctl *cvq = NULL;
struct virtqueue *vq;
-   size_t sz_vq, sz_q = 0, sz_hdr_mz = 0;
+   size_t sz_hdr_mz = 0;
void *sw_ring = NULL;
int ret;

@@ -337,25 +337,21 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
snprintf(vq_name, sizeof(vq_name), "port%d_vq%d",
 dev->data->port_id, vtpci_queue_idx);

-   sz_vq = RTE_ALIGN_CEIL(sizeof(*vq) +
+   size = RTE_ALIGN_CEIL(sizeof(*vq) +
vq_size * sizeof(struct vq_desc_extra),
RTE_CACHE_LINE_SIZE);
-   if (queue_type == VTNET_RQ) {
-   sz_q = sz_vq + sizeof(*rxvq);
-   } else if (queue_type == VTNET_TQ) {
-   sz_q = sz_vq + sizeof(*txvq);
+   if (queue_type == VTNET_TQ) {
/*
 * For each xmit packet, allocate a virtio_net_hdr
 * and indirect ring elements
 */
sz_hdr_mz = vq_size * sizeof(struct virtio_tx_region);
} else if (queue_type == VTNET_CQ) {
-   sz_q = sz_vq + sizeof(*cvq);
/* Allocate a page for control vq command, data and status */
sz_hdr_mz = PAGE_SIZE;
}

-   vq = rte_zmalloc_socket(vq_name, sz_q, RTE_CACHE_LINE_SIZE, socket_id);
+   vq = rte_zmalloc_socket(vq_name, size, RTE_CACHE_LINE_SIZE, socket_id);
if (vq == NULL) {
PMD_INIT_LOG(ERR, "can not allocate vq");
return -ENOMEM;
@@ -425,14 +421,14 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
}

vq->sw_ring = sw_ring;
-   rxvq = (struct virtnet_rx *)RTE_PTR_ADD(vq, sz_vq);
+   rxvq = >rxq;
rxvq->vq = vq;
rxvq->port_id = dev->data->port_id;
rxvq->queue_id = queue_idx;
rxvq->mz = mz;
*pvq = rxvq;
} else if (queue_type == VTNET_TQ) {
-   txvq = (struct virtnet_tx *)RTE_PTR_ADD(vq, sz_vq);
+   txvq = >txq;
txvq->vq = vq;
txvq->port_id = dev->data->port_id;
txvq->queue_id = queue_idx;
@@ -442,7 +438,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,

*pvq = txvq;
} else if (queue_type == VTNET_CQ) {
-   cvq = (struct virtnet_ctl *)RTE_PTR_ADD(vq, sz_vq);
+   cvq = >cq;
cvq->vq = vq;
cvq->mz = mz;
cvq->virtio_net_hdr_mz = hdr_mz;
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index ef0027b..bbeb2f2 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -44,6 +44,7 @@
 #include "virtio_pci.h"
 #include "virtio_ring.h"
 #include "virtio_logs.h"
+#include "virtio_rxtx.h"

 struct rte_mbuf;

@@ -191,6 +192,12 @@ struct virtqueue {
void *vq_ring_virt_mem;  /**< linear address of vring*/
unsigned int vq_ring_size;

+   union {
+   struct virtnet_rx rxq;
+   struct virtnet_tx txq;
+   struct virtnet_ctl cq;
+   };
+
phys_addr_t vq_ring_mem; /**< physical address of vring,
  * or virtual address for virtio_user. */

-- 
1.9.0

[dpdk-dev] [PATCH 2/8] net/virtio: simplify queue memzone name

2016-11-04 Thread Yuanhan Liu

Instead of setting up a queue memzone name like "port0_rxq0", "port0_txq0",
it could be simplified a bit to something like "port0_vq0", "port0_vq1" ...

Meanwhile, the code is also simplified a bit.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 5815875..d082df5 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -312,7 +312,6 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
struct virtnet_tx *txvq = NULL;
struct virtnet_ctl *cvq = NULL;
struct virtqueue *vq;
-   const char *queue_names[] = {"rvq", "txq", "cvq"};
size_t sz_vq, sz_q = 0, sz_hdr_mz = 0;
void *sw_ring = NULL;
int ret;
@@ -335,8 +334,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
return -EINVAL;
}

-   snprintf(vq_name, sizeof(vq_name), "port%d_%s%d",
-dev->data->port_id, queue_names[queue_type], queue_idx);
+   snprintf(vq_name, sizeof(vq_name), "port%d_vq%d",
+dev->data->port_id, vtpci_queue_idx);

sz_vq = RTE_ALIGN_CEIL(sizeof(*vq) +
vq_size * sizeof(struct vq_desc_extra),
@@ -398,9 +397,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 (uint64_t)(uintptr_t)mz->addr);

if (sz_hdr_mz) {
-   snprintf(vq_hdr_name, sizeof(vq_hdr_name), "port%d_%s%d_hdr",
-dev->data->port_id, queue_names[queue_type],
-queue_idx);
+   snprintf(vq_hdr_name, sizeof(vq_hdr_name), "port%d_vq%d_hdr",
+dev->data->port_id, vtpci_queue_idx);
hdr_mz = rte_memzone_reserve_aligned(vq_hdr_name, sz_hdr_mz,
 socket_id, 0,
 RTE_CACHE_LINE_SIZE);
-- 
1.9.0

[dpdk-dev] [PATCH 1/8] net/virtio: revert "virtio: fix restart"

2016-11-04 Thread Yuanhan Liu

This reverts commit 9a0615af7746 ("virtio: fix restart"); conflict is
manually addressed.

Kyle reported an issue with above commit

qemu-kvm: Guest moved used index from 5 to 1

with following steps,

1) Start my virtio interfaces
2) Send some traffic into/out of the interfaces
3) Stop the interfaces
4) Start the interfaces
5) Send some more traffic

And here are some quotes from Kyle's analysis,

Prior to the patch, if an interface were stopped then started, without
restarting the application, the queues would be left as-is, because
hw->started would be set to 1. Now, calling stop sets hw->started to 0,
which means the next call to start will "touch the queues". This is the
unintended side-effect that causes the problem.

Fixes: 9a0615af7746 ("virtio: fix restart")

Cc: Jianfeng Tan 
Cc: 
Reported-by: Kyle Larose 
Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index b388134..5815875 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -550,9 +550,6 @@ virtio_dev_close(struct rte_eth_dev *dev)

PMD_INIT_LOG(DEBUG, "virtio_dev_close");

-   if (hw->started == 1)
-   virtio_dev_stop(dev);
-
if (hw->cvq)
virtio_dev_queue_release(hw->cvq->vq);

@@ -560,6 +557,7 @@ virtio_dev_close(struct rte_eth_dev *dev)
if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
vtpci_reset(hw);
+   hw->started = 0;
virtio_dev_free_mbufs(dev);
virtio_free_queues(dev);
 }
@@ -1296,15 +1294,17 @@ static int
 eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
 {
struct rte_pci_device *pci_dev;
+   struct virtio_hw *hw = eth_dev->data->dev_private;

PMD_INIT_FUNC_TRACE();

if (rte_eal_process_type() == RTE_PROC_SECONDARY)
return -EPERM;

-   /* Close it anyway since there's no way to know if closed */
-   virtio_dev_close(eth_dev);
-
+   if (hw->started == 1) {
+   virtio_dev_stop(eth_dev);
+   virtio_dev_close(eth_dev);
+   }
pci_dev = eth_dev->pci_dev;

eth_dev->dev_ops = NULL;
@@ -1543,12 +1543,9 @@ static void
 virtio_dev_stop(struct rte_eth_dev *dev)
 {
struct rte_eth_link link;
-   struct virtio_hw *hw = dev->data->dev_private;

PMD_INIT_LOG(DEBUG, "stop");

-   hw->started = 0;
-
if (dev->data->dev_conf.intr_conf.lsc)
rte_intr_disable(>pci_dev->intr_handle);

-- 
1.9.0

[dpdk-dev] [PATCH 0/8] net/virtio: fix queue reconfigure issue

2016-11-04 Thread Yuanhan Liu

This patchset fixes an issue (virtio/vhost become broken when queue setup
is done 2 or more times, check patch 2 for great details) reported by Ilya
at the late stage of last release. I sensed it would be some amounts of
code work, that I delayed it to this release. However, I failed to make
the fix in the early stage of this release, that we come to the same
situation again: we are being at the late stage of v16.11 :(

However, honestly, I don't want to miss it again in this release, as it
is hard to backport such fix to a stable release. It's not a bug can
be fixed by few lines of code. The virtio init logic is quite wrong that
I need change quite many places to fix it. Meanwhile, I have done my best
to keep the change being as minimal as possible.

Besides that, judging that v16.11 would be a LTS and it's such an severe
bug that should be fixed ASAP (at least, we should make it work in a LTS
release), I then tried hard today to make this patchset, with the hope we
could fix this severe issue at this release.

The issue is decribed in length in patch 4 "net/virtio: allocate queue
at init stage".

Again, it's not a bug can be fixed by one patch, that you see no one
"fix" tag in none of those patches. All below patches work together
to fix this issue.

Thanks.

    --yliu


---
Yuanhan Liu (8):
  net/virtio: revert "virtio: fix restart"
  net/virtio: simplify queue memzone name
  net/virtio: simplify queue allocation
  net/virtio: allocate queue at init stage
  net/virtio: initiate vring at init stage
  net/virtio: move queue configure code to proper place
  net/virtio: complete init stage at the right place
  net/virtio: remove started field

 drivers/net/virtio/virtio_ethdev.c | 182 ---
 drivers/net/virtio/virtio_ethdev.h |  10 --
 drivers/net/virtio/virtio_pci.h|   3 +-
 drivers/net/virtio/virtio_rxtx.c   | 247 ++---
 drivers/net/virtio/virtqueue.h |   7 ++
 5 files changed, 208 insertions(+), 241 deletions(-)

-- 
1.9.0

[dpdk-dev] [PATCH 3/3] vhost: update comments

2016-11-02 Thread Yuanhan Liu

vhost-cuse is removed, update corresponding comments that are still
referencing it.

Signed-off-by: Yuanhan Liu 
---
 examples/tep_termination/main.c   | 7 ++-
 examples/vhost/main.c | 4 +---
 examples/vhost_xen/main.c | 3 +--
 lib/librte_vhost/rte_virtio_net.h | 5 ++---
 lib/librte_vhost/vhost.c  | 9 -
 lib/librte_vhost/vhost.h  | 4 +++-
 6 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c
index 622f248..1d6d463 100644
--- a/examples/tep_termination/main.c
+++ b/examples/tep_termination/main.c
@@ -1151,8 +1151,7 @@ print_stats(void)
 }

 /**
- * Main function, does initialisation and calls the per-lcore functions. The 
CUSE
- * device is also registered here to handle the IOCTLs.
+ * Main function, does initialisation and calls the per-lcore functions.
  */
 int
 main(int argc, char *argv[])
@@ -1253,14 +1252,12 @@ main(int argc, char *argv[])
}
rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_MRG_RXBUF);

-   /* Register CUSE device to handle IOCTLs. */
ret = rte_vhost_driver_register((char *)_basename, 0);
if (ret != 0)
-   rte_exit(EXIT_FAILURE, "CUSE device setup failure.\n");
+   rte_exit(EXIT_FAILURE, "failed to register vhost driver.\n");

rte_vhost_driver_callback_register(_net_device_ops);

-   /* Start CUSE session. */
rte_vhost_driver_session_start();

return 0;
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 91000e8..0709859 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1409,8 +1409,7 @@ create_mbuf_pool(uint16_t nr_port, uint32_t 
nr_switch_core, uint32_t mbuf_size,
 }

 /*
- * Main function, does initialisation and calls the per-lcore functions. The 
CUSE
- * device is also registered here to handle the IOCTLs.
+ * Main function, does initialisation and calls the per-lcore functions.
  */
 int
 main(int argc, char *argv[])
@@ -1531,7 +1530,6 @@ main(int argc, char *argv[])

rte_vhost_driver_callback_register(_net_device_ops);

-   /* Start CUSE session. */
rte_vhost_driver_session_start();
return 0;

diff --git a/examples/vhost_xen/main.c b/examples/vhost_xen/main.c
index 2e40357..f4dbaa4 100644
--- a/examples/vhost_xen/main.c
+++ b/examples/vhost_xen/main.c
@@ -1407,8 +1407,7 @@ print_stats(void)
 int init_virtio_net(struct virtio_net_device_ops const * const ops);

 /*
- * Main function, does initialisation and calls the per-lcore functions. The 
CUSE
- * device is also registered here to handle the IOCTLs.
+ * Main function, does initialisation and calls the per-lcore functions.
  */
 int
 main(int argc, char *argv[])
diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index c53ff64..926039c 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -123,9 +123,8 @@ int rte_vhost_get_numa_node(int vid);
 uint32_t rte_vhost_get_queue_num(int vid);

 /**
- * Get the virtio net device's ifname. For vhost-cuse, ifname is the
- * path of the char device. For vhost-user, ifname is the vhost-user
- * socket file path.
+ * Get the virtio net device's ifname, which is the vhost-user socket
+ * file path.
  *
  * @param vid
  *  virtio-net device ID
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index d8116ff..31825b8 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -227,9 +227,8 @@ reset_device(struct virtio_net *dev)
 }

 /*
- * Function is called from the CUSE open function. The device structure is
- * initialised and a new entry is added to the device configuration linked
- * list.
+ * Invoked when there is a new vhost-user connection established (when
+ * there is a new virtio device being attached).
  */
 int
 vhost_new_device(void)
@@ -261,8 +260,8 @@ vhost_new_device(void)
 }

 /*
- * Function is called from the CUSE release function. This function will
- * cleanup the device and remove it from device configuration linked list.
+ * Invoked when there is the vhost-user connection is broken (when
+ * the virtio device is being detached).
  */
 void
 vhost_destroy_device(int vid)
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index acec772..22564f1 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -284,7 +284,9 @@ void vhost_set_ifname(int, const char *if_name, unsigned 
int if_len);
 void vhost_enable_dequeue_zero_copy(int vid);

 /*
- * Backend-specific cleanup. Defined by vhost-cuse and vhost-user.
+ * Backend-specific cleanup.
+ *
+ * TODO: fix it; we have one backend now
  */
 void vhost_backend_cleanup(struct virtio_net *dev);

-- 
1.9.0

[dpdk-dev] [PATCH 2/3] doc: update the vhost sample guide

2016-11-02 Thread Yuanhan Liu

For vhost-switch sample, the old guide takes too many words on
vhost-cuse, which is mainly due to vhost-cuse is invented before
vhost-user.

Now vhost-cuse is removed, meaning the best part of the doc is useless.
Instead of amending one piece here and there, this patch simply removes
the most part of the doc and replace it with a simple test guide.

For tep_term sample, mainly for removing the part has "vhost-cuse".

Signed-off-by: Yuanhan Liu 
---

The test is simplified a bit (from using two virtio-net devices to one),
meaning few figures doesn't apply anymore; thus, vhost_net_sample and
tx_dpdk_testpmd figure are removed.

And apparently, the vhost_net_arch figure also becomes useless. Yet
I have no time to draw one for vhost-user, I also removed the two left
figures: qemu_virtio_net and virtio_linux_vhost, which shows the arch
of typical virtio net and linux kernel vhost-net.
---
 doc/guides/sample_app_ug/img/qemu_virtio_net.png   | Bin 31557 -> 0 bytes
 doc/guides/sample_app_ug/img/tx_dpdk_testpmd.png   | Bin 76019 -> 0 bytes
 doc/guides/sample_app_ug/img/vhost_net_arch.png| Bin 154920 -> 0 bytes
 .../sample_app_ug/img/vhost_net_sample_app.png | Bin 23800 -> 0 bytes
 .../sample_app_ug/img/virtio_linux_vhost.png   | Bin 30290 -> 0 bytes
 doc/guides/sample_app_ug/index.rst |  10 -
 doc/guides/sample_app_ug/tep_termination.rst   |  54 +-
 doc/guides/sample_app_ug/vhost.rst | 869 +++--
 8 files changed, 128 insertions(+), 805 deletions(-)
 delete mode 100644 doc/guides/sample_app_ug/img/qemu_virtio_net.png
 delete mode 100644 doc/guides/sample_app_ug/img/tx_dpdk_testpmd.png
 delete mode 100644 doc/guides/sample_app_ug/img/vhost_net_arch.png
 delete mode 100644 doc/guides/sample_app_ug/img/vhost_net_sample_app.png
 delete mode 100644 doc/guides/sample_app_ug/img/virtio_linux_vhost.png

diff --git a/doc/guides/sample_app_ug/img/qemu_virtio_net.png 
b/doc/guides/sample_app_ug/img/qemu_virtio_net.png
deleted file mode 100644
index 
a852c1662fe978e1dc785f4771ff285172ae5e86..
GIT binary patch
literal 0
HcmV?d1

literal 31557
zcmXtf1yoy2*EI$D;10!IQ at jMX;_gsfQz-7Ph2mP=p~VUA4lNc61h=5Yi at V#O_W$0y
z)?KWGWHNW=$UbN9iBeaU!$K!RM?gTpQjnL{L_m1u0>9Hy|AW6HlVv6h|AXu%si2Jt
zzx+`@M#5jCIm_$2As}G)|Gi(OvSO1XAkZKvNK0sY=N#tx<Pgrb)#hs{aW-^7PTs
zKUjyBePcmqa3zQp{>nnes|fths~T-5%P8)Cpp_Y2qtHdlY3r?RKZER-vvyX;Zwk)l
zcwc4A!eM?V;EN}XS<#j0m4^2o!aOq<4{`6Ba@@cjJ5?>KI`RNUj=xs`{=NR}k
zlDCQOLQG7YE*8t$Z){{_bkDV at +l4@!o~N$c5~IH+O6g?7h;`KwDGSop!TH`)ocL2J
zqQPg=x6?$t9rA^a0SQH?!HsB_$0X^S|=4viH;OS!84rB0?&0STnIC3M{w?w2{)i
zjm?eqFGCtY-Wi_Sf}~vG?*0^})YLCFb`Ax;r(Ar2-FXECbc8BWTF912|J85DsHCK%
z;K2v11WPL_C at EpIS5Gf53_TahPA@Kpg7-qDBtm{7 at IEAzd=F$sLcy|lXm%)z9{wSb
zioBRbs?D1bM?R<V;X`<V?Vl_~4Htl6i at nwRyZ4KW2Bae1sCCvUX>%XkRSKJ$!ZZpY
z^?GU+dRjJlMMQEhP;cC;DbBQkt-43i}6trC6E^lSf~XeRml0d%-#!1YMlCX
z{b`RlT58t(Aolq^&5y6&`%qE7AS1rg)?viO<g%S1+4nr^Ke(D)oRbuJQ)tz1z3
z>vJ%X{nGhs85x-rEDHJBs?_>dNU%;9BqY|iiHMJ4)X at d!+t*c?l_b(!xMqK9pR8>h
z=}6GO+mU at Y`-}<-y32_fue2c9uHf&)TQ!Qpa7-|FkhJ4v4b>?Cga_jj{3q at -7Ob
z*<x$$^(Z8WvDF*WN!kaSEIYIznC_>k+3Kz`hKX^l`Cvbc8^TP(TU=
zxf_E{zdn_%T`?9Ty

[dpdk-dev] [PATCH 1/3] doc: update vhost programming guide

2016-11-02 Thread Yuanhan Liu

vhost-cuse has been removed in this release. Update the doc, with the
vhost-cuse part being removed.

Signed-off-by: Yuanhan Liu 
---
 doc/guides/prog_guide/vhost_lib.rst | 62 +
 1 file changed, 7 insertions(+), 55 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst 
b/doc/guides/prog_guide/vhost_lib.rst
index 573a318..4f997d4 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -46,26 +46,8 @@ vhost library should be able to:
 * Know all the necessary information about the vring:

   Information such as where the available ring is stored. Vhost defines some
-  messages to tell the backend all the information it needs to know how to
-  manipulate the vring.
-
-Currently, there are two ways to pass these messages and as a result there are
-two Vhost implementations in DPDK: *vhost-cuse* (where the character devices
-are in user space) and *vhost-user*.
-
-Vhost-cuse creates a user space character device and hook to a function ioctl,
-so that all ioctl commands that are sent from the frontend (QEMU) will be
-captured and handled.
-
-Vhost-user creates a Unix domain socket file through which messages are
-passed.
-
-.. Note::
-
-   Since DPDK v2.2, the majority of the development effort has gone into
-   enhancing vhost-user, such as multiple queue, live migration, and
-   reconnect. Thus, it is strongly advised to use vhost-user instead of
-   vhost-cuse.
+  messages (passed through a Unix domain socket file) to tell the backend all
+  the information it needs to know how to manipulate the vring.


 Vhost API Overview
@@ -75,11 +57,10 @@ The following is an overview of the Vhost API functions:

 * ``rte_vhost_driver_register(path, flags)``

-  This function registers a vhost driver into the system. For vhost-cuse, a
-  ``/dev/path`` character device file will be created. For vhost-user server
-  mode, a Unix domain socket file ``path`` will be created.
+  This function registers a vhost driver into the system. ``path`` specifies
+  the Unix domain socket file path.

-  Currently supported flags are (these are valid for vhost-user only):
+  Currently supported flags are:

   - ``RTE_VHOST_USER_CLIENT``

@@ -171,35 +152,8 @@ The following is an overview of the Vhost API functions:
   default.


-Vhost Implementations
--
-
-Vhost-cuse implementation
-~
-
-When vSwitch registers the vhost driver, it will register a cuse device driver
-into the system and creates a character device file. This cuse driver will
-receive vhost open/release/IOCTL messages from the QEMU simulator.
-
-When the open call is received, the vhost driver will create a vhost device
-for the virtio device in the guest.
-
-When the ``VHOST_SET_MEM_TABLE`` ioctl is received, vhost searches the memory
-region to find the starting user space virtual address that maps the memory of
-the guest virtual machine. Through this virtual address and the QEMU pid,
-vhost can find the file QEMU uses to map the guest memory. Vhost maps this
-file into its address space, in this way vhost can fully access the guest
-physical memory, which means vhost could access the shared virtio ring and the
-guest physical address specified in the entry of the ring.
-
-The guest virtual machine tells the vhost whether the virtio device is ready
-for processing or is de-activated through the ``VHOST_NET_SET_BACKEND``
-message. The registered callback from vSwitch will be called.
-
-When the release call is made, vhost will destroy the device.
-
-Vhost-user implementation
-~
+Vhost-user Implementations
+--

 Vhost-user uses Unix domain sockets for passing messages. This means the DPDK
 vhost-user implementation has two options:
@@ -246,8 +200,6 @@ For ``VHOST_SET_MEM_TABLE`` message, QEMU will send 
information for each
 memory region and its file descriptor in the ancillary data of the message.
 The file descriptor is used to map that region.

-There is no ``VHOST_NET_SET_BACKEND`` message as in vhost-cuse to signal
-whether the virtio device is ready or stopped. Instead,
 ``VHOST_SET_VRING_KICK`` is used as the signal to put the vhost device into
 the data plane, and ``VHOST_GET_VRING_BASE`` is used as the signal to remove
 the vhost device from the data plane.
-- 
1.9.0

[dpdk-dev] [PATCH 0/3] vhost: comments and doc update due to vhost-cuse removal

2016-11-02 Thread Yuanhan Liu

Here is a small patchset of updating vhost programming and sample guide
and comments, due to the removal of vhost-cuse.

---
Yuanhan Liu (3):
  doc: update vhost programming guide
  doc: update the vhost sample guide
  vhost: update comments

 doc/guides/prog_guide/vhost_lib.rst|  62 +-
 doc/guides/sample_app_ug/img/qemu_virtio_net.png   | Bin 31557 -> 0 bytes
 doc/guides/sample_app_ug/img/tx_dpdk_testpmd.png   | Bin 76019 -> 0 bytes
 doc/guides/sample_app_ug/img/vhost_net_arch.png| Bin 154920 -> 0 bytes
 .../sample_app_ug/img/vhost_net_sample_app.png | Bin 23800 -> 0 bytes
 .../sample_app_ug/img/virtio_linux_vhost.png   | Bin 30290 -> 0 bytes
 doc/guides/sample_app_ug/index.rst |  10 -
 doc/guides/sample_app_ug/tep_termination.rst   |  54 +-
 doc/guides/sample_app_ug/vhost.rst | 869 +++--
 examples/tep_termination/main.c|   7 +-
 examples/vhost/main.c  |   4 +-
 examples/vhost_xen/main.c  |   3 +-
 lib/librte_vhost/rte_virtio_net.h  |   5 +-
 lib/librte_vhost/vhost.c   |   9 +-
 lib/librte_vhost/vhost.h   |   4 +-
 15 files changed, 148 insertions(+), 879 deletions(-)
 delete mode 100644 doc/guides/sample_app_ug/img/qemu_virtio_net.png
 delete mode 100644 doc/guides/sample_app_ug/img/tx_dpdk_testpmd.png
 delete mode 100644 doc/guides/sample_app_ug/img/vhost_net_arch.png
 delete mode 100644 doc/guides/sample_app_ug/img/vhost_net_sample_app.png
 delete mode 100644 doc/guides/sample_app_ug/img/virtio_linux_vhost.png

-- 
1.9.0

[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path

2016-11-02 Thread Yuanhan Liu

On Tue, Nov 01, 2016 at 10:39:35AM +0100, Thomas Monjalon wrote:
> 2016-11-01 16:15, Yuanhan Liu:
> > On Fri, Oct 28, 2016 at 09:58:51AM +0200, Maxime Coquelin wrote:
> > > Agree, what we need is to be able to disable Virtio PMD features
> > > without having to rebuild the PMD.
> > 
> > I want this feature (or more precisely, ability) long times ago.
> > For example, I'd wish there is an option like "force_legacy" when
> > both legacy and modern exist.
> 
> You can change the behaviour of the driver with a run-time parameter
> as a struct rte_devargs.

Thanks for the tip! Yeah, it's a workable solution, not an ideal one though.

--yliu

[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path

2016-11-01 Thread Yuanhan Liu

On Fri, Oct 28, 2016 at 09:58:51AM +0200, Maxime Coquelin wrote:
> >From my experience, and as Michael pointed out, the best mode for small 
> >packets is obviously
> >ANY_LAYOUT so it uses a single descriptor per packet.
> Of course, having a single descriptor is in theory the best way.
> But, in current Virtio PMD implementation, with no offload supported, we
> never access the virtio header at transmit time, it is allocated and
> zeroed at startup.
> 
> For ANY_LAYOUT case, the virtio header is prepended to the packet, and
> need to be zeroed at packet transmit time. The performance impact is
> quite important, as show the measurements I made one month ago (txonly):
>  - 2 descs per packet: 11.6Mpps
>  - 1 desc per packet: 9.6Mpps
> 
> As Michael suggested, I tried to replace the memset by direct
> fields assignation, but it only recovers a small part of the drop.
> 
> What I suggested is to introduce a new feature, so that we can skip the
> virtio header when no offload is negotiated.
> 
> Maybe you have other ideas?
> 
> >So, disabling indirect descriptors may give you better pps for 64 bytes 
> >packets, but that doesn't mean you should not implement, or enable, it in 
> >your driver. That just means that the guest is not taking the right 
> >decision, and uses indirect while it should actually use any_layout.
> +1, it really depends on the use-case.
> >
> >Given virtio/vhost design (most decision comes from the guest), the host 
> >should be liberal in what it accepts, and not try to influence guest 
> >implementation by carefully picking the features it supports. Otherwise 
> >guests will never get a chance to make the right decisions either.
> Agree, what we need is to be able to disable Virtio PMD features
> without having to rebuild the PMD.

I want this feature (or more precisely, ability) long times ago.
For example, I'd wish there is an option like "force_legacy" when
both legacy and modern exist.

> It will certainly require an new API change to add this option.

I don't think it will work.

Firstly, generally, as discussed before, it's not a good idea to
introduce some PMD specific APIs.

Secondly, even if it's okay to do so, you need to let the application
(say testpmd) invoke it. Once you have done that, you need introduce
some CLI options to make sure that part is invoked.

And as stated before, it's also not a good idea to add virtio PMD
specific CLI options to testpmd. 

For this case, however, I think it makes more sense if test provides
some commands to enable/disable csum and tso stuff. With that, we
could enable it dynamically.

It has "tso set/show" commands though: it just doesn't look like the
right interface to me to enable/disable tso.

--yliu

[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path

2016-10-27 Thread Yuanhan Liu

On Thu, Oct 27, 2016 at 12:35:11PM +0200, Maxime Coquelin wrote:
> 
> 
> On 10/27/2016 12:33 PM, Yuanhan Liu wrote:
> >On Thu, Oct 27, 2016 at 11:10:34AM +0200, Maxime Coquelin wrote:
> >>Hi Zhihong,
> >>
> >>On 10/27/2016 11:00 AM, Wang, Zhihong wrote:
> >>>Hi Maxime,
> >>>
> >>>Seems indirect desc feature is causing serious performance
> >>>degradation on Haswell platform, about 20% drop for both
> >>>mrg=on and mrg=off (--txqflags=0xf00, non-vector version),
> >>>both iofwd and macfwd.
> >>I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy Bridge
> >>platform, and didn't faced such a drop.
> >
> >I was actually wondering that may be the cause. I tested it with
> >my IvyBridge server as well, I saw no drop.
> >
> >Maybe you should find a similar platform (Haswell) and have a try?
> Yes, that's why I asked Zhihong whether he could test Txonly in guest to
> see if issue is reproducible like this.

I have no Haswell box, otherwise I could do a quick test for you. IIRC,
he tried to disable the indirect_desc feature, then the performance
recovered. So, it's likely the indirect_desc is the culprit here.

> I will be easier for me to find an Haswell machine if it has not to be
> connected back to back to and HW/SW packet generator.

Makes sense.

--yliu
> 
> Thanks,
> Maxime
> 
> >
> > --yliu
> >
> >>Have you tried to pass indirect_desc=off to qemu cmdline to see if you
> >>recover the performance?
> >>
> >>Yuanhan, which platform did you use when you tested it with zero copy?
> >>
> >>>
> >>>I'm using RC2, and the CPU is Xeon E5-2699 v3 @ 2.30GHz.
> >>>
> >>>Could you please verify if this is true in your test?
> >>I'll try -rc1/-rc2 on my platform, and let you know.
> >>
> >>Thanks,
> >>Maxime
> >>
> >>>
> >>>
> >>>Thanks
> >>>Zhihong
> >>>
> >>>>-Original Message-
> >>>>From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
> >>>>Sent: Monday, October 17, 2016 10:15 PM
> >>>>To: Yuanhan Liu 
> >>>>Cc: Wang, Zhihong ; Xie, Huawei
> >>>>; dev at dpdk.org; vkaplans at redhat.com;
> >>>>mst at redhat.com; stephen at networkplumber.org
> >>>>Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support
> >>>>to the TX path
> >>>>
> >>>>
> >>>>
> >>>>On 10/17/2016 03:21 PM, Yuanhan Liu wrote:
> >>>>>On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote:
> >>>>>>>On my side, I just setup 2 Windows 2016 VMs, and confirm the issue.
> >>>>>>>I'll continue the investigation early next week.
> >>>>>>
> >>>>>>The root cause is identified.
> >>>>>>When INDIRECT_DESC feature is negotiated, Windows guest uses indirect
> >>>>>>for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD &
> >>>>>>virtio-net kernel driver) use indirect only for Tx.
> >>>>>>I'll implement indirect support for the Rx path in vhost lib, but the
> >>>>>>change will be too big for -rc release.
> >>>>>>I propose in the mean time to disable INDIRECT_DESC feature in vhost
> >>>>>>lib, we can still enable it locally for testing.
> >>>>>>
> >>>>>>Yuanhan, is it ok for you?
> >>>>>
> >>>>>That's okay.
> >>>>I'll send a patch to disable it then.
> >>>>
> >>>>>
> >>>>>>
> >>>>>>>Has anyone already tested Windows guest with vhost-net, which also
> >>>>has
> >>>>>>>indirect descs support?
> >>>>>>
> >>>>>>I tested and confirm it works with vhost-net.
> >>>>>
> >>>>>I'm a bit confused then. IIRC, vhost-net also doesn't support indirect
> >>>>>for Rx path, right?
> >>>>
> >>>>No, it does support it actually.
> >>>>I thought it didn't support too, I misread the Kernel implementation of
> >>>>vhost-net and virtio-net. Acutally, virtio-net makes use of indirect
> >>>>in Rx path when mergeable buffers is disabled.
> >>>>
> >>>>The confusion certainly comes from me, sorry about that.
> >>>>
> >>>>Maxime

[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path

2016-10-27 Thread Yuanhan Liu

On Thu, Oct 27, 2016 at 11:10:34AM +0200, Maxime Coquelin wrote:
> Hi Zhihong,
> 
> On 10/27/2016 11:00 AM, Wang, Zhihong wrote:
> >Hi Maxime,
> >
> >Seems indirect desc feature is causing serious performance
> >degradation on Haswell platform, about 20% drop for both
> >mrg=on and mrg=off (--txqflags=0xf00, non-vector version),
> >both iofwd and macfwd.
> I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy Bridge
> platform, and didn't faced such a drop.

I was actually wondering that may be the cause. I tested it with
my IvyBridge server as well, I saw no drop.

Maybe you should find a similar platform (Haswell) and have a try?

--yliu

> Have you tried to pass indirect_desc=off to qemu cmdline to see if you
> recover the performance?
> 
> Yuanhan, which platform did you use when you tested it with zero copy?
> 
> >
> >I'm using RC2, and the CPU is Xeon E5-2699 v3 @ 2.30GHz.
> >
> >Could you please verify if this is true in your test?
> I'll try -rc1/-rc2 on my platform, and let you know.
> 
> Thanks,
> Maxime
> 
> >
> >
> >Thanks
> >Zhihong
> >
> >>-Original Message-
> >>From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
> >>Sent: Monday, October 17, 2016 10:15 PM
> >>To: Yuanhan Liu 
> >>Cc: Wang, Zhihong ; Xie, Huawei
> >>; dev at dpdk.org; vkaplans at redhat.com;
> >>mst at redhat.com; stephen at networkplumber.org
> >>Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support
> >>to the TX path
> >>
> >>
> >>
> >>On 10/17/2016 03:21 PM, Yuanhan Liu wrote:
> >>>On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote:
> >>>>>On my side, I just setup 2 Windows 2016 VMs, and confirm the issue.
> >>>>>I'll continue the investigation early next week.
> >>>>
> >>>>The root cause is identified.
> >>>>When INDIRECT_DESC feature is negotiated, Windows guest uses indirect
> >>>>for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD &
> >>>>virtio-net kernel driver) use indirect only for Tx.
> >>>>I'll implement indirect support for the Rx path in vhost lib, but the
> >>>>change will be too big for -rc release.
> >>>>I propose in the mean time to disable INDIRECT_DESC feature in vhost
> >>>>lib, we can still enable it locally for testing.
> >>>>
> >>>>Yuanhan, is it ok for you?
> >>>
> >>>That's okay.
> >>I'll send a patch to disable it then.
> >>
> >>>
> >>>>
> >>>>>Has anyone already tested Windows guest with vhost-net, which also
> >>has
> >>>>>indirect descs support?
> >>>>
> >>>>I tested and confirm it works with vhost-net.
> >>>
> >>>I'm a bit confused then. IIRC, vhost-net also doesn't support indirect
> >>>for Rx path, right?
> >>
> >>No, it does support it actually.
> >>I thought it didn't support too, I misread the Kernel implementation of
> >>vhost-net and virtio-net. Acutally, virtio-net makes use of indirect
> >>in Rx path when mergeable buffers is disabled.
> >>
> >>The confusion certainly comes from me, sorry about that.
> >>
> >>Maxime

[dpdk-dev] [PATCH v7 3/7] vhost: simplify mergeable Rx vring reservation

2016-10-26 Thread Yuanhan Liu

On Wed, Oct 26, 2016 at 12:08:49AM +0200, Thomas Monjalon wrote:
> 2016-10-14 17:34, Yuanhan Liu:
> > -static inline uint32_t __attribute__((always_inline))
> > +static inline int __attribute__((always_inline))
> >  copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue 
> > *vq,
> > -   uint16_t end_idx, struct rte_mbuf *m,
> > -   struct buf_vector *buf_vec)
> > +   struct rte_mbuf *m, struct buf_vector *buf_vec,
> > +   uint16_t num_buffers)
> >  {
> > struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0};
> > uint32_t vec_idx = 0;
> > -   uint16_t start_idx = vq->last_used_idx;
> > -   uint16_t cur_idx = start_idx;
> > +   uint16_t cur_idx = vq->last_used_idx;
> > uint64_t desc_addr;
> > uint32_t desc_chain_head;
> > uint32_t desc_chain_len;
> > @@ -394,21 +393,21 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, 
> > struct vhost_virtqueue *vq,
> > struct rte_mbuf *hdr_mbuf;
> >  
> > if (unlikely(m == NULL))
> > -   return 0;
> > +   return -1;
> >  
> > LOG_DEBUG(VHOST_DATA, "(%d) current index %d | end index %d\n",
> > dev->vid, cur_idx, end_idx);
> 
> There is a build error:
>   lib/librte_vhost/virtio_net.c:399:22: error: ?end_idx? undeclared

Oops...  you know, my robot is broken since the holiday :(
I just had a quick fix. Hopefully, it will start working again...

> It is probably trivial and could be fixed directly in the already applied
> commit in next-virtio.

Yes, and FYI, here is the overall diffs I made to fix this bug.

--yliu

---
diff --git a/lib/librte_vhost/virtio_net.c
b/lib/librte_vhost/virtio_net.c
index b784dba..eed0b1c 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -443,9 +443,6 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev,
struct rte_mbuf *m,
if (unlikely(m == NULL))
return -1;

-   LOG_DEBUG(VHOST_DATA, "(%d) current index %d | end index %d\n",
-   dev->vid, cur_idx, end_idx);
-
desc_addr = gpa_to_vva(dev, buf_vec[vec_idx].buf_addr);
if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr)
return -1;
@@ -555,6 +552,10 @@ virtio_dev_merge_rx(struct virtio_net *dev,
uint16_t queue_id,
break;
}

+   LOG_DEBUG(VHOST_DATA, "(%d) current index %d | end index 
%d\n",
+   dev->vid, vq->last_avail_idx,
+   vq->last_avail_idx + num_buffers);
+
if (copy_mbuf_to_desc_mergeable(dev, pkts[pkt_idx],
buf_vec, num_buffers) < 0) {
vq->shadow_used_idx -= num_buffers;

[dpdk-dev] [PATCH v2 1/3] drivers: add name alias registration for rte_driver

2016-10-25 Thread Yuanhan Liu

On Mon, Oct 24, 2016 at 12:22:21PM -0400, Jan Blunck wrote:
> This adds infrastructure for drivers to allow being requested by an alias
> so that a renamed driver can still get loaded by its legacy name.
> 
> Signed-off-by: Jan Blunck 
> Reviewed-by: Maxime Coquelin 
> Tested-by: Pablo de Lara 

Series Reviewed-by: Yuanhan Liu 

--yliu

[dpdk-dev] [PATCH 0/2] vhost: Add support to indirect descriptors on Rx path

2016-10-22 Thread Yuanhan Liu

On Tue, Oct 18, 2016 at 05:35:37PM +0200, Maxime Coquelin wrote:
> Indirect descriptor feature has been enabled in v16.11-rc1, but only TX path
> was implemented.
> However, some configurations expect it is supported for the Rx path:
>  - Windows guests with and without mergeable buffers enabled
>  - Linux guests with Kernel drivr with mergeable buffers disabled
> 
> This series add the support of indirect descs for the Rx path, and has been
> tested with following configurations:
>  - Windows guest with indirect ON, mergeeable ON/OFF
>  - Linux guest with Kernel driver with indirect ON/OFF, mergeable ON/OFF
>  - Linux guest with Virtio PMD with mergeable ON/OFF
> 
> The performance degradation measured with/without this series with Virtio PMD
> is around 1% (which doesn't use indirect descs for the Rx path).
> 
> The series is based on top of "vhost: optimize mergeable Rx path" v7.

Series applied to dpdk-next-virtio.

Thanks.

--yliu
> 
> Maxime Coquelin (2):
>   vhost: Add indirect desc support to Rx mergeable path
>   vhost: Add indirect desc support to the Rx no-mergeable path
> 
>  lib/librte_vhost/virtio_net.c | 73 
> ++-
>  1 file changed, 51 insertions(+), 22 deletions(-)
> 
> -- 
> 2.7.4

[dpdk-dev] [PATCH v7 0/7] vhost: optimize mergeable Rx path

2016-10-21 Thread Yuanhan Liu

Applied to dpdk-next-virtio.

And thanks for testing and reviewing.

--yliu

On Fri, Oct 14, 2016 at 05:34:31PM +0800, Yuanhan Liu wrote:
> This is a new set of patches to optimize the mergeable Rx code path.
> No refactoring (rewrite) was made this time. It just applies some
> findings from Zhihong (kudos to him!) that could improve the mergeable
> Rx path on the old code.
> 
> The two major factors that could improve the performance greatly are:
> 
> - copy virtio header together with packet data. This could remove
>   the buubbles between the two copy to optimize the cache access.
> 
>   This is implemented in patch 2 "vhost: optimize cache access"
> 
> - shadow used ring update and update them at once
> 
>   The basic idea is to update used ring in a local buffer and flush
>   them to the virtio used ring at once in the end. Again, this is
>   for optimizing the cache access.
> 
>   This is implemented in patch 5 "vhost: shadow used ring update"
> 
> The two optimizations could yield 40+% performance in micro testing
> and 20+% in PVP case testing with 64B packet size.
> 
> Besides that, there are some tiny optimizations, such as prefetch
> avail ring (patch 6) and retrieve avail head once (patch 7).
> 
> Note: the shadow used ring tech could also be applied to the non-mrg
> Rx path (and even the dequeu) path. I didn't do that for two reasons:
> 
> - we already update used ring in batch in both path: it's not shadowed
>   first though.
> 
> - it's a bit too late too make many changes at this stage: RC1 is out. 
> 
> Please help testing.
> 
> Thanks.
> 
>   --yliu
> 
> Cc: Jianbo Liu 
> ---
> Yuanhan Liu (4):
>   vhost: simplify mergeable Rx vring reservation
>   vhost: use last avail idx for avail ring reservation
>   vhost: prefetch avail ring
>   vhost: retrieve avail head once
> 
> Zhihong Wang (3):
>   vhost: remove useless volatile
>   vhost: optimize cache access
>   vhost: shadow used ring update
> 
>  lib/librte_vhost/vhost.c  |  13 ++-
>  lib/librte_vhost/vhost.h  |   5 +-
>  lib/librte_vhost/vhost_user.c |  23 +++--
>  lib/librte_vhost/virtio_net.c | 193 
> +-
>  4 files changed, 149 insertions(+), 85 deletions(-)
> 
> -- 
> 1.9.0

[dpdk-dev] [PATCH] drivers: make driver names consistent

2016-10-19 Thread Yuanhan Liu

On Tue, Oct 18, 2016 at 06:46:40PM +0200, Thomas Monjalon wrote:
> 2016-10-18 22:18, Yuanhan Liu:
> > On Tue, Oct 18, 2016 at 03:42:54PM +0200, Thomas Monjalon wrote:
> > > 2016-10-18 21:06, Yuanhan Liu:
> > > > On Tue, Oct 18, 2016 at 02:50:16PM +0200, Jan Blunck wrote:
> > > > > >From my understanding this is a massive API breakage. This forces all
> > > > > existing users of the virtual PMDs to change with zero benefit. Even
> > > > > if that isn't enough it also makes it impossible to switch between
> > > > > releases by recompiling.
> > > > > 
> > > > > Can we please revert these changes and work on some aliasing support
> > > > > for the PMDs to fix it long term?
> > > > 
> > > > +1. Aliasing is also something I would suggest before making such 
> > > > renames.
> > > 
> > > It is a brutal change, yes.
> > > It was announced in 16.07 release notes though.
> > 
> > Yes, but it still took me a while (by running git bisect) to figure out
> > what went wrong: I wasn't aware of such note, that I was thinking maybe
> > something is broken.
> > 
> > Later I also got quite few same complains. It may also took them a while
> > to know what's happened.
> > 
> > Anyway, my point is, for this kind of change, we should have added the
> > alias support firstly.
> 
> Yes.
> 
> > If that's been done, then the announcement is not
> > needed at all?
> 
> The announcement would be needed to remove the aliases, later.

Why do you have to remove an aliase? What's wrong if they stay there
forever. I think the naming will be switched (maybe slowly) when we
haved updated the doc (whereas it states "net_vhost" but not "eth_vhost"
only).

After few releases, the name should have been switched fluently. And
we should not care about those alias.

--yliu

[dpdk-dev] [PATCH] vhost: fix use after free issue

2016-10-18 Thread Yuanhan Liu

Fix the coverity USE_AFTER_FREE issue.

Fixes: a277c7159876 ("vhost: refactor code structure")
Coverity issue: 137884

Reported-by: John McNamara 
Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 967cb65..aaa9c27 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -250,8 +250,8 @@ vhost_user_read_cb(int connfd, void *dat, int *remove)
vsocket->connfd = -1;
close(connfd);
*remove = 1;
-   free(conn);
vhost_destroy_device(conn->vid);
+   free(conn);

if (vsocket->reconnect)
vhost_user_create_client(vsocket);
-- 
1.9.0

[dpdk-dev] [PATCH v3] vhost: Only access header if offloading is supported in dequeue path

2016-10-18 Thread Yuanhan Liu

On Fri, Oct 14, 2016 at 10:07:07AM +0200, Maxime Coquelin wrote:
> If offloading features are not negotiated, parsing the virtio header
> is not needed.
> 
> Micro-benchmark with testpmd shows that the gain is +4% with indirect
> descriptors, +1% when using direct descriptors.
> 
> Signed-off-by: Maxime Coquelin 

Applied to dpdk-next-virtio.

Thanks.

--yliu

[dpdk-dev] [PATCH] drivers: make driver names consistent

2016-10-18 Thread Yuanhan Liu

On Tue, Oct 18, 2016 at 03:42:54PM +0200, Thomas Monjalon wrote:
> 2016-10-18 21:06, Yuanhan Liu:
> > On Tue, Oct 18, 2016 at 02:50:16PM +0200, Jan Blunck wrote:
> > > >From my understanding this is a massive API breakage. This forces all
> > > existing users of the virtual PMDs to change with zero benefit. Even
> > > if that isn't enough it also makes it impossible to switch between
> > > releases by recompiling.
> > > 
> > > Can we please revert these changes and work on some aliasing support
> > > for the PMDs to fix it long term?
> > 
> > +1. Aliasing is also something I would suggest before making such renames.
> 
> It is a brutal change, yes.
> It was announced in 16.07 release notes though.

Yes, but it still took me a while (by running git bisect) to figure out
what went wrong: I wasn't aware of such note, that I was thinking maybe
something is broken.

Later I also got quite few same complains. It may also took them a while
to know what's happened.

Anyway, my point is, for this kind of change, we should have added the
alias support firstly. If that's been done, then the announcement is not
needed at all?

--yliu
> 
> We can try to make this change more progressive by keeping old names
> as aliases for some time.
> Is there a volunteer to work on vdev names aliases,
> with the target of integrating them in RC2 or RC3?

[dpdk-dev] [PATCH] drivers: make driver names consistent

2016-10-18 Thread Yuanhan Liu

On Tue, Oct 18, 2016 at 02:50:16PM +0200, Jan Blunck wrote:
> >From my understanding this is a massive API breakage. This forces all
> existing users of the virtual PMDs to change with zero benefit. Even
> if that isn't enough it also makes it impossible to switch between
> releases by recompiling.
> 
> Can we please revert these changes and work on some aliasing support
> for the PMDs to fix it long term?

+1. Aliasing is also something I would suggest before making such renames.

--yliu
> 
> Thanks,
> Jan
> 
> 
> On Fri, Sep 16, 2016 at 11:58 AM, Thomas Monjalon
>  wrote:
> > 2016-08-24 22:37, Mcnamara, John:
> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pablo de Lara
> >> >
> >> > ...
> >> >
> >> > -$RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 
> >> > 'eth_pcap0,rx_pcap=/path/to/ file_rx.pcap,tx_pcap=/path/to/file_tx.pcap' 
> >> > -- --port-topology=chained
> >> > +$RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 
> >> > 'net_pcap0,rx_pcap=/path/to/ file_rx.pcap,tx_pcap=/path/to/file_tx.pcap' 
> >> > -- --port-topology=chained
> >>
> >>
> >> I know that this is an existing issue but there shouldn't be a space in
> >> "/path/to/ file". Perhaps you could fix that (in a number of places) as 
> >> part
> >> of this patch. You could probably leave out the "/path/to/" part 
> >> altogether as
> >> it may be clearer, see below.
> >>
> >> Also, could you wrap the long code lines in the sections that you change at
> >> 80 chars using "\" to keep them on the page in the PDF docs, like:
> >>
> >> $RTE_TARGET/app/testpmd -c '0xf' -n 4 \
> >> --vdev 
> >> 'net_pcap0,rx_pcap=/path/to/file_rx.pcap,tx_pcap=/path/to/file_tx.pcap' \
> >> -- --port-topology=chained
> >>
> >> Or without the path part:
> >>
> >> $RTE_TARGET/app/testpmd -c '0xf' -n 4 \
> >> --vdev 'net_pcap0,rx_pcap=file_rx.pcap,tx_pcap=file_tx.pcap' \
> >> -- --port-topology=chained
> >
> > Applied with above comments fixed and release notes updated, thanks.

[dpdk-dev] [PATCH] vhost: disable indirect descriptors feature

2016-10-18 Thread Yuanhan Liu

On Tue, Oct 18, 2016 at 10:13:54AM +0200, Maxime Coquelin wrote:
> >>Thanks to Zhihong series you reworked, the changes to be done for
> >>mergeable buffers case is greatly simplified.
> >>I'll send the series later today.
> >
> >Do you mean the v6 from Zhihong? Unluckily, it will not be merged. That
> >series has been simplified to not rewrite the enqueue from scratch. See
> >V7.
> No, I meant the v7.

Oh, glad to know that the rework helps here :)

--yliu
> 
> >
> >For this patch, I think you should also update (or remove?) the related
> >section in the release note.
> Yes, sure.
> 
> Thanks,
> Maxime

[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path

2016-10-17 Thread Yuanhan Liu

On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote:
> >On my side, I just setup 2 Windows 2016 VMs, and confirm the issue.
> >I'll continue the investigation early next week.
> 
> The root cause is identified.
> When INDIRECT_DESC feature is negotiated, Windows guest uses indirect
> for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD &
> virtio-net kernel driver) use indirect only for Tx.
> 
> I'll implement indirect support for the Rx path in vhost lib, but the
> change will be too big for -rc release.
> I propose in the mean time to disable INDIRECT_DESC feature in vhost
> lib, we can still enable it locally for testing.
> 
> Yuanhan, is it ok for you?

That's okay.

> 
> >Has anyone already tested Windows guest with vhost-net, which also has
> >indirect descs support?
> 
> I tested and confirm it works with vhost-net.

I'm a bit confused then. IIRC, vhost-net also doesn't support indirect
for Rx path, right?

--yliu

[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

2016-10-14 Thread Yuanhan Liu

On Thu, Oct 13, 2016 at 11:23:44AM +0200, Maxime Coquelin wrote:
> I was going to re-run some PVP benchmark with 0% pkt loss, as I had
> some strange results last week.
> 
> Problem is that your series no more apply cleanly due to
> next-virtio's master branch history rewrite.
> Any chance you send me a rebased version so that I can apply the series?

I think it's pointless to do that now: it won't be merged after all.
We have refactored out the new series, please help review if you
got time :)

BTW, apologize that I forgot to include your Reviewed-by for the
first patch. I intended to do that ...

--yliu

[dpdk-dev] [PATCH v7 7/7] vhost: retrieve avail head once

2016-10-14 Thread Yuanhan Liu

There is no need to retrieve the latest avail head every time we enqueue
a packet in the mereable Rx path by

avail_idx = *((volatile uint16_t *)>avail->idx);

Instead, we could just retrieve it once at the beginning of the enqueue
path. This could diminish the cache penalty slightly, because the virtio
driver could be updating it while vhost is reading it (for each packet).

Signed-off-by: Yuanhan Liu 
Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/virtio_net.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 12a037b..b784dba 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -387,10 +387,10 @@ fill_vec_buf(struct vhost_virtqueue *vq, uint32_t 
avail_idx,
  */
 static inline int
 reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, uint32_t size,
-   struct buf_vector *buf_vec, uint16_t *num_buffers)
+   struct buf_vector *buf_vec, uint16_t *num_buffers,
+   uint16_t avail_head)
 {
uint16_t cur_idx;
-   uint16_t avail_idx;
uint32_t vec_idx = 0;
uint16_t tries = 0;

@@ -401,8 +401,7 @@ reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, 
uint32_t size,
cur_idx  = vq->last_avail_idx;

while (size > 0) {
-   avail_idx = *((volatile uint16_t *)>avail->idx);
-   if (unlikely(cur_idx == avail_idx))
+   if (unlikely(cur_idx == avail_head))
return -1;

if (unlikely(fill_vec_buf(vq, cur_idx, _idx, buf_vec,
@@ -523,6 +522,7 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
uint32_t pkt_idx = 0;
uint16_t num_buffers;
struct buf_vector buf_vec[BUF_VECTOR_MAX];
+   uint16_t avail_head;

LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__);
if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
@@ -542,11 +542,12 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
rte_prefetch0(>avail->ring[vq->last_avail_idx & (vq->size - 1)]);

vq->shadow_used_idx = 0;
+   avail_head = *((volatile uint16_t *)>avail->idx);
for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;

if (unlikely(reserve_avail_buf_mergeable(vq, pkt_len, buf_vec,
-_buffers) < 0)) {
+   _buffers, avail_head) < 0)) {
LOG_DEBUG(VHOST_DATA,
"(%d) failed to get enough desc from vring\n",
dev->vid);
-- 
1.9.0

[dpdk-dev] [PATCH v7 6/7] vhost: prefetch avail ring

2016-10-14 Thread Yuanhan Liu

Signed-off-by: Yuanhan Liu 
Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/virtio_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 2bdc2fe..12a037b 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -539,6 +539,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
if (count == 0)
return 0;

+   rte_prefetch0(>avail->ring[vq->last_avail_idx & (vq->size - 1)]);
+
vq->shadow_used_idx = 0;
for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;
-- 
1.9.0

[dpdk-dev] [PATCH v7 5/7] vhost: shadow used ring update

2016-10-14 Thread Yuanhan Liu

From: Zhihong Wang <zhihong.w...@intel.com>

The basic idea is to shadow the used ring update: update them into a
local buffer first, and then flush them all to the virtio used vring
at once in the end.

And since we do avail ring reservation before enqueuing data, we would
know which and how many descs will be used. Which means we could update
the shadow used ring at the reservation time. It also introduce another
slight advantage: we don't need access the desc->flag any more inside
copy_mbuf_to_desc_mergeable().

Signed-off-by: Zhihong Wang 
Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost.c  |  13 +++-
 lib/librte_vhost/vhost.h  |   3 +
 lib/librte_vhost/vhost_user.c |  23 +--
 lib/librte_vhost/virtio_net.c | 138 +-
 4 files changed, 113 insertions(+), 64 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 469117a..d8116ff 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -121,9 +121,18 @@ static void
 free_device(struct virtio_net *dev)
 {
uint32_t i;
+   struct vhost_virtqueue *rxq, *txq;

-   for (i = 0; i < dev->virt_qp_nb; i++)
-   rte_free(dev->virtqueue[i * VIRTIO_QNUM]);
+   for (i = 0; i < dev->virt_qp_nb; i++) {
+   rxq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ];
+   txq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ];
+
+   rte_free(rxq->shadow_used_ring);
+   rte_free(txq->shadow_used_ring);
+
+   /* rxq and txq are allocated together as queue-pair */
+   rte_free(rxq);
+   }

rte_free(dev);
 }
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 17c557f..acec772 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -105,6 +105,9 @@ struct vhost_virtqueue {
uint16_tlast_zmbuf_idx;
struct zcopy_mbuf   *zmbufs;
struct zcopy_mbuf_list  zmbuf_list;
+
+   struct vring_used_elem  *shadow_used_ring;
+   uint16_tshadow_used_idx;
 } __rte_cache_aligned;

 /* Old kernels have no such macro defined */
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 3074227..6b83c15 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -198,6 +198,15 @@ vhost_user_set_vring_num(struct virtio_net *dev,
}
}

+   vq->shadow_used_ring = rte_malloc(NULL,
+   vq->size * sizeof(struct vring_used_elem),
+   RTE_CACHE_LINE_SIZE);
+   if (!vq->shadow_used_ring) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "failed to allocate memory for shadow used ring.\n");
+   return -1;
+   }
+
return 0;
 }

@@ -711,6 +720,8 @@ static int
 vhost_user_get_vring_base(struct virtio_net *dev,
  struct vhost_vring_state *state)
 {
+   struct vhost_virtqueue *vq = dev->virtqueue[state->index];
+
/* We have to stop the queue (virtio) if it is running. */
if (dev->flags & VIRTIO_DEV_RUNNING) {
dev->flags &= ~VIRTIO_DEV_RUNNING;
@@ -718,7 +729,7 @@ vhost_user_get_vring_base(struct virtio_net *dev,
}

/* Here we are safe to get the last used index */
-   state->num = dev->virtqueue[state->index]->last_used_idx;
+   state->num = vq->last_used_idx;

RTE_LOG(INFO, VHOST_CONFIG,
"vring base idx:%d file:%d\n", state->index, state->num);
@@ -727,13 +738,15 @@ vhost_user_get_vring_base(struct virtio_net *dev,
 * sent and only sent in vhost_vring_stop.
 * TODO: cleanup the vring, it isn't usable since here.
 */
-   if (dev->virtqueue[state->index]->kickfd >= 0)
-   close(dev->virtqueue[state->index]->kickfd);
+   if (vq->kickfd >= 0)
+   close(vq->kickfd);

-   dev->virtqueue[state->index]->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+   vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;

if (dev->dequeue_zero_copy)
-   free_zmbufs(dev->virtqueue[state->index]);
+   free_zmbufs(vq);
+   rte_free(vq->shadow_used_ring);
+   vq->shadow_used_ring = NULL;

return 0;
 }
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index b5ba633..2bdc2fe 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -91,6 +91,56 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t 
qp_nb)
return (is_tx ^ (idx & 1)) == 0 && idx < qp_nb * VIRTIO_QNUM;
 }

+static inline void __attribute__((always_inline))
+do_flush_shadow_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq,
+

[dpdk-dev] [PATCH v7 3/7] vhost: simplify mergeable Rx vring reservation

2016-10-14 Thread Yuanhan Liu

Let it return "num_buffers" we reserved, so that we could re-use it
with copy_mbuf_to_desc_mergeable() directly, instead of calculating
it again there.

Meanwhile, the return type of copy_mbuf_to_desc_mergeable is changed
to "int". -1 will be return on error.

Signed-off-by: Yuanhan Liu 
Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/virtio_net.c | 41 +
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index d4fc62a..1a40c91 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -336,7 +336,7 @@ fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx,
  */
 static inline int
 reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, uint32_t size,
-   uint16_t *end, struct buf_vector *buf_vec)
+   struct buf_vector *buf_vec, uint16_t *num_buffers)
 {
uint16_t cur_idx;
uint16_t avail_idx;
@@ -370,19 +370,18 @@ reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, 
uint32_t size,
return -1;
}

-   *end = cur_idx;
+   *num_buffers = cur_idx - vq->last_used_idx;
return 0;
 }

-static inline uint32_t __attribute__((always_inline))
+static inline int __attribute__((always_inline))
 copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
-   uint16_t end_idx, struct rte_mbuf *m,
-   struct buf_vector *buf_vec)
+   struct rte_mbuf *m, struct buf_vector *buf_vec,
+   uint16_t num_buffers)
 {
struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0};
uint32_t vec_idx = 0;
-   uint16_t start_idx = vq->last_used_idx;
-   uint16_t cur_idx = start_idx;
+   uint16_t cur_idx = vq->last_used_idx;
uint64_t desc_addr;
uint32_t desc_chain_head;
uint32_t desc_chain_len;
@@ -394,21 +393,21 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, 
struct vhost_virtqueue *vq,
struct rte_mbuf *hdr_mbuf;

if (unlikely(m == NULL))
-   return 0;
+   return -1;

LOG_DEBUG(VHOST_DATA, "(%d) current index %d | end index %d\n",
dev->vid, cur_idx, end_idx);

desc_addr = gpa_to_vva(dev, buf_vec[vec_idx].buf_addr);
if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr)
-   return 0;
+   return -1;

hdr_mbuf = m;
hdr_addr = desc_addr;
hdr_phys_addr = buf_vec[vec_idx].buf_addr;
rte_prefetch0((void *)(uintptr_t)hdr_addr);

-   virtio_hdr.num_buffers = end_idx - start_idx;
+   virtio_hdr.num_buffers = num_buffers;
LOG_DEBUG(VHOST_DATA, "(%d) RX: num merge buffers %d\n",
dev->vid, virtio_hdr.num_buffers);

@@ -440,7 +439,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct 
vhost_virtqueue *vq,

desc_addr = gpa_to_vva(dev, buf_vec[vec_idx].buf_addr);
if (unlikely(!desc_addr))
-   return 0;
+   return -1;

/* Prefetch buffer address. */
rte_prefetch0((void *)(uintptr_t)desc_addr);
@@ -489,7 +488,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
offsetof(struct vring_used, ring[used_idx]),
sizeof(vq->used->ring[used_idx]));

-   return end_idx - start_idx;
+   return 0;
 }

 static inline uint32_t __attribute__((always_inline))
@@ -497,8 +496,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
struct rte_mbuf **pkts, uint32_t count)
 {
struct vhost_virtqueue *vq;
-   uint32_t pkt_idx = 0, nr_used = 0;
-   uint16_t end;
+   uint32_t pkt_idx = 0;
+   uint16_t num_buffers;
struct buf_vector buf_vec[BUF_VECTOR_MAX];

LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__);
@@ -519,22 +518,24 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;

-   if (unlikely(reserve_avail_buf_mergeable(vq, pkt_len,
-, buf_vec) < 0)) {
+   if (unlikely(reserve_avail_buf_mergeable(vq, pkt_len, buf_vec,
+_buffers) < 0)) {
LOG_DEBUG(VHOST_DATA,
"(%d) failed to get enough desc from vring\n",
dev->vid);
break;
}

-   nr_used = copy_mbuf_

[dpdk-dev] [PATCH v7 2/7] vhost: optimize cache access

2016-10-14 Thread Yuanhan Liu

From: Zhihong Wang <zhihong.w...@intel.com>

This patch reorders the code to delay virtio header write to improve
cache access efficiency for cases where the mrg_rxbuf feature is turned
on. CPU pipeline stall cycles can be significantly reduced.

Virtio header write and mbuf data copy are all remote store operations
which takes a long time to finish. It's a good idea to put them together
to remove bubbles in between, to let as many remote store instructions
as possible go into store buffer at the same time to hide latency, and
to let the H/W prefetcher goes to work as early as possible.

On a Haswell machine, about 100 cycles can be saved per packet by this
patch alone. Taking 64B packets traffic for example, this means about 60%
efficiency improvement for the enqueue operation.

Signed-off-by: Zhihong Wang 
Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/virtio_net.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 812e5d3..d4fc62a 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -390,6 +390,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
uint32_t desc_offset, desc_avail;
uint32_t cpy_len;
uint16_t desc_idx, used_idx;
+   uint64_t hdr_addr, hdr_phys_addr;
+   struct rte_mbuf *hdr_mbuf;

if (unlikely(m == NULL))
return 0;
@@ -401,17 +403,15 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, 
struct vhost_virtqueue *vq,
if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr)
return 0;

-   rte_prefetch0((void *)(uintptr_t)desc_addr);
+   hdr_mbuf = m;
+   hdr_addr = desc_addr;
+   hdr_phys_addr = buf_vec[vec_idx].buf_addr;
+   rte_prefetch0((void *)(uintptr_t)hdr_addr);

virtio_hdr.num_buffers = end_idx - start_idx;
LOG_DEBUG(VHOST_DATA, "(%d) RX: num merge buffers %d\n",
dev->vid, virtio_hdr.num_buffers);

-   virtio_enqueue_offload(m, _hdr.hdr);
-   copy_virtio_net_hdr(dev, desc_addr, virtio_hdr);
-   vhost_log_write(dev, buf_vec[vec_idx].buf_addr, dev->vhost_hlen);
-   PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0);
-
desc_avail  = buf_vec[vec_idx].buf_len - dev->vhost_hlen;
desc_offset = dev->vhost_hlen;
desc_chain_head = buf_vec[vec_idx].desc_idx;
@@ -456,6 +456,16 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
mbuf_avail  = rte_pktmbuf_data_len(m);
}

+   if (hdr_addr) {
+   virtio_enqueue_offload(hdr_mbuf, _hdr.hdr);
+   copy_virtio_net_hdr(dev, hdr_addr, virtio_hdr);
+   vhost_log_write(dev, hdr_phys_addr, dev->vhost_hlen);
+   PRINT_PACKET(dev, (uintptr_t)hdr_addr,
+dev->vhost_hlen, 0);
+
+   hdr_addr = 0;
+   }
+
cpy_len = RTE_MIN(desc_avail, mbuf_avail);
rte_memcpy((void *)((uintptr_t)(desc_addr + desc_offset)),
rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
-- 
1.9.0

[dpdk-dev] [PATCH v7 1/7] vhost: remove useless volatile

2016-10-14 Thread Yuanhan Liu

From: Zhihong Wang 

last_used_idx is a local var, there is no need to decorate it
by "volatile".

Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/vhost.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 53dbf33..17c557f 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -85,7 +85,7 @@ struct vhost_virtqueue {
uint32_tsize;

uint16_tlast_avail_idx;
-   volatile uint16_t   last_used_idx;
+   uint16_tlast_used_idx;
 #define VIRTIO_INVALID_EVENTFD (-1)
 #define VIRTIO_UNINITIALIZED_EVENTFD   (-2)

-- 
1.9.0

[dpdk-dev] [PATCH v7 0/7] vhost: optimize mergeable Rx path

2016-10-14 Thread Yuanhan Liu

This is a new set of patches to optimize the mergeable Rx code path.
No refactoring (rewrite) was made this time. It just applies some
findings from Zhihong (kudos to him!) that could improve the mergeable
Rx path on the old code.

The two major factors that could improve the performance greatly are:

- copy virtio header together with packet data. This could remove
  the buubbles between the two copy to optimize the cache access.

  This is implemented in patch 2 "vhost: optimize cache access"

- shadow used ring update and update them at once

  The basic idea is to update used ring in a local buffer and flush
  them to the virtio used ring at once in the end. Again, this is
  for optimizing the cache access.

  This is implemented in patch 5 "vhost: shadow used ring update"

The two optimizations could yield 40+% performance in micro testing
and 20+% in PVP case testing with 64B packet size.

Besides that, there are some tiny optimizations, such as prefetch
avail ring (patch 6) and retrieve avail head once (patch 7).

Note: the shadow used ring tech could also be applied to the non-mrg
Rx path (and even the dequeu) path. I didn't do that for two reasons:

- we already update used ring in batch in both path: it's not shadowed
  first though.

- it's a bit too late too make many changes at this stage: RC1 is out. 

Please help testing.

Thanks.

--yliu

Cc: Jianbo Liu 
---
Yuanhan Liu (4):
  vhost: simplify mergeable Rx vring reservation
  vhost: use last avail idx for avail ring reservation
  vhost: prefetch avail ring
  vhost: retrieve avail head once

Zhihong Wang (3):
  vhost: remove useless volatile
  vhost: optimize cache access
  vhost: shadow used ring update

 lib/librte_vhost/vhost.c  |  13 ++-
 lib/librte_vhost/vhost.h  |   5 +-
 lib/librte_vhost/vhost_user.c |  23 +++--
 lib/librte_vhost/virtio_net.c | 193 +-
 4 files changed, 149 insertions(+), 85 deletions(-)

-- 
1.9.0

[dpdk-dev] 16.07.1 stable patches review and test

2016-10-14 Thread Yuanhan Liu

Hi,

I have applied most of bug fixing patches (listed below) to the 16.07
stable branch at

http://dpdk.org/browse/dpdk-stable/

Please help reviewing and testing. The planned date for the final release
is 26th, Oct. Before that, please shout if anyone has objections with these
patches being applied.

Thanks.

--yliu

---
Alejandro Lucero (1):
  net/nfp: fix copying MAC address

Aleksey Katargin (1):
  table: fix symbol exports

Alex Zelezniak (1):
  net/ixgbe: fix VF reset to apply to correct VF

Ali Volkan Atli (1):
  net/e1000: fix returned number of available Rx descriptors

Arek Kusztal (1):
  app/test: fix verification of digest for GCM

Beilei Xing (2):
  net/i40e: fix dropping packets with ethertype 0x88A8
  net/i40e: fix parsing QinQ packets type

Bruce Richardson (1):
  net/mlx: fix debug build with gcc 6.1

Christian Ehrhardt (1):
  examples/ip_pipeline: fix Python interpreter

Deepak Kumar Jain (2):
  crypto/null: fix key size increment value
  crypto/qat: fix FreeBSD build

Dror Birkman (1):
  net/pcap: fix memory leak in jumbo frames

Ferruh Yigit (2):
  app/testpmd: fix help of MTU set commmand
  pmdinfogen: fix clang build

Gary Mussar (1):
  tools: fix virtio interface name when binding

Gowrishankar Muthukrishnan (1):
  examples/ip_pipeline: fix lcore mapping for ppc64

Hiroyuki Mikita (1):
  sched: fix releasing enqueued packets

James Poole (1):
  app/testpmd: fix timeout in Rx queue flushing

Jianfeng Tan (3):
  net/virtio_user: fix first queue pair without multiqueue
  net/virtio_user: fix wrong sequence of messages
  net/virtio_user: fix error management during init

Jim Harris (1):
  contigmem: zero all pages during mmap

John Daley (1):
  net/enic: fix bad L4 checksum flag on ICMP packets

Karmarkar Suyash (1):
  timer: fix lag delay

Maciej Czekaj (1):
  mem: fix crash on hugepage mapping error

Nelson Escobar (1):
  net/enic: fix freeing memory for descriptor ring

Olivier Matz (4):
  app/testpmd: fix crash when mempool allocation fails
  tools: fix json output of pmdinfo
  mbuf: fix error handling on pool creation
  mem: fix build with -O1

Pablo de Lara (3):
  hash: fix ring size
  hash: fix false zero signature key hit lookup
  crypto: fix build with icc

Qi Zhang (1):
  net/i40e/base: fix UDP packet header

Rich Lane (1):
  net/i40e: fix null pointer dereferences when using VMDq+RSS

Weiliang Luo (1):
  mempool: fix corruption due to invalid handler

Xiao Wang (5):
  net/fm10k: fix MAC address removal from switch
  net/ixgbe/base: fix pointer check
  net/ixgbe/base: fix check for NACK
  net/ixgbe/base: fix possible corruption of shadow RAM
  net/ixgbe/base: fix skipping PHY config

Yangchao Zhou (1):
  pci: fix memory leak when detaching device

Yury Kylulin (2):
  net/ixgbe: fix mbuf leak during Rx queue release
  net/i40e: fix mbuf leak during Rx queue release

Zhiyong Yang (1):
  net/virtio: fix xstats name

[dpdk-dev] [PATCH v3 12/12] net/virtio: add Tso support

2016-10-14 Thread Yuanhan Liu

On Thu, Oct 13, 2016 at 04:16:11PM +0200, Olivier Matz wrote:
> +/* When doing TSO, the IP length is not included in the pseudo header
> + * checksum of the packet given to the PMD, but for virtio it is
> + * expected.
> + */
> +static void
> +virtio_tso_fix_cksum(struct rte_mbuf *m)
> +{
> + /* common case: header is not fragmented */
> + if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> + m->l4_len)) {
> + struct ipv4_hdr *iph;
> + struct ipv6_hdr *ip6h;
> + struct tcp_hdr *th;
> + uint16_t prev_cksum, new_cksum, ip_len, ip_paylen;
> + uint32_t tmp;
...
> + } else {

As discussed just now, if you drop the else part, you could add my
ACK for the whole virtio changes, and Review-ed by for all mbuf and
other changes.

Thoams, please pick them by youself directly: since it depends on
other patches and they will be picked (or already be picked?) by you.

Thanks.

--yliu

[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-14 Thread Yuanhan Liu

On Thu, Oct 13, 2016 at 05:45:21PM +0200, Olivier Matz wrote:
> >> If you have a packet split like this:
> >> 
> >> mbuf segment 1 mbuf segment 2
> >>    --
> >> | Ethernet header |  IP hea|   |der | TCP header | data
> >>    --
> >>^
> >>iph
> >
> >Thanks, that's clear. How could you be able to access the tcp header
> >from the first mbuf then? I mean, how is the following code supposed
> >to work?
> >
> >prev_cksum.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
> > m->l2_len + m->l3_len + 16);
> >
> 
> Oh I see... Sorry there was a confusion on my side with another (internal) 
> macro that browses the segments if the offset ils not in the first one.
> 
> If you agree, let's add the code without the else part, I'll fix it for the 
> rc2.

Good. That's okay to me.

--yliu

[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-14 Thread Yuanhan Liu

On Thu, Oct 13, 2016 at 05:15:24PM +0200, Olivier MATZ wrote:
> 
> 
> On 10/13/2016 05:01 PM, Yuanhan Liu wrote:
> >On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote:
> >>
> >>
> >>On 10/13/2016 04:16 PM, Yuanhan Liu wrote:
> >>>On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote:
> >>>>
> >>>>
> >>>>On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
> >>>>>On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
> >>>>>>+/* When doing TSO, the IP length is not included in the pseudo header
> >>>>>>+ * checksum of the packet given to the PMD, but for virtio it is
> >>>>>>+ * expected.
> >>>>>>+ */
> >>>>>>+static void
> >>>>>>+virtio_tso_fix_cksum(struct rte_mbuf *m)
> >>>>>>+{
> >>>>>>+   /* common case: header is not fragmented */
> >>>>>>+   if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> >>>>>>+   m->l4_len)) {
> >>>>>...
> >>>>>>+   /* replace it in the packet */
> >>>>>>+   th->cksum = new_cksum;
> >>>>>>+   } else {
> >>>>>...
> >>>>>>+   /* replace it in the packet */
> >>>>>>+   *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>>>>>+   m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
> >>>>>>+   *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>>>>>+   m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
> >>>>>>+   }
> >>>>>
> >>>>>The tcp header will always be in the mbuf, right? Otherwise, you can't
> >>>>>update the cksum field here. What's the point of introducing the "else
> >>>>>clause" then?
> >>>>
> >>>>Sorry, I don't see the problem you're pointing out here.
> >>>>
> >>>>What I want to solve here is to support the cases where the mbuf is
> >>>>segmented in the middle of the network header (which is probably a rare
> >>>>case).
> >>>
> >>>How it's gonna segmented?
> >>
> >>The mbuf is given by the application. So if the application generates a
> >>segmented mbuf, it should work.
> >>
> >>This could happen for instance if the application uses mbuf clones to share
> >>the IP/TCP/data part of the mbuf and prepend a specific Ethernet/vlan for
> >>different destination.
> >>
> >>
> >>>>In the "else" part, I only access the mbuf byte by byte using the
> >>>>rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy
> >>>>the header in a linear buffer, fix the checksum, then copy it again in the
> >>>>packet, but there is no mbuf helpers to do these copies for now.
> >>>
> >>>In the "else" clause, the ip header is still in the mbuf, right?
> >>>Why do you have to access it the way like:
> >>>
> >>>   ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>>   m->l2_len) >> 4;
> >>>
> >>>Why can't you just use
> >>>
> >>>   iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> >>>   iph->version_ihl ;
> >>
> >>AFAIK, there is no requirement that each network header has to be contiguous
> >>in a mbuf segment.
> >>
> >>Of course, a split in the middle of a network header probably never
> >>happens... but we never knows, as it is not forbidden. I think the code
> >>should be robust enough to avoid accesses to wrong addresses.
> >>
> >>Hope it's clear enough :)
> >
> >Thanks, but not really. Maybe let me ask this way: what wrong would
> >happen if we use
> > iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> >to access the IP header? Is it about the endian?
> 
> If you have a packet split like this:
> 
> mbuf segment 1 mbuf segment 2
>    --
> | Ethernet header |  IP hea|   |der | TCP header | data
>    --
>^
>iph

Thanks, that's clear. How could you be able to access the tcp header
from the first mbuf then? I mean, how is the following code supposed
to work?

prev_cksum.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
m->l2_len + m->l3_len + 16);

> The IP header is not contiguous. So accessing to the end of the structure
> will access to a wrong location.
> 
> >One more question is do you have any case to trigger the "else" clause?
> 
> No, but I think it may happen.

A piece of untest code is not trusted though ...

--yliu

[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-14 Thread Yuanhan Liu

On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote:
> >In the "else" clause, the ip header is still in the mbuf, right?
> >Why do you have to access it the way like:
> >
> > ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
> > m->l2_len) >> 4;
> >
> >Why can't you just use
> >
> > iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> > iph->version_ihl ;
> 
> AFAIK, there is no requirement that each network header has to be contiguous
> in a mbuf segment.
> 
> Of course, a split in the middle of a network header probably never
> happens... but we never knows, as it is not forbidden. I think the code
> should be robust enough to avoid accesses to wrong addresses.

One more question is do you have any case to trigger the "else" clause?

--yliu

[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-14 Thread Yuanhan Liu

On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote:
> 
> 
> On 10/13/2016 04:16 PM, Yuanhan Liu wrote:
> >On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote:
> >>
> >>
> >>On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
> >>>On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
> >>>>+/* When doing TSO, the IP length is not included in the pseudo header
> >>>>+ * checksum of the packet given to the PMD, but for virtio it is
> >>>>+ * expected.
> >>>>+ */
> >>>>+static void
> >>>>+virtio_tso_fix_cksum(struct rte_mbuf *m)
> >>>>+{
> >>>>+ /* common case: header is not fragmented */
> >>>>+ if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> >>>>+ m->l4_len)) {
> >>>...
> >>>>+ /* replace it in the packet */
> >>>>+ th->cksum = new_cksum;
> >>>>+ } else {
> >>>...
> >>>>+ /* replace it in the packet */
> >>>>+ *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>>>+ m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
> >>>>+ *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>>>+ m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
> >>>>+ }
> >>>
> >>>The tcp header will always be in the mbuf, right? Otherwise, you can't
> >>>update the cksum field here. What's the point of introducing the "else
> >>>clause" then?
> >>
> >>Sorry, I don't see the problem you're pointing out here.
> >>
> >>What I want to solve here is to support the cases where the mbuf is
> >>segmented in the middle of the network header (which is probably a rare
> >>case).
> >
> >How it's gonna segmented?
> 
> The mbuf is given by the application. So if the application generates a
> segmented mbuf, it should work.
> 
> This could happen for instance if the application uses mbuf clones to share
> the IP/TCP/data part of the mbuf and prepend a specific Ethernet/vlan for
> different destination.
> 
> 
> >>In the "else" part, I only access the mbuf byte by byte using the
> >>rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy
> >>the header in a linear buffer, fix the checksum, then copy it again in the
> >>packet, but there is no mbuf helpers to do these copies for now.
> >
> >In the "else" clause, the ip header is still in the mbuf, right?
> >Why do you have to access it the way like:
> >
> > ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
> > m->l2_len) >> 4;
> >
> >Why can't you just use
> >
> > iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> > iph->version_ihl ;
> 
> AFAIK, there is no requirement that each network header has to be contiguous
> in a mbuf segment.
> 
> Of course, a split in the middle of a network header probably never
> happens... but we never knows, as it is not forbidden. I think the code
> should be robust enough to avoid accesses to wrong addresses.
> 
> Hope it's clear enough :)

Thanks, but not really. Maybe let me ask this way: what wrong would
happen if we use
iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
to access the IP header? Is it about the endian?

--yliu

[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-13 Thread Yuanhan Liu

On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote:
> 
> 
> On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
> >On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
> >>+/* When doing TSO, the IP length is not included in the pseudo header
> >>+ * checksum of the packet given to the PMD, but for virtio it is
> >>+ * expected.
> >>+ */
> >>+static void
> >>+virtio_tso_fix_cksum(struct rte_mbuf *m)
> >>+{
> >>+   /* common case: header is not fragmented */
> >>+   if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> >>+   m->l4_len)) {
> >...
> >>+   /* replace it in the packet */
> >>+   th->cksum = new_cksum;
> >>+   } else {
> >...
> >>+   /* replace it in the packet */
> >>+   *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>+   m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
> >>+   *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>+   m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
> >>+   }
> >
> >The tcp header will always be in the mbuf, right? Otherwise, you can't
> >update the cksum field here. What's the point of introducing the "else
> >clause" then?
> 
> Sorry, I don't see the problem you're pointing out here.
> 
> What I want to solve here is to support the cases where the mbuf is
> segmented in the middle of the network header (which is probably a rare
> case).

How it's gonna segmented?

> 
> In the "else" part, I only access the mbuf byte by byte using the
> rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy
> the header in a linear buffer, fix the checksum, then copy it again in the
> packet, but there is no mbuf helpers to do these copies for now.

In the "else" clause, the ip header is still in the mbuf, right?
Why do you have to access it the way like:

ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
m->l2_len) >> 4;

Why can't you just use

iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
iph->version_ihl ;

Sorry, I'm just a bit lost.

--yliu

[dpdk-dev] [PATCH v2 10/12] virtio: add Tx checksum offload support

2016-10-13 Thread Yuanhan Liu

On Mon, Oct 03, 2016 at 11:00:21AM +0200, Olivier Matz wrote:
> + /* Checksum Offload */
> + switch (cookie->ol_flags & PKT_TX_L4_MASK) {
> + case PKT_TX_UDP_CKSUM:
> + hdr->csum_start = cookie->l2_len + cookie->l3_len;
> + hdr->csum_offset = 6;
> + hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> + break;
> +
> + case PKT_TX_TCP_CKSUM:
> + hdr->csum_start = cookie->l2_len + cookie->l3_len;
> + hdr->csum_offset = 16;

I would suggest to use "offsetof(...)" here, instead of some magic
number like 16.

--yliu

[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-13 Thread Yuanhan Liu

On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
> +/* When doing TSO, the IP length is not included in the pseudo header
> + * checksum of the packet given to the PMD, but for virtio it is
> + * expected.
> + */
> +static void
> +virtio_tso_fix_cksum(struct rte_mbuf *m)
> +{
> + /* common case: header is not fragmented */
> + if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> + m->l4_len)) {
...
> + /* replace it in the packet */
> + th->cksum = new_cksum;
> + } else {
...
> + /* replace it in the packet */
> + *rte_pktmbuf_mtod_offset(m, uint8_t *,
> + m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
> + *rte_pktmbuf_mtod_offset(m, uint8_t *,
> + m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
> + }

The tcp header will always be in the mbuf, right? Otherwise, you can't
update the cksum field here. What's the point of introducing the "else
clause" then?

--yliu

[dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device in configure callback

2016-10-13 Thread Yuanhan Liu

On Wed, Oct 12, 2016 at 06:01:25PM +0200, Olivier MATZ wrote:
> Hello Yuanhan,
> 
> On 10/12/2016 04:41 PM, Yuanhan Liu wrote:
> >On Mon, Oct 03, 2016 at 11:00:14AM +0200, Olivier Matz wrote:
> >>@@ -1344,6 +1347,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
> >>  {
> >>const struct rte_eth_rxmode *rxmode = >data->dev_conf.rxmode;
> >>struct virtio_hw *hw = dev->data->dev_private;
> >>+   uint64_t req_features;
> >>int ret;
> >>
> >>PMD_INIT_LOG(DEBUG, "configure");
> >>@@ -1353,6 +1357,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
> >>return -EINVAL;
> >>}
> >>
> >>+   req_features = VIRTIO_PMD_GUEST_FEATURES;
> >>+   /* if request features changed, reinit the device */
> >>+   if (req_features != hw->req_guest_features) {
> >>+   ret = virtio_init_device(dev, req_features);
> >>+   if (ret < 0)
> >>+   return ret;
> >>+   }
> >
> >Why do you have to reset virtio here? This doesn't make too much sense
> >to me.
> >
> >IIUC, you want to make sure those TSO related features being unset at
> >init time, and enable it (by doing reset) when it's asked to be enabled
> >(by rte_eth_dev_configure)?
> >
> >Why not always setting those features? We could do the actual offloads
> >when:
> >
> >- those features have been negoiated
> >
> >- they are enabled through rte_eth_dev_configure
> >
> >With that, I think we could avoid the reset here?
> 
> It would work for TX, since you decide to use or not the feature. But I
> think this won't work for RX: if you negociate LRO at init, the host may
> send you large packets, even if LRO is disabled in dev_configure.

I see. Thanks.

Besides, I think you should return error when LRO is not negoiated
after the reset (say, when it's disabled through qemu command line)?

--yliu

[dpdk-dev] [dpdk-stable] [PATCH v6 1/6] vhost: fix windows vm hang

2016-10-13 Thread Yuanhan Liu

On Mon, Sep 19, 2016 at 10:00:12PM -0400, Zhihong Wang wrote:
> This patch fixes a Windows VM compatibility issue in DPDK 16.07 vhost code
> which causes the guest to hang once any packets are enqueued when mrg_rxbuf
> is turned on by setting the right id and len in the used ring.
> 
> As defined in virtio spec 0.95 and 1.0, in each used ring element, id means
> index of start of used descriptor chain, and len means total length of the
> descriptor chain which was written to. While in 16.07 code, index of the
> last descriptor is assigned to id, and the length of the last descriptor is
> assigned to len.
> 
> How to test?
> 
>  1. Start testpmd in the host with a vhost port.
> 
>  2. Start a Windows VM image with qemu and connect to the vhost port.
> 
>  3. Start io forwarding with tx_first in host testpmd.
> 
> For 16.07 code, the Windows VM will hang once any packets are enqueued.
> 
> Cc: 
> Signed-off-by: Zhihong Wang 

Applied to dpdk-next-virtio (this patch only).

Thanks.

--yliu

[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

2016-10-13 Thread Yuanhan Liu

On Wed, Oct 12, 2016 at 12:22:08PM +, Wang, Zhihong wrote:
> > > >> >  3. How many percentage of drop are you seeing?
> > > The testing result:
> > > size (bytes) improvement (%)
> > > 64   3.92
> > > 128 11.51
> > > 256  24.16
> > > 512  -13.79
> > > 1024-22.51
> > > 1500-12.22
> > > A correction is that performance is dropping if byte size is larger than 
> > > 512.
> > 
> > I have thought of this twice. Unfortunately, I think I may need NACK this
> > series.
> > 
> > Merging two code path into one is really good: as you stated, it improves
> > the maintainability. But only if we see no performance regression on both
> > path after the refactor. Unfortunately, that's not the case here: it hurts
> > the performance for one code path (non-mrg Rx).
> > 
> > That makes me think we may should not do the code path merge at all. I think
> > that also aligns with what you have said before (internally): we could do 
> > the
> > merge if it gives comparable performance before and after that.
> > 
> > Besides that, I don't quite like the way you did in patch 2 (rewrite 
> > enqueue):
> > you made a lot of changes in one patch. That means if something wrong
> > happened,
> > it is hard to narrow down which change introduces that regression. Badly,
> > that's exactly what we met here. Weeks have been passed, I see no progress.
> > 
> > That's the reason we like the idea of "one patch only does one thing, an
> > atomic thing".
> 
> 
> Yuanhan, folks,
> 
> Thanks for the analysis. I disagree here though.
> 
> I analyze, develop, benchmark on x86 platforms, where this patch
> works great.

Yes, that's great effort! With your hardwork, we know what the bottleneck
is and how it could be improved.

However, you don't have to do code refactor (merge two code path to one)
to apply those improvements. From what I know, in this patchset, there
are two factors could improve the performance:

- copy hdr together with packet data

- shadow used ring update and update at once

The overall performance boost I got with your v6 patchset with mrg-Rx
code path is about 27% (in PVP case). And I have just applied the 1st
optimization, it yields about 20% boosts. The left could be covered if
we apply the 2nd optimization (I guess).

That would be a clean way to optimize vhost mergeable Rx path:

- you don't touch non-mrg Rx path (well, you may could apply the
  shadow_used_ring trick to it as wel)

  This would at least make sure we will have no such performance
  regression issue reported by ARM guys.

- you don't refactor the code

  The rewrite from scratch could introduce other issues, besides the
  performance regression. We may just don't know it yet.


Make sense to you? If you agree, I think we could still make it in
this release: they would be some small changes after all. For example,
below is the patch applies the 1st optimization tip on top of
dpdk-next-virtio

--yliu

---
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 8a151af..0ddb5af 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -379,7 +379,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
uint16_t end_idx, struct rte_mbuf *m,
struct buf_vector *buf_vec)
 {
-   struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0};
+   struct virtio_net_hdr_mrg_rxbuf *virtio_hdr;
uint32_t vec_idx = 0;
uint16_t start_idx = vq->last_used_idx;
uint16_t cur_idx = start_idx;
@@ -388,6 +388,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
uint32_t desc_offset, desc_avail;
uint32_t cpy_len;
uint16_t desc_idx, used_idx;
+   uint16_t num_buffers = end_idx - start_idx;
+   int hdr_copied = 0;

if (unlikely(m == NULL))
return 0;
@@ -399,16 +401,11 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, 
struct vhost_virtqueue *vq,
if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr)
return 0;

-   rte_prefetch0((void *)(uintptr_t)desc_addr);
+   virtio_hdr = (struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr;
+   rte_prefetch0(virtio_hdr);

-   virtio_hdr.num_buffers = end_idx - start_idx;
LOG_DEBUG(VHOST_DATA, "(%d) RX: num merge buffers %d\n",
-   dev->vid, virtio_hdr.num_buffers);
-
-   virtio_enqueue_offload(m, _hdr.hdr);
-   copy_virtio_net_hdr(dev, desc_addr, virtio_hdr);
-   vhost_log_write(dev, buf_vec[vec_idx].buf_addr, dev->vhost_hlen);
-   PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0);
+   dev->vid, num_buffers);

desc_avail  = buf_vec[vec_idx].buf_len - dev->vhost_hlen;
desc_offset =

[dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device in configure callback

2016-10-12 Thread Yuanhan Liu

On Mon, Oct 03, 2016 at 11:00:14AM +0200, Olivier Matz wrote:
> @@ -1344,6 +1347,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
>  {
>   const struct rte_eth_rxmode *rxmode = >data->dev_conf.rxmode;
>   struct virtio_hw *hw = dev->data->dev_private;
> + uint64_t req_features;
>   int ret;
>  
>   PMD_INIT_LOG(DEBUG, "configure");
> @@ -1353,6 +1357,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
>   return -EINVAL;
>   }
>  
> + req_features = VIRTIO_PMD_GUEST_FEATURES;
> + /* if request features changed, reinit the device */
> + if (req_features != hw->req_guest_features) {
> + ret = virtio_init_device(dev, req_features);
> + if (ret < 0)
> + return ret;
> + }

Why do you have to reset virtio here? This doesn't make too much sense
to me.

IIUC, you want to make sure those TSO related features being unset at
init time, and enable it (by doing reset) when it's asked to be enabled
(by rte_eth_dev_configure)?

Why not always setting those features? We could do the actual offloads
when:

- those features have been negoiated

- they are enabled through rte_eth_dev_configure

With that, I think we could avoid the reset here?

--yliu

[dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support

2016-10-12 Thread Yuanhan Liu

On Wed, Oct 05, 2016 at 03:27:47PM +0200, Maxime Coquelin wrote:
> >/* Update offload features */
> >-   if (virtio_rx_offload(rxm, hdr) < 0) {
> >+   if ((features & VIRTIO_NET_F_GUEST_CSUM) &&
> s/VIRTIO_NET_F_GUEST_CSUM/(1u << VIRTIO_NET_F_GUEST_CSUM)/

There is a helper function for that: vtpci_with_feature.

--yliu

[dpdk-dev] [PATCH v3 0/7] vhost: add dequeue zero copy support

2016-10-12 Thread Yuanhan Liu

On Sun, Oct 09, 2016 at 03:27:53PM +0800, Yuanhan Liu wrote:
> This patch set enables vhost dequeue zero copy. The majority work goes
> to patch 4: "vhost: add dequeue zero copy".

Applied to dpdk-next-virtio.

--yliu
> 
> The basic idea of dequeue zero copy is, instead of copying data from the
> desc buf, here we let the mbuf reference the desc buf addr directly.
> 
> The major issue behind that is how and when to update the used ring.
> You could check the commit log of patch 4 for more details.
> 
> Patch 5 introduces a new flag, RTE_VHOST_USER_DEQUEUE_ZERO_COPY, to enable
> dequeue zero copy, which is disabled by default.
> 
> The performance gain is quite impressive. For a simple dequeue workload
> (running rxonly in vhost-pmd and runnin txonly in guest testpmd), it yields
> 50+% performance boost for packet size 1500B. For VM2VM iperf test case,
> it's even better: about 70% boost.
> 
> For small packets, the performance is worse (it's expected, as the extra
> overhead introduced by zero copy outweighs the benefits from saving few
> bytes copy).
> 
> v3: - rebase: mainly for removing conflicts with the Tx indirect patch
> - don't update last_used_idx twice for zero-copy mode
> - handle two mssiing "Tx -> dequeue" renames in log and usage
> 
> v2: - renamed "tx zero copy" to "dequeue zero copy", to reduce confusions.
> - hnadle the case that a desc buf might across 2 host phys pages
> - use MAP_POPULATE to let kernel populate the page tables
> - updated release note
> - doc-ed the limitations for the vm2nic case
> - merge 2 continuous guest phys memory region
> - and few more trivial changes, please see them in the corresponding
>   patches
> 
> ---
> Yuanhan Liu (7):
>   vhost: simplify memory regions handling
>   vhost: get guest/host physical address mappings
>   vhost: introduce last avail idx for dequeue
>   vhost: add dequeue zero copy
>   vhost: add a flag to enable dequeue zero copy
>   examples/vhost: add an option to enable dequeue zero copy
>   net/vhost: add an option to enable dequeue zero copy
> 
>  doc/guides/prog_guide/vhost_lib.rst|  35 +++-
>  doc/guides/rel_notes/release_16_11.rst |  13 ++
>  drivers/net/vhost/rte_eth_vhost.c  |  13 ++
>  examples/vhost/main.c  |  19 +-
>  lib/librte_vhost/rte_virtio_net.h  |   1 +
>  lib/librte_vhost/socket.c  |   5 +
>  lib/librte_vhost/vhost.c   |  12 ++
>  lib/librte_vhost/vhost.h   | 102 ---
>  lib/librte_vhost/vhost_user.c  | 315 
> ++---
>  lib/librte_vhost/virtio_net.c  | 196 +---
>  10 files changed, 549 insertions(+), 162 deletions(-)
> 
> -- 
> 1.9.0

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

2016-10-12 Thread Yuanhan Liu

On Tue, Oct 11, 2016 at 02:57:49PM +0800, Yuanhan Liu wrote:
> > > > > > There was an example: the vhost enqueue optmization patchset from
> > > > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)

Though it doesn't matter now, but I have verified it yesterday (with and
wihtout memset), the drop could be up to 30+%.

This is to let you know that it could behaviour badly if memset is not
inlined.

> > > > > > on my Ivybridge server: it has no such issue on his server though.
> > > > > >
> > > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > > > > >
> > > > > > --yliu
> > > > >
> > > > > I'd say that's weird. what's your config? any chance you
> > > > > are using an old compiler?
> > > > 
> > > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC,
> > > > he said the memset is not well optimized for Ivybridge server.
> > > 
> > > The dst is remote in that case. It's fine on Haswell but has complication
> > > in Ivy Bridge which (wasn't supposed to but) causes serious frontend 
> > > issue.
> > > 
> > > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.
> > 
> > 
> > So try something like this then:
> 
> Yes, I saw memset is inlined when this diff is applied.

I have another concern though: It's a trick could let gcc do the inline,
I am not quite sure whether that's ture with other compilers (i.e. clang,
icc, or even, older gcc).

For this case, I think I still prefer some trick like
*(struct ..*) = {0, }

Or even, we may could introduce rte_memset(). IIRC, that has been
proposed somehow before?

--yliu

[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

2016-10-12 Thread Yuanhan Liu

On Thu, Sep 22, 2016 at 01:47:45PM +0800, Jianbo Liu wrote:
> On 22 September 2016 at 10:29, Yuanhan Liu  
> wrote:
> > On Wed, Sep 21, 2016 at 08:54:11PM +0800, Jianbo Liu wrote:
> >> >> > My setup consists of one host running a guest.
> >> >> > The guest generates as much 64bytes packets as possible using
> >> >>
> >> >> Have you tested with other different packet size?
> >> >> My testing shows that performance is dropping when packet size is more
> >> >> than 256.
> >> >
> >> >
> >> > Hi Jianbo,
> >> >
> >> > Thanks for reporting this.
> >> >
> >> >  1. Are you running the vector frontend with mrg_rxbuf=off?
> >> >
> Yes, my testing is mrg_rxbuf=off, but not vector frontend PMD.
> 
> >> >  2. Could you please specify what CPU you're running? Is it Haswell
> >> > or Ivy Bridge?
> >> >
> It's an ARM server.
> 
> >> >  3. How many percentage of drop are you seeing?
> The testing result:
> size (bytes) improvement (%)
> 64   3.92
> 128 11.51
> 256  24.16
> 512  -13.79
> 1024-22.51
> 1500-12.22
> A correction is that performance is dropping if byte size is larger than 512.

I have thought of this twice. Unfortunately, I think I may need NACK this
series.

Merging two code path into one is really good: as you stated, it improves
the maintainability. But only if we see no performance regression on both 
path after the refactor. Unfortunately, that's not the case here: it hurts
the performance for one code path (non-mrg Rx).

That makes me think we may should not do the code path merge at all. I think
that also aligns with what you have said before (internally): we could do the
merge if it gives comparable performance before and after that.

Besides that, I don't quite like the way you did in patch 2 (rewrite enqueue):
you made a lot of changes in one patch. That means if something wrong happened,
it is hard to narrow down which change introduces that regression. Badly,
that's exactly what we met here. Weeks have been passed, I see no progress.

That's the reason we like the idea of "one patch only does one thing, an
atomic thing".

So I will apply the first patch (it's a bug fixing patch) and ask you to
refactor the rest, without the code path merge.

I think we could still have a good maintainability code base if we introduce 
more common helper functions that can be used on both Rx path, or even on
Tx path (such as update_used_ring, or shadow_used_ring).

It's a bit late for too many changes for v16.11. I think you could just
grab patch 6 (vhost: optimize cache access) to the old mrg-Rx code path,
if that also helps the performance? Let us handle the left in next release,
such as shadow used ring.

Thanks.

--yliu

[dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support

2016-10-12 Thread Yuanhan Liu

On Tue, Oct 11, 2016 at 02:14:10PM +0200, Olivier MATZ wrote:
> Hi Yuanhan,
> 
> On 10/11/2016 01:35 PM, Yuanhan Liu wrote:
> >Hi,
> >
> >Firstly, apologize for so late review. It's been forgotten :(
> >
> >BTW, please feel free to ping me in future if I made no response
> >in one or two weeks!
> >
> >I haven't reviewed it carefully yet (something I will do tomorrow).
> >Before that, few quick questions.
> >
> >Firstly, would you write down some test steps? Honestly, I'm not
> >quite sure how that works without the TCP/IP stack.
> 
> Not sure I'm getting your question.
> The test plan described in the cover letter works without any dpdk tcp/ip
> stack. It uses testpmd, which is able to bridge packets and ask for TCP
> segmentation.

Oops, I thought the patch list is the end of the cover letter :(
It looks like a great doc after a first glimpse.

I will look at your code tomorrow.

Thanks.

--yliu

[dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support

2016-10-11 Thread Yuanhan Liu

Hi,

Firstly, apologize for so late review. It's been forgotten :(

BTW, please feel free to ping me in future if I made no response 
in one or two weeks!

I haven't reviewed it carefully yet (something I will do tomorrow).
Before that, few quick questions.

Firstly, would you write down some test steps? Honestly, I'm not
quite sure how that works without the TCP/IP stack.

On Mon, Oct 03, 2016 at 11:00:11AM +0200, Olivier Matz wrote:
> This patchset, targetted for 16.11, introduces the support of rx and tx
> offload in virtio pmd.  To achieve this, some new mbuf flags must be
> introduced, as discussed in [1].
> 
> It applies on top of:
> - software packet type [2]
> - testpmd enhancements [3]

I didn't do the search. Have the two got merged?

--yliu
> 
> The new mbuf checksum flags are backward compatible for current
> applications that assume that unknown_csum = good_cum (since there
> was only a bad_csum flag). But it the patchset is integrated, we
> should consider updating the PMDs to match the new API for 16.11.
> 
> [1] http://dpdk.org/ml/archives/dev/2016-May/039920.html
> [2] http://dpdk.org/ml/archives/dev/2016-October/048073.html
> [3] http://dpdk.org/ml/archives/dev/2016-September/046443.html
> 
> changes v1 -> v2
> - change mbuf checksum calculation static inline
> - fix checksum calculation for protocol where csum=0 means no csum
> - move mbuf checksum calculation in librte_net
> - use RTE_MIN() to set max rx/tx queue
> - rebase on top of head
> 
> Olivier Matz (12):
>   virtio: move device initialization in a function
>   virtio: setup and start cq in configure callback
>   virtio: reinitialize the device in configure callback
>   net: add function to calculate a checksum in a mbuf
>   mbuf: add new Rx checksum mbuf flags
>   app/testpmd: fix checksum stats in csum engine
>   mbuf: new flag for LRO
>   app/testpmd: display lro segment size
>   virtio: add Rx checksum offload support
>   virtio: add Tx checksum offload support
>   virtio: add Lro support
>   virtio: add Tso support
> 
>  app/test-pmd/csumonly.c|   8 +-
>  doc/guides/rel_notes/release_16_11.rst |  16 ++
>  drivers/net/virtio/virtio_ethdev.c | 182 +-
>  drivers/net/virtio/virtio_ethdev.h |  18 +--
>  drivers/net/virtio/virtio_pci.h|   4 +-
>  drivers/net/virtio/virtio_rxtx.c   | 270 
> ++---
>  drivers/net/virtio/virtqueue.h |   1 +
>  lib/librte_mbuf/rte_mbuf.c |  18 ++-
>  lib/librte_mbuf/rte_mbuf.h |  58 ++-
>  lib/librte_net/rte_ip.h|  60 
>  10 files changed, 526 insertions(+), 109 deletions(-)
> 
> Test plan
> =
> 
> (not fully replayed on v2, but no major change)
> 
> Platform description
> 
> 
>   guest (dpdk)
>   ++
>   ||
>   ||
>   | port0  +-<---+
>   |   ixgbe /  | |
>   |   directio | |
>   || |
>   |port1   | ^ flow1
>   ++ | (flow2 is the reverse)
>  |   |
>  | virtio|
>  v   |
>   ++ |
>   | tap0   /   | |
>   |1.1.1.1   / | |
>   |ns-tap  /   | |
>   |  / | |
>   |/   ixgbe2  +-->--+
>   |  /1.1.1.2  |
>   |/  ns-ixgbe |
>   ++
>   host (linux, vhost-net)
> 
> 
> flow1:
>   host -(ixgbe)-> guest -(virtio)-> host
>   1.1.1.2 -> 1.1.1.1
> 
> flow2:
>   host -(virtio)-> guest -(ixgbe)-> host
>   1.1.1.2 -> 1.1.1.1
> 
> Host configuration
> --
> 
> Start qemu with:
> 
> - a ne2k management interface to avoi any conflict with dpdk
> - 2 ixgbe interfaces given to with vm through vfio
> - a virtio net device, connected to a tap interface through vhost-net
> 
>   /usr/bin/qemu-system-x86_64 -k fr -daemonize --enable-kvm -m 1G -cpu host \
> -smp 3 -serial telnet::40564,server,nowait -serial null \
> -qmp tcp::44340,server,nowait -monitor telnet::49229,server,nowait \
> -device ne2k_pci,mac=de:ad:de:01:02:03,netdev=user.0,addr=03 \
> -netdev user,id=user.0,hostfwd=tcp::34965-:22 \
> -device vfio-pci,host=:04:00.0 -device vfio-pci,host=:04:00.1 \
> -netdev type=tap,id=vhostnet0,script=no,vhost=on,queues=8 \
> -device virtio-net-pci,netdev=vhostnet0,ioeventfd=on,mq=on,vectors=17 \
> -hda "/path/to/ubuntu-14.04-template.qcow2" \
> -snapshot -vga none -display none
> 
> Move the tap interface in a netns, and configure it:
> 
>   ip netns add ns-tap
>   ip netns exec ns-tap ip l set lo up
>   ip link set tap0 netns ns-tap
>   ip netns exec ns-tap ip l set tap0 down
>   ip netns exec ns-tap ip l set addr 02:00:00:00:00:01 dev tap0
>   ip netns exec ns-tap ip l set tap0 up
>   ip netns exec ns-tap ip a a 1.1.1.1/24 dev tap0
>   ip netns exec ns-tap arp -s 1.1.1.2

[dpdk-dev] [PATCH v2] vhost: Only access header if offloading is supported in dequeue path

2016-10-11 Thread Yuanhan Liu

On Tue, Oct 11, 2016 at 09:45:27AM +0200, Maxime Coquelin wrote:
> @@ -684,12 +699,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
> vring_desc *descs,
> struct rte_mempool *mbuf_pool)
>  {
>   struct vring_desc *desc;
> - uint64_t desc_addr;
> + uint64_t desc_addr = 0;
>   uint32_t desc_avail, desc_offset;
>   uint32_t mbuf_avail, mbuf_offset;
>   uint32_t cpy_len;
>   struct rte_mbuf *cur = m, *prev = m;
> - struct virtio_net_hdr *hdr;
> + struct virtio_net_hdr *hdr = NULL;
>   /* A counter to avoid desc dead loop chain */
>   uint32_t nr_desc = 1;
>  
> @@ -698,12 +713,14 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
> vring_desc *descs,
>   (desc->flags & VRING_DESC_F_INDIRECT))
>   return -1;
>  
> - desc_addr = gpa_to_vva(dev, desc->addr);
> - if (unlikely(!desc_addr))
> - return -1;
> + if (virtio_net_with_host_offload(dev)) {
> + desc_addr = gpa_to_vva(dev, desc->addr);
> + if (unlikely(!desc_addr))
> + return -1;
>  
> - hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr);
> - rte_prefetch0(hdr);
> + hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr);
> + rte_prefetch0(hdr);
> + }
>  
>   /*
>* A virtio driver normally uses at least 2 desc buffers
> @@ -720,18 +737,24 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
> vring_desc *descs,
>   if (unlikely(!desc_addr))
>   return -1;
>  
> - rte_prefetch0((void *)(uintptr_t)desc_addr);
> -
>   desc_offset = 0;
>   desc_avail  = desc->len;
>   nr_desc+= 1;
> -
> - PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
>   } else {
> + if (!desc_addr) {
> + desc_addr = gpa_to_vva(dev, desc->addr);
> + if (unlikely(!desc_addr))
> + return -1;
> + }
> +

I think this piece of code make things a bit complex. I think what you
want to achieve is, besides saving hdr prefetch, to save one call to
gpa_to_vva() for the non-ANY_LAYOUT case. Does that matter too much?

How about just saving the hdr prefetch?

if (virtio_net_with_host_offload(dev)) {
hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr);
rte_prefetch0(hdr);
}

--yliu

>   desc_avail  = desc->len - dev->vhost_hlen;
>   desc_offset = dev->vhost_hlen;
>   }
>  
> + rte_prefetch0((void *)(uintptr_t)(desc_addr + desc_offset));
> +
> + PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), desc_avail, 0);
> +
>   mbuf_offset = 0;
>   mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
>   while (1) {
> @@ -795,7 +818,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
> vring_desc *descs,
>   prev->data_len = mbuf_offset;
>   m->pkt_len+= mbuf_offset;
>  
> - if (hdr->flags != 0 || hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE)
> + if (virtio_net_with_host_offload(dev))
>   vhost_dequeue_offload(hdr, m);
>  
>   return 0;
> -- 
> 2.7.4

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

2016-10-11 Thread Yuanhan Liu

On Mon, Oct 10, 2016 at 07:39:59AM +0300, Michael S. Tsirkin wrote:
> > > > > > 1. why is this memset expensive?
> > > > >
> > > > > I don't have the exact answer, but just some rough thoughts:
> > > > >
> > > > > It's an external clib function: there is a call stack and the
> > > > > IP register will bounch back and forth.
> > > >
> > > > for memset 0?  gcc 5.3.1 on fedora happily inlines it.
> > > 
> > > Good to know!
> > > 
> > > > > overkill to use that for resetting 14 bytes structure.
> > > > >
> > > > > Some trick like
> > > > > *(struct virtio_net_hdr *)hdr = {0, };
> > > > >
> > > > > Or even
> > > > > hdr->xxx = 0;
> > > > > hdr->yyy = 0;
> > > > >
> > > > > should behaviour better.
> > > > >
> > > > > There was an example: the vhost enqueue optmization patchset from
> > > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
> > > > > on my Ivybridge server: it has no such issue on his server though.
> > > > >
> > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > > > >
> > > > >   --yliu
> > > >
> > > > I'd say that's weird. what's your config? any chance you
> > > > are using an old compiler?
> > > 
> > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC,
> > > he said the memset is not well optimized for Ivybridge server.
> > 
> > The dst is remote in that case. It's fine on Haswell but has complication
> > in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue.
> > 
> > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.
> 
> 
> So try something like this then:

Yes, I saw memset is inlined when this diff is applied.

So, mind to send a formal patch? You might want to try build at least:
it doesn't build.


> Generally pointer chasing in vq->hw->vtnet_hdr_size can't be good
> for performance. Move fields used on data path into vq
> and use from there to avoid indirections?

Good suggestion!

--yliu

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

2016-10-11 Thread Yuanhan Liu

On Tue, Oct 11, 2016 at 08:39:54AM +0200, Maxime Coquelin wrote:
> 
> 
> On 10/11/2016 08:04 AM, Yuanhan Liu wrote:
> >On Mon, Oct 10, 2016 at 04:54:39PM +0200, Maxime Coquelin wrote:
> >>
> >>
> >>On 10/10/2016 04:42 PM, Yuanhan Liu wrote:
> >>>On Mon, Oct 10, 2016 at 02:40:44PM +0200, Maxime Coquelin wrote:
> >>>>>>>At that time, a packet always use 2 descs. Since indirect desc is
> >>>>>>>enabled (by default) now, the assumption is not true then. What's
> >>>>>>>worse, it might even slow things a bit down. That should also be
> >>>>>>>part of the reason why performance is slightly worse than before.
> >>>>>>>
> >>>>>>>   --yliu
> >>>>>>
> >>>>>>I'm not sure I get what you are saying
> >>>>>>
> >>>>>>>commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
> >>>>>>>Author: Yuanhan Liu 
> >>>>>>>Date:   Mon May 2 17:46:17 2016 -0700
> >>>>>>>
> >>>>>>>  vhost: optimize dequeue for small packets
> >>>>>>>
> >>>>>>>  A virtio driver normally uses at least 2 desc buffers for Tx: the
> >>>>>>>  first for storing the header, and the others for storing the data.
> >>>>>>>
> >>>>>>>  Therefore, we could fetch the first data desc buf before the main
> >>>>>>>  loop, and do the copy first before the check of "are we done yet?".
> >>>>>>>  This could save one check for small packets that just have one data
> >>>>>>>  desc buffer and need one mbuf to store it.
> >>>>>>>
> >>>>>>>  Signed-off-by: Yuanhan Liu 
> >>>>>>>  Acked-by: Huawei Xie 
> >>>>>>>  Tested-by: Rich Lane 
> >>>>>>
> >>>>>>This fast-paths the 2-descriptors format but it's not active
> >>>>>>for indirect descriptors. Is this what you mean?
> >>>>>
> >>>>>Yes. It's also not active when ANY_LAYOUT is actually turned on.
> >>>>>>Should be a simple matter to apply this optimization for indirect.
> >>>>>
> >>>>>Might be.
> >>>>
> >>>>If I understand the code correctly, indirect descs also benefit from this
> >>>>optimization, or am I missing something?
> >>>
> >>>Aha..., you are right!
> >>
> >>The interesting thing is that the patch I send on Thursday that removes
> >>header access when no offload has been negotiated[0] seems to reduce
> >>almost to zero the performance seen with indirect descriptors enabled.
> >
> >Didn't follow that.
> >
> >>I see this with 64 bytes packets using testpmd on both ends.
> >>
> >>When I did the patch, I would have expected the same gain with both
> >>modes, whereas I measured +1% for direct and +4% for indirect.
> >
> >IIRC, I did a test before (remove those offload code piece), and the
> >performance was basically the same before and after that. Well, there
> >might be some small difference, say 1% as you said. But the result has
> >never been steady.
> >
> >Anyway, I think your patch is good to have: I just didn't see v2.
> 
> I waited to gather some comments/feedback before sending the v2.
> I'll send it today or tomorrow.

Interesting, I saw a deadlock then: I haven't looked at the code
carefully once you said there is a v2, thus I'm waiting for it.
However, you are waitting for my review. :)

Anyway, I will take time to look at it shortly.

--yliu

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

2016-10-11 Thread Yuanhan Liu

On Mon, Oct 10, 2016 at 04:54:39PM +0200, Maxime Coquelin wrote:
> 
> 
> On 10/10/2016 04:42 PM, Yuanhan Liu wrote:
> >On Mon, Oct 10, 2016 at 02:40:44PM +0200, Maxime Coquelin wrote:
> >>>>>At that time, a packet always use 2 descs. Since indirect desc is
> >>>>>enabled (by default) now, the assumption is not true then. What's
> >>>>>worse, it might even slow things a bit down. That should also be
> >>>>>part of the reason why performance is slightly worse than before.
> >>>>>
> >>>>> --yliu
> >>>>
> >>>>I'm not sure I get what you are saying
> >>>>
> >>>>>commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
> >>>>>Author: Yuanhan Liu 
> >>>>>Date:   Mon May 2 17:46:17 2016 -0700
> >>>>>
> >>>>>   vhost: optimize dequeue for small packets
> >>>>>
> >>>>>   A virtio driver normally uses at least 2 desc buffers for Tx: the
> >>>>>   first for storing the header, and the others for storing the data.
> >>>>>
> >>>>>   Therefore, we could fetch the first data desc buf before the main
> >>>>>   loop, and do the copy first before the check of "are we done yet?".
> >>>>>   This could save one check for small packets that just have one data
> >>>>>   desc buffer and need one mbuf to store it.
> >>>>>
> >>>>>   Signed-off-by: Yuanhan Liu 
> >>>>>   Acked-by: Huawei Xie 
> >>>>>   Tested-by: Rich Lane 
> >>>>
> >>>>This fast-paths the 2-descriptors format but it's not active
> >>>>for indirect descriptors. Is this what you mean?
> >>>
> >>>Yes. It's also not active when ANY_LAYOUT is actually turned on.
> >>>>Should be a simple matter to apply this optimization for indirect.
> >>>
> >>>Might be.
> >>
> >>If I understand the code correctly, indirect descs also benefit from this
> >>optimization, or am I missing something?
> >
> >Aha..., you are right!
> 
> The interesting thing is that the patch I send on Thursday that removes
> header access when no offload has been negotiated[0] seems to reduce
> almost to zero the performance seen with indirect descriptors enabled.

Didn't follow that.

> I see this with 64 bytes packets using testpmd on both ends.
> 
> When I did the patch, I would have expected the same gain with both
> modes, whereas I measured +1% for direct and +4% for indirect.

IIRC, I did a test before (remove those offload code piece), and the
performance was basically the same before and after that. Well, there
might be some small difference, say 1% as you said. But the result has
never been steady.

Anyway, I think your patch is good to have: I just didn't see v2.

--yliu

[dpdk-dev] 17.02 Roadmap

2016-10-11 Thread Yuanhan Liu

On Mon, Oct 10, 2016 at 10:42:58PM +0200, Thomas Monjalon wrote:
> > Support New Device Types in Vhost-User: Support will be added to vhost-user 
> > for new device types including vhost-scsi and vhost-blk.
> 
> Great!
> Is it only related to networking or should we expect some storage-related
> code or drivers to come up?

It's not only netowrking related. It just introduces few more APIs to
export those buf infos (virt addr, phys addr, len, etc) to applications,
so that they can handle/parse the data in the way they want. For example,
for SPDK (https://01.org/spdk), they can use those APIs to fetch guest
data and parse it following virtio-SCSI spec.

The dequeue path (guest Tx) might look like something below:

rte_vhost_dequeue_vring_desc_burst(vid, queue_id, iov, count)
{
for (i = 0; i < count; i++) {
desc_idx = get_next_desc();

iov[i]->addr  = 
desc_virt_addr(desc[desc_idx]->addr);
iov[i]->phys_addr = 
desc_phys_addr(desc[desc_idx]->addr);
iov[i]->len   = desc[desc_idx]->len;
iov[i]->desc  = desc_idx;
}

return i;
}

rte_vhost_update_used_ring(vid, queue_id, descs, count)
{
for (i = 0; i < count; i++) {
used_idx = get_next_used_idx();
vq->used->ring[used_idx] = descs[i];
}

vq->used->idx += i;
}

And we introduce similar APIs to the enqueue path.

--yliu

[dpdk-dev] [PATCH V2 2/2] virtio: support IOMMU platform

2016-10-11 Thread Yuanhan Liu


On Fri, Oct 07, 2016 at 07:24:44AM +0300, Michael S. Tsirkin wrote:
> On Wed, Sep 28, 2016 at 04:25:12PM +0800, Jason Wang wrote:
> > Negotiate VIRTIO_F_IOMMU_PLATFORM to have IOMMU support.
> > 
> > Signed-off-by: Jason Wang 
> > ---
> > Changes from v1:
> > - remove unnecessary NEED_MAPPING flag
> 
> One thing we probably should do is enable this flag
> with VFIO but not with UIO or VFIO-noiommu.

Good suggestion. I think we could do that in another patch.

> > ---
> >  drivers/net/virtio/virtio_ethdev.h | 3 ++-
> >  drivers/net/virtio/virtio_pci.h| 3 ++-
> >  2 files changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/virtio/virtio_ethdev.h 
> > b/drivers/net/virtio/virtio_ethdev.h
> > index 2ecec6e..04a06e2 100644
> > --- a/drivers/net/virtio/virtio_ethdev.h
> > +++ b/drivers/net/virtio/virtio_ethdev.h
> > @@ -63,7 +63,8 @@
> >  1u << VIRTIO_NET_F_CTRL_RX   | \
> >  1u << VIRTIO_NET_F_CTRL_VLAN | \
> >  1u << VIRTIO_NET_F_MRG_RXBUF | \
> > -1ULL << VIRTIO_F_VERSION_1)
> > +1ULL << VIRTIO_F_VERSION_1   | \
> > +1ULL << VIRTIO_F_IOMMU_PLATFORM )
> 
> Space before ) looks kind of ugly.

Yes, a bit. I will remove it while apply.

> >  #define VIRTIO_TRANSPORT_F_START 28
> > -#define VIRTIO_TRANSPORT_F_END   32
> > +#define VIRTIO_TRANSPORT_F_END   34
> >  
> 
> This seems unused. Drop it?

Indeed. I will submit a patch to remove both (_START and _END).

--yliu

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1146 matches

Mail list logo