Re: [PATCH V3 0/7] mdev based hardware virtio offloading support

2019-10-14 Thread Jason Wang


On 2019/10/15 上午1:49, Stefan Hajnoczi wrote:

On Fri, Oct 11, 2019 at 04:15:50PM +0800, Jason Wang wrote:

There are hardware that can do virtio datapath offloading while having
its own control path. This path tries to implement a mdev based
unified API to support using kernel virtio driver to drive those
devices. This is done by introducing a new mdev transport for virtio
(virtio_mdev) and register itself as a new kind of mdev driver. Then
it provides a unified way for kernel virtio driver to talk with mdev
device implementation.

Though the series only contains kernel driver support, the goal is to
make the transport generic enough to support userspace drivers. This
means vhost-mdev[1] could be built on top as well by resuing the
transport.

A sample driver is also implemented which simulate a virito-net
loopback ethernet device on top of vringh + workqueue. This could be
used as a reference implementation for real hardware driver.

Consider mdev framework only support VFIO device and driver right now,
this series also extend it to support other types. This is done
through introducing class id to the device and pairing it with
id_talbe claimed by the driver. On top, this seris also decouple
device specific parents ops out of the common ones.

I was curious so I took a quick look and posted comments.

I guess this driver runs inside the guest since it registers virtio
devices?



It could run in either guest or host. But the main focus is to run in 
the host then we can use virtio drivers in containers.





If this is used with physical PCI devices that support datapath
offloading then how are physical devices presented to the guest without
SR-IOV?



We will do control path meditation through vhost-mdev[1] and 
vhost-vfio[2]. Then we will present a full virtio compatible ethernet 
device for guest.


SR-IOV is not a must, any mdev device that implements the API defined in 
patch 5 can be used by this framework.


Thanks

[1] https://lkml.org/lkml/2019/9/26/15

[2] https://patchwork.ozlabs.org/cover/984763/




Stefan

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 6/7] virtio: introduce a mdev based transport

2019-10-14 Thread Jason Wang


On 2019/10/15 上午1:39, Stefan Hajnoczi wrote:

On Fri, Oct 11, 2019 at 04:15:56PM +0800, Jason Wang wrote:

+struct virtio_mdev_device {
+   struct virtio_device vdev;
+   struct mdev_device *mdev;
+   unsigned long version;
+
+   struct virtqueue **vqs;
+   /* The lock to protect virtqueue list */
+   spinlock_t lock;
+   struct list_head virtqueues;

Is this a list of struct virtio_mdev_vq_info?  Please document the
actual type in a comment.



Ok.



+static int virtio_mdev_find_vqs(struct virtio_device *vdev, unsigned nvqs,
+   struct virtqueue *vqs[],
+   vq_callback_t *callbacks[],
+   const char * const names[],
+   const bool *ctx,
+   struct irq_affinity *desc)
+{
+   struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+   struct mdev_device *mdev = vm_get_mdev(vdev);
+   const struct virtio_mdev_device_ops *ops = mdev_get_dev_ops(mdev);
+   struct virtio_mdev_callback cb;
+   int i, err, queue_idx = 0;
+
+   vm_dev->vqs = kmalloc_array(queue_idx, sizeof(*vm_dev->vqs),
+   GFP_KERNEL);

kmalloc_array(0, ...)?  I would have expected nvqs instead of queue_idx
(0).

What is this the purpose of vm_dev->vqs and does anything ever access it?



It's useless, will remove it.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 5/7] mdev: introduce virtio device and its device ops

2019-10-14 Thread Jason Wang


On 2019/10/15 上午1:23, Stefan Hajnoczi wrote:

On Fri, Oct 11, 2019 at 04:15:55PM +0800, Jason Wang wrote:

+ * @set_vq_cb: Set the interrut calback function for

s/interrut/interrupt/

s/calback/callback/



Fixed.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v4 0/5] vhost: ring format independence

2019-10-14 Thread Jason Wang


On 2019/10/13 下午7:41, Michael S. Tsirkin wrote:

This adds infrastructure required for supporting
multiple ring formats.

The idea is as follows: we convert descriptors to an
independent format first, and process that converting to
iov later.

The point is that we have a tight loop that fetches
descriptors, which is good for cache utilization.
This will also allow all kind of batching tricks -
e.g. it seems possible to keep SMAP disabled while
we are fetching multiple descriptors.

This seems to perform exactly the same as the original
code already based on a microbenchmark.
Lightly tested.
More testing would be very much appreciated.

To use new code:
echo 1 > /sys/module/vhost_test/parameters/newcode
or
echo 1 > /sys/module/vhost_net/parameters/newcode

changes from v3:
 - fixed error handling in case of indirect descriptors
 - add BUG_ON to detect buffer overflow in case of bugs
 in response to comment by Jason Wang
 - minor code tweaks

Changes from v2:
- fixed indirect descriptor batching
 reported by Jason Wang

Changes from v1:
- typo fixes



I've just done some quick benchmark with testpmd + vhost_net txonly.

With 256 queue size, no difference but in 1024 queue size 1% regression 
of PPS were found.


Thanks





Michael S. Tsirkin (5):
   vhost: option to fetch descriptors through an independent struct
   vhost/test: add an option to test new code
   vhost: batching fetches
   vhost/net: add an option to test new code
   vhost: last descriptor must have NEXT clear

  drivers/vhost/net.c   |  32 -
  drivers/vhost/test.c  |  19 ++-
  drivers/vhost/vhost.c | 328 +-
  drivers/vhost/vhost.h |  20 ++-
  4 files changed, 385 insertions(+), 14 deletions(-)


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 0/7] mdev based hardware virtio offloading support

2019-10-14 Thread Stefan Hajnoczi
On Fri, Oct 11, 2019 at 04:15:50PM +0800, Jason Wang wrote:
> There are hardware that can do virtio datapath offloading while having
> its own control path. This path tries to implement a mdev based
> unified API to support using kernel virtio driver to drive those
> devices. This is done by introducing a new mdev transport for virtio
> (virtio_mdev) and register itself as a new kind of mdev driver. Then
> it provides a unified way for kernel virtio driver to talk with mdev
> device implementation.
> 
> Though the series only contains kernel driver support, the goal is to
> make the transport generic enough to support userspace drivers. This
> means vhost-mdev[1] could be built on top as well by resuing the
> transport.
> 
> A sample driver is also implemented which simulate a virito-net
> loopback ethernet device on top of vringh + workqueue. This could be
> used as a reference implementation for real hardware driver.
> 
> Consider mdev framework only support VFIO device and driver right now,
> this series also extend it to support other types. This is done
> through introducing class id to the device and pairing it with
> id_talbe claimed by the driver. On top, this seris also decouple
> device specific parents ops out of the common ones.

I was curious so I took a quick look and posted comments.

I guess this driver runs inside the guest since it registers virtio
devices?

If this is used with physical PCI devices that support datapath
offloading then how are physical devices presented to the guest without
SR-IOV?

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 6/7] virtio: introduce a mdev based transport

2019-10-14 Thread Stefan Hajnoczi
On Fri, Oct 11, 2019 at 04:15:56PM +0800, Jason Wang wrote:
> +struct virtio_mdev_device {
> + struct virtio_device vdev;
> + struct mdev_device *mdev;
> + unsigned long version;
> +
> + struct virtqueue **vqs;
> + /* The lock to protect virtqueue list */
> + spinlock_t lock;
> + struct list_head virtqueues;

Is this a list of struct virtio_mdev_vq_info?  Please document the
actual type in a comment.

> +static int virtio_mdev_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> + struct virtqueue *vqs[],
> + vq_callback_t *callbacks[],
> + const char * const names[],
> + const bool *ctx,
> + struct irq_affinity *desc)
> +{
> + struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
> + struct mdev_device *mdev = vm_get_mdev(vdev);
> + const struct virtio_mdev_device_ops *ops = mdev_get_dev_ops(mdev);
> + struct virtio_mdev_callback cb;
> + int i, err, queue_idx = 0;
> +
> + vm_dev->vqs = kmalloc_array(queue_idx, sizeof(*vm_dev->vqs),
> + GFP_KERNEL);

kmalloc_array(0, ...)?  I would have expected nvqs instead of queue_idx
(0).

What is this the purpose of vm_dev->vqs and does anything ever access it?


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 5/7] mdev: introduce virtio device and its device ops

2019-10-14 Thread Stefan Hajnoczi
On Fri, Oct 11, 2019 at 04:15:55PM +0800, Jason Wang wrote:
> + * @set_vq_cb:   Set the interrut calback function for

s/interrut/interrupt/

s/calback/callback/


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 01/25] crypto: virtio - implement missing support for output IVs

2019-10-14 Thread Ard Biesheuvel
In order to allow for CBC to be chained, which is something that the
CTS template relies upon, implementations of CBC need to pass the
IV to be used for subsequent invocations via the IV buffer. This was
not implemented yet for virtio-crypto so implement it now.

Fixes: dbaf0624ffa5 ("crypto: add virtio-crypto driver")
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Gonglei 
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Ard Biesheuvel 
---
 drivers/crypto/virtio/virtio_crypto_algs.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c 
b/drivers/crypto/virtio/virtio_crypto_algs.c
index 42d19205166b..65ec10800137 100644
--- a/drivers/crypto/virtio/virtio_crypto_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_algs.c
@@ -437,6 +437,11 @@ __virtio_crypto_ablkcipher_do_req(struct 
virtio_crypto_sym_request *vc_sym_req,
goto free;
}
memcpy(iv, req->info, ivsize);
+   if (!vc_sym_req->encrypt)
+   scatterwalk_map_and_copy(req->info, req->src,
+req->nbytes - AES_BLOCK_SIZE,
+AES_BLOCK_SIZE, 0);
+
sg_init_one(_sg, iv, ivsize);
sgs[num_out++] = _sg;
vc_sym_req->iv = iv;
@@ -563,6 +568,10 @@ static void virtio_crypto_ablkcipher_finalize_req(
struct ablkcipher_request *req,
int err)
 {
+   if (vc_sym_req->encrypt)
+   scatterwalk_map_and_copy(req->info, req->dst,
+req->nbytes - AES_BLOCK_SIZE,
+AES_BLOCK_SIZE, 0);
crypto_finalize_ablkcipher_request(vc_sym_req->base.dataq->engine,
   req, err);
kzfree(vc_sym_req->iv);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 03/25] crypto: virtio - switch to skcipher API

2019-10-14 Thread Ard Biesheuvel
Commit 7a7ffe65c8c5 ("crypto: skcipher - Add top-level skcipher interface")
dated 20 august 2015 introduced the new skcipher API which is supposed to
replace both blkcipher and ablkcipher. While all consumers of the API have
been converted long ago, some producers of the ablkcipher remain, forcing
us to keep the ablkcipher support routines alive, along with the matching
code to expose [a]blkciphers via the skcipher API.

So switch this driver to the skcipher API, allowing us to finally drop the
blkcipher code in the near future.

Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Gonglei 
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Ard Biesheuvel 
---
 drivers/crypto/virtio/virtio_crypto_algs.c   | 187 ++--
 drivers/crypto/virtio/virtio_crypto_common.h |   2 +-
 2 files changed, 92 insertions(+), 97 deletions(-)

diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c 
b/drivers/crypto/virtio/virtio_crypto_algs.c
index 82b316b2f537..4b71e80951b7 100644
--- a/drivers/crypto/virtio/virtio_crypto_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_algs.c
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -16,10 +17,10 @@
 #include "virtio_crypto_common.h"
 
 
-struct virtio_crypto_ablkcipher_ctx {
+struct virtio_crypto_skcipher_ctx {
struct crypto_engine_ctx enginectx;
struct virtio_crypto *vcrypto;
-   struct crypto_tfm *tfm;
+   struct crypto_skcipher *tfm;
 
struct virtio_crypto_sym_session_info enc_sess_info;
struct virtio_crypto_sym_session_info dec_sess_info;
@@ -30,8 +31,8 @@ struct virtio_crypto_sym_request {
 
/* Cipher or aead */
uint32_t type;
-   struct virtio_crypto_ablkcipher_ctx *ablkcipher_ctx;
-   struct ablkcipher_request *ablkcipher_req;
+   struct virtio_crypto_skcipher_ctx *skcipher_ctx;
+   struct skcipher_request *skcipher_req;
uint8_t *iv;
/* Encryption? */
bool encrypt;
@@ -41,7 +42,7 @@ struct virtio_crypto_algo {
uint32_t algonum;
uint32_t service;
unsigned int active_devs;
-   struct crypto_alg algo;
+   struct skcipher_alg algo;
 };
 
 /*
@@ -49,9 +50,9 @@ struct virtio_crypto_algo {
  * and crypto algorithms registion.
  */
 static DEFINE_MUTEX(algs_lock);
-static void virtio_crypto_ablkcipher_finalize_req(
+static void virtio_crypto_skcipher_finalize_req(
struct virtio_crypto_sym_request *vc_sym_req,
-   struct ablkcipher_request *req,
+   struct skcipher_request *req,
int err);
 
 static void virtio_crypto_dataq_sym_callback
@@ -59,7 +60,7 @@ static void virtio_crypto_dataq_sym_callback
 {
struct virtio_crypto_sym_request *vc_sym_req =
container_of(vc_req, struct virtio_crypto_sym_request, base);
-   struct ablkcipher_request *ablk_req;
+   struct skcipher_request *ablk_req;
int error;
 
/* Finish the encrypt or decrypt process */
@@ -79,8 +80,8 @@ static void virtio_crypto_dataq_sym_callback
error = -EIO;
break;
}
-   ablk_req = vc_sym_req->ablkcipher_req;
-   virtio_crypto_ablkcipher_finalize_req(vc_sym_req,
+   ablk_req = vc_sym_req->skcipher_req;
+   virtio_crypto_skcipher_finalize_req(vc_sym_req,
ablk_req, error);
}
 }
@@ -110,8 +111,8 @@ virtio_crypto_alg_validate_key(int key_len, uint32_t *alg)
return 0;
 }
 
-static int virtio_crypto_alg_ablkcipher_init_session(
-   struct virtio_crypto_ablkcipher_ctx *ctx,
+static int virtio_crypto_alg_skcipher_init_session(
+   struct virtio_crypto_skcipher_ctx *ctx,
uint32_t alg, const uint8_t *key,
unsigned int keylen,
int encrypt)
@@ -200,8 +201,8 @@ static int virtio_crypto_alg_ablkcipher_init_session(
return 0;
 }
 
-static int virtio_crypto_alg_ablkcipher_close_session(
-   struct virtio_crypto_ablkcipher_ctx *ctx,
+static int virtio_crypto_alg_skcipher_close_session(
+   struct virtio_crypto_skcipher_ctx *ctx,
int encrypt)
 {
struct scatterlist outhdr, status_sg, *sgs[2];
@@ -261,8 +262,8 @@ static int virtio_crypto_alg_ablkcipher_close_session(
return 0;
 }
 
-static int virtio_crypto_alg_ablkcipher_init_sessions(
-   struct virtio_crypto_ablkcipher_ctx *ctx,
+static int virtio_crypto_alg_skcipher_init_sessions(
+   struct virtio_crypto_skcipher_ctx *ctx,
const uint8_t *key, unsigned int keylen)
 {
uint32_t alg;
@@ -278,30 +279,30 @@ static int virtio_crypto_alg_ablkcipher_init_sessions(
goto bad_key;
 
/* Create encryption session */
-   ret = virtio_crypto_alg_ablkcipher_init_session(ctx,
+   ret = virtio_crypto_alg_skcipher_init_session(ctx,
alg, key, 

[PATCH 02/25] crypto: virtio - deal with unsupported input sizes

2019-10-14 Thread Ard Biesheuvel
Return -EINVAL for input sizes that are not a multiple of the AES
block size, since they are not supported by our CBC chaining mode.

While at it, remove the pr_err() that reports unsupported key sizes
being used: we shouldn't spam the kernel log with that.

Fixes: dbaf0624ffa5 ("crypto: add virtio-crypto driver")
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Gonglei 
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Ard Biesheuvel 
---
 drivers/crypto/virtio/virtio_crypto_algs.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c 
b/drivers/crypto/virtio/virtio_crypto_algs.c
index 65ec10800137..82b316b2f537 100644
--- a/drivers/crypto/virtio/virtio_crypto_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_algs.c
@@ -105,8 +105,6 @@ virtio_crypto_alg_validate_key(int key_len, uint32_t *alg)
*alg = VIRTIO_CRYPTO_CIPHER_AES_CBC;
break;
default:
-   pr_err("virtio_crypto: Unsupported key length: %d\n",
-   key_len);
return -EINVAL;
}
return 0;
@@ -489,6 +487,11 @@ static int virtio_crypto_ablkcipher_encrypt(struct 
ablkcipher_request *req)
/* Use the first data virtqueue as default */
struct data_queue *data_vq = >data_vq[0];
 
+   if (!req->nbytes)
+   return 0;
+   if (req->nbytes % AES_BLOCK_SIZE)
+   return -EINVAL;
+
vc_req->dataq = data_vq;
vc_req->alg_cb = virtio_crypto_dataq_sym_callback;
vc_sym_req->ablkcipher_ctx = ctx;
@@ -509,6 +512,11 @@ static int virtio_crypto_ablkcipher_decrypt(struct 
ablkcipher_request *req)
/* Use the first data virtqueue as default */
struct data_queue *data_vq = >data_vq[0];
 
+   if (!req->nbytes)
+   return 0;
+   if (req->nbytes % AES_BLOCK_SIZE)
+   return -EINVAL;
+
vc_req->dataq = data_vq;
vc_req->alg_cb = virtio_crypto_dataq_sym_callback;
vc_sym_req->ablkcipher_ctx = ctx;
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] dma-mapping: Add dma_addr_is_phys_addr()

2019-10-14 Thread Robin Murphy

On 14/10/2019 05:51, David Gibson wrote:

On Fri, Oct 11, 2019 at 06:25:18PM -0700, Ram Pai wrote:

From: Thiago Jung Bauermann 

In order to safely use the DMA API, virtio needs to know whether DMA
addresses are in fact physical addresses and for that purpose,
dma_addr_is_phys_addr() is introduced.

cc: Benjamin Herrenschmidt 
cc: David Gibson 
cc: Michael Ellerman 
cc: Paul Mackerras 
cc: Michael Roth 
cc: Alexey Kardashevskiy 
cc: Paul Burton 
cc: Robin Murphy 
cc: Bartlomiej Zolnierkiewicz 
cc: Marek Szyprowski 
cc: Christoph Hellwig 
Suggested-by: Michael S. Tsirkin 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 


The change itself looks ok, so

Reviewed-by: David Gibson 

However, I would like to see the commit message (and maybe the inline
comments) expanded a bit on what the distinction here is about.  Some
of the text from the next patch would be suitable, about DMA addresses
usually being in a different address space but not in the case of
bounce buffering.


Right, this needs a much tighter definition. "DMA address happens to be 
a valid physical address" is true of various IOMMU setups too, but I 
can't believe it's meaningful in such cases.


If what you actually want is "DMA is direct or SWIOTLB" - i.e. "DMA 
address is physical address of DMA data (not necessarily the original 
buffer)" - wouldn't dma_is_direct() suffice?


Robin.


---
  arch/powerpc/include/asm/dma-mapping.h | 21 +
  arch/powerpc/platforms/pseries/Kconfig |  1 +
  include/linux/dma-mapping.h| 20 
  kernel/dma/Kconfig |  3 +++
  4 files changed, 45 insertions(+)

diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 565d6f7..f92c0a4b 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -5,6 +5,8 @@
  #ifndef _ASM_DMA_MAPPING_H
  #define _ASM_DMA_MAPPING_H
  
+#include 

+
  static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
  {
/* We don't handle the NULL dev case for ISA for now. We could
@@ -15,4 +17,23 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return NULL;
  }
  
+#ifdef CONFIG_ARCH_HAS_DMA_ADDR_IS_PHYS_ADDR

+/**
+ * dma_addr_is_phys_addr - check whether a device DMA address is a physical
+ * address
+ * @dev:   device to check
+ *
+ * Returns %true if any DMA address for this device happens to also be a valid
+ * physical address (not necessarily of the same page).
+ */
+static inline bool dma_addr_is_phys_addr(struct device *dev)
+{
+   /*
+* Secure guests always use the SWIOTLB, therefore DMA addresses are
+* actually the physical address of the bounce buffer.
+*/
+   return is_secure_guest();
+}
+#endif
+
  #endif/* _ASM_DMA_MAPPING_H */
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 9e35cdd..0108150 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -152,6 +152,7 @@ config PPC_SVM
select SWIOTLB
select ARCH_HAS_MEM_ENCRYPT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+   select ARCH_HAS_DMA_ADDR_IS_PHYS_ADDR
help
 There are certain POWER platforms which support secure guests using
 the Protected Execution Facility, with the help of an Ultravisor
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f7d1eea..6df5664 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -693,6 +693,26 @@ static inline bool dma_addressing_limited(struct device 
*dev)
dma_get_required_mask(dev);
  }
  
+#ifndef CONFIG_ARCH_HAS_DMA_ADDR_IS_PHYS_ADDR

+/**
+ * dma_addr_is_phys_addr - check whether a device DMA address is a physical
+ * address
+ * @dev:   device to check
+ *
+ * Returns %true if any DMA address for this device happens to also be a valid
+ * physical address (not necessarily of the same page).
+ */
+static inline bool dma_addr_is_phys_addr(struct device *dev)
+{
+   /*
+* Except in very specific setups, DMA addresses exist in a different
+* address space from CPU physical addresses and cannot be directly used
+* to reference system memory.
+*/
+   return false;
+}
+#endif
+
  #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent);
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 9decbba..6209b46 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -51,6 +51,9 @@ config ARCH_HAS_DMA_MMAP_PGPROT
  config ARCH_HAS_FORCE_DMA_UNENCRYPTED
bool
  
+config ARCH_HAS_DMA_ADDR_IS_PHYS_ADDR

+   bool
+
  config DMA_NONCOHERENT_CACHE_SYNC
bool
  



___

Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

2019-10-14 Thread Stefano Garzarella
On Mon, Oct 14, 2019 at 04:21:35PM +0800, Jason Wang wrote:
> On 2019/10/14 下午4:17, Stefan Hajnoczi wrote:
> > SO_VM_SOCKETS_BUFFER_SIZE might have been useful for VMCI-specific
> > applications, but we should use SO_RCVBUF and SO_SNDBUF for portable
> > applications in the future.  Those socket options also work with other
> > address families.
> > 

I think hyperv_transport started to use it in this patch:
ac383f58f3c9  hv_sock: perf: Allow the socket buffer size options to influence
  the actual socket buffers


> > I guess these sockopts are bypassed by AF_VSOCK because it doesn't use
> > the common skb queuing code in net/core/sock.c:(.  But one day we might
> > migrate to it...
> > 
> > Stefan
> 
> 
> +1, we should really consider to reuse the exist socket mechanism instead of
> re-inventing wheels.

+1, I totally agree. I'll go this way.

Guys, thank you all for your suggestions!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

2019-10-14 Thread Jason Wang


On 2019/10/14 下午4:17, Stefan Hajnoczi wrote:

SO_VM_SOCKETS_BUFFER_SIZE might have been useful for VMCI-specific
applications, but we should use SO_RCVBUF and SO_SNDBUF for portable
applications in the future.  Those socket options also work with other
address families.

I guess these sockopts are bypassed by AF_VSOCK because it doesn't use
the common skb queuing code in net/core/sock.c:(.  But one day we might
migrate to it...

Stefan



+1, we should really consider to reuse the exist socket mechanism 
instead of re-inventing wheels.


Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

2019-10-14 Thread Stefan Hajnoczi
On Fri, Oct 11, 2019 at 03:40:48PM +0200, Stefano Garzarella wrote:
> On Sun, Sep 1, 2019 at 8:56 AM Michael S. Tsirkin  wrote:
> > On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > and pushed to the guest using the vring, are directly queued in
> > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > with a fixed size (4 KB).
> > > > >
> > > > > The maximum amount of memory used by each socket should be
> > > > > controlled by the credit mechanism.
> > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > to avoid starvation of other sockets.
> > > > >
> > > > > This patch mitigates this issue copying the payload of small
> > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > order to avoid wasting memory.
> > > > >
> > > > > Reviewed-by: Stefan Hajnoczi 
> > > > > Signed-off-by: Stefano Garzarella 
> > > >
> > > > This is good enough for net-next, but for net I think we
> > > > should figure out how to address the issue completely.
> > > > Can we make the accounting precise? What happens to
> > > > performance if we do?
> > > >
> > >
> > > Since I'm back from holidays, I'm restarting this thread to figure out
> > > how to address the issue completely.
> > >
> > > I did a better analysis of the credit mechanism that we implemented in
> > > virtio-vsock to get a clearer view and I'd share it with you:
> > >
> > > This issue affect only the "host->guest" path. In this case, when the
> > > host wants to send a packet to the guest, it uses a "free" buffer
> > > allocated by the guest (4KB).
> > > The "free" buffers available for the host are shared between all
> > > sockets, instead, the credit mechanism is per-socket, I think to
> > > avoid the starvation of others sockets.
> > > The guests re-fill the "free" queue when the available buffers are
> > > less than half.
> > >
> > > Each peer have these variables in the per-socket state:
> > >/* local vars */
> > >buf_alloc/* max bytes usable by this socket
> > >[exposed to the other peer] */
> > >fwd_cnt  /* increased when RX packet is consumed by the
> > >user space [exposed to the other peer] */
> > >tx_cnt /* increased when TX packet is sent to the 
> > > other peer */
> > >
> > >/* remote vars  */
> > >peer_buf_alloc   /* peer's buf_alloc */
> > >peer_fwd_cnt /* peer's fwd_cnt */
> > >
> > > When a peer sends a packet, it increases the 'tx_cnt'; when the
> > > receiver consumes the packet (copy it to the user-space buffer), it
> > > increases the 'fwd_cnt'.
> > > Note: increments are made considering the payload length and not the
> > > buffer length.
> > >
> > > The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
> > > all packet headers or with an explicit CREDIT_UPDATE packet.
> > >
> > > The local 'buf_alloc' value can be modified by the user space using
> > > setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> > >
> > > Before to send a packet, the peer checks the space available:
> > >   credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
> > > and it will send up to credit_available bytes to the other peer.
> > >
> > > Possible solutions considering Michael's advice:
> > > 1. Use the buffer length instead of the payload length when we increment
> > >the counters:
> > >   - This approach will account precisely the memory used per socket.
> > >   - This requires changes in both guest and host.
> > >   - It is not compatible with old drivers, so a feature should be 
> > > negotiated.
> > > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
> > >the socket queue but not used. (e.g. 256 byte used on 4K available in
> > >the buffer)
> > >   - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
> > >   - This should be compatible also with old drivers.
> > >
> > > Maybe the second is less invasive, but will it be too tricky?
> > > Any other advice or suggestions?
> > >
> > > Thanks in advance,
> > > Stefano
> >
> > OK let me try to clarify.  The idea is this:
> >
> > Let's say we queue a buffer of 4K, and we copy if len < 128 bytes.  This
> > means that in the worst case (128 byte packets), each byte of credit in
> > the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
> > to also account for the