Re: [PATCH linux-next v3 2/6] vdpa: Introduce query of device config layout

2021-06-24 Thread Jason Wang


在 2021/6/24 下午3:59, Parav Pandit 写道:



From: Jason Wang 
Sent: Thursday, June 24, 2021 12:35 PM


Consider we had a mature set of virtio specific uAPI for config space.
It would be a burden if we need an unnecessary translation layer of
netlink in the middle:

[vDPA parent (virtio_net_config)] <-> [netlink
(VDPA_ATTR_DEV_NET_XX)] <-> [userspace

(VDPA_ATTR_DEV_NET_XX)]

<-> [ user (virtio_net_config)]

This translation is not there. We show relevant net config fields as

VDPA_ATTR_DEV_NET individually.

It is not a binary dump which is harder for users to parse and make any use

of it.


The is done implicitly, user needs to understand the semantic of
virtio_net_config and map the individual fields to the vdpa tool sub-
command.

Mostly not virtio_net_config is for the producer and consumer sw entities.
Here user doesn't know about such layout and where its located.
User only sets config params that gets set in the config space.
(without understanding what is config layout and its location).




It is only one level of translation from virtio_net_config (kernel) -> netlink

vdpa fields.

It is similar to 'struct netdevice' -> rtnl info fields.


I think not, the problem is that the netdevice is not a part of uAPI but
virtio_net_config is.

Virtio_net_config is a UAPI for sw consumption.
That way yes, netlink can also do it, however it requires side channel 
communicate what is valid.




If we make netlink simply a transport, it would be much easier. And we

had

the chance to unify the logic of build_config() and set_config() in the

driver.

How? We need bit mask to tell that out of 21 fields which fields to update

and which not.

And that is further mixed with offset and length.


So set_config() could be called from userspace, so did build_config().
The only difference is:

1) they're using different transport, ioctl vs netlink
2) build_config() is only expected to be called by the management tool

If qemu works well via set_config ioctl, netlink should work as well.


mlx5 set_config is noop.
vdpa_set_config() need to return an error code. I don't
vp_vdpa.c blindly writes the config as its passthrough.
Parsing which fields to write and which not, using offset and length is a messy 
code with typecast and compare old values etc.



I don't see why it needs typecast, virtio_net_config is also uABI, you 
can deference the fields directly.






Btw, what happens if management tool tries to modify the mac of vDPA
when the device is already used by the driver?

At present it allows modifying, but it should be improved in future to fail if 
device is in use.



This is something we need to fix I think. Or if it's really useful to 
allowing the attributes to be modified after the device is created.


Why not simply allow the config to be built only at device creation?





And actually, it's not the binary blob since uapi clearly define the
format (e.g struct virtio_net_config), can we find a way to use that?
E.g introduce device/net specific command and passing the blob with
length and negotiated features.

Length may change in future, mostly expand. And parsing based on

length

is not such a clean way.


Length is only for legal checking. The config is self contained with:


Unlikely. When structure size increases later, the parsing will change based

on the length.

Because older kernel would return shorter length with older iproute2 tool.


This is fine since the older kernel only support less features. The only
possible issue if the old iproute 2 runs on new kernel. With the current
proposal, it may cause some config fields can't not be showed.


Not showing is ok.
But the code is messy to typecast on size.


I think it might be useful to introduce a command to simply dump the
config space.



So user space always have to deal and have nasty parsing/typecasting

based on the length.

Such nasty parsing is not required for netlink interface.



That's how userspace (Qemu) is expected to work now. The userspace
should determine the semantic of the fields based on the features.

Differentiate config fields doesn't help much, e.g userspace still need
to differ LINK_UP and ANNOUNCE for the status field.

Yes, this parsing is from constant size u16 status.



[..]


Its not about performance. By the time 1st call is made, features got

updated and it is out of sync with config.

1) get config
2) get device id
3) get features


This requires using features from 3rd netlink output to decode output of

1st netlink output.

Which is a bit odd of netlink.
Other netlink nla_put() probably sending whole structure doesn’t need to

do it.


Well, we can pack them all into a single skb isn't it? (probably with a
config len).


You want to pack features and config both in the single nla_put()?
If so, it isn't necessary. There are more examples in kernel that adds 
individual fields instead of nla_put(blob).
I wouldn’t follow those nla_put() callers.



No, a single skb not single nla_put().

Actually git grep told 

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-24 Thread Jason Wang


在 2021/6/24 下午5:16, Yongji Xie 写道:

On Thu, Jun 24, 2021 at 4:14 PM Jason Wang  wrote:


在 2021/6/24 下午12:46, Yongji Xie 写道:

So we need to deal with both FEATURES_OK and reset, but probably not
DRIVER_OK.


OK, I see. Thanks for the explanation. One more question is how about
clearing the corresponding status bit in get_status() rather than
making set_status() fail. Since the spec recommends this way for
validation which is done in virtio_dev_remove() and
virtio_finalize_features().

Thanks,
Yongji


I think you can. Or it would be even better that we just don't set the
bit during set_status().


Yes, that's what I mean.


I just realize that in vdpa_reset() we had:

static inline void vdpa_reset(struct vdpa_device *vdev)
{
  const struct vdpa_config_ops *ops = vdev->config;

  vdev->features_valid = false;
  ops->set_status(vdev, 0);
}

We probably need to add the synchronization here. E.g re-read with a
timeout.


Looks like the timeout is already in set_status().



Do you mean the VDUSE's implementation?



  Do we really need a
duplicated one here?



1) this is the timeout at the vDPA layer instead of the VDUSE layer.
2) it really depends on what's the meaning of the timeout for set_status 
of VDUSE.


Do we want:

2a) for set_status(): relay the message to userspace and wait for the 
userspace to quiescence the datapath


or

2b) for set_status(): simply relay the message to userspace, reply is no 
needed. Userspace will use a command to update the status when the 
datapath is stop. The the status could be fetched via get_stats().


2b looks more spec complaint.


And how to handle failure? Adding a return value
to virtio_config_ops->reset() and passing the error to the upper
layer?



Something like this.

Thanks




Thanks,
Yongji



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 13/16] dt-bindings: arm: Add virtio transport for SCMI

2021-06-24 Thread Rob Herring
On Fri, 11 Jun 2021 17:59:34 +0100, Cristian Marussi wrote:
> From: Igor Skalkin 
> 
> Document the properties for arm,scmi-virtio compatible nodes.
> The backing virtio SCMI device is described in patch [1].
> 
> While doing that, make shmem property required only for pre-existing
> mailbox and smc transports, since virtio-scmi does not need it.
> 
> [1] https://lists.oasis-open.org/archives/virtio-comment/202102/msg00018.html
> 
> CC: Rob Herring 
> CC: devicet...@vger.kernel.org
> Signed-off-by: Igor Skalkin 
> [ Peter: Adapted patch for submission to upstream. ]
> Co-developed-by: Peter Hilber 
> Signed-off-by: Peter Hilber 
> [ Cristian: converted to yaml format, moved shmen required property. ]
> Co-developed-by: Cristian Marussi 
> Signed-off-by: Cristian Marussi 
> ---
> v3 --> V4
> - convertd to YAML
> - make shmem required only for pre-existing mailbox and smc transport
> - updated VirtIO specification patch message reference
> - dropped virtio-mmio SCMI device example since really not pertinent to
>   virtio-scmi dt bindings transport: it is not even referenced in SCMI
>   virtio DT node since they are enumerated by VirtIO subsystem and there
>   could be PCI based SCMI devices anyway.
> ---
>  Documentation/devicetree/bindings/firmware/arm,scmi.yaml | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 

Reviewed-by: Rob Herring 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] kthread: allow caller to pass in user_struct

2021-06-24 Thread Mike Christie
On 6/23/21 11:34 PM, kernel test robot wrote:
>>> kernel/kthread.c:466:6: warning: function 'kthread_create_for_user' might 
>>> be a candidate for 'gnu_printf' format attribute 
>>> [-Wsuggest-attribute=format]
>  466 |  namefmt, args);
>  |  ^~~
> 

Will add a __printf(4, 5).


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/3] vhost: pass kthread user to check RLIMIT_NPROC

2021-06-24 Thread Mike Christie
On 6/24/21 3:26 AM, kernel test robot wrote:
>>> drivers/vhost/vhost.c:599:57: sparse: sparse: dereference of noderef 
>>> expression
> vim +599 drivers/vhost/vhost.c
> 
>581
>582/* Caller should have device mutex */
>583long vhost_dev_set_owner(struct vhost_dev *dev)
>584{
>585struct task_struct *worker;
>586int err;
>587
>588/* Is there an owner already? */
>589if (vhost_dev_has_owner(dev)) {
>590err = -EBUSY;
>591goto err_mm;
>592}
>593
>594vhost_attach_mm(dev);
>595
>596dev->kcov_handle = kcov_common_handle();
>597if (dev->use_worker) {
>598worker = kthread_create_for_user(vhost_worker, 
> dev,
>  > 599 
> current->real_cred->user,
>600 "vhost-%d", 
> current->pid);

It looks like I should be doing something like get_uid(current_user())
then a free_uid() when doing using the user_struct.

Will fix.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] drm/qxl: Remove empty qxl_gem_prime_mmap()

2021-06-24 Thread Gerd Hoffmann
On Thu, Jun 24, 2021 at 11:05:00AM +0200, Thomas Zimmermann wrote:
> The function qxl_gem_prime_mmap() returns an error. The two callers
> of gem_prime_mmap are drm_fbdev_fb_mmap() and drm_gem_dmabuf_mmap(),
> which both already handle NULL-callbacks with an error code. So clear
> gem_prime_mmap in qxl and remove qxl_gem_prime_mmap().

Reviewed-by: Gerd Hoffmann 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] drm/qxl: Remove empty qxl_gem_prime_mmap()

2021-06-24 Thread Thomas Zimmermann
The function qxl_gem_prime_mmap() returns an error. The two callers
of gem_prime_mmap are drm_fbdev_fb_mmap() and drm_gem_dmabuf_mmap(),
which both already handle NULL-callbacks with an error code. So clear
gem_prime_mmap in qxl and remove qxl_gem_prime_mmap().

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/qxl/qxl_drv.c   | 1 -
 drivers/gpu/drm/qxl/qxl_drv.h   | 2 --
 drivers/gpu/drm/qxl/qxl_prime.c | 6 --
 3 files changed, 9 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_drv.c b/drivers/gpu/drm/qxl/qxl_drv.c
index 854e6c5a563f..b3d75ea7e6b3 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.c
+++ b/drivers/gpu/drm/qxl/qxl_drv.c
@@ -281,7 +281,6 @@ static struct drm_driver qxl_driver = {
.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
.gem_prime_import_sg_table = qxl_gem_prime_import_sg_table,
-   .gem_prime_mmap = qxl_gem_prime_mmap,
.fops = _fops,
.ioctls = qxl_ioctls,
.irq_handler = qxl_irq_handler,
diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
index dd6abee55f56..f95885a8bd2b 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.h
+++ b/drivers/gpu/drm/qxl/qxl_drv.h
@@ -434,8 +434,6 @@ struct drm_gem_object *qxl_gem_prime_import_sg_table(
 int qxl_gem_prime_vmap(struct drm_gem_object *obj, struct dma_buf_map *map);
 void qxl_gem_prime_vunmap(struct drm_gem_object *obj,
  struct dma_buf_map *map);
-int qxl_gem_prime_mmap(struct drm_gem_object *obj,
-   struct vm_area_struct *vma);
 
 /* qxl_irq.c */
 int qxl_irq_init(struct qxl_device *qdev);
diff --git a/drivers/gpu/drm/qxl/qxl_prime.c b/drivers/gpu/drm/qxl/qxl_prime.c
index 0628d1cc91fe..4a10cb0a413b 100644
--- a/drivers/gpu/drm/qxl/qxl_prime.c
+++ b/drivers/gpu/drm/qxl/qxl_prime.c
@@ -73,9 +73,3 @@ void qxl_gem_prime_vunmap(struct drm_gem_object *obj,
 
qxl_bo_vunmap(bo);
 }
-
-int qxl_gem_prime_mmap(struct drm_gem_object *obj,
-  struct vm_area_struct *area)
-{
-   return -ENOSYS;
-}
-- 
2.32.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/3] vhost: pass kthread user to check RLIMIT_NPROC

2021-06-24 Thread kernel test robot
Hi Mike,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on vhost/linux-next]
[also build test WARNING on linux/master linus/master v5.13-rc7]
[cannot apply to next-20210623]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Mike-Christie/kthread-pass-in-user-and-check-RLIMIT_NPROC/20210624-110925
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: riscv-randconfig-s032-20210622 (attached as .config)
compiler: riscv64-linux-gcc (GCC) 9.3.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.3-341-g8af24329-dirty
# 
https://github.com/0day-ci/linux/commit/daae0f4bb5ef7264d67cab20da37192754f885b8
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Mike-Christie/kthread-pass-in-user-and-check-RLIMIT_NPROC/20210624-110925
git checkout daae0f4bb5ef7264d67cab20da37192754f885b8
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 
CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=riscv 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


sparse warnings: (new ones prefixed by >>)
>> drivers/vhost/vhost.c:599:57: sparse: sparse: dereference of noderef 
>> expression

vim +599 drivers/vhost/vhost.c

   581  
   582  /* Caller should have device mutex */
   583  long vhost_dev_set_owner(struct vhost_dev *dev)
   584  {
   585  struct task_struct *worker;
   586  int err;
   587  
   588  /* Is there an owner already? */
   589  if (vhost_dev_has_owner(dev)) {
   590  err = -EBUSY;
   591  goto err_mm;
   592  }
   593  
   594  vhost_attach_mm(dev);
   595  
   596  dev->kcov_handle = kcov_common_handle();
   597  if (dev->use_worker) {
   598  worker = kthread_create_for_user(vhost_worker, dev,
 > 599   
 > current->real_cred->user,
   600   "vhost-%d", 
current->pid);
   601  if (IS_ERR(worker)) {
   602  err = PTR_ERR(worker);
   603  goto err_worker;
   604  }
   605  
   606  dev->worker = worker;
   607  wake_up_process(worker); /* avoid contributing to 
loadavg */
   608  
   609  err = vhost_attach_cgroups(dev);
   610  if (err)
   611  goto err_cgroup;
   612  }
   613  
   614  err = vhost_dev_alloc_iovecs(dev);
   615  if (err)
   616  goto err_cgroup;
   617  
   618  return 0;
   619  err_cgroup:
   620  if (dev->worker) {
   621  kthread_stop(dev->worker);
   622  dev->worker = NULL;
   623  }
   624  err_worker:
   625  vhost_detach_mm(dev);
   626  dev->kcov_handle = 0;
   627  err_mm:
   628  return err;
   629  }
   630  EXPORT_SYMBOL_GPL(vhost_dev_set_owner);
   631  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-24 Thread Jason Wang


在 2021/6/24 下午12:46, Yongji Xie 写道:

So we need to deal with both FEATURES_OK and reset, but probably not
DRIVER_OK.


OK, I see. Thanks for the explanation. One more question is how about
clearing the corresponding status bit in get_status() rather than
making set_status() fail. Since the spec recommends this way for
validation which is done in virtio_dev_remove() and
virtio_finalize_features().

Thanks,
Yongji



I think you can. Or it would be even better that we just don't set the 
bit during set_status().


I just realize that in vdpa_reset() we had:

static inline void vdpa_reset(struct vdpa_device *vdev)
{
    const struct vdpa_config_ops *ops = vdev->config;

    vdev->features_valid = false;
    ops->set_status(vdev, 0);
}

We probably need to add the synchronization here. E.g re-read with a 
timeout.


Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH linux-next v3 2/6] vdpa: Introduce query of device config layout

2021-06-24 Thread Parav Pandit


> From: Jason Wang 
> Sent: Thursday, June 24, 2021 12:35 PM
> 
> >> Consider we had a mature set of virtio specific uAPI for config space.
> >> It would be a burden if we need an unnecessary translation layer of
> >> netlink in the middle:
> >>
> >> [vDPA parent (virtio_net_config)] <-> [netlink
> >> (VDPA_ATTR_DEV_NET_XX)] <-> [userspace
> (VDPA_ATTR_DEV_NET_XX)]
> >>> <-> [ user (virtio_net_config)]
> > This translation is not there. We show relevant net config fields as
> VDPA_ATTR_DEV_NET individually.
> > It is not a binary dump which is harder for users to parse and make any use
> of it.
> 
> 
> The is done implicitly, user needs to understand the semantic of
> virtio_net_config and map the individual fields to the vdpa tool sub-
> command.
Mostly not virtio_net_config is for the producer and consumer sw entities.
Here user doesn't know about such layout and where its located.
User only sets config params that gets set in the config space.
(without understanding what is config layout and its location).

> 
> 
> >
> > It is only one level of translation from virtio_net_config (kernel) -> 
> > netlink
> vdpa fields.
> > It is similar to 'struct netdevice' -> rtnl info fields.
> 
> 
> I think not, the problem is that the netdevice is not a part of uAPI but
> virtio_net_config is.
Virtio_net_config is a UAPI for sw consumption.
That way yes, netlink can also do it, however it requires side channel 
communicate what is valid.

> 
> 
> >
> >> If we make netlink simply a transport, it would be much easier. And we
> had
> >> the chance to unify the logic of build_config() and set_config() in the
> driver.
> > How? We need bit mask to tell that out of 21 fields which fields to update
> and which not.
> > And that is further mixed with offset and length.
> 
> 
> So set_config() could be called from userspace, so did build_config().
> The only difference is:
> 
> 1) they're using different transport, ioctl vs netlink
> 2) build_config() is only expected to be called by the management tool
> 
> If qemu works well via set_config ioctl, netlink should work as well.
> 
mlx5 set_config is noop.
vdpa_set_config() need to return an error code. I don't 
vp_vdpa.c blindly writes the config as its passthrough.
Parsing which fields to write and which not, using offset and length is a messy 
code with typecast and compare old values etc.

> Btw, what happens if management tool tries to modify the mac of vDPA
> when the device is already used by the driver?
At present it allows modifying, but it should be improved in future to fail if 
device is in use.

>  And actually, it's not the binary blob since uapi clearly define the
>  format (e.g struct virtio_net_config), can we find a way to use that?
>  E.g introduce device/net specific command and passing the blob with
>  length and negotiated features.
> >>> Length may change in future, mostly expand. And parsing based on
> length
> >> is not such a clean way.
> >>
> >>
> >> Length is only for legal checking. The config is self contained with:
> >>
> > Unlikely. When structure size increases later, the parsing will change based
> on the length.
> > Because older kernel would return shorter length with older iproute2 tool.
> 
> 
> This is fine since the older kernel only support less features. The only
> possible issue if the old iproute 2 runs on new kernel. With the current
> proposal, it may cause some config fields can't not be showed.
> 
Not showing is ok.
But the code is messy to typecast on size.

> I think it might be useful to introduce a command to simply dump the
> config space.
> 
> 
> > So user space always have to deal and have nasty parsing/typecasting
> based on the length.
Such nasty parsing is not required for netlink interface.

> 
> 
> That's how userspace (Qemu) is expected to work now. The userspace
> should determine the semantic of the fields based on the features.
> 
> Differentiate config fields doesn't help much, e.g userspace still need
> to differ LINK_UP and ANNOUNCE for the status field.
Yes, this parsing is from constant size u16 status.
> 
> 
[..]

>
> > Its not about performance. By the time 1st call is made, features got
> updated and it is out of sync with config.
> >
> >> 1) get config
> >> 2) get device id
> >> 3) get features
> >>
> > This requires using features from 3rd netlink output to decode output of
> 1st netlink output.
> > Which is a bit odd of netlink.
> > Other netlink nla_put() probably sending whole structure doesn’t need to
> do it.
> 
> 
> Well, we can pack them all into a single skb isn't it? (probably with a
> config len).
> 
You want to pack features and config both in the single nla_put()?
If so, it isn't necessary. There are more examples in kernel that adds 
individual fields instead of nla_put(blob).
I wouldn’t follow those nla_put() callers.

> 
> >
> >> For build config, it's only one
> >>
> >> 1) build config
> >>
> >>
> >>> I prefer to follow rest of the kernel style to return self contained
> 

Re: [PATCH 0/3] kthread: pass in user and check RLIMIT_NPROC

2021-06-24 Thread Michael S. Tsirkin
On Wed, Jun 23, 2021 at 10:08:01PM -0500, Mike Christie wrote:
> The vhost driver will create a kthread when userspace does a
> VHOST_SET_OWNER ioctl, but the thread is charged to the kthreadd thread.
> We can then end up violating the userspace process's RLIMIT_NPROC. This
> patchset allows drivers to pass in the user to charge/check.
> 
> The patches were made over Linus's current tree.
> 


Makes sense I guess.

Acked-by: Michael S. Tsirkin 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH linux-next v3 2/6] vdpa: Introduce query of device config layout

2021-06-24 Thread Jason Wang


在 2021/6/24 下午2:29, Parav Pandit 写道:



From: Jason Wang 
Sent: Thursday, June 24, 2021 11:13 AM

在 2021/6/23 下午12:22, Parav Pandit 写道:

From: Jason Wang 
Sent: Wednesday, June 23, 2021 9:39 AM

在 2021/6/22 下午10:03, Parav Pandit 写道:

Is it better to use a separate enum for net specific attributes?


Yes, because they are only net specific.
I guess it is related to your below question.


Another question (sorry if it has been asked before). Can we simply
return the config (binary) to the userspace, then usespace can use
the existing uAPI like virtio_net_config plus the feature to
explain the

config?

We did discuss in v2.
Usually returning the whole blob and parsing is not desired via netlink.
Returning individual fields give the full flexibility to return only
the valid

fields.

Otherwise we need to implement another bitmask too to tell which
fields

from the struct are valid and share with user space.

Returning individual fields is the widely used approach.

The main concerns are:

1) The blob will be self contained if it was passed with the
negotiated features, so we don't need bitmask.

Which fields of the struct are valid is told by additional fields.

2) Using individual fields means it must duplicate the config fields
of every virtio devices


Mostly no. if there are common config fields across two device types,
they would be named as
VDPA_ATTR_DEV_CFG_*
Net specific will be,
VDPA_ATTR_DEV_NET_CFG_*
Block specific, will be,
VDPA_ATTR_DEV_BLK_CFG_*


I meant it looks like VDPA_ATTR_DEV_NET will duplicate all the fields of:

struct virtio_net_config;

And VDPA_ATTR_DEV_BLOCK will duplicate all the fields of

struct virtio_blk_config; which has ~21 fields.

And we had a plenty of other types of virtio devices.

Consider we had a mature set of virtio specific uAPI for config space.
It would be a burden if we need an unnecessary translation layer of netlink in
the middle:

[vDPA parent (virtio_net_config)] <-> [netlink (VDPA_ATTR_DEV_NET_XX)]
<-> [userspace (VDPA_ATTR_DEV_NET_XX)]

<-> [ user (virtio_net_config)]

This translation is not there. We show relevant net config fields as 
VDPA_ATTR_DEV_NET individually.
It is not a binary dump which is harder for users to parse and make any use of 
it.



The is done implicitly, user needs to understand the semantic of 
virtio_net_config and map the individual fields to the vdpa tool 
sub-command.





It is only one level of translation from virtio_net_config (kernel) -> netlink 
vdpa fields.
It is similar to 'struct netdevice' -> rtnl info fields.



I think not, the problem is that the netdevice is not a part of uAPI but 
virtio_net_config is.






If we make netlink simply a transport, it would be much easier. And we had
the chance to unify the logic of build_config() and set_config() in the driver.

How? We need bit mask to tell that out of 21 fields which fields to update and 
which not.
And that is further mixed with offset and length.



So set_config() could be called from userspace, so did build_config(). 
The only difference is:


1) they're using different transport, ioctl vs netlink
2) build_config() is only expected to be called by the management tool

If qemu works well via set_config ioctl, netlink should work as well.

Btw, what happens if management tool tries to modify the mac of vDPA 
when the device is already used by the driver?








And actually, it's not the binary blob since uapi clearly define the
format (e.g struct virtio_net_config), can we find a way to use that?
E.g introduce device/net specific command and passing the blob with
length and negotiated features.

Length may change in future, mostly expand. And parsing based on length

is not such a clean way.


Length is only for legal checking. The config is self contained with:


Unlikely. When structure size increases later, the parsing will change based on 
the length.
Because older kernel would return shorter length with older iproute2 tool.



This is fine since the older kernel only support less features. The only 
possible issue if the old iproute 2 runs on new kernel. With the current 
proposal, it may cause some config fields can't not be showed.


I think it might be useful to introduce a command to simply dump the 
config space.




So user space always have to deal and have nasty parsing/typecasting based on 
the length.



That's how userspace (Qemu) is expected to work now. The userspace 
should determine the semantic of the fields based on the features.


Differentiate config fields doesn't help much, e.g userspace still need 
to differ LINK_UP and ANNOUNCE for the status field.






1) device id
2) features



Parsing fields require knowledge of features as well and application needs

to make multiple netlink calls to parse the config space.


I think we don't care about the performance in this case. It's about three
netlink calls:


Its not about performance. By the time 1st call is made, features got updated 
and it is out of sync with config.


1) get 

RE: [PATCH linux-next v3 2/6] vdpa: Introduce query of device config layout

2021-06-24 Thread Parav Pandit


> From: Jason Wang 
> Sent: Thursday, June 24, 2021 11:13 AM
> 
> 在 2021/6/23 下午12:22, Parav Pandit 写道:
> >
> >> From: Jason Wang 
> >> Sent: Wednesday, June 23, 2021 9:39 AM
> >>
> >> 在 2021/6/22 下午10:03, Parav Pandit 写道:
>  Is it better to use a separate enum for net specific attributes?
> 
> >>> Yes, because they are only net specific.
> >>> I guess it is related to your below question.
> >>>
>  Another question (sorry if it has been asked before). Can we simply
>  return the config (binary) to the userspace, then usespace can use
>  the existing uAPI like virtio_net_config plus the feature to
>  explain the
> >> config?
> >>> We did discuss in v2.
> >>> Usually returning the whole blob and parsing is not desired via netlink.
> >>> Returning individual fields give the full flexibility to return only
> >>> the valid
> >> fields.
> >>> Otherwise we need to implement another bitmask too to tell which
> >>> fields
> >> from the struct are valid and share with user space.
> >>> Returning individual fields is the widely used approach.
> >>
> >> The main concerns are:
> >>
> >> 1) The blob will be self contained if it was passed with the
> >> negotiated features, so we don't need bitmask.
> > Which fields of the struct are valid is told by additional fields.
> >> 2) Using individual fields means it must duplicate the config fields
> >> of every virtio devices
> >>
> > Mostly no. if there are common config fields across two device types,
> > they would be named as
> > VDPA_ATTR_DEV_CFG_*
> > Net specific will be,
> > VDPA_ATTR_DEV_NET_CFG_*
> > Block specific, will be,
> > VDPA_ATTR_DEV_BLK_CFG_*
> 
> 
> I meant it looks like VDPA_ATTR_DEV_NET will duplicate all the fields of:
> 
> struct virtio_net_config;
> 
> And VDPA_ATTR_DEV_BLOCK will duplicate all the fields of
> 
> struct virtio_blk_config; which has ~21 fields.
> 
> And we had a plenty of other types of virtio devices.
> 
> Consider we had a mature set of virtio specific uAPI for config space.
> It would be a burden if we need an unnecessary translation layer of netlink in
> the middle:
> 
> [vDPA parent (virtio_net_config)] <-> [netlink (VDPA_ATTR_DEV_NET_XX)]
> <-> [userspace (VDPA_ATTR_DEV_NET_XX)] 

>> <-> [ user (virtio_net_config)]
This translation is not there. We show relevant net config fields as 
VDPA_ATTR_DEV_NET individually.
It is not a binary dump which is harder for users to parse and make any use of 
it.

It is only one level of translation from virtio_net_config (kernel) -> netlink 
vdpa fields.
It is similar to 'struct netdevice' -> rtnl info fields.

> 
> If we make netlink simply a transport, it would be much easier. And we had
> the chance to unify the logic of build_config() and set_config() in the 
> driver.
How? We need bit mask to tell that out of 21 fields which fields to update and 
which not.
And that is further mixed with offset and length.

> 
> 
> >
> >> And actually, it's not the binary blob since uapi clearly define the
> >> format (e.g struct virtio_net_config), can we find a way to use that?
> >> E.g introduce device/net specific command and passing the blob with
> >> length and negotiated features.
> > Length may change in future, mostly expand. And parsing based on length
> is not such a clean way.
> 
> 
> Length is only for legal checking. The config is self contained with:
> 
Unlikely. When structure size increases later, the parsing will change based on 
the length.
Because older kernel would return shorter length with older iproute2 tool.
So user space always have to deal and have nasty parsing/typecasting based on 
the length.

> 1) device id
> 2) features
> 
> 
> > Parsing fields require knowledge of features as well and application needs
> to make multiple netlink calls to parse the config space.
> 
> 
> I think we don't care about the performance in this case. It's about three
> netlink calls:
> 
Its not about performance. By the time 1st call is made, features got updated 
and it is out of sync with config.

> 1) get config
> 2) get device id
> 3) get features
> 
This requires using features from 3rd netlink output to decode output of 1st 
netlink output.
Which is a bit odd of netlink.
Other netlink nla_put() probably sending whole structure doesn’t need to do it.

> For build config, it's only one
> 
> 1) build config
> 
> 
> > I prefer to follow rest of the kernel style to return self contained
> invidividual fields.
> 
> 
> But I saw a lot of kernel codes choose to use e.g nla_put() directly with
> module specific structure.
> 
It might be self-contained structure that probably has not found the need to 
expand.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization