Re: [RFC 0/3] cpuidle: add poll_source API and virtio vq polling

2021-07-20 Thread Jason Wang


在 2021/7/14 上午12:19, Stefan Hajnoczi 写道:

These patches are not polished yet but I would like request feedback on this
approach and share performance results with you.

Idle CPUs tentatively enter a busy wait loop before halting when the cpuidle
haltpoll driver is enabled inside a virtual machine. This reduces wakeup
latency for events that occur soon after the vCPU becomes idle.

This patch series extends the cpuidle busy wait loop with the new poll_source
API so drivers can participate in polling. Such polling-aware drivers disable
their device's irq during the busy wait loop to avoid the cost of interrupts.
This reduces latency further than regular cpuidle haltpoll, which still relies
on irqs.

Virtio drivers are modified to use the poll_source API so all virtio device
types get this feature. The following virtio-blk fio benchmark results show the
improvement:

  IOPS (numjobs=4, iodepth=1, 4 virtqueues)
before   poll_source  io_poll
4k randread167102  186049 (+11%)  186654 (+11%)
4k randwrite   162204  181214 (+11%)  181850 (+12%)
4k randrw  159520  177071 (+11%)  177928 (+11%)

The comparison against io_poll shows that cpuidle poll_source achieves
equivalent performance to the block layer's io_poll feature (which I
implemented in a separate patch series [1]).

The advantage of poll_source is that applications do not need to explicitly set
the RWF_HIPRI I/O request flag. The poll_source approach is attractive because
few applications actually use RWF_HIPRI and it takes advantage of CPU cycles we
would have spent in cpuidle haltpoll anyway.

The current series does not improve virtio-net. I haven't investigated deeply,
but it is possible that NAPI and poll_source do not combine. See the final
patch for a starting point on making the two work together.

I have not tried this on bare metal but it might help there too. The cost of
disabling a device's irq must be less than the savings from avoiding irq
handling for this optimization to make sense.

[1] 
https://lore.kernel.org/linux-block/20210520141305.355961-1-stefa...@redhat.com/



Hi Stefan:

Some questions:

1) What's the advantages of introducing polling at virtio level instead 
of doing it at each subsystems? Polling in virtio level may only work 
well if all (or most) of the devices are virtio
2) What's the advantages of using cpuidle instead of using a thread (and 
leverage the scheduler)?

3) Any reason it's virtio_pci specific not a general virtio one?

Thanks

(Btw, do we need to cc scheduler guys?)




Stefan Hajnoczi (3):
   cpuidle: add poll_source API
   virtio: add poll_source virtqueue polling
   softirq: participate in cpuidle polling

  drivers/cpuidle/Makefile   |   1 +
  drivers/virtio/virtio_pci_common.h |   7 ++
  include/linux/interrupt.h  |   2 +
  include/linux/poll_source.h|  53 +++
  include/linux/virtio.h |   2 +
  include/linux/virtio_config.h  |   2 +
  drivers/cpuidle/poll_source.c  | 102 +
  drivers/cpuidle/poll_state.c   |   6 ++
  drivers/virtio/virtio.c|  34 ++
  drivers/virtio/virtio_pci_common.c |  86 
  drivers/virtio/virtio_pci_modern.c |   2 +
  kernel/softirq.c   |  14 
  12 files changed, 311 insertions(+)
  create mode 100644 include/linux/poll_source.h
  create mode 100644 drivers/cpuidle/poll_source.c



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 3/4] fuse: add per-file DAX flag

2021-07-20 Thread Vivek Goyal
On Tue, Jul 20, 2021 at 03:19:50PM +0800, JeffleXu wrote:
> 
> 
> On 7/20/21 2:41 AM, Vivek Goyal wrote:
> > On Fri, Jul 16, 2021 at 06:47:52PM +0800, Jeffle Xu wrote:
> >> Add one flag for fuse_attr.flags indicating if DAX shall be enabled for
> >> this file.
> >>
> >> When the per-file DAX flag changes for an *opened* file, the state of
> >> the file won't be updated until this file is closed and reopened later.
> >>
> >> Signed-off-by: Jeffle Xu 
> > 
> > [..]
> >> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> >> index 36ed092227fa..90c9df10d37a 100644
> >> --- a/include/uapi/linux/fuse.h
> >> +++ b/include/uapi/linux/fuse.h
> >> @@ -184,6 +184,9 @@
> >>   *
> >>   *  7.34
> >>   *  - add FUSE_SYNCFS
> >> + *
> >> + *  7.35
> >> + *  - add FUSE_ATTR_DAX
> >>   */
> >>  
> >>  #ifndef _LINUX_FUSE_H
> >> @@ -449,8 +452,10 @@ struct fuse_file_lock {
> >>   * fuse_attr flags
> >>   *
> >>   * FUSE_ATTR_SUBMOUNT: Object is a submount root
> >> + * FUSE_ATTR_DAX: Enable DAX for this file in per-file DAX mode
> >>   */
> >>  #define FUSE_ATTR_SUBMOUNT  (1 << 0)
> >> +#define FUSE_ATTR_DAX (1 << 1)
> > 
> > Generic fuse changes (addition of FUSE_ATTR_DAX) should probably in
> > a separate patch. 
> 
> Got it.
> 
> > 
> > I am not clear on one thing. If we are planning to rely on persistent
> > inode attr (FS_XFLAG_DAX as per Documentation/filesystems/dax.rst), then
> > why fuse server needs to communicate the state of that attr using a 
> > flag? Can client directly query it?  I am not sure where at these
> > attrs stored and if fuse protocol currently supports it.
> 
> There are two issues.
> 
> 1. FUSE server side: Algorithm of deciding whether DAX is enabled for a
> file.
> 
> As I previously replied in [1], FUSE server must enable DAX if the
> backend file is flagged with FS_XFLAG_DAX, to make the FS_XFLAG_DAX
> previously set by FUSE client effective.
> 
> But I will argue that FUSE server also has the flexibility of the
> algorithm implementation. Even if guest queries FS_XFLAG_DAX by
> GETFLAGS/FSGETXATTR ioctl, FUSE server can still enable DAX when the
> backend file is not FS_XFLAG_DAX flagged.
> 
> 
> 2. The protocol between server and client.
> 
> extending LOOKUP vs. LOOKUP + GETFLAGS/FSGETXATTR ioctl
> 
> As I said in [1], client can directly query the FS_XFLAG_DAX flag, but
> there will be one more round trip.
> 
> 
> [1]
> https://lore.kernel.org/linux-fsdevel/031efb1d-7c0d-35fb-c147-dcc3b6cac...@linux.alibaba.com/T/#m3f3407158b2c028694c85d82d0d6bd0387f4e24e
> 
> > 
> > What about flag STATX_ATTR_DAX. We probably should report that too
> > in stat if we are using dax on the inode?
> > 
> 
> VFS will automatically report STATX_ATTR_DAX if inode is in DAX mode,
> e.g., in vfs_getattr_nosec().

Good to know. Given user will know which files are using dax and 
which ones are not, it is even more important to define semantics
properly. In what cases DAX will be driven by FS_XFLAGS_DAX attr
and in what cases DAX will completely be driven by server.

May be we should divide it in two patch series. First patch series
implements "-o dax=inode" and server follows FS_XFLAGS_DAX attr
and reports during lookup/getattr/. 

And once that is merged this can be ehanced with "-o dax=server" where
server is free to choose what files dax should be used on. Only if
this is still needed.

Vivek

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 3/4] fuse: add per-file DAX flag

2021-07-20 Thread Vivek Goyal
On Tue, Jul 20, 2021 at 02:51:34PM +0800, JeffleXu wrote:
> 
> 
> On 7/20/21 3:44 AM, Vivek Goyal wrote:
> > On Fri, Jul 16, 2021 at 06:47:52PM +0800, Jeffle Xu wrote:
> >> Add one flag for fuse_attr.flags indicating if DAX shall be enabled for
> >> this file.
> >>
> >> When the per-file DAX flag changes for an *opened* file, the state of
> >> the file won't be updated until this file is closed and reopened later.
> >>
> >> Signed-off-by: Jeffle Xu 
> >> ---
> >>  fs/fuse/dax.c | 21 +
> >>  fs/fuse/file.c|  4 ++--
> >>  fs/fuse/fuse_i.h  |  5 +++--
> >>  fs/fuse/inode.c   |  5 -
> >>  include/uapi/linux/fuse.h |  5 +
> >>  5 files changed, 31 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> >> index a478e824c2d0..0e862119757a 100644
> >> --- a/fs/fuse/dax.c
> >> +++ b/fs/fuse/dax.c
> >> @@ -1341,7 +1341,7 @@ static const struct address_space_operations 
> >> fuse_dax_file_aops  = {
> >>.invalidatepage = noop_invalidatepage,
> >>  };
> >>  
> >> -static bool fuse_should_enable_dax(struct inode *inode)
> >> +static bool fuse_should_enable_dax(struct inode *inode, unsigned int 
> >> flags)
> >>  {
> >>struct fuse_conn *fc = get_fuse_conn(inode);
> >>unsigned int mode;
> >> @@ -1354,18 +1354,31 @@ static bool fuse_should_enable_dax(struct inode 
> >> *inode)
> >>if (mode == FUSE_DAX_MOUNT_NEVER)
> >>return false;
> >>  
> >> -  return true;
> >> +  if (mode == FUSE_DAX_MOUNT_ALWAYS)
> >> +  return true;
> >> +
> >> +  WARN_ON(mode != FUSE_DAX_MOUNT_INODE);
> >> +  return flags & FUSE_ATTR_DAX;
> >>  }
> >>  
> >> -void fuse_dax_inode_init(struct inode *inode)
> >> +void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
> >>  {
> >> -  if (!fuse_should_enable_dax(inode))
> >> +  if (!fuse_should_enable_dax(inode, flags))
> >>return;
> >>  
> >>inode->i_flags |= S_DAX;
> >>inode->i_data.a_ops = _dax_file_aops;
> >>  }
> >>  
> >> +void fuse_dax_dontcache(struct inode *inode, bool newdax)
> >> +{
> >> +  struct fuse_conn *fc = get_fuse_conn(inode);
> >> +
> >> +  if (fc->dax && fc->dax->mode == FUSE_DAX_MOUNT_INODE &&
> >> +  IS_DAX(inode) != newdax)
> >> +  d_mark_dontcache(inode);
> >> +}
> >> +
> > 
> > This capability to mark an inode dontcache should probably be in a
> > separate patch. These seem to logically two functionalities. One is
> > enabling DAX on an inode. And second is making sure how soon you
> > see the effect of that change and hence marking inode dontcache.
> 
> OK, sounds reasonable.
> 
> > 
> > Not sure how useful this is. In cache=none mode we should get rid of
> > inode ASAP. In cache=auto mode we will get rid of after 1 second (or
> > after a user specified timeout). So only place this seems to be
> > useful is cache=always.
> 
> Actually dontcache here is used to avoid dynamic switching between DAX
> and non-DAX state while file is opened. The complexity of dynamic
> switching is that, you have to clear the address_space, since page cache
> and DAX entry can not coexist in the address space. Besides,
> inode->a_ops also needs to be changed dynamically.
> 
> With dontcache, dynamic switching is no longer needed and the DAX state
> will be decided only when inode (in memory) is initialized. The downside
> is that the new DAX state won't be updated until the file is closed and
> reopened later.
> 
> 'cache=none' only invalidates dentry, while the inode (in memory) is
> still there (with address_space uncleared and a_ops unchanged).

Aha.., that's a good point.
> 
> The dynamic switching may be done, though it's not such straightforward.
> Currently, ext4/xfs are all implemented in this dontcache way, i.e., the
> new DAX state won't be seen until the file is closed and reopened later.

Got it. Agreed that dontcache seems reasonable if file's DAX state
has changed. Keep it in separate patch though with proper commit
logs.

Also, please copy virtiofs list (virtio...@redhat.com) when you post
patches next time.

Vivek

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 0/4] virtiofs,fuse: support per-file DAX

2021-07-20 Thread Vivek Goyal
On Tue, Jul 20, 2021 at 01:25:11PM +0800, JeffleXu wrote:
> 
> 
> On 7/20/21 5:30 AM, Vivek Goyal wrote:
> > On Fri, Jul 16, 2021 at 06:47:49PM +0800, Jeffle Xu wrote:
> >> This patchset adds support of per-file DAX for virtiofs, which is
> >> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>
> >> There are three related scenarios:
> >> 1. Alloc inode: get per-file DAX flag from fuse_attr.flags. (patch 3)
> >> 2. Per-file DAX flag changes when the file has been opened. (patch 3)
> >> In this case, the dentry and inode are all marked as DONT_CACHE, and
> >> the DAX state won't be updated until the file is closed and reopened
> >> later.
> >> 3. Users can change the per-file DAX flag inside the guest by chattr(1).
> >> (patch 4)
> >> 4. Create new files under directories with DAX enabled. When creating
> >> new files in ext4/xfs on host, the new created files will inherit the
> >> per-file DAX flag from the directory, and thus the new created files in
> >> virtiofs will also inherit the per-file DAX flag if the fuse server
> >> derives fuse_attr.flags from the underlying ext4/xfs inode's per-file
> >> DAX flag.
> > 
> > Thinking little bit more about this from requirement perspective. I think
> > we are trying to address two use cases here.
> > 
> > A. Client does not know which files DAX should be used on. Only server
> >knows it and server passes this information to client. I suspect
> >that's your primary use case.
> 
> Yes, this is the starting point of this patch set.
> 
> > 
> > B. Client is driving which files are supposed to be using DAX. This is
> >exactly same as the model ext4/xfs are using by storing a persistent
> >flag on inode. 
> > 
> > Current patches seem to be a hybrid of both approach A and B. 
> > 
> > If we were to implement B, then fuse client probably needs to have the
> > capability to query FS_XFLAG_DAX on inode and decide whether to
> > enable DAX or not. (Without extra round trip). Or know it indirectly
> > by extending GETATTR and requesting this explicitly.
> 
> If guest queries if the file is DAX capable or not by an extra GETATTR,
> I'm afraid this will add extra round trip.
> 
> Or if we add the DAX flag (ATTR_DAX) by extending LOOKUP, as done by
> this patch set, then the FUSE server needs to set ATTR_DAX according to
> the FS_XFLAG_DAX of the backend files, *to make the FS_XFLAG_DAX flag
> previously set by FUSE client work*. Then it becomes a *mandatory*
> requirement when implementing FUSE server. i.e., it becomes part of the
> FUSE protocol rather than implementation specific. FUSE server can still
> implement some other algorithm deciding whether to set ATTR_DAX or not,
> though it must set ATTR_DAX once the backend file is flagged with
> FS_XFLAG_DAX.
> 
> Besides, as you said, FUSE server needs to add one extra
> GETFLAGS/FSGETXATTR ioctl per LOOKUP in this case. To eliminate this
> cost, we could negotiate the per-file DAX capability during FUSE_INIT.
> Only when the per-file DAX capability is negotiated, will the FUSE
> server do extra GETFLAGS/FSGETXATTR ioctl and set ATTR_DAX flag when
> doing LOOKUP.
> 
> 
> Personally, I prefer the second way, i.e., by extending LOOKUP (adding
> ATTR_DAX), to eliminate the extra round trip.

Negotiating a fuse feature say FUSE_FS_XFLAG_DAX makes sense. If
client is mounted with "-o dax=inode", then client will negotitate
this feature and if server does not support it, mount can fail.

But this probably will be binding on server that it needs to return
the state of FS_XFLAG_DAX attr on inode upon lookup/getattr. I don't
this will allow server to implement its own separate policy which
does not follow FS_XFLAG_DAX xattr. 

IOW, I don't think server can choose to implement its own policy
for enabling dax for "-o dax=inode".

If there is really a need to for something new where server needs
to dynamically decide which inodes should use dax (and not use
FS_XFLAG_FS), I guess that probably should be a separate mount
option say "-o dax=server" and it negotiates a separate feature
say FUSE_DAX_SERVER. Once that's negotiated, now both client
and server know that DAX will be used on files as determined
by server and not necessarily by using file attr FS_XFLAG_DAX.

So is "dax=inode" enough for your needs? What's your requirement,
can you give little bit of more details.

Thanks
Vivek

> 
> > 
> > If we were only implementing A, then server does not have a way to 
> > tell client to enable DAX. Server can either look at FS_XFLAG_DAX
> > and decide to enable DAX or use some other property. Given querying
> > FS_XFLAG_DAX will be an extra ioctl() on every inode lookup/getattr,
> > it probably will be a server option. But enabling on server does not
> > mean client will enable it.
> 
> As I said previously, it can be negotiated whether this per-file DAX
> capability is needed. Guest can advertise this capability when '-o
> dax=inode' is configured.
> 
> > 
> > I think my primary concern with this patch right now is 

Re: [PATCH] MAINTAINERS: Update for VMCI driver

2021-07-20 Thread Greg KH
On Tue, Jul 20, 2021 at 06:04:15PM +, Vishnu Dasa wrote:
> 
> On Jul 20, 2021, at 3:39 AM, Greg KH 
> mailto:gre...@linuxfoundation.org>> wrote:
> 
> Also I would need an ack from Vishnu.
> 
> Acked-by: Vishnu Dasa mailto:d...@vmware.com>>

A maintainer that sends html email isn't going to work very well :(

Please fix.

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] MAINTAINERS: Update for VMCI driver

2021-07-20 Thread Vishnu Dasa

On Jul 20, 2021, at 3:39 AM, Greg KH 
mailto:gre...@linuxfoundation.org>> wrote:

Also I would need an ack from Vishnu.

Acked-by: Vishnu Dasa mailto:d...@vmware.com>>

Thanks,
Vishnu

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC 1/3] cpuidle: add poll_source API

2021-07-20 Thread Stefan Hajnoczi
On Mon, Jul 19, 2021 at 06:03:55PM -0300, Marcelo Tosatti wrote:
> Hi Stefan,
> 
> On Tue, Jul 13, 2021 at 05:19:04PM +0100, Stefan Hajnoczi wrote:
> > Introduce an API for adding cpuidle poll callbacks:
> > 
> >   struct poll_source_ops {
> >   void (*start)(struct poll_source *src);
> >   void (*stop)(struct poll_source *src);
> >   void (*poll)(struct poll_source *src);
> >   };
> > 
> >   int poll_source_register(struct poll_source *src);
> >   int poll_source_unregister(struct poll_source *src);
> > 
> > When cpuidle enters the poll state it invokes ->start() and then invokes
> > ->poll() repeatedly from the busy wait loop. Finally ->stop() is invoked
> > when the busy wait loop finishes.
> > 
> > The ->poll() function should check for activity and cause
> > TIF_NEED_RESCHED to be set in order to stop the busy wait loop.
> > 
> > This API is intended to be used by drivers that can cheaply poll for
> > events. Participating in cpuidle polling allows them to avoid interrupt
> > latencies during periods where the CPU is going to poll anyway.
> > 
> > Note that each poll_source is bound to a particular CPU. The API is
> > mainly intended to by used by drivers that have multiple queues with irq
> > affinity.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  drivers/cpuidle/Makefile  |  1 +
> >  include/linux/poll_source.h   | 53 +++
> >  drivers/cpuidle/poll_source.c | 99 +++
> >  drivers/cpuidle/poll_state.c  |  6 +++
> >  4 files changed, 159 insertions(+)
> >  create mode 100644 include/linux/poll_source.h
> >  create mode 100644 drivers/cpuidle/poll_source.c
> > 
> > diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
> > index 26bbc5e74123..994f72d6fe95 100644
> > --- a/drivers/cpuidle/Makefile
> > +++ b/drivers/cpuidle/Makefile
> > @@ -7,6 +7,7 @@ obj-y += cpuidle.o driver.o governor.o sysfs.o governors/
> >  obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o
> >  obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o
> >  obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o
> > +obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_source.o
> >  obj-$(CONFIG_HALTPOLL_CPUIDLE)   += cpuidle-haltpoll.o
> >  
> >  
> > ##
> > diff --git a/include/linux/poll_source.h b/include/linux/poll_source.h
> > new file mode 100644
> > index ..ccfb424e170b
> > --- /dev/null
> > +++ b/include/linux/poll_source.h
> > @@ -0,0 +1,53 @@
> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> > +/*
> > + * poll_source.h - cpuidle busy waiting API
> > + */
> > +#ifndef __LINUX_POLLSOURCE_H__
> > +#define __LINUX_POLLSOURCE_H__
> > +
> > +#include 
> > +
> > +struct poll_source;
> > +
> > +struct poll_source_ops {
> > +   void (*start)(struct poll_source *src);
> > +   void (*stop)(struct poll_source *src);
> > +   void (*poll)(struct poll_source *src);
> > +};
> > +
> > +struct poll_source {
> > +   const struct poll_source_ops *ops;
> > +   struct list_head node;
> > +   int cpu;
> > +};
> > +
> > +/**
> > + * poll_source_register - Add a poll_source for a CPU
> > + */
> > +#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_ARCH_HAS_CPU_RELAX)
> > +int poll_source_register(struct poll_source *src);
> > +#else
> > +static inline int poll_source_register(struct poll_source *src)
> > +{
> > +   return 0;
> > +}
> > +#endif
> > +
> > +/**
> > + * poll_source_unregister - Remove a previously registered poll_source
> > + */
> > +#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_ARCH_HAS_CPU_RELAX)
> > +int poll_source_unregister(struct poll_source *src);
> > +#else
> > +static inline int poll_source_unregister(struct poll_source *src)
> > +{
> > +   return 0;
> > +}
> > +#endif
> > +
> > +/* Used by the cpuidle driver */
> > +void poll_source_start(void);
> > +void poll_source_run_once(void);
> > +void poll_source_stop(void);
> > +
> > +#endif /* __LINUX_POLLSOURCE_H__ */
> > diff --git a/drivers/cpuidle/poll_source.c b/drivers/cpuidle/poll_source.c
> > new file mode 100644
> > index ..46100e5a71e4
> > --- /dev/null
> > +++ b/drivers/cpuidle/poll_source.c
> > @@ -0,0 +1,99 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * poll_source.c - cpuidle busy waiting API
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +/* The per-cpu list of registered poll sources */
> > +DEFINE_PER_CPU(struct list_head, poll_source_list);
> > +
> > +/* Called from idle task with TIF_POLLING_NRFLAG set and irqs enabled */
> > +void poll_source_start(void)
> > +{
> > +   struct poll_source *src;
> > +
> > +   list_for_each_entry(src, this_cpu_ptr(_source_list), node)
> > +   src->ops->start(src);
> > +}
> > +
> > +/* Called from idle task with TIF_POLLING_NRFLAG set and irqs enabled */
> > +void poll_source_run_once(void)
> > +{
> > +   struct poll_source *src;
> > +
> > +   list_for_each_entry(src, 

[PATCH] MAINTAINERS: Update for VMCI driver

2021-07-20 Thread Jorgen Hansen
Add maintainer info for the VMware VMCI driver.

Signed-off-by: Jorgen Hansen 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a61f4f3..7e7c6fa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -19792,6 +19792,14 @@ L: net...@vger.kernel.org
 S: Supported
 F: drivers/ptp/ptp_vmw.c
 
+VMWARE VMCI DRIVER
+M: Jorgen Hansen 
+M: Vishnu Dasa 
+M: "VMware, Inc." 
+L: linux-ker...@vger.kernel.org
+S: Maintained
+F: drivers/misc/vmw_vmci/
+
 VMWARE VMMOUSE SUBDRIVER
 M: "VMware Graphics" 
 M: "VMware, Inc." 
-- 
2.6.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] MAINTAINERS: Update for VMCI driver

2021-07-20 Thread Greg KH
On Tue, Jul 20, 2021 at 03:29:01AM -0700, Jorgen Hansen wrote:
> Add maintainer info for the VMware VMCI driver.
> 
> Signed-off-by: Jorgen Hansen 
> ---
>  MAINTAINERS | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a61f4f3..7e7c6fa 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -19792,6 +19792,14 @@ L:   net...@vger.kernel.org
>  S:   Supported
>  F:   drivers/ptp/ptp_vmw.c
>  
> +VMWARE VMCI DRIVER
> +M:   Jorgen Hansen 
> +M:   Vishnu Dasa 
> +M:   "VMware, Inc." 

Please do not use generic aliases as they provide no personal
responsibility.  Just stick with real people.

Also I would need an ack from Vishnu.

thanks,

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 3/4] fuse: add per-file DAX flag

2021-07-20 Thread JeffleXu



On 7/20/21 2:51 PM, JeffleXu wrote:
> 
> 
> On 7/20/21 3:44 AM, Vivek Goyal wrote:
>> On Fri, Jul 16, 2021 at 06:47:52PM +0800, Jeffle Xu wrote:
>>> Add one flag for fuse_attr.flags indicating if DAX shall be enabled for
>>> this file.
>>>
>>> When the per-file DAX flag changes for an *opened* file, the state of
>>> the file won't be updated until this file is closed and reopened later.
>>>
>>> Signed-off-by: Jeffle Xu 
>>> ---
>>>  fs/fuse/dax.c | 21 +
>>>  fs/fuse/file.c|  4 ++--
>>>  fs/fuse/fuse_i.h  |  5 +++--
>>>  fs/fuse/inode.c   |  5 -
>>>  include/uapi/linux/fuse.h |  5 +
>>>  5 files changed, 31 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
>>> index a478e824c2d0..0e862119757a 100644
>>> --- a/fs/fuse/dax.c
>>> +++ b/fs/fuse/dax.c
>>> @@ -1341,7 +1341,7 @@ static const struct address_space_operations 
>>> fuse_dax_file_aops  = {
>>> .invalidatepage = noop_invalidatepage,
>>>  };
>>>  
>>> -static bool fuse_should_enable_dax(struct inode *inode)
>>> +static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
>>>  {
>>> struct fuse_conn *fc = get_fuse_conn(inode);
>>> unsigned int mode;
>>> @@ -1354,18 +1354,31 @@ static bool fuse_should_enable_dax(struct inode 
>>> *inode)
>>> if (mode == FUSE_DAX_MOUNT_NEVER)
>>> return false;
>>>  
>>> -   return true;
>>> +   if (mode == FUSE_DAX_MOUNT_ALWAYS)
>>> +   return true;
>>> +
>>> +   WARN_ON(mode != FUSE_DAX_MOUNT_INODE);
>>> +   return flags & FUSE_ATTR_DAX;
>>>  }
>>>  
>>> -void fuse_dax_inode_init(struct inode *inode)
>>> +void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>>>  {
>>> -   if (!fuse_should_enable_dax(inode))
>>> +   if (!fuse_should_enable_dax(inode, flags))
>>> return;
>>>  
>>> inode->i_flags |= S_DAX;
>>> inode->i_data.a_ops = _dax_file_aops;
>>>  }
>>>  
>>> +void fuse_dax_dontcache(struct inode *inode, bool newdax)
>>> +{
>>> +   struct fuse_conn *fc = get_fuse_conn(inode);
>>> +
>>> +   if (fc->dax && fc->dax->mode == FUSE_DAX_MOUNT_INODE &&
>>> +   IS_DAX(inode) != newdax)
>>> +   d_mark_dontcache(inode);
>>> +}
>>> +
>>
>> This capability to mark an inode dontcache should probably be in a
>> separate patch. These seem to logically two functionalities. One is
>> enabling DAX on an inode. And second is making sure how soon you
>> see the effect of that change and hence marking inode dontcache.
> 
> OK, sounds reasonable.
> 
>>
>> Not sure how useful this is. In cache=none mode we should get rid of
>> inode ASAP. In cache=auto mode we will get rid of after 1 second (or
>> after a user specified timeout). So only place this seems to be
>> useful is cache=always.
> 
> Actually dontcache here is used to avoid dynamic switching between DAX
> and non-DAX state while file is opened. The complexity of dynamic
> switching is that, you have to clear the address_space, since page cache
> and DAX entry can not coexist in the address space. Besides,
> inode->a_ops also needs to be changed dynamically.

FYI some context on the complication of switching DAX state dynamically,
when Ira Weiny was working on per-file DAX of ext4/xfs.

> At LSF/MM we discussed the difficulties of switching the DAX state of
a file with active mappings / page cache.  It was thought the races
could be avoided by limiting DAX state flips to 0-length files.
>
> However, this turns out to not be true.[3][5] This is because address
space operations (a_ops) may be in use at any time the inode is referenced.
> from Ira Weiny

[1]
https://patchwork.kernel.org/project/xfs/cover/20200407182958.568475-1-ira.we...@intel.com/
[2] https://lore.kernel.org/lkml/20200305155144.ga5...@lst.de/
[3] https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
[4] https://lore.kernel.org/lkml/20200403182904.GP80283@magnolia/

> 
> With dontcache, dynamic switching is no longer needed and the DAX state
> will be decided only when inode (in memory) is initialized. The downside
> is that the new DAX state won't be updated until the file is closed and
> reopened later.
> 
> 'cache=none' only invalidates dentry, while the inode (in memory) is
> still there (with address_space uncleared and a_ops unchanged).
> 
> The dynamic switching may be done, though it's not such straightforward.
> Currently, ext4/xfs are all implemented in this dontcache way, i.e., the
> new DAX state won't be seen until the file is closed and reopened later.
> 

-- 
Thanks,
Jeffle
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH -next v2 resend] drm/bochs: Fix missing pci_disable_device() on error in bochs_pci_probe()

2021-07-20 Thread Thomas Zimmermann

Hi

Am 15.07.21 um 15:28 schrieb Yang Yingliang:

Replace pci_enable_device() with pcim_enable_device(),
pci_disable_device() will be called in release automatically.

v2:
   use pcim_enable_device()

Signed-off-by: Yang Yingliang 
Reported-by: Hulk Robot 
Acked-by: Thomas Zimmermann 


Thanks, I'll merge it into drm-misc-next as v3. I also had to update the 
path to the bochs driver meanwhile.


Best regards
Thomas


---
  drivers/gpu/drm/bochs/bochs_drv.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/bochs/bochs_drv.c 
b/drivers/gpu/drm/bochs/bochs_drv.c
index c828cadbabff..8065c9537237 100644
--- a/drivers/gpu/drm/bochs/bochs_drv.c
+++ b/drivers/gpu/drm/bochs/bochs_drv.c
@@ -118,7 +118,7 @@ static int bochs_pci_probe(struct pci_dev *pdev,
if (IS_ERR(dev))
return PTR_ERR(dev);
  
-	ret = pci_enable_device(pdev);

+   ret = pcim_enable_device(pdev);
if (ret)
goto err_free_dev;
  



--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer



OpenPGP_signature
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1] vdpa/vdpa_sim: Use the negotiated features when calling vringh_init_iotlb

2021-07-20 Thread Stefano Garzarella

On Tue, Jul 20, 2021 at 10:14:35AM +0300, Eli Cohen wrote:

On Tue, Jul 20, 2021 at 08:57:40AM +0200, Stefano Garzarella wrote:

On Tue, Jul 20, 2021 at 08:25:33AM +0300, Eli Cohen wrote:
> When calling vringh_init_iotlb(), use the negotiated features which
> might be different than the supported features.
>
> Fixes: 2c53d0f64c06f ("vdpasim: vDPA device simulator")
> Signed-off-by: Eli Cohen 
> ---
> v0 --> v1:
> Update "Fixes" line
>
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c 
b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 14e024de5cbf..89a474c7a096 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -66,7 +66,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, 
unsigned int idx)
> {
>struct vdpasim_virtqueue *vq = >vqs[idx];
>
> -  vringh_init_iotlb(>vring, vdpasim->dev_attr.supported_features,
> +  vringh_init_iotlb(>vring, vdpasim->features,
>  VDPASIM_QUEUE_MAX, false,
>  (struct vring_desc *)(uintptr_t)vq->desc_addr,
>  (struct vring_avail *)
> @@ -86,7 +86,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
>vq->device_addr = 0;
>vq->cb = NULL;
>vq->private = NULL;
> -  vringh_init_iotlb(>vring, vdpasim->dev_attr.supported_features,
> +  vringh_init_iotlb(>vring, vdpasim->features,

vdpasim_vq_reset() is called while resetting the device in vdpasim_reset()
where we also set `vdpasim->features = 0` after resetting the vqs, so maybe
it's better to use the supported features here, since the negotiated ones
are related to the previous instance.



I don't think using supported features is valid. Better to make sure
vringh_init_iotlb() is called after the features have been negotiated.



I think the vringh_init_iotlb() call in vdpasim_vq_reset() is just used 
to clean up the `struct vringh`, then it will be initialized in 
vdpasim_queue_ready() when features have already been negotiated.


Maybe here we can pass 0 (to the features parameter) if we don't want to 
use the features supported by the device.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 3/4] fuse: add per-file DAX flag

2021-07-20 Thread JeffleXu



On 7/20/21 2:41 AM, Vivek Goyal wrote:
> On Fri, Jul 16, 2021 at 06:47:52PM +0800, Jeffle Xu wrote:
>> Add one flag for fuse_attr.flags indicating if DAX shall be enabled for
>> this file.
>>
>> When the per-file DAX flag changes for an *opened* file, the state of
>> the file won't be updated until this file is closed and reopened later.
>>
>> Signed-off-by: Jeffle Xu 
> 
> [..]
>> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
>> index 36ed092227fa..90c9df10d37a 100644
>> --- a/include/uapi/linux/fuse.h
>> +++ b/include/uapi/linux/fuse.h
>> @@ -184,6 +184,9 @@
>>   *
>>   *  7.34
>>   *  - add FUSE_SYNCFS
>> + *
>> + *  7.35
>> + *  - add FUSE_ATTR_DAX
>>   */
>>  
>>  #ifndef _LINUX_FUSE_H
>> @@ -449,8 +452,10 @@ struct fuse_file_lock {
>>   * fuse_attr flags
>>   *
>>   * FUSE_ATTR_SUBMOUNT: Object is a submount root
>> + * FUSE_ATTR_DAX: Enable DAX for this file in per-file DAX mode
>>   */
>>  #define FUSE_ATTR_SUBMOUNT  (1 << 0)
>> +#define FUSE_ATTR_DAX   (1 << 1)
> 
> Generic fuse changes (addition of FUSE_ATTR_DAX) should probably in
> a separate patch. 

Got it.

> 
> I am not clear on one thing. If we are planning to rely on persistent
> inode attr (FS_XFLAG_DAX as per Documentation/filesystems/dax.rst), then
> why fuse server needs to communicate the state of that attr using a 
> flag? Can client directly query it?  I am not sure where at these
> attrs stored and if fuse protocol currently supports it.

There are two issues.

1. FUSE server side: Algorithm of deciding whether DAX is enabled for a
file.

As I previously replied in [1], FUSE server must enable DAX if the
backend file is flagged with FS_XFLAG_DAX, to make the FS_XFLAG_DAX
previously set by FUSE client effective.

But I will argue that FUSE server also has the flexibility of the
algorithm implementation. Even if guest queries FS_XFLAG_DAX by
GETFLAGS/FSGETXATTR ioctl, FUSE server can still enable DAX when the
backend file is not FS_XFLAG_DAX flagged.


2. The protocol between server and client.

extending LOOKUP vs. LOOKUP + GETFLAGS/FSGETXATTR ioctl

As I said in [1], client can directly query the FS_XFLAG_DAX flag, but
there will be one more round trip.


[1]
https://lore.kernel.org/linux-fsdevel/031efb1d-7c0d-35fb-c147-dcc3b6cac...@linux.alibaba.com/T/#m3f3407158b2c028694c85d82d0d6bd0387f4e24e

> 
> What about flag STATX_ATTR_DAX. We probably should report that too
> in stat if we are using dax on the inode?
> 

VFS will automatically report STATX_ATTR_DAX if inode is in DAX mode,
e.g., in vfs_getattr_nosec().



-- 
Thanks,
Jeffle
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] vdpa/vdpa_sim: Use the negotiated features when calling vringh_init_iotlb

2021-07-20 Thread Stefano Garzarella

On Tue, Jul 20, 2021 at 08:25:33AM +0300, Eli Cohen wrote:

When calling vringh_init_iotlb(), use the negotiated features which
might be different than the supported features.

Fixes: 2c53d0f64c06f ("vdpasim: vDPA device simulator")
Signed-off-by: Eli Cohen 
---
v0 --> v1:
Update "Fixes" line

drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 14e024de5cbf..89a474c7a096 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -66,7 +66,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, 
unsigned int idx)
{
struct vdpasim_virtqueue *vq = >vqs[idx];

-   vringh_init_iotlb(>vring, vdpasim->dev_attr.supported_features,
+   vringh_init_iotlb(>vring, vdpasim->features,
  VDPASIM_QUEUE_MAX, false,
  (struct vring_desc *)(uintptr_t)vq->desc_addr,
  (struct vring_avail *)
@@ -86,7 +86,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
vq->device_addr = 0;
vq->cb = NULL;
vq->private = NULL;
-   vringh_init_iotlb(>vring, vdpasim->dev_attr.supported_features,
+   vringh_init_iotlb(>vring, vdpasim->features,


vdpasim_vq_reset() is called while resetting the device in 
vdpasim_reset() where we also set `vdpasim->features = 0` after 
resetting the vqs, so maybe it's better to use the supported features 
here, since the negotiated ones are related to the previous instance.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 3/4] fuse: add per-file DAX flag

2021-07-20 Thread JeffleXu



On 7/20/21 3:44 AM, Vivek Goyal wrote:
> On Fri, Jul 16, 2021 at 06:47:52PM +0800, Jeffle Xu wrote:
>> Add one flag for fuse_attr.flags indicating if DAX shall be enabled for
>> this file.
>>
>> When the per-file DAX flag changes for an *opened* file, the state of
>> the file won't be updated until this file is closed and reopened later.
>>
>> Signed-off-by: Jeffle Xu 
>> ---
>>  fs/fuse/dax.c | 21 +
>>  fs/fuse/file.c|  4 ++--
>>  fs/fuse/fuse_i.h  |  5 +++--
>>  fs/fuse/inode.c   |  5 -
>>  include/uapi/linux/fuse.h |  5 +
>>  5 files changed, 31 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
>> index a478e824c2d0..0e862119757a 100644
>> --- a/fs/fuse/dax.c
>> +++ b/fs/fuse/dax.c
>> @@ -1341,7 +1341,7 @@ static const struct address_space_operations 
>> fuse_dax_file_aops  = {
>>  .invalidatepage = noop_invalidatepage,
>>  };
>>  
>> -static bool fuse_should_enable_dax(struct inode *inode)
>> +static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
>>  {
>>  struct fuse_conn *fc = get_fuse_conn(inode);
>>  unsigned int mode;
>> @@ -1354,18 +1354,31 @@ static bool fuse_should_enable_dax(struct inode 
>> *inode)
>>  if (mode == FUSE_DAX_MOUNT_NEVER)
>>  return false;
>>  
>> -return true;
>> +if (mode == FUSE_DAX_MOUNT_ALWAYS)
>> +return true;
>> +
>> +WARN_ON(mode != FUSE_DAX_MOUNT_INODE);
>> +return flags & FUSE_ATTR_DAX;
>>  }
>>  
>> -void fuse_dax_inode_init(struct inode *inode)
>> +void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>>  {
>> -if (!fuse_should_enable_dax(inode))
>> +if (!fuse_should_enable_dax(inode, flags))
>>  return;
>>  
>>  inode->i_flags |= S_DAX;
>>  inode->i_data.a_ops = _dax_file_aops;
>>  }
>>  
>> +void fuse_dax_dontcache(struct inode *inode, bool newdax)
>> +{
>> +struct fuse_conn *fc = get_fuse_conn(inode);
>> +
>> +if (fc->dax && fc->dax->mode == FUSE_DAX_MOUNT_INODE &&
>> +IS_DAX(inode) != newdax)
>> +d_mark_dontcache(inode);
>> +}
>> +
> 
> This capability to mark an inode dontcache should probably be in a
> separate patch. These seem to logically two functionalities. One is
> enabling DAX on an inode. And second is making sure how soon you
> see the effect of that change and hence marking inode dontcache.

OK, sounds reasonable.

> 
> Not sure how useful this is. In cache=none mode we should get rid of
> inode ASAP. In cache=auto mode we will get rid of after 1 second (or
> after a user specified timeout). So only place this seems to be
> useful is cache=always.

Actually dontcache here is used to avoid dynamic switching between DAX
and non-DAX state while file is opened. The complexity of dynamic
switching is that, you have to clear the address_space, since page cache
and DAX entry can not coexist in the address space. Besides,
inode->a_ops also needs to be changed dynamically.

With dontcache, dynamic switching is no longer needed and the DAX state
will be decided only when inode (in memory) is initialized. The downside
is that the new DAX state won't be updated until the file is closed and
reopened later.

'cache=none' only invalidates dentry, while the inode (in memory) is
still there (with address_space uncleared and a_ops unchanged).

The dynamic switching may be done, though it's not such straightforward.
Currently, ext4/xfs are all implemented in this dontcache way, i.e., the
new DAX state won't be seen until the file is closed and reopened later.

-- 
Thanks,
Jeffle
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization