Re: [PATCH v4 3/3] virtio_balloon: remove the unnecessary 0-initialization

2019-01-07 Thread Greg KH
On Tue, Jan 08, 2019 at 12:50:05PM +0800, Wei Wang wrote:
> We've changed to kzalloc the vb struct, so no need to 0-initialize
> this field one more time.
> 
> Signed-off-by: Wei Wang 
> Reviewed-by: Cornelia Huck 
> ---
>  drivers/virtio/virtio_balloon.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index d48c12c..048959a 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -925,7 +925,6 @@ static int virtballoon_probe(struct virtio_device *vdev)
> VIRTIO_BALLOON_CMD_ID_STOP);
>   vb->cmd_id_stop = cpu_to_virtio32(vb->vdev,
> VIRTIO_BALLOON_CMD_ID_STOP);
> - vb->num_free_page_blocks = 0;
>   spin_lock_init(>free_page_list_lock);
>   INIT_LIST_HEAD(>free_page_list);
>   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
> -- 
> 2.7.4
> 



This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 2/3] virtio-balloon: improve update_balloon_size_func

2019-01-07 Thread Greg KH
On Tue, Jan 08, 2019 at 12:50:04PM +0800, Wei Wang wrote:
> There is no need to update the balloon actual register when there is no
> ballooning request. This patch avoids update_balloon_size when diff is 0.
> 
> Signed-off-by: Wei Wang 
> Reviewed-by: Cornelia Huck 
> Reviewed-by: Halil Pasic 
> Tested-by: Christian Borntraeger 
> ---
>  drivers/virtio/virtio_balloon.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 45d32f5..d48c12c 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -457,9 +457,12 @@ static void update_balloon_size_func(struct work_struct 
> *work)
> update_balloon_size_work);
>   diff = towards_target(vb);
>  
> + if (!diff)
> + return;
> +
>   if (diff > 0)
>   diff -= fill_balloon(vb, diff);
> - else if (diff < 0)
> + else
>   diff += leak_balloon(vb, -diff);
>   update_balloon_size(vb);
>  
> -- 
> 2.7.4
> 



This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 1/3] virtio-balloon: tweak config_changed implementation

2019-01-07 Thread Greg KH
On Tue, Jan 08, 2019 at 12:50:03PM +0800, Wei Wang wrote:
> virtio-ccw has deadlock issues with reading the config space inside the
> interrupt context, so we tweak the virtballoon_changed implementation
> by moving the config read operations into the related workqueue contexts.
> The config_read_bitmap is used as a flag to the workqueue callbacks
> about the related config fields that need to be read.
> 
> The cmd_id_received is also renamed to cmd_id_received_cache, and
> the value should be obtained via virtio_balloon_cmd_id_received.
> 
> Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
> Reported-by: Christian Borntraeger 
> Signed-off-by: Wei Wang 
> Reviewed-by: Cornelia Huck 
> Reviewed-by: Halil Pasic 
> Tested-by: Christian Borntraeger 
> ---
>  drivers/virtio/virtio_balloon.c | 98 
> +++--
>  1 file changed, 65 insertions(+), 33 deletions(-)
> 



This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation

2019-01-07 Thread Wei Wang

On 01/07/2019 09:49 PM, Christian Borntraeger wrote:


On 07.01.2019 08:01, Wei Wang wrote:

virtio-ccw has deadlock issues with reading the config space inside the
interrupt context, so we tweak the virtballoon_changed implementation
by moving the config read operations into the related workqueue contexts.
The config_read_bitmap is used as a flag to the workqueue callbacks
about the related config fields that need to be read.

The cmd_id_received is also renamed to cmd_id_received_cache, and
the value should be obtained via virtio_balloon_cmd_id_received.

Reported-by: Christian Borntraeger 
Signed-off-by: Wei Wang 
Reviewed-by: Cornelia Huck 
Reviewed-by: Halil Pasic 

Together with
   virtio_pci: use queue idx instead of array idx to set up the vq
   virtio: don't allocate vqs when names[i] = NULL

Tested-by: Christian Borntraeger 


OK. I don't plan to send a new version of the above patches as no 
changes needed so far.


Michael, if the above two patches look good to you, please help add the 
related tested-by

and reviewed-by tags. Thanks.

Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 3/3] virtio_balloon: remove the unnecessary 0-initialization

2019-01-07 Thread Wei Wang
We've changed to kzalloc the vb struct, so no need to 0-initialize
this field one more time.

Signed-off-by: Wei Wang 
Reviewed-by: Cornelia Huck 
---
 drivers/virtio/virtio_balloon.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index d48c12c..048959a 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -925,7 +925,6 @@ static int virtballoon_probe(struct virtio_device *vdev)
  VIRTIO_BALLOON_CMD_ID_STOP);
vb->cmd_id_stop = cpu_to_virtio32(vb->vdev,
  VIRTIO_BALLOON_CMD_ID_STOP);
-   vb->num_free_page_blocks = 0;
spin_lock_init(>free_page_list_lock);
INIT_LIST_HEAD(>free_page_list);
if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
-- 
2.7.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 0/3] virtio-balloon: tweak config_changed

2019-01-07 Thread Wei Wang
Since virtio-ccw doesn't work with accessing to the config space
inside an interrupt context, this patch series avoids that issue by
moving the config register accesses to the related workqueue contexts.

v3->v4 ChangeLog:
- change virtio32_to_cpu to cpu_to_virtio_32 in send_cmd_id_start;
v2->v3 ChangeLog:
- rename cmd_id_received to cmd_id_received_cache, and have call sites
  read the latest value via virtio_balloon_cmd_id_received. (Still
  kept Cornelia and Halil's reviewed-by as it's a minor change)
- remove zeroing vb->num_free_page_blocks in probe since vb is
  allocated via kzalloc.
v1->v2 ChangeLog:
- add config_read_bitmap to indicate to the workqueue callbacks about
  the necessity of reading the related config fields.

Wei Wang (3):
  virtio-balloon: tweak config_changed implementation
  virtio-balloon: improve update_balloon_size_func
  virtio_balloon: remove the unnecessary 0-initialization

 drivers/virtio/virtio_balloon.c | 104 ++--
 1 file changed, 69 insertions(+), 35 deletions(-)

-- 
2.7.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 2/3] virtio-balloon: improve update_balloon_size_func

2019-01-07 Thread Wei Wang
There is no need to update the balloon actual register when there is no
ballooning request. This patch avoids update_balloon_size when diff is 0.

Signed-off-by: Wei Wang 
Reviewed-by: Cornelia Huck 
Reviewed-by: Halil Pasic 
Tested-by: Christian Borntraeger 
---
 drivers/virtio/virtio_balloon.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 45d32f5..d48c12c 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -457,9 +457,12 @@ static void update_balloon_size_func(struct work_struct 
*work)
  update_balloon_size_work);
diff = towards_target(vb);
 
+   if (!diff)
+   return;
+
if (diff > 0)
diff -= fill_balloon(vb, diff);
-   else if (diff < 0)
+   else
diff += leak_balloon(vb, -diff);
update_balloon_size(vb);
 
-- 
2.7.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 1/3] virtio-balloon: tweak config_changed implementation

2019-01-07 Thread Wei Wang
virtio-ccw has deadlock issues with reading the config space inside the
interrupt context, so we tweak the virtballoon_changed implementation
by moving the config read operations into the related workqueue contexts.
The config_read_bitmap is used as a flag to the workqueue callbacks
about the related config fields that need to be read.

The cmd_id_received is also renamed to cmd_id_received_cache, and
the value should be obtained via virtio_balloon_cmd_id_received.

Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Reported-by: Christian Borntraeger 
Signed-off-by: Wei Wang 
Reviewed-by: Cornelia Huck 
Reviewed-by: Halil Pasic 
Tested-by: Christian Borntraeger 
---
 drivers/virtio/virtio_balloon.c | 98 +++--
 1 file changed, 65 insertions(+), 33 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 728ecd1..45d32f5 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -61,6 +61,10 @@ enum virtio_balloon_vq {
VIRTIO_BALLOON_VQ_MAX
 };
 
+enum virtio_balloon_config_read {
+   VIRTIO_BALLOON_CONFIG_READ_CMD_ID = 0,
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
@@ -77,14 +81,20 @@ struct virtio_balloon {
/* Prevent updating balloon when it is being canceled. */
spinlock_t stop_update_lock;
bool stop_update;
+   /* Bitmap to indicate if reading the related config fields are needed */
+   unsigned long config_read_bitmap;
 
/* The list of allocated free pages, waiting to be given back to mm */
struct list_head free_page_list;
spinlock_t free_page_list_lock;
/* The number of free page blocks on the above list */
unsigned long num_free_page_blocks;
-   /* The cmd id received from host */
-   u32 cmd_id_received;
+   /*
+* The cmd id received from host.
+* Read it via virtio_balloon_cmd_id_received to get the latest value
+* sent from host.
+*/
+   u32 cmd_id_received_cache;
/* The cmd id that is actively in use */
__virtio32 cmd_id_active;
/* Buffer to store the stop sign */
@@ -390,37 +400,31 @@ static unsigned long return_free_pages_to_mm(struct 
virtio_balloon *vb,
return num_returned;
 }
 
+static void virtio_balloon_queue_free_page_work(struct virtio_balloon *vb)
+{
+   if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+   return;
+
+   /* No need to queue the work if the bit was already set. */
+   if (test_and_set_bit(VIRTIO_BALLOON_CONFIG_READ_CMD_ID,
+>config_read_bitmap))
+   return;
+
+   queue_work(vb->balloon_wq, >report_free_page_work);
+}
+
 static void virtballoon_changed(struct virtio_device *vdev)
 {
struct virtio_balloon *vb = vdev->priv;
unsigned long flags;
-   s64 diff = towards_target(vb);
-
-   if (diff) {
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq,
-  >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-   }
 
-   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
-   virtio_cread(vdev, struct virtio_balloon_config,
-free_page_report_cmd_id, >cmd_id_received);
-   if (vb->cmd_id_received == VIRTIO_BALLOON_CMD_ID_DONE) {
-   /* Pass ULONG_MAX to give back all the free pages */
-   return_free_pages_to_mm(vb, ULONG_MAX);
-   } else if (vb->cmd_id_received != VIRTIO_BALLOON_CMD_ID_STOP &&
-  vb->cmd_id_received !=
-  virtio32_to_cpu(vdev, vb->cmd_id_active)) {
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update) {
-   queue_work(vb->balloon_wq,
-  >report_free_page_work);
-   }
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-   }
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update) {
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   virtio_balloon_queue_free_page_work(vb);
}
+   spin_unlock_irqrestore(>stop_update_lock, flags);
 }
 
 static void update_balloon_size(struct virtio_balloon *vb)
@@ -527,6 +531,17 @@ static int init_vqs(struct virtio_balloon *vb)
return 0;
 }
 
+static u32 virtio_balloon_cmd_id_received(struct virtio_balloon *vb)
+{
+   if (test_and_clear_bit(VIRTIO_BALLOON_CONFIG_READ_CMD_ID,
+  

Re: [PATCH v15 23/26] sched: early boot clock

2019-01-07 Thread Dominique Martinet
Pavel Tatashin wrote on Mon, Jan 07, 2019:
> I did exactly the same sequence on Kaby Lake CPU and could not
> reproduce it. What is your host CPU?

skylake consumer laptop CPU: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz

I don't have any kaby lake around; I have access to older servers though...
-- 
Dominique
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v15 23/26] sched: early boot clock

2019-01-07 Thread Dominique Martinet
Pavel Tatashin wrote on Mon, Jan 07, 2019:
> I could not reproduce the problem. Did you suspend to memory between
> wake ups? Does this time jump happen every time, even if your laptop
> sleeps for a minute?

I'm not sure I understand "suspend to memory between the wake ups".
The full sequence is:
 - start a VM (just in case, I let it boot till the end)
 - suspend to memory (aka systemctl suspend) the host
 - after resuming the host, soft reboot the VM (login through
serial/ssh/whatever and reboot or in the qemu console 'system_reset')

I've just slept exactly one minute and reproduced again with the fedora
stock kernel now (4.19.13-300.fc29.x86_64) in the VM.

Interestingly I'm not getting the same offset between multiple reboots
now despite not suspending again; but if I don't suspend I cannot seem
to get it to give an offset at all (only tried for a few minutes; this
might not be true) ; OTOH I pushed my luck further and even with a five
seconds sleep I'm getting a noticeable offset on first VM reboot after
resume:

[0.00] Hypervisor detected: KVM
[0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00
[  179.362163] kvm-clock: cpu 0, msr 13c01001, primary cpu clock
[  179.362163] clocksource: kvm-clock: mask: 0x max_cycles: 
0x1cd42e4dffb, max_idle_ns: 881590591483 ns

Honestly not sure what more information I could give, I'll try on some
other hardware than my laptop (if I can get a server to resume after
suspend through ipmi or wake on lan); but I don't have anything I could
install ubuntu on to try their qemu's version... although I really don't
want to believe that's the difference...

Thanks,
-- 
Dominique
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 2/4] include/linux/compiler.h: allow memory operands

2019-01-07 Thread Michael S. Tsirkin
On Mon, Jan 07, 2019 at 05:54:27PM +, Will Deacon wrote:
> On Wed, Jan 02, 2019 at 03:57:54PM -0500, Michael S. Tsirkin wrote:
> > We don't really care whether the variable is in-register
> > or in-memory. Relax the constraint accordingly.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  include/linux/compiler.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > index 1ad367b4cd8d..6601d39e8c48 100644
> > --- a/include/linux/compiler.h
> > +++ b/include/linux/compiler.h
> > @@ -154,7 +154,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, 
> > int val,
> >  #ifndef OPTIMIZER_HIDE_VAR
> >  /* Make the optimizer believe the variable can be manipulated arbitrarily. 
> > */
> >  #define OPTIMIZER_HIDE_VAR(var)
> > \
> > -   __asm__ ("" : "=r" (var) : "0" (var))
> > +   __asm__ ("" : "=rm" (var) : "0" (var))
> >  #endif
> 
> I think this can break for architectures with write-back addressing modes
> such as arm, where the "m" constraint is assumed to be evaluated precisely
> once in the asm block.
> 
> Will

Thanks, I'll drop this patch.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH V3 0/5] Hi:

2019-01-07 Thread Dan Williams
On Mon, Jan 7, 2019 at 2:25 PM Michael S. Tsirkin  wrote:
>
> On Mon, Jan 07, 2019 at 01:39:15PM -0800, Dan Williams wrote:
> > On Mon, Jan 7, 2019 at 6:11 AM Michael S. Tsirkin  wrote:
> > >
> > > On Sun, Jan 06, 2019 at 11:15:20PM -0800, Dan Williams wrote:
> > > > On Sun, Jan 6, 2019 at 8:17 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote:
> > > > > >
> > > > > > On 2019/1/7 上午11:28, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote:
> > > > > > > > On 2019/1/3 上午4:47, Michael S. Tsirkin wrote:
> > > > > > > > > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote:
> > > > > > > > > > This series tries to access virtqueue metadata through 
> > > > > > > > > > kernel virtual
> > > > > > > > > > address instead of copy_user() friends since they had too 
> > > > > > > > > > much
> > > > > > > > > > overheads like checks, spec barriers or even hardware 
> > > > > > > > > > feature
> > > > > > > > > > toggling.
> > > > > > > > > Will review, thanks!
> > > > > > > > > One questions that comes to mind is whether it's all about 
> > > > > > > > > bypassing
> > > > > > > > > stac/clac.  Could you please include a performance comparison 
> > > > > > > > > with
> > > > > > > > > nosmap?
> > > > > > > > >
> > > > > > > > On machine without SMAP (Sandy Bridge):
> > > > > > > >
> > > > > > > > Before: 4.8Mpps
> > > > > > > >
> > > > > > > > After: 5.2Mpps
> > > > > > > OK so would you say it's really unsafe versus safe accesses?
> > > > > > > Or would you say it's just a better written code?
> > > > > >
> > > > > >
> > > > > > It's the effect of removing speculation barrier.
> > > > >
> > > > >
> > > > > You mean __uaccess_begin_nospec introduced by
> > > > > commit 304ec1b050310548db33063e567123fae8fd0301
> > > > > ?
> > > > >
> > > > > So fundamentally we do access_ok checks when supplying
> > > > > the memory table to the kernel thread, and we should
> > > > > do the spec barrier there.
> > > > >
> > > > > Then we can just create and use a variant of uaccess macros that does
> > > > > not include the barrier?
> > > > >
> > > > > Or, how about moving the barrier into access_ok?
> > > > > This way repeated accesses with a single access_ok get a bit faster.
> > > > > CC Dan Williams on this idea.
> > > >
> > > > It would be interesting to see how expensive re-doing the address
> > > > limit check is compared to the speculation barrier. I.e. just switch
> > > > vhost_get_user() to use get_user() rather than __get_user(). That will
> > > > sanitize the pointer in the speculative path without a barrier.
> > >
> > > Hmm it's way cheaper even though IIRC it's measureable.
> > > Jason, would you like to try?
> > > Although frankly __get_user being slower than get_user feels very wrong.
> > > Not yet sure what to do exactly but would you agree?
> >
> > Agree. __get_user() being faster than get_user() defeats the whole
> > point of converting code paths to the access_ok() + __get_user()
> > pattern.
>
> Did you mean the reverse?

Hmm, no... I'll rephrase: __get_user() should have lower overhead than
get_user().
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 3/4] barriers: convert a control to a data dependency

2019-01-07 Thread Peter Zijlstra
On Mon, Jan 07, 2019 at 08:36:36AM -0500, Michael S. Tsirkin wrote:
> On Mon, Jan 07, 2019 at 10:46:10AM +0100, Peter Zijlstra wrote:

> > How about naming the thing: dependent_ptr() ? That is without any (r)mb
> > implications at all. The address dependency is strictly weaker than an
> > rmb in that it will only order the two loads in qestion and not, like
> > rmb, any prior to any later load.
> 
> So I'm fine with this as it's enough for virtio, but I would like to point 
> out two things:
> 
> 1. E.g. on x86 both SMP and DMA variants can be NOPs but
> the madatory one can't, so assuming we do not want
> it to be stronger than rmp then either we want
> smp_dependent_ptr(), dma_dependent_ptr(), dependent_ptr()
> or we just will specify that dependent_ptr() works for
> both DMA and SMP.

The latter; the construct simply generates dependent loads. It is up to
the CPU as to what all that works for.

> 2. Down the road, someone might want to order a store after a load.
> Address dependency does that for us too. Assuming we make
> dependent_ptr a NOP on x86, we will want an mb variant
> which isn't a NOP on x86. Will we want to rename
> dependent_ptr to dependent_ptr_rmb at that point?

Not sure; what is the actual overhead of the construct on x86 vs the
NOP?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 3/4] barriers: convert a control to a data dependency

2019-01-07 Thread Michael S. Tsirkin
On Mon, Jan 07, 2019 at 04:54:23PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 07, 2019 at 08:36:36AM -0500, Michael S. Tsirkin wrote:
> > On Mon, Jan 07, 2019 at 10:46:10AM +0100, Peter Zijlstra wrote:
> 
> > > How about naming the thing: dependent_ptr() ? That is without any (r)mb
> > > implications at all. The address dependency is strictly weaker than an
> > > rmb in that it will only order the two loads in qestion and not, like
> > > rmb, any prior to any later load.
> > 
> > So I'm fine with this as it's enough for virtio, but I would like to point 
> > out two things:
> > 
> > 1. E.g. on x86 both SMP and DMA variants can be NOPs but
> > the madatory one can't, so assuming we do not want
> > it to be stronger than rmp then either we want
> > smp_dependent_ptr(), dma_dependent_ptr(), dependent_ptr()
> > or we just will specify that dependent_ptr() works for
> > both DMA and SMP.
> 
> The latter; the construct simply generates dependent loads. It is up to
> the CPU as to what all that works for.

But not on intel right? On intel loads are ordered so it can be a nop.

> > 2. Down the road, someone might want to order a store after a load.
> > Address dependency does that for us too. Assuming we make
> > dependent_ptr a NOP on x86, we will want an mb variant
> > which isn't a NOP on x86. Will we want to rename
> > dependent_ptr to dependent_ptr_rmb at that point?
> 
> Not sure; what is the actual overhead of the construct on x86 vs the
> NOP?

I'll have to check. There's a pipeline stall almost for sure - that's
why we put it there after all :).

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 1/2] virtio-net: bql support

2019-01-07 Thread Michael S. Tsirkin
On Mon, Jan 07, 2019 at 02:31:47PM +0800, Jason Wang wrote:
> 
> On 2019/1/7 下午12:01, Michael S. Tsirkin wrote:
> > On Mon, Jan 07, 2019 at 11:51:55AM +0800, Jason Wang wrote:
> > > On 2019/1/7 上午11:17, Michael S. Tsirkin wrote:
> > > > On Mon, Jan 07, 2019 at 10:14:37AM +0800, Jason Wang wrote:
> > > > > On 2019/1/2 下午9:59, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jan 02, 2019 at 11:28:43AM +0800, Jason Wang wrote:
> > > > > > > On 2018/12/31 上午2:45, Michael S. Tsirkin wrote:
> > > > > > > > On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote:
> > > > > > > > > On 2018/12/26 下午11:19, Michael S. Tsirkin wrote:
> > > > > > > > > > On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
> > > > > > > > > > > On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
> > > > > > > > > > > > When use_napi is set, let's enable BQLs.  Note: some of 
> > > > > > > > > > > > the issues are
> > > > > > > > > > > > similar to wifi.  It's worth considering whether 
> > > > > > > > > > > > something similar to
> > > > > > > > > > > > commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing 
> > > > > > > > > > > > shift") might be
> > > > > > > > > > > > benefitial.
> > > > > > > > > > > I've played a similar patch several days before. The 
> > > > > > > > > > > tricky part is the mode
> > > > > > > > > > > switching between napi and no napi. We should make sure 
> > > > > > > > > > > when the packet is
> > > > > > > > > > > sent and trakced by BQL,  it should be consumed by BQL as 
> > > > > > > > > > > well. I did it by
> > > > > > > > > > > tracking it through skb->cb.  And deal with the freeze by 
> > > > > > > > > > > reset the BQL
> > > > > > > > > > > status. Patch attached.
> > > > > > > > > > > 
> > > > > > > > > > > But when testing with vhost-net, I don't very a stable 
> > > > > > > > > > > performance,
> > > > > > > > > > So how about increasing TSQ pacing shift then?
> > > > > > > > > I can test this. But changing default TCP value is much more 
> > > > > > > > > than a
> > > > > > > > > virtio-net specific thing.
> > > > > > > > Well same logic as wifi applies. Unpredictable latencies related
> > > > > > > > to radio in one case, to host scheduler in the other.
> > > > > > > > 
> > > > > > > > > > > it was
> > > > > > > > > > > probably because we batch the used ring updating so tx 
> > > > > > > > > > > interrupt may come
> > > > > > > > > > > randomly. We probably need to implement time bounded 
> > > > > > > > > > > coalescing mechanism
> > > > > > > > > > > which could be configured from userspace.
> > > > > > > > > > I don't think it's reasonable to expect userspace to be 
> > > > > > > > > > that smart ...
> > > > > > > > > > Why do we need time bounded? used ring is always updated 
> > > > > > > > > > when ring
> > > > > > > > > > becomes empty.
> > > > > > > > > We don't add used when means BQL may not see the consumed 
> > > > > > > > > packet in time.
> > > > > > > > > And the delay varies based on the workload since we count 
> > > > > > > > > packets not bytes
> > > > > > > > > or time before doing the batched updating.
> > > > > > > > > 
> > > > > > > > > Thanks
> > > > > > > > Sorry I still don't get it.
> > > > > > > > When nothing is outstanding then we do update the used.
> > > > > > > > So if BQL stops userspace from sending packets then
> > > > > > > > we get an interrupt and packets start flowing again.
> > > > > > > Yes, but how about the cases of multiple flows. That's where I 
> > > > > > > see unstable
> > > > > > > results.
> > > > > > > 
> > > > > > > 
> > > > > > > > It might be suboptimal, we might need to tune it but I doubt 
> > > > > > > > running
> > > > > > > > timers is a solution, timer interrupts cause VM exits.
> > > > > > > Probably not a timer but a time counter (or event byte counter) 
> > > > > > > in vhost to
> > > > > > > add used and signal guest if it exceeds a value instead of 
> > > > > > > waiting the
> > > > > > > number of packets.
> > > > > > > 
> > > > > > > 
> > > > > > > Thanks
> > > > > > Well we already have VHOST_NET_WEIGHT - is it too big then?
> > > > > I'm not sure, it might be too big.
> > > > > 
> > > > > 
> > > > > > And maybe we should expose the "MORE" flag in the descriptor -
> > > > > > do you think that will help?
> > > > > > 
> > > > > I don't know. But how a "more" flag can help here?
> > > > > 
> > > > > Thanks
> > > > It sounds like we should be a bit more aggressive in updating used ring.
> > > > But if we just do it naively we will harm performance for sure as that
> > > > is how we are doing batching right now.
> > > 
> > > I agree but the problem is to balance the PPS and throughput. More 
> > > batching
> > > helps for PPS but may damage TCP throughput.
> > That is what more flag is supposed to be I think - it is only set if
> > there's a socket that actually needs the skb freed in order to go on.
> 
> 
> I'm not quite sure I get, but is this something similar to what you want?
> 
> 

Re: [PATCH 1/1] s390/virtio: handle find on invalid queue gracefully

2019-01-07 Thread Cornelia Huck
On Mon,  7 Jan 2019 13:31:46 +0100
Halil Pasic  wrote:

> A queue with a capacity of zero is clearly not a valid virtio queue.
> Some emulators report zero queue size if queried with an invalid queue
> index. Instead of crashing in this case let us just return -EINVAL. To

s/-EINVAL/-ENOENT/

> make that work properly, let us fix the notifier cleanup logic as well.
> 
> Signed-off-by: Halil Pasic 
> ---
> This patch is motivated by commit 86a5597 "virtio-balloon:
> VIRTIO_BALLOON_F_FREE_PAGE_HINT" (Wei Wang, 2018-08-27) which triggered
> the described scenario.  The emulator in question is the current QEMU.
> The problem we run into is the underflow in the following loop
> in  __vring_new_virtqueue():
> for (i = 0; i < vring.num-1; i++)
>   vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1)
> Namely vring.num is an unsigned int.
> 
> RFC --> v1:
> * Change error code from -EINVAL to -ENOENT, so we are in line with the
> other transports.
> * Push down the detection of the error into virtio_ccw_read_vq_conf().
> ---
>  drivers/s390/virtio/virtio_ccw.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Thanks, applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH V3 0/5] Hi:

2019-01-07 Thread Michael S. Tsirkin
On Mon, Jan 07, 2019 at 02:58:08PM +0800, Jason Wang wrote:
> 
> On 2019/1/5 上午5:41, Michael S. Tsirkin wrote:
> > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote:
> > > This series tries to access virtqueue metadata through kernel virtual
> > > address instead of copy_user() friends since they had too much
> > > overheads like checks, spec barriers or even hardware feature
> > > toggling.
> > 
> > I think it's a reasonable approach.
> > However I need to look at whether and which mmu notifiers are invoked before
> > writeback. Do you know?
> 
> 
> I don't know but just looking at the MMU notifier ops definition, there's no
> such callback if my understanding is correct.
> 
> Thanks

In that case how are you making sure used ring updates are written back?
If they aren't guest will crash ...

> 
> > 
> > > Test shows about 24% improvement on TX PPS. It should benefit other
> > > cases as well.
> > > 
> > > Changes from V2:
> > > - fix buggy range overlapping check
> > > - tear down MMU notifier during vhost ioctl to make sure invalidation
> > >request can read metadata userspace address and vq size without
> > >holding vq mutex.
> > > Changes from V1:
> > > - instead of pinning pages, use MMU notifier to invalidate vmaps and
> > >remap duing metadata prefetch
> > > - fix build warning on MIPS
> > > 
> > > Jason Wang (5):
> > >vhost: generalize adding used elem
> > >vhost: fine grain userspace memory accessors
> > >vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch()
> > >vhost: introduce helpers to get the size of metadata area
> > >vhost: access vq metadata through kernel virtual address
> > > 
> > >   drivers/vhost/net.c   |   4 +-
> > >   drivers/vhost/vhost.c | 416 +-
> > >   drivers/vhost/vhost.h |  15 +-
> > >   3 files changed, 384 insertions(+), 51 deletions(-)
> > > 
> > > -- 
> > > 2.17.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH V3 0/5] Hi:

2019-01-07 Thread Michael S. Tsirkin
On Mon, Jan 07, 2019 at 02:50:17PM +0800, Jason Wang wrote:
> 
> On 2019/1/7 下午12:17, Michael S. Tsirkin wrote:
> > On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote:
> > > On 2019/1/7 上午11:28, Michael S. Tsirkin wrote:
> > > > On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote:
> > > > > On 2019/1/3 上午4:47, Michael S. Tsirkin wrote:
> > > > > > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote:
> > > > > > > This series tries to access virtqueue metadata through kernel 
> > > > > > > virtual
> > > > > > > address instead of copy_user() friends since they had too much
> > > > > > > overheads like checks, spec barriers or even hardware feature
> > > > > > > toggling.
> > > > > > Will review, thanks!
> > > > > > One questions that comes to mind is whether it's all about bypassing
> > > > > > stac/clac.  Could you please include a performance comparison with
> > > > > > nosmap?
> > > > > > 
> > > > > On machine without SMAP (Sandy Bridge):
> > > > > 
> > > > > Before: 4.8Mpps
> > > > > 
> > > > > After: 5.2Mpps
> > > > OK so would you say it's really unsafe versus safe accesses?
> > > > Or would you say it's just a better written code?
> > > 
> > > It's the effect of removing speculation barrier.
> > 
> > You mean __uaccess_begin_nospec introduced by
> > commit 304ec1b050310548db33063e567123fae8fd0301
> > ?
> 
> Yes.
> 
> 
> > 
> > So fundamentally we do access_ok checks when supplying
> > the memory table to the kernel thread, and we should
> > do the spec barrier there.
> > 
> > Then we can just create and use a variant of uaccess macros that does
> > not include the barrier?
> 
> 
> The unsafe ones?

Fundamentally yes.


> 
> > 
> > Or, how about moving the barrier into access_ok?
> > This way repeated accesses with a single access_ok get a bit faster.
> > CC Dan Williams on this idea.
> 
> 
> The problem is, e.g for vhost control path. During mem table validation, we
> don't even want to access them there. So the spec barrier is not needed.

Again spec barrier is not needed as such at all. It's defence in depth.
And mem table init is slow path. So we can stick a barrier there and it
won't be a problem for anyone.

> 
> > 
> > 
> > > > > On machine with SMAP (Broadwell):
> > > > > 
> > > > > Before: 5.0Mpps
> > > > > 
> > > > > After: 6.1Mpps
> > > > > 
> > > > > No smap: 7.5Mpps
> > > > > 
> > > > > 
> > > > > Thanks
> > > > no smap being before or after?
> > > > 
> > > Let me clarify:
> > > 
> > > 
> > > Before (SMAP on): 5.0Mpps
> > > 
> > > Before (SMAP off): 7.5Mpps
> > > 
> > > After (SMAP on): 6.1Mpps
> > > 
> > > 
> > > Thanks
> > How about after + smap off?
> 
> 
> After (SMAP off): 8.0Mpps
> 
> > 
> > And maybe we want a module option just for the vhost thread to keep smap
> > off generally since almost all it does is copy stuff from userspace into
> > kernel anyway. Because what above numbers should is that we really
> > really want a solution that isn't limited to just meta-data access,
> > and I really do not see how any such solution can not also be
> > used to make meta-data access fast.
> 
> 
> As we've discussed in another thread of previous version. This requires lots
> of changes, the main issues is SMAP state was not saved/restored on explicit
> schedule().

I wonder how expensive can reading eflags be?
If it's cheap we can just check EFLAGS.AC and rerun stac if needed.

> Even if it did, since vhost will call lots of net/block codes,
> any kind of uaccess in those codes needs understand this special request
> from vhost e.g you provably need to invent a new kinds of iov iterator that
> does not touch SMAP at all. And I'm not sure this is the only thing we need
> to deal with.


Well we wanted to move packet processing from tun into vhost anyway right?

> 
> So I still prefer to:
> 
> 1) speedup the metadata access through vmap + MMU notifier
> 
> 2) speedup the datacopy with batched copy (unsafe ones or other new
> interfaces)
> 
> Thanks

I just guess once you do (2) you will want to rework (1) to use
the new interfaces. So all the effort you are now investing in (1)
will be wasted. Just my $.02.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH V3 0/5] Hi:

2019-01-07 Thread Michael S. Tsirkin
On Sun, Jan 06, 2019 at 11:15:20PM -0800, Dan Williams wrote:
> On Sun, Jan 6, 2019 at 8:17 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote:
> > >
> > > On 2019/1/7 上午11:28, Michael S. Tsirkin wrote:
> > > > On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote:
> > > > > On 2019/1/3 上午4:47, Michael S. Tsirkin wrote:
> > > > > > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote:
> > > > > > > This series tries to access virtqueue metadata through kernel 
> > > > > > > virtual
> > > > > > > address instead of copy_user() friends since they had too much
> > > > > > > overheads like checks, spec barriers or even hardware feature
> > > > > > > toggling.
> > > > > > Will review, thanks!
> > > > > > One questions that comes to mind is whether it's all about bypassing
> > > > > > stac/clac.  Could you please include a performance comparison with
> > > > > > nosmap?
> > > > > >
> > > > > On machine without SMAP (Sandy Bridge):
> > > > >
> > > > > Before: 4.8Mpps
> > > > >
> > > > > After: 5.2Mpps
> > > > OK so would you say it's really unsafe versus safe accesses?
> > > > Or would you say it's just a better written code?
> > >
> > >
> > > It's the effect of removing speculation barrier.
> >
> >
> > You mean __uaccess_begin_nospec introduced by
> > commit 304ec1b050310548db33063e567123fae8fd0301
> > ?
> >
> > So fundamentally we do access_ok checks when supplying
> > the memory table to the kernel thread, and we should
> > do the spec barrier there.
> >
> > Then we can just create and use a variant of uaccess macros that does
> > not include the barrier?
> >
> > Or, how about moving the barrier into access_ok?
> > This way repeated accesses with a single access_ok get a bit faster.
> > CC Dan Williams on this idea.
> 
> It would be interesting to see how expensive re-doing the address
> limit check is compared to the speculation barrier. I.e. just switch
> vhost_get_user() to use get_user() rather than __get_user(). That will
> sanitize the pointer in the speculative path without a barrier.

Hmm it's way cheaper even though IIRC it's measureable.
Jason, would you like to try?
Although frankly __get_user being slower than get_user feels very wrong.
Not yet sure what to do exactly but would you agree?


> I recall we had a convert access_ok() discussion with this result here:
> 
> https://lkml.org/lkml/2018/1/17/929

Sorry let me try to clarify. IIUC speculating access_ok once
is harmless. As Linus said the problem is with "_subsequent_
accesses that can then be used to perturb the cache".

Thus:

1. if (!access_ok)
2.  return
3.  get_user
4. if (!access_ok)
5.  return
6.  get_user

Your proposal that Linus nacked was to effectively add a barrier after
lines 2 and 5 (also using the array_index_nospec trick for speed),
right? Unfortunately that needs a big API change.

I am asking about adding barrier_nospec within access_ok.
Thus effectively before lines 1 and 4.
access_ok will be slower but after all the point of access_ok is
to then access the same memory multiple times.
So we should be making __get_user faster and access_ok slower ...

> ...but it sounds like you are proposing a smaller scope fixup for the
> vhost use case? Something like barrier_nospec() in the success path
> for all vhost access_ok() checks and then a get_user() variant that
> disables the barrier.

Maybe we'll have to. Except I hope vhost won't end up being the
only user otherwise it will be hard to maintain.


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 0/3] virtio-balloon: tweak config_changed

2019-01-07 Thread Christian Borntraeger



On 07.01.2019 14:45, Michael S. Tsirkin wrote:
> On Mon, Jan 07, 2019 at 03:01:03PM +0800, Wei Wang wrote:
>> Since virtio-ccw doesn't work with accessing to the config space
>> inside an interrupt context, this patch series avoids that issue by
>> moving the config register accesses to the related workqueue contexts.
> 
> So is this enough to get ccw going again or do we also need
> the patches that deal with NULL VQ names?

This alone does not fix 4.20. You need the other ones as well.(The other patches
fix the memory corruption during boot, this patch fixes the deadlock when 
actually
changing the balloon size)


> 
>> v2->v3 ChangeLog:
>> - rename cmd_id_received to cmd_id_received_cache, and have call sites
>>   read the latest value via virtio_balloon_cmd_id_received. (Still
>>   kept Cornelia and Halil's reviewed-by as it's a minor change)
>> - remove zeroing vb->num_free_page_blocks in probe since vb is
>>   allocated via kzalloc.
>> v1->v2 ChangeLog:
>> - add config_read_bitmap to indicate to the workqueue callbacks about
>>   the necessity of reading the related config fields.
>>
>> Wei Wang (3):
>>   virtio-balloon: tweak config_changed implementation
>>   virtio-balloon: improve update_balloon_size_func
>>   virtio_balloon: remove the unnecessary 0-initialization
>>
>>  drivers/virtio/virtio_balloon.c | 104 
>> ++--
>>  1 file changed, 69 insertions(+), 35 deletions(-)
>>
>> -- 
>> 2.7.4
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 3/3] virtio_balloon: remove the unnecessary 0-initialization

2019-01-07 Thread Cornelia Huck
On Mon,  7 Jan 2019 15:01:06 +0800
Wei Wang  wrote:

> We've changed to kzalloc the vb struct, so no need to 0-initialize
> this field one more time.
> 
> Signed-off-by: Wei Wang 
> ---
>  drivers/virtio/virtio_balloon.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index e33dc8e..f19061b 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -925,7 +925,6 @@ static int virtballoon_probe(struct virtio_device *vdev)
> VIRTIO_BALLOON_CMD_ID_STOP);
>   vb->cmd_id_stop = cpu_to_virtio32(vb->vdev,
> VIRTIO_BALLOON_CMD_ID_STOP);
> - vb->num_free_page_blocks = 0;
>   spin_lock_init(>free_page_list_lock);
>   INIT_LIST_HEAD(>free_page_list);
>   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {

Reviewed-by: Cornelia Huck 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation

2019-01-07 Thread Christian Borntraeger



On 07.01.2019 08:01, Wei Wang wrote:
> virtio-ccw has deadlock issues with reading the config space inside the
> interrupt context, so we tweak the virtballoon_changed implementation
> by moving the config read operations into the related workqueue contexts.
> The config_read_bitmap is used as a flag to the workqueue callbacks
> about the related config fields that need to be read.
> 
> The cmd_id_received is also renamed to cmd_id_received_cache, and
> the value should be obtained via virtio_balloon_cmd_id_received.
> 
> Reported-by: Christian Borntraeger 
> Signed-off-by: Wei Wang 
> Reviewed-by: Cornelia Huck 
> Reviewed-by: Halil Pasic 

Together with 
  virtio_pci: use queue idx instead of array idx to set up the vq
  virtio: don't allocate vqs when names[i] = NULL

Tested-by: Christian Borntraeger 

as already said, would be good to add cc stable (and a Fixes: tag)


> ---
>  drivers/virtio/virtio_balloon.c | 98 
> +++--
>  1 file changed, 65 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 728ecd1..fb12fe2 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -61,6 +61,10 @@ enum virtio_balloon_vq {
>   VIRTIO_BALLOON_VQ_MAX
>  };
>  
> +enum virtio_balloon_config_read {
> + VIRTIO_BALLOON_CONFIG_READ_CMD_ID = 0,
> +};
> +
>  struct virtio_balloon {
>   struct virtio_device *vdev;
>   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> @@ -77,14 +81,20 @@ struct virtio_balloon {
>   /* Prevent updating balloon when it is being canceled. */
>   spinlock_t stop_update_lock;
>   bool stop_update;
> + /* Bitmap to indicate if reading the related config fields are needed */
> + unsigned long config_read_bitmap;
>  
>   /* The list of allocated free pages, waiting to be given back to mm */
>   struct list_head free_page_list;
>   spinlock_t free_page_list_lock;
>   /* The number of free page blocks on the above list */
>   unsigned long num_free_page_blocks;
> - /* The cmd id received from host */
> - u32 cmd_id_received;
> + /*
> +  * The cmd id received from host.
> +  * Read it via virtio_balloon_cmd_id_received to get the latest value
> +  * sent from host.
> +  */
> + u32 cmd_id_received_cache;
>   /* The cmd id that is actively in use */
>   __virtio32 cmd_id_active;
>   /* Buffer to store the stop sign */
> @@ -390,37 +400,31 @@ static unsigned long return_free_pages_to_mm(struct 
> virtio_balloon *vb,
>   return num_returned;
>  }
>  
> +static void virtio_balloon_queue_free_page_work(struct virtio_balloon *vb)
> +{
> + if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> + return;
> +
> + /* No need to queue the work if the bit was already set. */
> + if (test_and_set_bit(VIRTIO_BALLOON_CONFIG_READ_CMD_ID,
> +  >config_read_bitmap))
> + return;
> +
> + queue_work(vb->balloon_wq, >report_free_page_work);
> +}
> +
>  static void virtballoon_changed(struct virtio_device *vdev)
>  {
>   struct virtio_balloon *vb = vdev->priv;
>   unsigned long flags;
> - s64 diff = towards_target(vb);
> -
> - if (diff) {
> - spin_lock_irqsave(>stop_update_lock, flags);
> - if (!vb->stop_update)
> - queue_work(system_freezable_wq,
> ->update_balloon_size_work);
> - spin_unlock_irqrestore(>stop_update_lock, flags);
> - }
>  
> - if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> - virtio_cread(vdev, struct virtio_balloon_config,
> -  free_page_report_cmd_id, >cmd_id_received);
> - if (vb->cmd_id_received == VIRTIO_BALLOON_CMD_ID_DONE) {
> - /* Pass ULONG_MAX to give back all the free pages */
> - return_free_pages_to_mm(vb, ULONG_MAX);
> - } else if (vb->cmd_id_received != VIRTIO_BALLOON_CMD_ID_STOP &&
> -vb->cmd_id_received !=
> -virtio32_to_cpu(vdev, vb->cmd_id_active)) {
> - spin_lock_irqsave(>stop_update_lock, flags);
> - if (!vb->stop_update) {
> - queue_work(vb->balloon_wq,
> ->report_free_page_work);
> - }
> - spin_unlock_irqrestore(>stop_update_lock, flags);
> - }
> + spin_lock_irqsave(>stop_update_lock, flags);
> + if (!vb->stop_update) {
> + queue_work(system_freezable_wq,
> +>update_balloon_size_work);
> + virtio_balloon_queue_free_page_work(vb);
>   }
> + spin_unlock_irqrestore(>stop_update_lock, flags);
>  }
>  
>  static void update_balloon_size(struct 

Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation

2019-01-07 Thread Michael S. Tsirkin
On Mon, Jan 07, 2019 at 03:01:04PM +0800, Wei Wang wrote:
> virtio-ccw has deadlock issues with reading the config space inside the
> interrupt context, so we tweak the virtballoon_changed implementation
> by moving the config read operations into the related workqueue contexts.
> The config_read_bitmap is used as a flag to the workqueue callbacks
> about the related config fields that need to be read.
> 
> The cmd_id_received is also renamed to cmd_id_received_cache, and
> the value should be obtained via virtio_balloon_cmd_id_received.
> 
> Reported-by: Christian Borntraeger 
> Signed-off-by: Wei Wang 
> Reviewed-by: Cornelia Huck 
> Reviewed-by: Halil Pasic 

Please include a Fixes tag to list the problematic commit.
See Documentation/process/submitting-patches.rst for the appropriate format.

> ---
>  drivers/virtio/virtio_balloon.c | 98 
> +++--
>  1 file changed, 65 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 728ecd1..fb12fe2 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -61,6 +61,10 @@ enum virtio_balloon_vq {
>   VIRTIO_BALLOON_VQ_MAX
>  };
>  
> +enum virtio_balloon_config_read {
> + VIRTIO_BALLOON_CONFIG_READ_CMD_ID = 0,
> +};
> +
>  struct virtio_balloon {
>   struct virtio_device *vdev;
>   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> @@ -77,14 +81,20 @@ struct virtio_balloon {
>   /* Prevent updating balloon when it is being canceled. */
>   spinlock_t stop_update_lock;
>   bool stop_update;
> + /* Bitmap to indicate if reading the related config fields are needed */
> + unsigned long config_read_bitmap;
>  
>   /* The list of allocated free pages, waiting to be given back to mm */
>   struct list_head free_page_list;
>   spinlock_t free_page_list_lock;
>   /* The number of free page blocks on the above list */
>   unsigned long num_free_page_blocks;
> - /* The cmd id received from host */
> - u32 cmd_id_received;
> + /*
> +  * The cmd id received from host.
> +  * Read it via virtio_balloon_cmd_id_received to get the latest value
> +  * sent from host.
> +  */
> + u32 cmd_id_received_cache;
>   /* The cmd id that is actively in use */
>   __virtio32 cmd_id_active;
>   /* Buffer to store the stop sign */
> @@ -390,37 +400,31 @@ static unsigned long return_free_pages_to_mm(struct 
> virtio_balloon *vb,
>   return num_returned;
>  }
>  
> +static void virtio_balloon_queue_free_page_work(struct virtio_balloon *vb)
> +{
> + if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> + return;
> +
> + /* No need to queue the work if the bit was already set. */
> + if (test_and_set_bit(VIRTIO_BALLOON_CONFIG_READ_CMD_ID,
> +  >config_read_bitmap))
> + return;
> +
> + queue_work(vb->balloon_wq, >report_free_page_work);
> +}
> +
>  static void virtballoon_changed(struct virtio_device *vdev)
>  {
>   struct virtio_balloon *vb = vdev->priv;
>   unsigned long flags;
> - s64 diff = towards_target(vb);
> -
> - if (diff) {
> - spin_lock_irqsave(>stop_update_lock, flags);
> - if (!vb->stop_update)
> - queue_work(system_freezable_wq,
> ->update_balloon_size_work);
> - spin_unlock_irqrestore(>stop_update_lock, flags);
> - }
>  
> - if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> - virtio_cread(vdev, struct virtio_balloon_config,
> -  free_page_report_cmd_id, >cmd_id_received);
> - if (vb->cmd_id_received == VIRTIO_BALLOON_CMD_ID_DONE) {
> - /* Pass ULONG_MAX to give back all the free pages */
> - return_free_pages_to_mm(vb, ULONG_MAX);
> - } else if (vb->cmd_id_received != VIRTIO_BALLOON_CMD_ID_STOP &&
> -vb->cmd_id_received !=
> -virtio32_to_cpu(vdev, vb->cmd_id_active)) {
> - spin_lock_irqsave(>stop_update_lock, flags);
> - if (!vb->stop_update) {
> - queue_work(vb->balloon_wq,
> ->report_free_page_work);
> - }
> - spin_unlock_irqrestore(>stop_update_lock, flags);
> - }
> + spin_lock_irqsave(>stop_update_lock, flags);
> + if (!vb->stop_update) {
> + queue_work(system_freezable_wq,
> +>update_balloon_size_work);
> + virtio_balloon_queue_free_page_work(vb);
>   }
> + spin_unlock_irqrestore(>stop_update_lock, flags);
>  }
>  
>  static void update_balloon_size(struct virtio_balloon *vb)
> @@ -527,6 +531,17 @@ static int init_vqs(struct virtio_balloon *vb)
>

Re: [PATCH 7/7] drm: Split out drm_probe_helper.h

2019-01-07 Thread Daniel Vetter
On Mon, Jan 07, 2019 at 10:08:41AM +, Liviu Dudau wrote:
> On Mon, Jan 07, 2019 at 10:45:23AM +0100, Daniel Vetter wrote:
> > On Sat, Dec 29, 2018 at 10:56:39PM +, Liviu Dudau wrote:
> > > On Mon, Dec 10, 2018 at 11:11:33AM +0100, Daniel Vetter wrote:
> > > > Having the probe helper stuff (which pretty much everyone needs) in
> > > > the drm_crtc_helper.h file (which atomic drivers should never need) is
> > > > confusing. Split them out.
> > > > 
> > > > To make sure I actually achieved the goal here I went through all
> > > > drivers. And indeed, all atomic drivers are now free of
> > > > drm_crtc_helper.h includes.
> > > > 
> > > > Signed-off-by: Daniel Vetter 
> > > > Cc: linux-arm-ker...@lists.infradead.org
> > > > Cc: virtualization@lists.linux-foundation.org
> > > > Cc: etna...@lists.freedesktop.org
> > > > Cc: linux-samsung-...@vger.kernel.org
> > > > Cc: intel-...@lists.freedesktop.org
> > > > Cc: linux-media...@lists.infradead.org
> > > > Cc: linux-amlo...@lists.infradead.org
> > > > Cc: linux-arm-...@vger.kernel.org
> > > > Cc: freedr...@lists.freedesktop.org
> > > > Cc: nouv...@lists.freedesktop.org
> > > > Cc: spice-de...@lists.freedesktop.org
> > > > Cc: amd-...@lists.freedesktop.org
> > > > Cc: linux-renesas-...@vger.kernel.org
> > > > Cc: linux-rockc...@lists.infradead.org
> > > > Cc: linux-st...@st-md-mailman.stormreply.com
> > > > Cc: linux-te...@vger.kernel.org
> > > > Cc: xen-de...@lists.xen.org
> > > 
> > > Daniel, please fix whatever script you're using to generate the list
> > > of people being Cc-ed. ./scripts/get_maintainer.pl generates my work
> > > email address for HDLCD and the Mali DP maintainers for malidp changes,
> > > but we were not Cc-ed and I've only found this patch in the linux-rockchip
> > > ML because there was not enough traffic there to be hidden under other 
> > > patches.
> > 
> > The number of Cc recipients this will generate is too much to be
> > acceptable for smtp servers. My scripts do generate the full lists, but
> > for patches like this here I need to delete a lot of them. So what I ended
> > up doing is deleting all the people and leaving the mailing lists behind.
> 
> OK, but Mali DP maintainers *is* a mailing list, exactly to cut off the 
> number of
> people you need to Cc in order to reach someone that takes care of Mali 
> Display
> drivers.

Hm right, that went wrong.

> > Plan B would be to split this up into a massive per-driver patch series,
> > which I found overkill in this case. But for anything with functional
> > changes that's what I usually end up doing.
> > 
> > Hope that explains what happened.
> > 
> > btw the tool I'm using is dim add-missing-cc from the maintainer-tools
> > repos.
> 
> I'll have a look to see what it does and how I can add Mali DP mailing list
> to that as a minimum :)

You need to list it as L:, not M:, then it'll be listed at the bottom of
the Cc: pile in the mailing list sections. I won't be able to find mailing
lists in the middle of 50+ maintainers :-)
-Daniel

> 
> Best regards,
> Liviu
> 
> > 
> > Cheers, Daniel
> > 
> > > 
> > > Best regards,
> > > Liviu
> > > 
> > > > ---
> > > >  .../gpu/drm/amd/amdgpu/amdgpu_connectors.c|  2 +-
> > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  2 +-
> > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  2 +-
> > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h  |  1 +
> > > >  .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  2 +-
> > > >  .../amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c  |  2 +-
> > > >  .../display/amdgpu_dm/amdgpu_dm_services.c|  2 +-
> > > >  drivers/gpu/drm/arc/arcpgu_crtc.c |  2 +-
> > > >  drivers/gpu/drm/arc/arcpgu_drv.c  |  2 +-
> > > >  drivers/gpu/drm/arc/arcpgu_sim.c  |  2 +-
> > > >  drivers/gpu/drm/arm/hdlcd_crtc.c  |  2 +-
> > > >  drivers/gpu/drm/arm/hdlcd_drv.c   |  2 +-
> > > >  drivers/gpu/drm/arm/malidp_crtc.c |  2 +-
> > > >  drivers/gpu/drm/arm/malidp_drv.c  |  2 +-
> > > >  drivers/gpu/drm/arm/malidp_mw.c   |  2 +-
> > > >  drivers/gpu/drm/armada/armada_510.c   |  2 +-
> > > >  drivers/gpu/drm/armada/armada_crtc.c  |  2 +-
> > > >  drivers/gpu/drm/armada/armada_drv.c   |  2 +-
> > > >  drivers/gpu/drm/armada/armada_fb.c|  2 +-
> > > >  drivers/gpu/drm/ast/ast_drv.c |  1 +
> > > >  drivers/gpu/drm/ast/ast_mode.c|  1 +
> > > >  .../gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c|  2 +-
> > > >  drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h  |  2 +-
> > > >  drivers/gpu/drm/bochs/bochs_drv.c |  1 +
> > > >  drivers/gpu/drm/bochs/bochs_kms.c |  1 +
> > > >  drivers/gpu/drm/bridge/adv7511/adv7511.h  |  2 +-
> > > >  drivers/gpu/drm/bridge/analogix-anx78xx.c |  3 +-
> > > >  .../drm/bridge/analogix/analogix_dp_core.c|  2 +-
> > > >  drivers/gpu/drm/bridge/cdns-dsi.c |  2 +-
> > > >  

Re: [PATCH RFC 3/4] barriers: convert a control to a data dependency

2019-01-07 Thread Peter Zijlstra
On Sun, Jan 06, 2019 at 11:23:07PM -0500, Michael S. Tsirkin wrote:
> On Mon, Jan 07, 2019 at 11:58:23AM +0800, Jason Wang wrote:
> > On 2019/1/3 上午4:57, Michael S. Tsirkin wrote:

> > > +#if defined(COMPILER_HAS_OPTIMIZER_HIDE_VAR) && \
> > > + !defined(ARCH_NEEDS_READ_BARRIER_DEPENDS)
> > > +
> > > +#define dependent_ptr_mb(ptr, val) ({
> > > \
> > > + long dependent_ptr_mb_val = (long)(val);\
> > > + long dependent_ptr_mb_ptr = (long)(ptr) - dependent_ptr_mb_val; \
> > > + \
> > > + BUILD_BUG_ON(sizeof(val) > sizeof(long));   \
> > > + OPTIMIZER_HIDE_VAR(dependent_ptr_mb_val);   \
> > > + (typeof(ptr))(dependent_ptr_mb_ptr + dependent_ptr_mb_val); \
> > > +})
> > > +
> > > +#else
> > > +
> > > +#define dependent_ptr_mb(ptr, val) ({ mb(); (ptr); })
> > 
> > 
> > So for the example of patch 4, we'd better fall back to rmb() or need a
> > dependent_ptr_rmb()?
> > 
> > Thanks
> 
> You mean for strongly ordered architectures like Intel?
> Yes, maybe it makes sense to have dependent_ptr_smp_rmb,
> dependent_ptr_dma_rmb and dependent_ptr_virt_rmb.
> 
> mb variant is unused right now so I'll remove it.

How about naming the thing: dependent_ptr() ? That is without any (r)mb
implications at all. The address dependency is strictly weaker than an
rmb in that it will only order the two loads in qestion and not, like
rmb, any prior to any later load.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH V3 0/5] Hi:

2019-01-07 Thread Michael S. Tsirkin
On Mon, Jan 07, 2019 at 01:39:15PM -0800, Dan Williams wrote:
> On Mon, Jan 7, 2019 at 6:11 AM Michael S. Tsirkin  wrote:
> >
> > On Sun, Jan 06, 2019 at 11:15:20PM -0800, Dan Williams wrote:
> > > On Sun, Jan 6, 2019 at 8:17 PM Michael S. Tsirkin  wrote:
> > > >
> > > > On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote:
> > > > >
> > > > > On 2019/1/7 上午11:28, Michael S. Tsirkin wrote:
> > > > > > On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote:
> > > > > > > On 2019/1/3 上午4:47, Michael S. Tsirkin wrote:
> > > > > > > > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote:
> > > > > > > > > This series tries to access virtqueue metadata through kernel 
> > > > > > > > > virtual
> > > > > > > > > address instead of copy_user() friends since they had too much
> > > > > > > > > overheads like checks, spec barriers or even hardware feature
> > > > > > > > > toggling.
> > > > > > > > Will review, thanks!
> > > > > > > > One questions that comes to mind is whether it's all about 
> > > > > > > > bypassing
> > > > > > > > stac/clac.  Could you please include a performance comparison 
> > > > > > > > with
> > > > > > > > nosmap?
> > > > > > > >
> > > > > > > On machine without SMAP (Sandy Bridge):
> > > > > > >
> > > > > > > Before: 4.8Mpps
> > > > > > >
> > > > > > > After: 5.2Mpps
> > > > > > OK so would you say it's really unsafe versus safe accesses?
> > > > > > Or would you say it's just a better written code?
> > > > >
> > > > >
> > > > > It's the effect of removing speculation barrier.
> > > >
> > > >
> > > > You mean __uaccess_begin_nospec introduced by
> > > > commit 304ec1b050310548db33063e567123fae8fd0301
> > > > ?
> > > >
> > > > So fundamentally we do access_ok checks when supplying
> > > > the memory table to the kernel thread, and we should
> > > > do the spec barrier there.
> > > >
> > > > Then we can just create and use a variant of uaccess macros that does
> > > > not include the barrier?
> > > >
> > > > Or, how about moving the barrier into access_ok?
> > > > This way repeated accesses with a single access_ok get a bit faster.
> > > > CC Dan Williams on this idea.
> > >
> > > It would be interesting to see how expensive re-doing the address
> > > limit check is compared to the speculation barrier. I.e. just switch
> > > vhost_get_user() to use get_user() rather than __get_user(). That will
> > > sanitize the pointer in the speculative path without a barrier.
> >
> > Hmm it's way cheaper even though IIRC it's measureable.
> > Jason, would you like to try?
> > Although frankly __get_user being slower than get_user feels very wrong.
> > Not yet sure what to do exactly but would you agree?
> 
> Agree. __get_user() being faster than get_user() defeats the whole
> point of converting code paths to the access_ok() + __get_user()
> pattern.

Did you mean the reverse?

> >
> >
> > > I recall we had a convert access_ok() discussion with this result here:
> > >
> > > https://lkml.org/lkml/2018/1/17/929
> >
> > Sorry let me try to clarify. IIUC speculating access_ok once
> > is harmless. As Linus said the problem is with "_subsequent_
> > accesses that can then be used to perturb the cache".
> >
> > Thus:
> >
> > 1. if (!access_ok)
> > 2.  return
> > 3.  get_user
> > 4. if (!access_ok)
> > 5.  return
> > 6.  get_user
> >
> > Your proposal that Linus nacked was to effectively add a barrier after
> > lines 2 and 5 (also using the array_index_nospec trick for speed),
> > right? Unfortunately that needs a big API change.
> >
> > I am asking about adding barrier_nospec within access_ok.
> > Thus effectively before lines 1 and 4.
> > access_ok will be slower but after all the point of access_ok is
> > to then access the same memory multiple times.
> 
> If the barrier is before lines 1 and 4 then it offers no real
> protection as far I can see. It will start speculating again on the
> user controlled pointer value after the barrier resolves.
> 
> > So we should be making __get_user faster and access_ok slower ...
> 
> I agree, but then the barrier always needs to be after the access_ok()
> check unconditionally called in the return path from access_ok(). At
> that point it's back to the implementation that Linus nak'd, or I'm
> missing some other detail.
> 
> ...but maybe if it is limited to a new access_ok_nospec() then the
> concern is addressed? Then rename current __get_user() to
> __get_user_nospec() and make a new __get_user() that is back to being
> optimal.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH V3 0/5] Hi:

2019-01-07 Thread Dan Williams
On Mon, Jan 7, 2019 at 6:11 AM Michael S. Tsirkin  wrote:
>
> On Sun, Jan 06, 2019 at 11:15:20PM -0800, Dan Williams wrote:
> > On Sun, Jan 6, 2019 at 8:17 PM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote:
> > > >
> > > > On 2019/1/7 上午11:28, Michael S. Tsirkin wrote:
> > > > > On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote:
> > > > > > On 2019/1/3 上午4:47, Michael S. Tsirkin wrote:
> > > > > > > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote:
> > > > > > > > This series tries to access virtqueue metadata through kernel 
> > > > > > > > virtual
> > > > > > > > address instead of copy_user() friends since they had too much
> > > > > > > > overheads like checks, spec barriers or even hardware feature
> > > > > > > > toggling.
> > > > > > > Will review, thanks!
> > > > > > > One questions that comes to mind is whether it's all about 
> > > > > > > bypassing
> > > > > > > stac/clac.  Could you please include a performance comparison with
> > > > > > > nosmap?
> > > > > > >
> > > > > > On machine without SMAP (Sandy Bridge):
> > > > > >
> > > > > > Before: 4.8Mpps
> > > > > >
> > > > > > After: 5.2Mpps
> > > > > OK so would you say it's really unsafe versus safe accesses?
> > > > > Or would you say it's just a better written code?
> > > >
> > > >
> > > > It's the effect of removing speculation barrier.
> > >
> > >
> > > You mean __uaccess_begin_nospec introduced by
> > > commit 304ec1b050310548db33063e567123fae8fd0301
> > > ?
> > >
> > > So fundamentally we do access_ok checks when supplying
> > > the memory table to the kernel thread, and we should
> > > do the spec barrier there.
> > >
> > > Then we can just create and use a variant of uaccess macros that does
> > > not include the barrier?
> > >
> > > Or, how about moving the barrier into access_ok?
> > > This way repeated accesses with a single access_ok get a bit faster.
> > > CC Dan Williams on this idea.
> >
> > It would be interesting to see how expensive re-doing the address
> > limit check is compared to the speculation barrier. I.e. just switch
> > vhost_get_user() to use get_user() rather than __get_user(). That will
> > sanitize the pointer in the speculative path without a barrier.
>
> Hmm it's way cheaper even though IIRC it's measureable.
> Jason, would you like to try?
> Although frankly __get_user being slower than get_user feels very wrong.
> Not yet sure what to do exactly but would you agree?

Agree. __get_user() being faster than get_user() defeats the whole
point of converting code paths to the access_ok() + __get_user()
pattern.

>
>
> > I recall we had a convert access_ok() discussion with this result here:
> >
> > https://lkml.org/lkml/2018/1/17/929
>
> Sorry let me try to clarify. IIUC speculating access_ok once
> is harmless. As Linus said the problem is with "_subsequent_
> accesses that can then be used to perturb the cache".
>
> Thus:
>
> 1. if (!access_ok)
> 2.  return
> 3.  get_user
> 4. if (!access_ok)
> 5.  return
> 6.  get_user
>
> Your proposal that Linus nacked was to effectively add a barrier after
> lines 2 and 5 (also using the array_index_nospec trick for speed),
> right? Unfortunately that needs a big API change.
>
> I am asking about adding barrier_nospec within access_ok.
> Thus effectively before lines 1 and 4.
> access_ok will be slower but after all the point of access_ok is
> to then access the same memory multiple times.

If the barrier is before lines 1 and 4 then it offers no real
protection as far I can see. It will start speculating again on the
user controlled pointer value after the barrier resolves.

> So we should be making __get_user faster and access_ok slower ...

I agree, but then the barrier always needs to be after the access_ok()
check unconditionally called in the return path from access_ok(). At
that point it's back to the implementation that Linus nak'd, or I'm
missing some other detail.

...but maybe if it is limited to a new access_ok_nospec() then the
concern is addressed? Then rename current __get_user() to
__get_user_nospec() and make a new __get_user() that is back to being
optimal.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 3/4] barriers: convert a control to a data dependency

2019-01-07 Thread Michael S. Tsirkin
On Mon, Jan 07, 2019 at 11:02:36AM -0800, Paul E. McKenney wrote:
> On Mon, Jan 07, 2019 at 08:36:36AM -0500, Michael S. Tsirkin wrote:
> > On Mon, Jan 07, 2019 at 10:46:10AM +0100, Peter Zijlstra wrote:
> > > On Sun, Jan 06, 2019 at 11:23:07PM -0500, Michael S. Tsirkin wrote:
> > > > On Mon, Jan 07, 2019 at 11:58:23AM +0800, Jason Wang wrote:
> > > > > On 2019/1/3 上午4:57, Michael S. Tsirkin wrote:
> > > 
> > > > > > +#if defined(COMPILER_HAS_OPTIMIZER_HIDE_VAR) && \
> > > > > > +   !defined(ARCH_NEEDS_READ_BARRIER_DEPENDS)
> > > > > > +
> > > > > > +#define dependent_ptr_mb(ptr, val) ({  
> > > > > > \
> > > > > > +   long dependent_ptr_mb_val = (long)(val);
> > > > > > \
> > > > > > +   long dependent_ptr_mb_ptr = (long)(ptr) - dependent_ptr_mb_val; 
> > > > > > \
> > > > > > +   
> > > > > > \
> > > > > > +   BUILD_BUG_ON(sizeof(val) > sizeof(long));   
> > > > > > \
> > > > > > +   OPTIMIZER_HIDE_VAR(dependent_ptr_mb_val);   
> > > > > > \
> > > > > > +   (typeof(ptr))(dependent_ptr_mb_ptr + dependent_ptr_mb_val); 
> > > > > > \
> > > > > > +})
> > > > > > +
> > > > > > +#else
> > > > > > +
> > > > > > +#define dependent_ptr_mb(ptr, val) ({ mb(); (ptr); })
> > > > > 
> > > > > 
> > > > > So for the example of patch 4, we'd better fall back to rmb() or need 
> > > > > a
> > > > > dependent_ptr_rmb()?
> > > > > 
> > > > > Thanks
> > > > 
> > > > You mean for strongly ordered architectures like Intel?
> > > > Yes, maybe it makes sense to have dependent_ptr_smp_rmb,
> > > > dependent_ptr_dma_rmb and dependent_ptr_virt_rmb.
> > > > 
> > > > mb variant is unused right now so I'll remove it.
> > > 
> > > How about naming the thing: dependent_ptr() ? That is without any (r)mb
> > > implications at all. The address dependency is strictly weaker than an
> > > rmb in that it will only order the two loads in qestion and not, like
> > > rmb, any prior to any later load.
> > 
> > So I'm fine with this as it's enough for virtio, but I would like to point 
> > out two things:
> > 
> > 1. E.g. on x86 both SMP and DMA variants can be NOPs but
> > the madatory one can't, so assuming we do not want
> > it to be stronger than rmp then either we want
> > smp_dependent_ptr(), dma_dependent_ptr(), dependent_ptr()
> > or we just will specify that dependent_ptr() works for
> > both DMA and SMP.
> > 
> > 2. Down the road, someone might want to order a store after a load.
> > Address dependency does that for us too. Assuming we make
> > dependent_ptr a NOP on x86, we will want an mb variant
> > which isn't a NOP on x86. Will we want to rename
> > dependent_ptr to dependent_ptr_rmb at that point?
> 
> But x86 preserves store-after-load orderings anyway, and even Alpha
> respects ordering from loads to dependent stores.  So what am I missing
> here?
> 
>   Thanx, Paul

Oh you are right. Stores are not reordered with older loads on x86.

So point 2 is moot. Sorry about the noise.

I guess at this point the only sticking point is the ECC compiler.
I'm inclined to stick an mb() there, seeing as it doesn't even
have spectre protection enabled. Slow but safe.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3 3/3] virtio_balloon: remove the unnecessary 0-initialization

2019-01-07 Thread Wei Wang
We've changed to kzalloc the vb struct, so no need to 0-initialize
this field one more time.

Signed-off-by: Wei Wang 
---
 drivers/virtio/virtio_balloon.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index e33dc8e..f19061b 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -925,7 +925,6 @@ static int virtballoon_probe(struct virtio_device *vdev)
  VIRTIO_BALLOON_CMD_ID_STOP);
vb->cmd_id_stop = cpu_to_virtio32(vb->vdev,
  VIRTIO_BALLOON_CMD_ID_STOP);
-   vb->num_free_page_blocks = 0;
spin_lock_init(>free_page_list_lock);
INIT_LIST_HEAD(>free_page_list);
if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
-- 
2.7.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 2/3] virtio-balloon: improve update_balloon_size_func

2019-01-07 Thread Wei Wang
There is no need to update the balloon actual register when there is no
ballooning request. This patch avoids update_balloon_size when diff is 0.

Signed-off-by: Wei Wang 
Reviewed-by: Cornelia Huck 
Reviewed-by: Halil Pasic 
---
 drivers/virtio/virtio_balloon.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index fb12fe2..e33dc8e 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -457,9 +457,12 @@ static void update_balloon_size_func(struct work_struct 
*work)
  update_balloon_size_work);
diff = towards_target(vb);
 
+   if (!diff)
+   return;
+
if (diff > 0)
diff -= fill_balloon(vb, diff);
-   else if (diff < 0)
+   else
diff += leak_balloon(vb, -diff);
update_balloon_size(vb);
 
-- 
2.7.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 2/4] include/linux/compiler.h: allow memory operands

2019-01-07 Thread Will Deacon
On Wed, Jan 02, 2019 at 03:57:54PM -0500, Michael S. Tsirkin wrote:
> We don't really care whether the variable is in-register
> or in-memory. Relax the constraint accordingly.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  include/linux/compiler.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 1ad367b4cd8d..6601d39e8c48 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -154,7 +154,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, 
> int val,
>  #ifndef OPTIMIZER_HIDE_VAR
>  /* Make the optimizer believe the variable can be manipulated arbitrarily. */
>  #define OPTIMIZER_HIDE_VAR(var)  
> \
> - __asm__ ("" : "=r" (var) : "0" (var))
> + __asm__ ("" : "=rm" (var) : "0" (var))
>  #endif

I think this can break for architectures with write-back addressing modes
such as arm, where the "m" constraint is assumed to be evaluated precisely
once in the asm block.

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 0/3] virtio-balloon: tweak config_changed

2019-01-07 Thread Wei Wang
Since virtio-ccw doesn't work with accessing to the config space
inside an interrupt context, this patch series avoids that issue by
moving the config register accesses to the related workqueue contexts.

v2->v3 ChangeLog:
- rename cmd_id_received to cmd_id_received_cache, and have call sites
  read the latest value via virtio_balloon_cmd_id_received. (Still
  kept Cornelia and Halil's reviewed-by as it's a minor change)
- remove zeroing vb->num_free_page_blocks in probe since vb is
  allocated via kzalloc.
v1->v2 ChangeLog:
- add config_read_bitmap to indicate to the workqueue callbacks about
  the necessity of reading the related config fields.

Wei Wang (3):
  virtio-balloon: tweak config_changed implementation
  virtio-balloon: improve update_balloon_size_func
  virtio_balloon: remove the unnecessary 0-initialization

 drivers/virtio/virtio_balloon.c | 104 ++--
 1 file changed, 69 insertions(+), 35 deletions(-)

-- 
2.7.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH V3 1/5] vhost: generalize adding used elem

2019-01-07 Thread Michael S. Tsirkin
On Mon, Jan 07, 2019 at 03:00:17PM +0800, Jason Wang wrote:
> 
> On 2019/1/5 上午8:33, Sean Christopherson wrote:
> > On Fri, Jan 04, 2019 at 04:29:34PM -0500, Michael S. Tsirkin wrote:
> > > On Sat, Dec 29, 2018 at 08:46:52PM +0800, Jason Wang wrote:
> > > > Use one generic vhost_copy_to_user() instead of two dedicated
> > > > accessor. This will simplify the conversion to fine grain
> > > > accessors. About 2% improvement of PPS were seen during vitio-user
> > > > txonly test.
> > > > 
> > > > Signed-off-by: Jason Wang 
> > > I don't hve a problem with this patch but do you have
> > > any idea how come removing what's supposed to be
> > > an optimization speeds things up?
> > With SMAP, the 2x vhost_put_user() will also mean an extra STAC/CLAC pair,
> > which is probably slower than the overhead of CALL+RET to whatever flavor
> > of copy_user_generic() gets used.  CALL+RET is really the only overhead
> > since all variants of copy_user_generic() unroll accesses smaller than
> > 64 bytes, e.g. on a 64-bit system, __copy_to_user() will write all 8
> > bytes in a single MOV.
> > 
> > Removing the special casing also eliminates a few hundred bytes of code
> > as well as the need for hardware to predict count==1 vs. count>1.
> > 
> 
> Yes, I don't measure, but STAC/CALC is pretty expensive when we are do very
> small copies based on the result of nosmap PPS.
> 
> Thanks

Yes all this really looks like a poster child for uaccess_begin/end
plus unsafe accesses. And if these APIs don't do the job for us
then maybe better ones are needed ...


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 0/3] virtio-balloon: tweak config_changed

2019-01-07 Thread Michael S. Tsirkin
On Mon, Jan 07, 2019 at 03:01:03PM +0800, Wei Wang wrote:
> Since virtio-ccw doesn't work with accessing to the config space
> inside an interrupt context, this patch series avoids that issue by
> moving the config register accesses to the related workqueue contexts.

So is this enough to get ccw going again or do we also need
the patches that deal with NULL VQ names?

> v2->v3 ChangeLog:
> - rename cmd_id_received to cmd_id_received_cache, and have call sites
>   read the latest value via virtio_balloon_cmd_id_received. (Still
>   kept Cornelia and Halil's reviewed-by as it's a minor change)
> - remove zeroing vb->num_free_page_blocks in probe since vb is
>   allocated via kzalloc.
> v1->v2 ChangeLog:
> - add config_read_bitmap to indicate to the workqueue callbacks about
>   the necessity of reading the related config fields.
> 
> Wei Wang (3):
>   virtio-balloon: tweak config_changed implementation
>   virtio-balloon: improve update_balloon_size_func
>   virtio_balloon: remove the unnecessary 0-initialization
> 
>  drivers/virtio/virtio_balloon.c | 104 
> ++--
>  1 file changed, 69 insertions(+), 35 deletions(-)
> 
> -- 
> 2.7.4
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation

2019-01-07 Thread Michael S. Tsirkin
On Mon, Jan 07, 2019 at 03:01:04PM +0800, Wei Wang wrote:
> virtio-ccw has deadlock issues with reading the config space inside the
> interrupt context, so we tweak the virtballoon_changed implementation
> by moving the config read operations into the related workqueue contexts.
> The config_read_bitmap is used as a flag to the workqueue callbacks
> about the related config fields that need to be read.
> 
> The cmd_id_received is also renamed to cmd_id_received_cache, and
> the value should be obtained via virtio_balloon_cmd_id_received.
> 
> Reported-by: Christian Borntraeger 
> Signed-off-by: Wei Wang 
> Reviewed-by: Cornelia Huck 
> Reviewed-by: Halil Pasic 
> ---
>  drivers/virtio/virtio_balloon.c | 98 
> +++--
>  1 file changed, 65 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 728ecd1..fb12fe2 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -61,6 +61,10 @@ enum virtio_balloon_vq {
>   VIRTIO_BALLOON_VQ_MAX
>  };
>  
> +enum virtio_balloon_config_read {
> + VIRTIO_BALLOON_CONFIG_READ_CMD_ID = 0,
> +};
> +
>  struct virtio_balloon {
>   struct virtio_device *vdev;
>   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> @@ -77,14 +81,20 @@ struct virtio_balloon {
>   /* Prevent updating balloon when it is being canceled. */
>   spinlock_t stop_update_lock;
>   bool stop_update;
> + /* Bitmap to indicate if reading the related config fields are needed */
> + unsigned long config_read_bitmap;
>  
>   /* The list of allocated free pages, waiting to be given back to mm */
>   struct list_head free_page_list;
>   spinlock_t free_page_list_lock;
>   /* The number of free page blocks on the above list */
>   unsigned long num_free_page_blocks;
> - /* The cmd id received from host */
> - u32 cmd_id_received;
> + /*
> +  * The cmd id received from host.
> +  * Read it via virtio_balloon_cmd_id_received to get the latest value
> +  * sent from host.
> +  */
> + u32 cmd_id_received_cache;
>   /* The cmd id that is actively in use */
>   __virtio32 cmd_id_active;
>   /* Buffer to store the stop sign */
> @@ -390,37 +400,31 @@ static unsigned long return_free_pages_to_mm(struct 
> virtio_balloon *vb,
>   return num_returned;
>  }
>  
> +static void virtio_balloon_queue_free_page_work(struct virtio_balloon *vb)
> +{
> + if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> + return;
> +
> + /* No need to queue the work if the bit was already set. */
> + if (test_and_set_bit(VIRTIO_BALLOON_CONFIG_READ_CMD_ID,
> +  >config_read_bitmap))
> + return;
> +
> + queue_work(vb->balloon_wq, >report_free_page_work);
> +}
> +
>  static void virtballoon_changed(struct virtio_device *vdev)
>  {
>   struct virtio_balloon *vb = vdev->priv;
>   unsigned long flags;
> - s64 diff = towards_target(vb);
> -
> - if (diff) {
> - spin_lock_irqsave(>stop_update_lock, flags);
> - if (!vb->stop_update)
> - queue_work(system_freezable_wq,
> ->update_balloon_size_work);
> - spin_unlock_irqrestore(>stop_update_lock, flags);
> - }
>  
> - if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> - virtio_cread(vdev, struct virtio_balloon_config,
> -  free_page_report_cmd_id, >cmd_id_received);
> - if (vb->cmd_id_received == VIRTIO_BALLOON_CMD_ID_DONE) {
> - /* Pass ULONG_MAX to give back all the free pages */
> - return_free_pages_to_mm(vb, ULONG_MAX);
> - } else if (vb->cmd_id_received != VIRTIO_BALLOON_CMD_ID_STOP &&
> -vb->cmd_id_received !=
> -virtio32_to_cpu(vdev, vb->cmd_id_active)) {
> - spin_lock_irqsave(>stop_update_lock, flags);
> - if (!vb->stop_update) {
> - queue_work(vb->balloon_wq,
> ->report_free_page_work);
> - }
> - spin_unlock_irqrestore(>stop_update_lock, flags);
> - }
> + spin_lock_irqsave(>stop_update_lock, flags);
> + if (!vb->stop_update) {
> + queue_work(system_freezable_wq,
> +>update_balloon_size_work);
> + virtio_balloon_queue_free_page_work(vb);
>   }
> + spin_unlock_irqrestore(>stop_update_lock, flags);
>  }
>  
>  static void update_balloon_size(struct virtio_balloon *vb)
> @@ -527,6 +531,17 @@ static int init_vqs(struct virtio_balloon *vb)
>   return 0;
>  }
>  
> +static u32 virtio_balloon_cmd_id_received(struct virtio_balloon *vb)
> +{
> + if 

Re: [PATCH v3 0/3] virtio-balloon: tweak config_changed

2019-01-07 Thread Christian Borntraeger
Can you please cc stable? Right now 4.20 does not work under KVM.



On 07.01.2019 08:01, Wei Wang wrote:
> Since virtio-ccw doesn't work with accessing to the config space
> inside an interrupt context, this patch series avoids that issue by
> moving the config register accesses to the related workqueue contexts.
> 
> v2->v3 ChangeLog:
> - rename cmd_id_received to cmd_id_received_cache, and have call sites
>   read the latest value via virtio_balloon_cmd_id_received. (Still
>   kept Cornelia and Halil's reviewed-by as it's a minor change)
> - remove zeroing vb->num_free_page_blocks in probe since vb is
>   allocated via kzalloc.
> v1->v2 ChangeLog:
> - add config_read_bitmap to indicate to the workqueue callbacks about
>   the necessity of reading the related config fields.
> 
> Wei Wang (3):
>   virtio-balloon: tweak config_changed implementation
>   virtio-balloon: improve update_balloon_size_func
>   virtio_balloon: remove the unnecessary 0-initialization
> 
>  drivers/virtio/virtio_balloon.c | 104 
> ++--
>  1 file changed, 69 insertions(+), 35 deletions(-)
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 3/4] barriers: convert a control to a data dependency

2019-01-07 Thread Michael S. Tsirkin
On Mon, Jan 07, 2019 at 10:46:10AM +0100, Peter Zijlstra wrote:
> On Sun, Jan 06, 2019 at 11:23:07PM -0500, Michael S. Tsirkin wrote:
> > On Mon, Jan 07, 2019 at 11:58:23AM +0800, Jason Wang wrote:
> > > On 2019/1/3 上午4:57, Michael S. Tsirkin wrote:
> 
> > > > +#if defined(COMPILER_HAS_OPTIMIZER_HIDE_VAR) && \
> > > > +   !defined(ARCH_NEEDS_READ_BARRIER_DEPENDS)
> > > > +
> > > > +#define dependent_ptr_mb(ptr, val) ({  
> > > > \
> > > > +   long dependent_ptr_mb_val = (long)(val);
> > > > \
> > > > +   long dependent_ptr_mb_ptr = (long)(ptr) - dependent_ptr_mb_val; 
> > > > \
> > > > +   
> > > > \
> > > > +   BUILD_BUG_ON(sizeof(val) > sizeof(long));   
> > > > \
> > > > +   OPTIMIZER_HIDE_VAR(dependent_ptr_mb_val);   
> > > > \
> > > > +   (typeof(ptr))(dependent_ptr_mb_ptr + dependent_ptr_mb_val); 
> > > > \
> > > > +})
> > > > +
> > > > +#else
> > > > +
> > > > +#define dependent_ptr_mb(ptr, val) ({ mb(); (ptr); })
> > > 
> > > 
> > > So for the example of patch 4, we'd better fall back to rmb() or need a
> > > dependent_ptr_rmb()?
> > > 
> > > Thanks
> > 
> > You mean for strongly ordered architectures like Intel?
> > Yes, maybe it makes sense to have dependent_ptr_smp_rmb,
> > dependent_ptr_dma_rmb and dependent_ptr_virt_rmb.
> > 
> > mb variant is unused right now so I'll remove it.
> 
> How about naming the thing: dependent_ptr() ? That is without any (r)mb
> implications at all. The address dependency is strictly weaker than an
> rmb in that it will only order the two loads in qestion and not, like
> rmb, any prior to any later load.

So I'm fine with this as it's enough for virtio, but I would like to point out 
two things:

1. E.g. on x86 both SMP and DMA variants can be NOPs but
the madatory one can't, so assuming we do not want
it to be stronger than rmp then either we want
smp_dependent_ptr(), dma_dependent_ptr(), dependent_ptr()
or we just will specify that dependent_ptr() works for
both DMA and SMP.

2. Down the road, someone might want to order a store after a load.
Address dependency does that for us too. Assuming we make
dependent_ptr a NOP on x86, we will want an mb variant
which isn't a NOP on x86. Will we want to rename
dependent_ptr to dependent_ptr_rmb at that point?

Thanks,

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH V3 5/5] vhost: access vq metadata through kernel virtual address

2019-01-07 Thread Jason Wang


On 2019/1/5 上午5:34, Michael S. Tsirkin wrote:

On Sat, Dec 29, 2018 at 08:46:56PM +0800, Jason Wang wrote:

It was noticed that the copy_user() friends that was used to access
virtqueue metdata tends to be very expensive for dataplane
implementation like vhost since it involves lots of software checks,
speculation barrier, hardware feature toggling (e.g SMAP). The
extra cost will be more obvious when transferring small packets since
the time spent on metadata accessing become significant..

This patch tries to eliminate those overhead by accessing them through
kernel virtual address by vmap(). To make the pages can be migrated,
instead of pinning them through GUP, we use mmu notifiers to
invalidate vmaps and re-establish vmaps during each round of metadata
prefetching in necessary. For devices that doesn't use metadata
prefetching, the memory acessors fallback to normal copy_user()
implementation gracefully. The invalidation was synchronized with
datapath through vq mutex, and in order to avoid hold vq mutex during
range checking, MMU notifier was teared down when trying to modify vq
metadata.

Note that this was only done when device IOTLB is not enabled. We
could use similar method to optimize it in the future.

Tests shows about ~24% improvement on TX PPS when using virtio-user +
vhost_net + xdp1 on TAP:

Before: ~5.0Mpps
After:  ~6.1Mpps

Signed-off-by: Jason Wang 
---
  drivers/vhost/vhost.c | 263 +-
  drivers/vhost/vhost.h |  13 +++
  2 files changed, 274 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 54b43feef8d9..e1ecb8acf8a3 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -440,6 +440,9 @@ void vhost_dev_init(struct vhost_dev *dev,
vq->indirect = NULL;
vq->heads = NULL;
vq->dev = dev;
+   memset(>avail_ring, 0, sizeof(vq->avail_ring));
+   memset(>used_ring, 0, sizeof(vq->used_ring));
+   memset(>desc_ring, 0, sizeof(vq->desc_ring));
mutex_init(>mutex);
vhost_vq_reset(dev, vq);
if (vq->handle_kick)
@@ -510,6 +513,73 @@ static size_t vhost_get_desc_size(struct vhost_virtqueue 
*vq, int num)
return sizeof(*vq->desc) * num;
  }
  
+static void vhost_uninit_vmap(struct vhost_vmap *map)

+{
+   if (map->addr)
+   vunmap(map->unmap_addr);
+
+   map->addr = NULL;
+   map->unmap_addr = NULL;
+}
+
+static int vhost_invalidate_vmap(struct vhost_virtqueue *vq,
+struct vhost_vmap *map,
+unsigned long ustart,
+size_t size,
+unsigned long start,
+unsigned long end,
+bool blockable)
+{
+   if (end < ustart || start > ustart - 1 + size)
+   return 0;
+
+   if (!blockable)
+   return -EAGAIN;
+
+   mutex_lock(>mutex);
+   vhost_uninit_vmap(map);
+   mutex_unlock(>mutex);
+
+   return 0;
+}
+
+static int vhost_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long start,
+unsigned long end,
+bool blockable)
+{
+   struct vhost_dev *dev = container_of(mn, struct vhost_dev,
+mmu_notifier);
+   int i;
+
+   for (i = 0; i < dev->nvqs; i++) {
+   struct vhost_virtqueue *vq = dev->vqs[i];
+
+   if (vhost_invalidate_vmap(vq, >avail_ring,
+ (unsigned long)vq->avail,
+ vhost_get_avail_size(vq, vq->num),
+ start, end, blockable))
+   return -EAGAIN;
+   if (vhost_invalidate_vmap(vq, >desc_ring,
+ (unsigned long)vq->desc,
+ vhost_get_desc_size(vq, vq->num),
+ start, end, blockable))
+   return -EAGAIN;
+   if (vhost_invalidate_vmap(vq, >used_ring,
+ (unsigned long)vq->used,
+ vhost_get_used_size(vq, vq->num),
+ start, end, blockable))
+   return -EAGAIN;
+   }
+
+   return 0;
+}
+
+static const struct mmu_notifier_ops vhost_mmu_notifier_ops = {
+   .invalidate_range_start = vhost_mmu_notifier_invalidate_range_start,
+};
+
  /* Caller should have device mutex */
  long vhost_dev_set_owner(struct vhost_dev *dev)
  {
@@ -541,7 +611,14 @@ long 

Re: [PATCH 7/7] drm: Split out drm_probe_helper.h

2019-01-07 Thread Daniel Vetter
On Sat, Dec 29, 2018 at 10:56:39PM +, Liviu Dudau wrote:
> On Mon, Dec 10, 2018 at 11:11:33AM +0100, Daniel Vetter wrote:
> > Having the probe helper stuff (which pretty much everyone needs) in
> > the drm_crtc_helper.h file (which atomic drivers should never need) is
> > confusing. Split them out.
> > 
> > To make sure I actually achieved the goal here I went through all
> > drivers. And indeed, all atomic drivers are now free of
> > drm_crtc_helper.h includes.
> > 
> > Signed-off-by: Daniel Vetter 
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: virtualization@lists.linux-foundation.org
> > Cc: etna...@lists.freedesktop.org
> > Cc: linux-samsung-...@vger.kernel.org
> > Cc: intel-...@lists.freedesktop.org
> > Cc: linux-media...@lists.infradead.org
> > Cc: linux-amlo...@lists.infradead.org
> > Cc: linux-arm-...@vger.kernel.org
> > Cc: freedr...@lists.freedesktop.org
> > Cc: nouv...@lists.freedesktop.org
> > Cc: spice-de...@lists.freedesktop.org
> > Cc: amd-...@lists.freedesktop.org
> > Cc: linux-renesas-...@vger.kernel.org
> > Cc: linux-rockc...@lists.infradead.org
> > Cc: linux-st...@st-md-mailman.stormreply.com
> > Cc: linux-te...@vger.kernel.org
> > Cc: xen-de...@lists.xen.org
> 
> Daniel, please fix whatever script you're using to generate the list
> of people being Cc-ed. ./scripts/get_maintainer.pl generates my work
> email address for HDLCD and the Mali DP maintainers for malidp changes,
> but we were not Cc-ed and I've only found this patch in the linux-rockchip
> ML because there was not enough traffic there to be hidden under other 
> patches.

The number of Cc recipients this will generate is too much to be
acceptable for smtp servers. My scripts do generate the full lists, but
for patches like this here I need to delete a lot of them. So what I ended
up doing is deleting all the people and leaving the mailing lists behind.

Plan B would be to split this up into a massive per-driver patch series,
which I found overkill in this case. But for anything with functional
changes that's what I usually end up doing.

Hope that explains what happened.

btw the tool I'm using is dim add-missing-cc from the maintainer-tools
repos.

Cheers, Daniel

> 
> Best regards,
> Liviu
> 
> > ---
> >  .../gpu/drm/amd/amdgpu/amdgpu_connectors.c|  2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h  |  1 +
> >  .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  2 +-
> >  .../amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c  |  2 +-
> >  .../display/amdgpu_dm/amdgpu_dm_services.c|  2 +-
> >  drivers/gpu/drm/arc/arcpgu_crtc.c |  2 +-
> >  drivers/gpu/drm/arc/arcpgu_drv.c  |  2 +-
> >  drivers/gpu/drm/arc/arcpgu_sim.c  |  2 +-
> >  drivers/gpu/drm/arm/hdlcd_crtc.c  |  2 +-
> >  drivers/gpu/drm/arm/hdlcd_drv.c   |  2 +-
> >  drivers/gpu/drm/arm/malidp_crtc.c |  2 +-
> >  drivers/gpu/drm/arm/malidp_drv.c  |  2 +-
> >  drivers/gpu/drm/arm/malidp_mw.c   |  2 +-
> >  drivers/gpu/drm/armada/armada_510.c   |  2 +-
> >  drivers/gpu/drm/armada/armada_crtc.c  |  2 +-
> >  drivers/gpu/drm/armada/armada_drv.c   |  2 +-
> >  drivers/gpu/drm/armada/armada_fb.c|  2 +-
> >  drivers/gpu/drm/ast/ast_drv.c |  1 +
> >  drivers/gpu/drm/ast/ast_mode.c|  1 +
> >  .../gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c|  2 +-
> >  drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h  |  2 +-
> >  drivers/gpu/drm/bochs/bochs_drv.c |  1 +
> >  drivers/gpu/drm/bochs/bochs_kms.c |  1 +
> >  drivers/gpu/drm/bridge/adv7511/adv7511.h  |  2 +-
> >  drivers/gpu/drm/bridge/analogix-anx78xx.c |  3 +-
> >  .../drm/bridge/analogix/analogix_dp_core.c|  2 +-
> >  drivers/gpu/drm/bridge/cdns-dsi.c |  2 +-
> >  drivers/gpu/drm/bridge/dumb-vga-dac.c |  2 +-
> >  .../bridge/megachips-stdp-ge-b850v3-fw.c  |  2 +-
> >  drivers/gpu/drm/bridge/nxp-ptn3460.c  |  2 +-
> >  drivers/gpu/drm/bridge/panel.c|  2 +-
> >  drivers/gpu/drm/bridge/parade-ps8622.c|  2 +-
> >  drivers/gpu/drm/bridge/sii902x.c  |  2 +-
> >  drivers/gpu/drm/bridge/synopsys/dw-hdmi.c |  2 +-
> >  drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c |  2 +-
> >  drivers/gpu/drm/bridge/tc358764.c |  2 +-
> >  drivers/gpu/drm/bridge/tc358767.c |  2 +-
> >  drivers/gpu/drm/bridge/ti-sn65dsi86.c |  2 +-
> >  drivers/gpu/drm/bridge/ti-tfp410.c|  2 +-
> >  drivers/gpu/drm/cirrus/cirrus_drv.c   |  1 +
> >  drivers/gpu/drm/cirrus/cirrus_mode.c  |  1 +
> >  drivers/gpu/drm/drm_atomic_helper.c   |  1 -
> >  drivers/gpu/drm/drm_dp_mst_topology.c |  2 +-
> >  drivers/gpu/drm/drm_modeset_helper.c  |  2 +-
> >  drivers/gpu/drm/drm_probe_helper.c|  2 +-