Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-24 Thread Christoph Hellwig
On Wed, Jul 24, 2019 at 06:10:53PM -0400, Michael S. Tsirkin wrote:
> Christoph - would a documented API wrapping dma_mask make sense?
> With the documentation explaining how users must
> desist from using DMA APIs if that returns false ...

We have some bigger changes in this are planned, including turning
dma_mask into a scalar instead of a pointer, please stay tuned.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 5/6] vhost: mark dirty pages during map uninit

2019-07-24 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 09:19:33PM +0800, Jason Wang wrote:
> 
> On 2019/7/23 下午5:17, Michael S. Tsirkin wrote:
> > On Tue, Jul 23, 2019 at 03:57:17AM -0400, Jason Wang wrote:
> > > We don't mark dirty pages if the map was teared down outside MMU
> > > notifier. This will lead untracked dirty pages. Fixing by marking
> > > dirty pages during map uninit.
> > > 
> > > Reported-by: Michael S. Tsirkin
> > > Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual 
> > > address")
> > > Signed-off-by: Jason Wang
> > > ---
> > >   drivers/vhost/vhost.c | 22 --
> > >   1 file changed, 16 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index 89c9f08b5146..5b8821d00fe4 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -306,6 +306,18 @@ static void vhost_map_unprefetch(struct vhost_map 
> > > *map)
> > >   kfree(map);
> > >   }
> > > +static void vhost_set_map_dirty(struct vhost_virtqueue *vq,
> > > + struct vhost_map *map, int index)
> > > +{
> > > + struct vhost_uaddr *uaddr = >uaddrs[index];
> > > + int i;
> > > +
> > > + if (uaddr->write) {
> > > + for (i = 0; i < map->npages; i++)
> > > + set_page_dirty(map->pages[i]);
> > > + }
> > > +}
> > > +
> > >   static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
> > >   {
> > >   struct vhost_map *map[VHOST_NUM_ADDRS];
> > > @@ -315,8 +327,10 @@ static void vhost_uninit_vq_maps(struct 
> > > vhost_virtqueue *vq)
> > >   for (i = 0; i < VHOST_NUM_ADDRS; i++) {
> > >   map[i] = rcu_dereference_protected(vq->maps[i],
> > > lockdep_is_held(>mmu_lock));
> > > - if (map[i])
> > > + if (map[i]) {
> > > + vhost_set_map_dirty(vq, map[i], i);
> > >   rcu_assign_pointer(vq->maps[i], NULL);
> > > + }
> > >   }
> > >   spin_unlock(>mmu_lock);
> > > @@ -354,7 +368,6 @@ static void vhost_invalidate_vq_start(struct 
> > > vhost_virtqueue *vq,
> > >   {
> > >   struct vhost_uaddr *uaddr = >uaddrs[index];
> > >   struct vhost_map *map;
> > > - int i;
> > >   if (!vhost_map_range_overlap(uaddr, start, end))
> > >   return;
> > > @@ -365,10 +378,7 @@ static void vhost_invalidate_vq_start(struct 
> > > vhost_virtqueue *vq,
> > >   map = rcu_dereference_protected(vq->maps[index],
> > >   lockdep_is_held(>mmu_lock));
> > >   if (map) {
> > > - if (uaddr->write) {
> > > - for (i = 0; i < map->npages; i++)
> > > - set_page_dirty(map->pages[i]);
> > > - }
> > > + vhost_set_map_dirty(vq, map, index);
> > >   rcu_assign_pointer(vq->maps[index], NULL);
> > >   }
> > >   spin_unlock(>mmu_lock);
> > OK and the reason it's safe is because the invalidate counter
> > got incremented so we know page will not get mapped again.
> > 
> > But we*do*  need to wait for page not to be mapped.
> > And if that means waiting for VQ processing to finish,
> > then I worry that is a very log time.
> > 
> 
> I'm not sure I get you here. If we don't have such map, we will fall back to
> normal uaccess helper. And in the memory accessor, the rcu critical section
> is pretty small.
> 
> Thanks
> 

OK. So the trick is that page_mkclean invokes mmu notifiers.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()

2019-07-24 Thread Bob Liu
On 7/24/19 12:25 PM, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> Hi,
> 
> This is mostly Jerome's work, converting the block/bio and related areas
> to call put_user_page*() instead of put_page(). Because I've changed
> Jerome's patches, in some cases significantly, I'd like to get his
> feedback before we actually leave him listed as the author (he might
> want to disown some or all of these).
> 

Could you add some background to the commit log for people don't have the 
context..
Why this converting? What's the main differences?

Regards, -Bob

> I added a new patch, in order to make this work with Christoph Hellwig's
> recent overhaul to bio_release_pages(): "block: bio_release_pages: use
> flags arg instead of bool".
> 
> I've started the series with a patch that I've posted in another
> series ("mm/gup: add make_dirty arg to put_user_pages_dirty_lock()"[1]),
> because I'm not sure which of these will go in first, and this allows each
> to stand alone.
> 
> Testing: not much beyond build and boot testing has been done yet. And
> I'm not set up to even exercise all of it (especially the IB parts) at
> run time.
> 
> Anyway, changes here are:
> 
> * Store, in the iov_iter, a "came from gup (get_user_pages)" parameter.
>   Then, use the new iov_iter_get_pages_use_gup() to retrieve it when
>   it is time to release the pages. That allows choosing between put_page()
>   and put_user_page*().
> 
> * Pass in one more piece of information to bio_release_pages: a "from_gup"
>   parameter. Similar use as above.
> 
> * Change the block layer, and several file systems, to use
>   put_user_page*().
> 
> [1] 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_r_20190724012606.25844-2D2-2Djhubbard-40nvidia.com=DwIDaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=1ktT0U2YS_I8Zz2o-MS1YcCAzWZ6hFGtyTgvVMGM7gI=FpFhv2rjbKCAYGmO6Hy8WJAottr1Qz_mDKDLObQ40FU=q-_mX3daEr22WbdZMElc_ZbD8L9oGLD7U0xLeyJ661Y=
>  
> And please note the correction email that I posted as a follow-up,
> if you're looking closely at that patch. :) The fixed version is
> included here.
> 
> John Hubbard (3):
>   mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
>   block: bio_release_pages: use flags arg instead of bool
>   fs/ceph: fix a build warning: returning a value from void function
> 
> Jérôme Glisse (9):
>   iov_iter: add helper to test if an iter would use GUP v2
>   block: bio_release_pages: convert put_page() to put_user_page*()
>   block_dev: convert put_page() to put_user_page*()
>   fs/nfs: convert put_page() to put_user_page*()
>   vhost-scsi: convert put_page() to put_user_page*()
>   fs/cifs: convert put_page() to put_user_page*()
>   fs/fuse: convert put_page() to put_user_page*()
>   fs/ceph: convert put_page() to put_user_page*()
>   9p/net: convert put_page() to put_user_page*()
> 
>  block/bio.c|  81 ---
>  drivers/infiniband/core/umem.c |   5 +-
>  drivers/infiniband/hw/hfi1/user_pages.c|   5 +-
>  drivers/infiniband/hw/qib/qib_user_pages.c |   5 +-
>  drivers/infiniband/hw/usnic/usnic_uiom.c   |   5 +-
>  drivers/infiniband/sw/siw/siw_mem.c|   8 +-
>  drivers/vhost/scsi.c   |  13 ++-
>  fs/block_dev.c |  22 +++-
>  fs/ceph/debugfs.c  |   2 +-
>  fs/ceph/file.c |  62 ---
>  fs/cifs/cifsglob.h |   3 +
>  fs/cifs/file.c |  22 +++-
>  fs/cifs/misc.c |  19 +++-
>  fs/direct-io.c |   2 +-
>  fs/fuse/dev.c  |  22 +++-
>  fs/fuse/file.c |  53 +++---
>  fs/nfs/direct.c|  10 +-
>  include/linux/bio.h|  22 +++-
>  include/linux/mm.h |   5 +-
>  include/linux/uio.h|  11 ++
>  mm/gup.c   | 115 +
>  net/9p/trans_common.c  |  14 ++-
>  net/9p/trans_common.h  |   3 +-
>  net/9p/trans_virtio.c  |  18 +++-
>  24 files changed, 357 insertions(+), 170 deletions(-)
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()

2019-07-24 Thread John Hubbard
On 7/23/19 11:17 PM, Christoph Hellwig wrote:
> On Tue, Jul 23, 2019 at 09:25:06PM -0700, john.hubb...@gmail.com wrote:
>> * Store, in the iov_iter, a "came from gup (get_user_pages)" parameter.
>>   Then, use the new iov_iter_get_pages_use_gup() to retrieve it when
>>   it is time to release the pages. That allows choosing between put_page()
>>   and put_user_page*().
>>
>> * Pass in one more piece of information to bio_release_pages: a "from_gup"
>>   parameter. Similar use as above.
>>
>> * Change the block layer, and several file systems, to use
>>   put_user_page*().
> 
> I think we can do this in a simple and better way.  We have 5 ITER_*
> types.  Of those ITER_DISCARD as the name suggests never uses pages, so
> we can skip handling it.  ITER_PIPE is rejected іn the direct I/O path,
> which leaves us with three.
> 
> Out of those ITER_BVEC needs a user page reference, so we want to call

   ^ ITER_IOVEC, I hope. Otherwise I'm hopeless lost. :)

> put_user_page* on it.  ITER_BVEC always already has page reference,
> which means in the block direct I/O path path we alread don't take
> a page reference.  We should extent that handling to all other calls
> of iov_iter_get_pages / iov_iter_get_pages_alloc.  I think we should
> just reject ITER_KVEC for direct I/O as well as we have no users and
> it is rather pointless.  Alternatively if we see a use for it the
> callers should always have a life page reference anyway (or might
> be on kmalloc memory), so we really should not take a reference either.
> 
> In other words:  the only time we should ever have to put a page in
> this patch is when they are user pages.  We'll need to clean up
> various bits of code for that, but that can be done gradually before
> even getting to the actual put_user_pages conversion.
> 

Sounds great. I'm part way into it and it doesn't look too bad. The main
question is where to scatter various checks and assertions, to keep
the kvecs out of direct I/0. Or at least keep the gups away from 
direct I/0.


thanks,
-- 
John Hubbard
NVIDIA
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-24 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 05:38:51PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 22, 2019 at 04:36:09PM +0100, Robin Murphy wrote:
> >> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >> index c8be1c4f5b55..37c143971211 100644
> >> --- a/drivers/virtio/virtio_ring.c
> >> +++ b/drivers/virtio/virtio_ring.c
> >> @@ -262,7 +262,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev)
> >>   {
> >>size_t max_segment_size = SIZE_MAX;
> >>   -if (vring_use_dma_api(vdev))
> >> +  if (vring_use_dma_api(vdev) && vdev->dev.dma_mask)
> >
> > Hmm, might it make sense to roll that check up into vring_use_dma_api() 
> > itself? After all, if the device has no mask then it's likely that other 
> > DMA API ops wouldn't really work as expected either.
> 
> Makes sense to me.

Christoph - would a documented API wrapping dma_mask make sense?
With the documentation explaining how users must
desist from using DMA APIs if that returns false ...


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 07/12] vhost-scsi: convert put_page() to put_user_page*()

2019-07-24 Thread John Hubbard
On 7/23/19 9:25 PM, john.hubb...@gmail.com wrote:
> From: Jérôme Glisse 
> 
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page().
> 
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
> 
> Changes from Jérôme's original patch:
> 
> * Changed a WARN_ON to a BUG_ON.
> 

Clearly, the above commit log has it backwards (this is quite my night
for typos).  Please read that as "changed a BUG_ON to a WARN_ON".

I'll correct the commit description in next iteration of this patchset.

...

> + /*
> +  * Here in all cases we should have an IOVEC which use GUP. If that is
> +  * not the case then we will wrongly call put_user_page() and the page
> +  * refcount will go wrong (this is in vhost_scsi_release_cmd())
> +  */
> + WARN_ON(!iov_iter_get_pages_use_gup(iter));
> +
...

thanks,
-- 
John Hubbard
NVIDIA
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

virtio BPF offers incorrect packet length

2019-07-24 Thread Arnau Verdaguer
Hi,

I suspect that the virtio network driver calls some skb BPF programs with
skb->data_end - skb->data != skb->len, but only for forwarded packets.

For instance, the attached sched_cls tc program prints skb->data,
skb->data_end and skb->len for each packet:

  -0 [000] ..s.   491.561727: 0: data: 3110080576
data_end: 3110080704 len: 262
  -0 [000] .Ns.   491.561752: 0: data: 3110080064
data_end: 3110080192 len: 250

As it can be seen, the frame length should be 262 and 250 bytes, but
data_end - data is always 128. For packets smaller than 128 it works fine.

I've tried the latest kernel (5.3-rc1) besides 4.14, 4.19, 5.2, etc., and
the error persists. Other drivers than virtio work as expected and I can
inspect every byte of the packet. Locally generated traffic also works as
expected.

To carry out this experiment I've used a Debian 10 virtual machine with
net.ipv4.ip_forward=1 and net.ipv4.conf.eth0.forwarding=1, forwarding
packets between its two virtio interfaces. The example program is run with
the following command, having BCC installed:

$> sudo python3 pkt_len_text.py eth0

Where "eth0" could be another ingress interface. The output is showed at
/sys/kernel/debug/tracing/trace_pipe.

Can anyone confirm if this error is reproducible, and does it have any
solutions?

BR.

Arnau
#!/usr/bin/python
# Copyright (c) PLUMgrid, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
import sys
from bcc import BPF
from pyroute2 import IPRoute

ipr = IPRoute()

text = """
int pkt_len_test(struct __sk_buff *skb) {
void *data_end = (void *) (long)skb->data_end;
void *data = (void *) (long)skb->data;
bpf_trace_printk("data: %u end: %u len: %d \\n", data, data_end, skb->len);
return 1;
}
"""

try:
b = BPF(text=text, debug=0)
fn = b.load_func("pkt_len_test", BPF.SCHED_CLS)
idx = ipr.link_lookup(ifname=str(sys.argv[1]))[0]

ipr.tc("add", "ingress", idx, ":")
ipr.tc("add-filter", "bpf", idx, ":1", fd=fn.fd,
   name=fn.name, parent=":", action="ok", classid=1)
ipr.tc("add", "sfq", idx, "1:")
ipr.tc("add-filter", "bpf", idx, ":1", fd=fn.fd,
   name=fn.name, parent="1:", action="ok", classid=1)
foo = input("Press any key to remove the filters and stop the program")
ipr.tc("del", "ingress", idx, ":")
ipr.tc("del", "sfq", idx, "1:")
print("Filters deleted")
except Exception as e:
print("Error: %s", e)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-net: parameterize min ring num_free for virtio receive

2019-07-24 Thread 冉 jiang

On 2019/7/20 0:13, Michael S. Tsirkin wrote:
> On Fri, Jul 19, 2019 at 03:31:29PM +, 冉 jiang wrote:
>> On 2019/7/19 22:29, Jiang wrote:
>>> On 2019/7/19 10:36, Jason Wang wrote:
 On 2019/7/18 下午10:43, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2019 at 10:42:47AM -0400, Michael S. Tsirkin wrote:
>> On Thu, Jul 18, 2019 at 10:01:05PM +0800, Jason Wang wrote:
>>> On 2019/7/18 下午9:04, Michael S. Tsirkin wrote:
 On Thu, Jul 18, 2019 at 12:55:50PM +, ? jiang wrote:
> This change makes ring buffer reclaim threshold num_free
> configurable
> for better performance, while it's hard coded as 1/2 * queue now.
> According to our test with qemu + dpdk, packet dropping happens
> when
> the guest is not able to provide free buffer in avail ring timely.
> Smaller value of num_free does decrease the number of packet
> dropping
> during our test as it makes virtio_net reclaim buffer earlier.
>
> At least, we should leave the value changeable to user while the
> default value as 1/2 * queue is kept.
>
> Signed-off-by: jiangkidd
 That would be one reason, but I suspect it's not the
 true one. If you need more buffer due to jitter
 then just increase the queue size. Would be cleaner.


 However are you sure this is the reason for
 packet drops? Do you see them dropped by dpdk
 due to lack of space in the ring? As opposed to
 by guest?


>>> Besides those, this patch depends on the user to choose a suitable
>>> threshold
>>> which is not good. You need either a good value with demonstrated
>>> numbers or
>>> something smarter.
>>>
>>> Thanks
>> I do however think that we have a problem right now: try_fill_recv can
>> take up a long time during which net stack does not run at all.
>> Imagine
>> a 1K queue - we are talking 512 packets. That's exceessive.

 Yes, we will starve a fast host in this case.


>>     napi poll
>> weight solves a similar problem, so it might make sense to cap this at
>> napi_poll_weight.
>>
>> Which will allow tweaking it through a module parameter as a
>> side effect :) Maybe just do NAPI_POLL_WEIGHT.
> Or maybe NAPI_POLL_WEIGHT/2 like we do at half the queue ;). Please
> experiment, measure performance and let the list know
>
>> Need to be careful though: queues can also be small and I don't
>> think we
>> want to exceed queue size / 2, or maybe queue size - napi_poll_weight.
>> Definitely must not exceed the full queue size.

 Looking at intel, it uses 16 and i40e uses 32.  It looks to me
 NAPI_POLL_WEIGHT/2 is better.

 Jiang, want to try that and post a new patch?

 Thanks


>> -- 
>> MST
>>> We did have completed several rounds of test with setting the value to
>>> budget (64 as the default value). It does improve a lot with pps is
>>> below 400pps for a single stream. Let me consolidate the data and will
>>> send it soon. Actually, we are confident that it runs out of free
>>> buffer in avail ring when packet dropping happens with below systemtap:
>>>
>>> Just a snippet:
>>>
>>> probe module("virtio_ring").function("virtqueue_get_buf")
>>> {
>>>      x = (@cast($_vq, "vring_virtqueue")->vring->used->idx)-
>>> (@cast($_vq, "vring_virtqueue")->last_used_idx) ---> we use this one
>>> to verify if the queue is full, which means guest is not able to take
>>> buffer from the queue timely
>>>
>>>      if (x<0 && (x+65535)<4096)
>>>          x = x+65535
>>>
>>>      if((x==1024) && @cast($_vq, "vring_virtqueue")->vq->callback ==
>>> callback_addr)
>>>          netrxcount[x] <<< gettimeofday_s()
>>> }
>>>
>>>
>>> probe module("virtio_ring").function("virtqueue_add_inbuf")
>>> {
>>>      y = (@cast($vq, "vring_virtqueue")->vring->avail->idx)-
>>> (@cast($vq, "vring_virtqueue")->vring->used->idx) ---> we use this one
>>> to verify if we run out of free buffer in avail ring
>>>      if (y<0 && (y+65535)<4096)
>>>          y = y+65535
>>>
>>>      if(@2=="debugon")
>>>      {
>>>          if(y==0 && @cast($vq, "vring_virtqueue")->vq->callback ==
>>> callback_addr)
>>>          {
>>>              netrxfreecount[y] <<< gettimeofday_s()
>>>
>>>              printf("no avail ring left seen, printing most recent 5
>>> num free, vq: %lx, current index: %d\n", $vq, recentfreecount)
>>>              for(i=recentfreecount; i!=((recentfreecount+4) % 5);
>>> i=((i+1) % 5))
>>>              {
>>>                  printf("index: %d, num free: %d\n", i, recentfree[$vq,
>>> i])
>>>              }
>>>
>>>              printf("index: %d, num free: %d\n", i, recentfree[$vq, i])
>>>              //exit()
>>>          }
>>>      }
>>> }
>>>
>>>
>>> probe
>>> module("virtio_net").statement("virtnet_receive@drivers/net/virtio_net.c:732")
>>> {
>>>  

Re: Re: Reminder: 3 open syzbot bugs in vhost subsystem

2019-07-24 Thread syzbot



On 2019/7/24 上午10:38, Eric Biggers wrote:
[This email was generated by a script.  Let me know if you have any  
suggestions
to make it better, or if you want it re-generated with the latest  
status.]


Of the currently open syzbot reports against the upstream kernel, I've  
manually
marked 3 of them as possibly being bugs in the vhost subsystem.  I've  
listed
these reports below, sorted by an algorithm that tries to list first the  
reports

most likely to be still valid, important, and actionable.



Of these 3 bugs, 2 were seen in mainline in the last week.



Of these 3 bugs, 2 were bisected to commits from the following person:



Jason Wang 


If you believe a bug is no longer valid, please close the syzbot report  
by
sending a '#syz fix', '#syz dup', or '#syz invalid' command in reply to  
the

original thread, as explained at https://goo.gl/tpsmEJ#status


If you believe I misattributed a bug to the vhost subsystem, please let  
me know,

and if possible forward the report to the correct people or mailing list.



Here are the bugs:




Title:  KASAN: use-after-free Write in tlb_finish_mmu
Last occurred:  5 days ago
Reported:   4 days ago
Branches:   Mainline
Dashboard link:  
https://syzkaller.appspot.com/bug?id=d57b94f89e48c85ef7d95acc208209ea4bdc10de
Original thread: 
https://lkml.kernel.org/lkml/45e7a1058e024...@google.com/T/#u



This bug has a syzkaller reproducer only.



This bug was bisected to:



commit 7f466032dc9e5a61217f22ea34b2df932786bbfc
Author: Jason Wang 
Date:   Fri May 24 08:12:18 2019 +



  vhost: access vq metadata through kernel virtual address



No one has replied to the original thread for this bug yet.



If you fix this bug, please add the following tag to the commit:
  Reported-by: syzbot+8267e9af795434ffa...@syzkaller.appspotmail.com



If you send any email or patch for this bug, please reply to the original
thread.  For the git send-email command to use, or tips on how to reply  
if the

thread isn't in your mailbox, see the "Reply instructions" at
https://lkml.kernel.org/r/45e7a1058e024...@google.com




Title:  KASAN: use-after-free Read in finish_task_switch (2)
Last occurred:  5 days ago
Reported:   4 days ago
Branches:   Mainline
Dashboard link:  
https://syzkaller.appspot.com/bug?id=9a98fcad6c8bd31f5c3afbdc6c75de9f082c0ffa
Original thread: 
https://lkml.kernel.org/lkml/490679058e024...@google.com/T/#u



This bug has a syzkaller reproducer only.



This bug was bisected to:



commit 7f466032dc9e5a61217f22ea34b2df932786bbfc
Author: Jason Wang 
Date:   Fri May 24 08:12:18 2019 +



  vhost: access vq metadata through kernel virtual address



No one has replied to the original thread for this bug yet.




Hi:



We believe above two bugs are duplicated with the report "WARNING in
__mmdrop". Can I just dup them with



#syz dup "WARNING in __mmdrop"


I see the command but can't find the corresponding bug.
Please resend the email to syzbot+h...@syzkaller.appspotmail.com address
that is the sender of the bug report (also present in the Reported-by tag).



(If yes, just wonder how syzbot differ bugs, technically, several
different bug can hit the same warning).





If you fix this bug, please add the following tag to the commit:
  Reported-by: syzbot+7f067c796eee2acbc...@syzkaller.appspotmail.com



If you send any email or patch for this bug, please reply to the original
thread.  For the git send-email command to use, or tips on how to reply  
if the

thread isn't in your mailbox, see the "Reply instructions" at
https://lkml.kernel.org/r/490679058e024...@google.com




Title:  memory leak in vhost_net_ioctl
Last occurred:  22 days ago
Reported:   48 days ago
Branches:   Mainline
Dashboard link:  
https://syzkaller.appspot.com/bug?id=12ba349d7e26ccfe95317bc376e812ebbae2ee0f
Original thread: 
https://lkml.kernel.org/lkml/188da1058a9c2...@google.com/T/#u



This bug has a C reproducer.


The original thread for this bug has received 4 replies; the last was 39  
days

ago.



If you fix this bug, please add the following tag to the commit:
  Reported-by: syzbot+0789f0c7e45efd7bb...@syzkaller.appspotmail.com




I do remember it can not be reproduced upstream, let me double check and
close this one.



Thanks




If you send any email or patch for this bug, please consider replying to  
the
original thread.  For the git send-email command to use, or tips on how  
to reply

if the thread isn't in your mailbox, see the "Reply instructions" at

[PATCH] drm/qxl: Use dev_get_drvdata where possible

2019-07-24 Thread Chuhong Yuan
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.

Signed-off-by: Chuhong Yuan 
---
 drivers/gpu/drm/qxl/qxl_drv.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_drv.c b/drivers/gpu/drm/qxl/qxl_drv.c
index f33e349c4ec5..af1e2b377945 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.c
+++ b/drivers/gpu/drm/qxl/qxl_drv.c
@@ -206,16 +206,14 @@ static int qxl_pm_resume(struct device *dev)
 
 static int qxl_pm_thaw(struct device *dev)
 {
-   struct pci_dev *pdev = to_pci_dev(dev);
-   struct drm_device *drm_dev = pci_get_drvdata(pdev);
+   struct drm_device *drm_dev = dev_get_drvdata(dev);
 
return qxl_drm_resume(drm_dev, true);
 }
 
 static int qxl_pm_freeze(struct device *dev)
 {
-   struct pci_dev *pdev = to_pci_dev(dev);
-   struct drm_device *drm_dev = pci_get_drvdata(pdev);
+   struct drm_device *drm_dev = dev_get_drvdata(dev);
 
return qxl_drm_freeze(drm_dev);
 }
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] drm/bochs: Use dev_get_drvdata

2019-07-24 Thread Chuhong Yuan
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.

Signed-off-by: Chuhong Yuan 
---
 drivers/gpu/drm/bochs/bochs_drv.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/bochs/bochs_drv.c 
b/drivers/gpu/drm/bochs/bochs_drv.c
index 8f3a5bda9d03..d8a50200408c 100644
--- a/drivers/gpu/drm/bochs/bochs_drv.c
+++ b/drivers/gpu/drm/bochs/bochs_drv.c
@@ -83,16 +83,14 @@ static struct drm_driver bochs_driver = {
 #ifdef CONFIG_PM_SLEEP
 static int bochs_pm_suspend(struct device *dev)
 {
-   struct pci_dev *pdev = to_pci_dev(dev);
-   struct drm_device *drm_dev = pci_get_drvdata(pdev);
+   struct drm_device *drm_dev = dev_get_drvdata(dev);
 
return drm_mode_config_helper_suspend(drm_dev);
 }
 
 static int bochs_pm_resume(struct device *dev)
 {
-   struct pci_dev *pdev = to_pci_dev(dev);
-   struct drm_device *drm_dev = pci_get_drvdata(pdev);
+   struct drm_device *drm_dev = dev_get_drvdata(dev);
 
return drm_mode_config_helper_resume(drm_dev);
 }
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] iommu/virtio: Update to most recent specification

2019-07-24 Thread Jean-Philippe Brucker
Following specification review a few things were changed in v8 of the
virtio-iommu series [1], but have been omitted when merging the base
driver. Add them now:

* Remove the EXEC flag.
* Add feature bit for the MMIO flag.
* Change domain_bits to domain_range.
* Add NOMEM status flag.

[1] 
https://lore.kernel.org/linux-iommu/20190530170929.19366-1-jean-philippe.bruc...@arm.com/

Fixes: edcd69ab9a32 ("iommu: Add virtio-iommu driver")
Reported-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c  | 40 ++-
 include/uapi/linux/virtio_iommu.h | 32 ++---
 2 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 433f4d2ee956..80a740df0737 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -2,7 +2,7 @@
 /*
  * Virtio driver for the paravirtualized IOMMU
  *
- * Copyright (C) 2018 Arm Limited
+ * Copyright (C) 2019 Arm Limited
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -47,7 +47,10 @@ struct viommu_dev {
/* Device configuration */
struct iommu_domain_geometrygeometry;
u64 pgsize_bitmap;
-   u8  domain_bits;
+   u32 first_domain;
+   u32 last_domain;
+   /* Supported MAP flags */
+   u32 map_flags;
u32 probe_size;
 };
 
@@ -62,6 +65,7 @@ struct viommu_domain {
struct viommu_dev   *viommu;
struct mutexmutex; /* protects viommu pointer */
unsigned intid;
+   u32 map_flags;
 
spinlock_t  mappings_lock;
struct rb_root_cached   mappings;
@@ -113,6 +117,8 @@ static int viommu_get_req_errno(void *buf, size_t len)
return -ENOENT;
case VIRTIO_IOMMU_S_FAULT:
return -EFAULT;
+   case VIRTIO_IOMMU_S_NOMEM:
+   return -ENOMEM;
case VIRTIO_IOMMU_S_IOERR:
case VIRTIO_IOMMU_S_DEVERR:
default:
@@ -607,15 +613,15 @@ static int viommu_domain_finalise(struct viommu_dev 
*viommu,
 {
int ret;
struct viommu_domain *vdomain = to_viommu_domain(domain);
-   unsigned int max_domain = viommu->domain_bits > 31 ? ~0 :
- (1U << viommu->domain_bits) - 1;
 
vdomain->viommu = viommu;
+   vdomain->map_flags  = viommu->map_flags;
 
domain->pgsize_bitmap   = viommu->pgsize_bitmap;
domain->geometry= viommu->geometry;
 
-   ret = ida_alloc_max(>domain_ids, max_domain, GFP_KERNEL);
+   ret = ida_alloc_range(>domain_ids, viommu->first_domain,
+ viommu->last_domain, GFP_KERNEL);
if (ret >= 0)
vdomain->id = (unsigned int)ret;
 
@@ -710,7 +716,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
  phys_addr_t paddr, size_t size, int prot)
 {
int ret;
-   int flags;
+   u32 flags;
struct virtio_iommu_req_map map;
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
@@ -718,6 +724,9 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
(prot & IOMMU_WRITE ? VIRTIO_IOMMU_MAP_F_WRITE : 0) |
(prot & IOMMU_MMIO ? VIRTIO_IOMMU_MAP_F_MMIO : 0);
 
+   if (flags & ~vdomain->map_flags)
+   return -EINVAL;
+
ret = viommu_add_mapping(vdomain, iova, paddr, size, flags);
if (ret)
return ret;
@@ -1027,7 +1036,8 @@ static int viommu_probe(struct virtio_device *vdev)
goto err_free_vqs;
}
 
-   viommu->domain_bits = 32;
+   viommu->map_flags = VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE;
+   viommu->last_domain = ~0U;
 
/* Optional features */
virtio_cread_feature(vdev, VIRTIO_IOMMU_F_INPUT_RANGE,
@@ -1038,9 +1048,13 @@ static int viommu_probe(struct virtio_device *vdev)
 struct virtio_iommu_config, input_range.end,
 _end);
 
-   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_BITS,
-struct virtio_iommu_config, domain_bits,
->domain_bits);
+   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_RANGE,
+struct virtio_iommu_config, domain_range.start,
+>first_domain);
+
+   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_RANGE,
+struct virtio_iommu_config, domain_range.end,
+>last_domain);
 
virtio_cread_feature(vdev, VIRTIO_IOMMU_F_PROBE,
 struct virtio_iommu_config, 

[PATCH] MAINTAINERS: Update my email address

2019-07-24 Thread Jean-Philippe Brucker
Update MAINTAINERS and .mailmap with my @linaro.org address, since I
don't have access to my @arm.com address anymore.

Signed-off-by: Jean-Philippe Brucker 
---
 .mailmap| 1 +
 MAINTAINERS | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/.mailmap b/.mailmap
index 0fef932de3db..8ce554b9c9f1 100644
--- a/.mailmap
+++ b/.mailmap
@@ -98,6 +98,7 @@ Jason Gunthorpe  

 Javi Merino  
  
 Jean Tourrilhes 
+ 
 Jeff Garzik 
 Jeff Layton  
 Jeff Layton  
diff --git a/MAINTAINERS b/MAINTAINERS
index 783569e3c4b4..bded78c84701 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17123,7 +17123,7 @@ F:  drivers/virtio/virtio_input.c
 F: include/uapi/linux/virtio_input.h
 
 VIRTIO IOMMU DRIVER
-M: Jean-Philippe Brucker 
+M: Jean-Philippe Brucker 
 L: virtualization@lists.linux-foundation.org
 S: Maintained
 F: drivers/iommu/virtio-iommu.c
-- 
2.22.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-net: parameterize min ring num_free for virtio receive

2019-07-24 Thread 冉 jiang

On 2019/7/19 22:29, Jiang wrote:
>
> On 2019/7/19 10:36, Jason Wang wrote:
>>
>> On 2019/7/18 下午10:43, Michael S. Tsirkin wrote:
>>> On Thu, Jul 18, 2019 at 10:42:47AM -0400, Michael S. Tsirkin wrote:
 On Thu, Jul 18, 2019 at 10:01:05PM +0800, Jason Wang wrote:
> On 2019/7/18 下午9:04, Michael S. Tsirkin wrote:
>> On Thu, Jul 18, 2019 at 12:55:50PM +, ? jiang wrote:
>>> This change makes ring buffer reclaim threshold num_free 
>>> configurable
>>> for better performance, while it's hard coded as 1/2 * queue now.
>>> According to our test with qemu + dpdk, packet dropping happens 
>>> when
>>> the guest is not able to provide free buffer in avail ring timely.
>>> Smaller value of num_free does decrease the number of packet 
>>> dropping
>>> during our test as it makes virtio_net reclaim buffer earlier.
>>>
>>> At least, we should leave the value changeable to user while the
>>> default value as 1/2 * queue is kept.
>>>
>>> Signed-off-by: jiangkidd
>> That would be one reason, but I suspect it's not the
>> true one. If you need more buffer due to jitter
>> then just increase the queue size. Would be cleaner.
>>
>>
>> However are you sure this is the reason for
>> packet drops? Do you see them dropped by dpdk
>> due to lack of space in the ring? As opposed to
>> by guest?
>>
>>
> Besides those, this patch depends on the user to choose a suitable 
> threshold
> which is not good. You need either a good value with demonstrated 
> numbers or
> something smarter.
>
> Thanks
 I do however think that we have a problem right now: try_fill_recv can
 take up a long time during which net stack does not run at all. 
 Imagine
 a 1K queue - we are talking 512 packets. That's exceessive.
>>
>>
>> Yes, we will starve a fast host in this case.
>>
>>
    napi poll
 weight solves a similar problem, so it might make sense to cap this at
 napi_poll_weight.

 Which will allow tweaking it through a module parameter as a
 side effect :) Maybe just do NAPI_POLL_WEIGHT.
>>> Or maybe NAPI_POLL_WEIGHT/2 like we do at half the queue ;). Please
>>> experiment, measure performance and let the list know
>>>
 Need to be careful though: queues can also be small and I don't 
 think we
 want to exceed queue size / 2, or maybe queue size - napi_poll_weight.
 Definitely must not exceed the full queue size.
>>
>>
>> Looking at intel, it uses 16 and i40e uses 32.  It looks to me 
>> NAPI_POLL_WEIGHT/2 is better.
>>
>> Jiang, want to try that and post a new patch?
>>
>> Thanks
>>
>>

 -- 
 MST
>
> We did have completed several rounds of test with setting the value to 
> budget (64 as the default value). It does improve a lot with pps is 
> below 400pps for a single stream. Let me consolidate the data and will 
> send it soon. Actually, we are confident that it runs out of free 
> buffer in avail ring when packet dropping happens with below systemtap:
>
> Just a snippet:
>
> probe module("virtio_ring").function("virtqueue_get_buf")
> {
>     x = (@cast($_vq, "vring_virtqueue")->vring->used->idx)- 
> (@cast($_vq, "vring_virtqueue")->last_used_idx) ---> we use this one 
> to verify if the queue is full, which means guest is not able to take 
> buffer from the queue timely
>
>     if (x<0 && (x+65535)<4096)
>         x = x+65535
>
>     if((x==1024) && @cast($_vq, "vring_virtqueue")->vq->callback == 
> callback_addr)
>         netrxcount[x] <<< gettimeofday_s()
> }
>
>
> probe module("virtio_ring").function("virtqueue_add_inbuf")
> {
>     y = (@cast($vq, "vring_virtqueue")->vring->avail->idx)- 
> (@cast($vq, "vring_virtqueue")->vring->used->idx) ---> we use this one 
> to verify if we run out of free buffer in avail ring
>     if (y<0 && (y+65535)<4096)
>         y = y+65535
>
>     if(@2=="debugon")
>     {
>         if(y==0 && @cast($vq, "vring_virtqueue")->vq->callback == 
> callback_addr)
>         {
>             netrxfreecount[y] <<< gettimeofday_s()
>
>             printf("no avail ring left seen, printing most recent 5 
> num free, vq: %lx, current index: %d\n", $vq, recentfreecount)
>             for(i=recentfreecount; i!=((recentfreecount+4) % 5); 
> i=((i+1) % 5))
>             {
>                 printf("index: %d, num free: %d\n", i, recentfree[$vq, 
> i])
>             }
>
>             printf("index: %d, num free: %d\n", i, recentfree[$vq, i])
>             //exit()
>         }
>     }
> }
>
>
> probe 
> module("virtio_net").statement("virtnet_receive@drivers/net/virtio_net.c:732")
> {
>     recentfreecount++
>     recentfreecount = recentfreecount % 5
>     recentfree[$rq->vq, recentfreecount] = $rq->vq->num_free ---> 
> record the num_free for the last 5 calls to virtnet_receive, so we can 
> see if lowering the bar helps.
> }
>
>
> Here is the result:
>
> no avail ring left seen, printing most recent 5 num free, vq: 
> 

Re: [PATCH] virtio-net: parameterize min ring num_free for virtio receive

2019-07-24 Thread 冉 jiang

On 2019/7/19 10:36, Jason Wang wrote:
>
> On 2019/7/18 下午10:43, Michael S. Tsirkin wrote:
>> On Thu, Jul 18, 2019 at 10:42:47AM -0400, Michael S. Tsirkin wrote:
>>> On Thu, Jul 18, 2019 at 10:01:05PM +0800, Jason Wang wrote:
 On 2019/7/18 下午9:04, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2019 at 12:55:50PM +, ? jiang wrote:
>> This change makes ring buffer reclaim threshold num_free 
>> configurable
>> for better performance, while it's hard coded as 1/2 * queue now.
>> According to our test with qemu + dpdk, packet dropping happens when
>> the guest is not able to provide free buffer in avail ring timely.
>> Smaller value of num_free does decrease the number of packet 
>> dropping
>> during our test as it makes virtio_net reclaim buffer earlier.
>>
>> At least, we should leave the value changeable to user while the
>> default value as 1/2 * queue is kept.
>>
>> Signed-off-by: jiangkidd
> That would be one reason, but I suspect it's not the
> true one. If you need more buffer due to jitter
> then just increase the queue size. Would be cleaner.
>
>
> However are you sure this is the reason for
> packet drops? Do you see them dropped by dpdk
> due to lack of space in the ring? As opposed to
> by guest?
>
>
 Besides those, this patch depends on the user to choose a suitable 
 threshold
 which is not good. You need either a good value with demonstrated 
 numbers or
 something smarter.

 Thanks
>>> I do however think that we have a problem right now: try_fill_recv can
>>> take up a long time during which net stack does not run at all. Imagine
>>> a 1K queue - we are talking 512 packets. That's exceessive.
>
>
> Yes, we will starve a fast host in this case.
>
>
>>>    napi poll
>>> weight solves a similar problem, so it might make sense to cap this at
>>> napi_poll_weight.
>>>
>>> Which will allow tweaking it through a module parameter as a
>>> side effect :) Maybe just do NAPI_POLL_WEIGHT.
>> Or maybe NAPI_POLL_WEIGHT/2 like we do at half the queue ;). Please
>> experiment, measure performance and let the list know
>>
>>> Need to be careful though: queues can also be small and I don't 
>>> think we
>>> want to exceed queue size / 2, or maybe queue size - napi_poll_weight.
>>> Definitely must not exceed the full queue size.
>
>
> Looking at intel, it uses 16 and i40e uses 32.  It looks to me 
> NAPI_POLL_WEIGHT/2 is better.
>
> Jiang, want to try that and post a new patch?
>
> Thanks
>
>
>>>
>>> -- 
>>> MST

We did have completed several rounds of test with setting the value to 
budget (64 as the default value). It does improve a lot with pps is 
below 400pps for a single stream. Let me consolidate the data and will 
send it soon. Actually, we are confident that it runs out of free buffer 
in avail ring when packet dropping happens with below systemtap:

Just a snippet:

probe module("virtio_ring").function("virtqueue_get_buf")
{
     x = (@cast($_vq, "vring_virtqueue")->vring->used->idx)- 
(@cast($_vq, "vring_virtqueue")->last_used_idx) ---> we use this one to 
verify if the queue is full, which means guest is not able to take 
buffer from the queue timely

     if (x<0 && (x+65535)<4096)
         x = x+65535

     if((x==1024) && @cast($_vq, "vring_virtqueue")->vq->callback == 
callback_addr)
         netrxcount[x] <<< gettimeofday_s()
}


probe module("virtio_ring").function("virtqueue_add_inbuf")
{
     y = (@cast($vq, "vring_virtqueue")->vring->avail->idx)- (@cast($vq, 
"vring_virtqueue")->vring->used->idx) ---> we use this one to verify if 
we run out of free buffer in avail ring
     if (y<0 && (y+65535)<4096)
         y = y+65535

     if(@2=="debugon")
     {
         if(y==0 && @cast($vq, "vring_virtqueue")->vq->callback == 
callback_addr)
         {
             netrxfreecount[y] <<< gettimeofday_s()

             printf("no avail ring left seen, printing most recent 5 num 
free, vq: %lx, current index: %d\n", $vq, recentfreecount)
             for(i=recentfreecount; i!=((recentfreecount+4) % 5); 
i=((i+1) % 5))
             {
                 printf("index: %d, num free: %d\n", i, recentfree[$vq, i])
             }

             printf("index: %d, num free: %d\n", i, recentfree[$vq, i])
             //exit()
         }
     }
}


probe 
module("virtio_net").statement("virtnet_receive@drivers/net/virtio_net.c:732")
{
     recentfreecount++
     recentfreecount = recentfreecount % 5
     recentfree[$rq->vq, recentfreecount] = $rq->vq->num_free ---> 
record the num_free for the last 5 calls to virtnet_receive, so we can 
see if lowering the bar helps.
}


Here is the result:

no avail ring left seen, printing most recent 5 num free, vq: 
9c13c120, current index: 1
index: 1, num free: 561
index: 2, num free: 305
index: 3, num free: 369
index: 4, num free: 433
index: 0, num free: 497
no avail ring left seen, printing most recent 5 num free, vq: 

Re: [PATCH v4 2/2] balloon: fix up comments

2019-07-24 Thread Ralph Campbell



On 7/18/19 7:01 AM, Michael S. Tsirkin wrote:

Lots of comments bitrotted. Fix them up.

Fixes: 418a3ab1e778 (mm/balloon_compaction: List interfaces)
Reviewed-by: Wei Wang 
Signed-off-by: Michael S. Tsirkin 
---

fixes since v3:
teaks suggested by Wei

  mm/balloon_compaction.c | 71 ++---
  1 file changed, 39 insertions(+), 32 deletions(-)

diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index d25664e1857b..7e95d2cd185a 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -32,10 +32,10 @@ static void balloon_page_enqueue_one(struct 
balloon_dev_info *b_dev_info,
   * @b_dev_info: balloon device descriptor where we will insert a new page to
   * @pages: pages to enqueue - allocated using balloon_page_alloc.
   *
- * Driver must call it to properly enqueue a balloon pages before definitively
- * removing it from the guest system.
+ * Driver must call this function to properly enqueue balloon pages before
+ * definitively removing them from the guest system.
   *
- * Return: number of pages that were enqueued.
+ * Returns: number of pages that were enqueued.


According to Documentation/doc-guide/kernel-doc.rst,
this is going in the wrong direction and "Return:" is correct.
Ditto for other occurrences below.


   */
  size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
 struct list_head *pages)
@@ -63,14 +63,15 @@ EXPORT_SYMBOL_GPL(balloon_page_list_enqueue);
   * @n_req_pages: number of requested pages.
   *
   * Driver must call this function to properly de-allocate a previous enlisted
- * balloon pages before definetively releasing it back to the guest system.
+ * balloon pages before definitively releasing it back to the guest system.
   * This function tries to remove @n_req_pages from the ballooned pages and
   * return them to the caller in the @pages list.
   *
- * Note that this function may fail to dequeue some pages temporarily empty due
- * to compaction isolated pages.
+ * Note that this function may fail to dequeue some pages even if the balloon
+ * isn't empty - since the page list can be temporarily empty due to compaction
+ * of isolated pages.
   *
- * Return: number of pages that were added to the @pages list.
+ * Returns: number of pages that were added to the @pages list.
   */
  size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
 struct list_head *pages, size_t n_req_pages)
@@ -112,12 +113,13 @@ EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
  
  /*

   * balloon_page_alloc - allocates a new page for insertion into the balloon
- *   page list.
+ * page list.
   *
- * Driver must call it to properly allocate a new enlisted balloon page.
- * Driver must call balloon_page_enqueue before definitively removing it from
- * the guest system.  This function returns the page address for the recently
- * allocated page or NULL in the case we fail to allocate a new page this turn.
+ * Driver must call this function to properly allocate a new balloon page.
+ * Driver must call balloon_page_enqueue before definitively removing the page
+ * from the guest system.
+ *
+ * Returns: struct page for the allocated page or NULL on allocation failure.
   */
  struct page *balloon_page_alloc(void)
  {
@@ -130,19 +132,15 @@ EXPORT_SYMBOL_GPL(balloon_page_alloc);
  /*
   * balloon_page_enqueue - inserts a new page into the balloon page list.
   *
- * @b_dev_info: balloon device descriptor where we will insert a new page to
+ * @b_dev_info: balloon device descriptor where we will insert a new page
   * @page: new page to enqueue - allocated using balloon_page_alloc.
   *
- * Driver must call it to properly enqueue a new allocated balloon page
- * before definitively removing it from the guest system.
+ * Drivers must call this function to properly enqueue a new allocated balloon
+ * page before definitively removing the page from the guest system.
   *
- * Drivers must not call balloon_page_enqueue on pages that have been
- * pushed to a list with balloon_page_push before removing them with
- * balloon_page_pop. To all pages on a list, use balloon_page_list_enqueue
- * instead.
- *
- * This function returns the page address for the recently enqueued page or
- * NULL in the case we fail to allocate a new page this turn.
+ * Drivers must not call balloon_page_enqueue on pages that have been pushed to
+ * a list with balloon_page_push before removing them with balloon_page_pop. To
+ * enqueue a list of pages, use balloon_page_list_enqueue instead.
   */
  void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
  struct page *page)
@@ -157,14 +155,23 @@ EXPORT_SYMBOL_GPL(balloon_page_enqueue);
  
  /*

   * balloon_page_dequeue - removes a page from balloon's page list and returns
- *   the its address to allow the driver release the page.
+ *

[PATCH] virtio-net: parameterize min ring num_free for virtio receive

2019-07-24 Thread ? jiang
This change makes ring buffer reclaim threshold num_free configurable for 
better performance, while it's hard coded as 1/2 * queue now.
According to our test with qemu + dpdk, packet dropping happens when the guest 
is not able to provide free buffer in avail ring timely.
Smaller value of num_free does decrease the number of packet dropping during 
our test as it makes virtio_net reclaim buffer earlier.

At least, we should leave the value changeable to user while the default value 
as 1/2 * queue is kept.

Signed-off-by: jiangkidd 
---
 drivers/net/virtio_net.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0d4115c9e20b..bc190dec6084 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,6 +26,9 @@
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
 
+static int min_numfree;
+module_param(min_numfree, int, 0444);
+
 static bool csum = true, gso = true, napi_tx;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
@@ -1315,6 +1318,9 @@ static int virtnet_receive(struct receive_queue *rq, int 
budget,
void *buf;
int i;
 
+   if (!min_numfree)
+   min_numfree = virtqueue_get_vring_size(rq->vq) / 2;
+
if (!vi->big_packets || vi->mergeable_rx_bufs) {
void *ctx;
 
@@ -1331,7 +1337,7 @@ static int virtnet_receive(struct receive_queue *rq, int 
budget,
}
}
 
-   if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) {
+   if (rq->vq->num_free > min_numfree) {
if (!try_fill_recv(vi, rq, GFP_ATOMIC))
schedule_delayed_work(>refill, 0);
}
-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 11/12] Documentation/x86: repointer docs to Documentation/arch/

2019-07-24 Thread Alex Shi
Since we move Documentation/x86 docs to Documentation/arch/x86
dir, redirect the doc pointer to them.

Signed-off-by: Alex Shi 
Cc: Jonathan Corbet 
Cc: Tony Luck 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: Peter Zijlstra 
Cc: Changbin Du 
Cc: linux-...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: xen-de...@lists.xenproject.org
Cc: platform-driver-...@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: net...@vger.kernel.org
Cc: linux-security-mod...@vger.kernel.org
---
 Documentation/admin-guide/hw-vuln/mds.rst|  2 +-
 Documentation/admin-guide/kernel-parameters.rst  |  6 +++---
 Documentation/admin-guide/kernel-parameters.txt  |  8 
 Documentation/admin-guide/ras.rst|  2 +-
 Documentation/arch/x86/x86_64/5level-paging.rst  |  2 +-
 Documentation/arch/x86/x86_64/boot-options.rst   |  4 ++--
 .../arch/x86/x86_64/fake-numa-for-cpusets.rst|  2 +-
 Documentation/devicetree/booting-without-of.txt  |  2 +-
 Documentation/sysctl/kernel.txt  |  4 ++--
 MAINTAINERS  |  4 ++--
 arch/arm/Kconfig |  2 +-
 arch/x86/Kconfig | 12 ++--
 arch/x86/Kconfig.debug   |  2 +-
 arch/x86/boot/header.S   |  2 +-
 arch/x86/entry/entry_64.S|  2 +-
 arch/x86/include/asm/bootparam_utils.h   |  2 +-
 arch/x86/include/asm/page_64_types.h |  2 +-
 arch/x86/include/asm/pgtable_64_types.h  |  2 +-
 arch/x86/kernel/cpu/microcode/amd.c  |  2 +-
 arch/x86/kernel/kexec-bzimage64.c|  2 +-
 arch/x86/kernel/pci-dma.c|  2 +-
 arch/x86/mm/tlb.c|  2 +-
 arch/x86/platform/pvh/enlighten.c|  2 +-
 drivers/vhost/vhost.c|  2 +-
 security/Kconfig |  2 +-
 tools/include/linux/err.h|  2 +-
 tools/objtool/Documentation/stack-validation.txt |  4 ++--
 27 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/mds.rst 
b/Documentation/admin-guide/hw-vuln/mds.rst
index e3a796c0d3a2..303228380fdc 100644
--- a/Documentation/admin-guide/hw-vuln/mds.rst
+++ b/Documentation/admin-guide/hw-vuln/mds.rst
@@ -58,7 +58,7 @@ Because the buffers are potentially shared between 
Hyper-Threads cross
 Hyper-Thread attacks are possible.
 
 Deeper technical information is available in the MDS specific x86
-architecture section: :ref:`Documentation/x86/mds.rst `.
+architecture section: :ref:`Documentation/arch/x86/mds.rst `.
 
 
 Attack scenarios
diff --git a/Documentation/admin-guide/kernel-parameters.rst 
b/Documentation/admin-guide/kernel-parameters.rst
index dc283dcffae8..7c32484811c8 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -167,7 +167,7 @@ parameter is applicable::
X86-32  X86-32, aka i386 architecture is enabled.
X86-64  X86-64 architecture is enabled.
More X86-64 boot options can be found in
-   Documentation/x86/x86_64/boot-options.rst.
+   Documentation/arch/x86/x86_64/boot-options.rst.
X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
X86_UV  SGI UV support is enabled.
XEN Xen support is enabled
@@ -181,10 +181,10 @@ In addition, the following text indicates that the 
option::
 Parameters denoted with BOOT are actually interpreted by the boot
 loader, and have no meaning to the kernel directly.
 Do not modify the syntax of boot loader parameters without extreme
-need or coordination with .
+need or coordination with .
 
 There are also arch-specific kernel-parameters not documented here.
-See for example .
+See for example .
 
 Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
 a trailing = on the name of any parameter states that that parameter will
diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 4ceb4691245b..d9eb5895ea9e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -963,7 +963,7 @@
for details.
 
nompx   [X86] Disables Intel Memory Protection Extensions.
-   See Documentation/x86/intel_mpx.rst for more
+   See Documentation/arch/x86/intel_mpx.rst for more
information about the feature.
 
nopku   [X86] Disable Memory Protection Keys CPU feature found
@@ -2380,7 +2380,7 @@
 
mce [X86-32] Machine Check Exception
 
-   mce=option  

Re: [PATCH v2 06/35] crypto: Use kmemdup rather than duplicating its implementation

2019-07-24 Thread Horia Geanta
On 7/3/2019 7:27 PM, Fuqian Huang wrote:
> kmemdup is introduced to duplicate a region of memory in a neat way.
> Rather than kmalloc/kzalloc + memcpy, which the programmer needs to
> write the size twice (sometimes lead to mistakes), kmemdup improves
> readability, leads to smaller code and also reduce the chances of mistakes.
> Suggestion to use kmemdup rather than using kmalloc/kzalloc + memcpy.
> 
> Signed-off-by: Fuqian Huang 
Reviewed-by: Horia Geantă 

Thanks,
Horia
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 06/35] crypto: Use kmemdup rather than duplicating its implementation

2019-07-24 Thread Fuqian Huang
kmemdup is introduced to duplicate a region of memory in a neat way.
Rather than kmalloc/kzalloc + memcpy, which the programmer needs to
write the size twice (sometimes lead to mistakes), kmemdup improves
readability, leads to smaller code and also reduce the chances of mistakes.
Suggestion to use kmemdup rather than using kmalloc/kzalloc + memcpy.

Signed-off-by: Fuqian Huang 
---
Changes in v2:
  - Fix a typo in commit message (memset -> memcpy)

 drivers/crypto/caam/caampkc.c  | 11 +++
 drivers/crypto/virtio/virtio_crypto_algs.c |  4 +---
 2 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/caam/caampkc.c b/drivers/crypto/caam/caampkc.c
index fe24485274e1..a03464b4c019 100644
--- a/drivers/crypto/caam/caampkc.c
+++ b/drivers/crypto/caam/caampkc.c
@@ -816,7 +816,7 @@ static int caam_rsa_set_pub_key(struct crypto_akcipher 
*tfm, const void *key,
return ret;
 
/* Copy key in DMA zone */
-   rsa_key->e = kzalloc(raw_key.e_sz, GFP_DMA | GFP_KERNEL);
+   rsa_key->e = kmemdup(raw_key.e, raw_key.e_sz, GFP_DMA | GFP_KERNEL);
if (!rsa_key->e)
goto err;
 
@@ -838,8 +838,6 @@ static int caam_rsa_set_pub_key(struct crypto_akcipher 
*tfm, const void *key,
rsa_key->e_sz = raw_key.e_sz;
rsa_key->n_sz = raw_key.n_sz;
 
-   memcpy(rsa_key->e, raw_key.e, raw_key.e_sz);
-
return 0;
 err:
caam_rsa_free_key(rsa_key);
@@ -920,11 +918,11 @@ static int caam_rsa_set_priv_key(struct crypto_akcipher 
*tfm, const void *key,
return ret;
 
/* Copy key in DMA zone */
-   rsa_key->d = kzalloc(raw_key.d_sz, GFP_DMA | GFP_KERNEL);
+   rsa_key->d = kmemdup(raw_key.d, raw_key.d_sz, GFP_DMA | GFP_KERNEL);
if (!rsa_key->d)
goto err;
 
-   rsa_key->e = kzalloc(raw_key.e_sz, GFP_DMA | GFP_KERNEL);
+   rsa_key->e = kmemdup(raw_key.e, raw_key.e_sz, GFP_DMA | GFP_KERNEL);
if (!rsa_key->e)
goto err;
 
@@ -947,9 +945,6 @@ static int caam_rsa_set_priv_key(struct crypto_akcipher 
*tfm, const void *key,
rsa_key->e_sz = raw_key.e_sz;
rsa_key->n_sz = raw_key.n_sz;
 
-   memcpy(rsa_key->d, raw_key.d, raw_key.d_sz);
-   memcpy(rsa_key->e, raw_key.e, raw_key.e_sz);
-
caam_rsa_set_priv_key_form(ctx, _key);
 
return 0;
diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c 
b/drivers/crypto/virtio/virtio_crypto_algs.c
index 10f266d462d6..42d19205166b 100644
--- a/drivers/crypto/virtio/virtio_crypto_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_algs.c
@@ -129,13 +129,11 @@ static int virtio_crypto_alg_ablkcipher_init_session(
 * Avoid to do DMA from the stack, switch to using
 * dynamically-allocated for the key
 */
-   uint8_t *cipher_key = kmalloc(keylen, GFP_ATOMIC);
+   uint8_t *cipher_key = kmemdup(key, keylen, GFP_ATOMIC);
 
if (!cipher_key)
return -ENOMEM;
 
-   memcpy(cipher_key, key, keylen);
-
spin_lock(>ctrl_lock);
/* Pad ctrl header */
vcrypto->ctrl.header.opcode =
-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 05/30] crypto: Use kmemdup rather than duplicating its implementation

2019-07-24 Thread Fuqian Huang
kmemdup is introduced to duplicate a region of memory in a neat way.
Rather than kmalloc/kzalloc + memset, which the programmer needs to
write the size twice (sometimes lead to mistakes), kmemdup improves
readability, leads to smaller code and also reduce the chances of mistakes.
Suggestion to use kmemdup rather than using kmalloc/kzalloc + memset.

Signed-off-by: Fuqian Huang 
---
 drivers/crypto/caam/caampkc.c  | 11 +++
 drivers/crypto/virtio/virtio_crypto_algs.c |  4 +---
 2 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/caam/caampkc.c b/drivers/crypto/caam/caampkc.c
index fe24485274e1..a03464b4c019 100644
--- a/drivers/crypto/caam/caampkc.c
+++ b/drivers/crypto/caam/caampkc.c
@@ -816,7 +816,7 @@ static int caam_rsa_set_pub_key(struct crypto_akcipher 
*tfm, const void *key,
return ret;
 
/* Copy key in DMA zone */
-   rsa_key->e = kzalloc(raw_key.e_sz, GFP_DMA | GFP_KERNEL);
+   rsa_key->e = kmemdup(raw_key.e, raw_key.e_sz, GFP_DMA | GFP_KERNEL);
if (!rsa_key->e)
goto err;
 
@@ -838,8 +838,6 @@ static int caam_rsa_set_pub_key(struct crypto_akcipher 
*tfm, const void *key,
rsa_key->e_sz = raw_key.e_sz;
rsa_key->n_sz = raw_key.n_sz;
 
-   memcpy(rsa_key->e, raw_key.e, raw_key.e_sz);
-
return 0;
 err:
caam_rsa_free_key(rsa_key);
@@ -920,11 +918,11 @@ static int caam_rsa_set_priv_key(struct crypto_akcipher 
*tfm, const void *key,
return ret;
 
/* Copy key in DMA zone */
-   rsa_key->d = kzalloc(raw_key.d_sz, GFP_DMA | GFP_KERNEL);
+   rsa_key->d = kmemdup(raw_key.d, raw_key.d_sz, GFP_DMA | GFP_KERNEL);
if (!rsa_key->d)
goto err;
 
-   rsa_key->e = kzalloc(raw_key.e_sz, GFP_DMA | GFP_KERNEL);
+   rsa_key->e = kmemdup(raw_key.e, raw_key.e_sz, GFP_DMA | GFP_KERNEL);
if (!rsa_key->e)
goto err;
 
@@ -947,9 +945,6 @@ static int caam_rsa_set_priv_key(struct crypto_akcipher 
*tfm, const void *key,
rsa_key->e_sz = raw_key.e_sz;
rsa_key->n_sz = raw_key.n_sz;
 
-   memcpy(rsa_key->d, raw_key.d, raw_key.d_sz);
-   memcpy(rsa_key->e, raw_key.e, raw_key.e_sz);
-
caam_rsa_set_priv_key_form(ctx, _key);
 
return 0;
diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c 
b/drivers/crypto/virtio/virtio_crypto_algs.c
index 10f266d462d6..42d19205166b 100644
--- a/drivers/crypto/virtio/virtio_crypto_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_algs.c
@@ -129,13 +129,11 @@ static int virtio_crypto_alg_ablkcipher_init_session(
 * Avoid to do DMA from the stack, switch to using
 * dynamically-allocated for the key
 */
-   uint8_t *cipher_key = kmalloc(keylen, GFP_ATOMIC);
+   uint8_t *cipher_key = kmemdup(key, keylen, GFP_ATOMIC);
 
if (!cipher_key)
return -ENOMEM;
 
-   memcpy(cipher_key, key, keylen);
-
spin_lock(>ctrl_lock);
/* Pad ctrl header */
vcrypto->ctrl.header.opcode =
-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] virtio-mmio: add error check for platform_get_irq

2019-07-24 Thread Ihor Matushchak
in vm_find_vqs() irq has a wrong type
so, in case of no IRQ resource defined,
wrong parameter will be passed to request_irq()

Signed-off-by: Ihor Matushchak 
---
 drivers/virtio/virtio_mmio.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index f363fbeb5ab0..60dde8ed163b 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -463,9 +463,14 @@ static int vm_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
   struct irq_affinity *desc)
 {
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
-   unsigned int irq = platform_get_irq(vm_dev->pdev, 0);
+   int irq = platform_get_irq(vm_dev->pdev, 0);
int i, err, queue_idx = 0;
 
+   if (irq < 0) {
+   dev_err(>dev, "no IRQ resource defined\n");
+   return -ENODEV;
+   }
+
err = request_irq(irq, vm_interrupt, IRQF_SHARED,
dev_name(>dev), vm_dev);
if (err)
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: What are all the different virtualization solutions for Linux?

2019-07-24 Thread iggy
Saying `kvm` is a pretty loaded term. There are at least 5 frontends for it 
that I know of (qemu, firecracker, nemu, cloud-hypervisor, crosvm, etc).

Are you looking for strictly open source options?



> On Jul 2, 2019, at 5:56 AM, Turritopsis Dohrnii Teo En Ming 
>  wrote:
> 
> Good evening from Singapore,
> 
> What are all the different virtualization solutions for Linux? I can think of:
> 
> 1. open source Xen project
> 
> 2. linux-kvm
> 
> Are there any others? Is there a comprehensive list of all virtualization 
> solutions for Linux?
> 
> Thank you.
> 
> 
> 
> 
> -BEGIN EMAIL SIGNATURE-
> 
> The Gospel for all Targeted Individuals (TIs):
> 
> [The New York Times] Microwave Weapons Are Prime Suspect in Ills of
> U.S. Embassy Workers
> 
> Link: 
> https://www.nytimes.com/2018/09/01/science/sonic-attack-cuba-microwave.html
> 
> 
> 
> Singaporean Mr. Turritopsis Dohrnii Teo En Ming's Academic
> Qualifications as at 14 Feb 2019
> 
> [1] https://tdtemcerts.wordpress.com/
> 
> [2] https://tdtemcerts.blogspot.sg/
> 
> [3] https://www.scribd.com/user/270125049/Teo-En-Ming
> 
> -END EMAIL SIGNATURE-
> 
> ___
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2] virtio-mmio: add error check for platform_get_irq

2019-07-24 Thread Ihor Matushchak
in vm_find_vqs() irq has a wrong type
so, in case of no IRQ resource defined,
wrong parameter will be passed to request_irq()

Signed-off-by: Ihor Matushchak 
---
Changes in v2:
Don't overwrite error code value.

 drivers/virtio/virtio_mmio.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index f363fbeb5ab0..e09edb5c5e06 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -463,9 +463,14 @@ static int vm_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
   struct irq_affinity *desc)
 {
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
-   unsigned int irq = platform_get_irq(vm_dev->pdev, 0);
+   int irq = platform_get_irq(vm_dev->pdev, 0);
int i, err, queue_idx = 0;
 
+   if (irq < 0) {
+   dev_err(>dev, "Cannot get IRQ resource\n");
+   return irq;
+   }
+
err = request_irq(irq, vm_interrupt, IRQF_SHARED,
dev_name(>dev), vm_dev);
if (err)
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] virtio-mmio: add error check for platform_get_irq

2019-07-24 Thread Ivan T. Ivanov
Quoting Ihor Matushchak (2019-07-02 17:48:18)
> in vm_find_vqs() irq has a wrong type
> so, in case of no IRQ resource defined,
> wrong parameter will be passed to request_irq()
> 
> Signed-off-by: Ihor Matushchak 


Reviewed-by: Ivan T. Ivanov 

Thanks!

> ---
> Changes in v2:
> Don't overwrite error code value.
> 
>  drivers/virtio/virtio_mmio.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index f363fbeb5ab0..e09edb5c5e06 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -463,9 +463,14 @@ static int vm_find_vqs(struct virtio_device *vdev, 
> unsigned nvqs,
>struct irq_affinity *desc)
>  {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> -   unsigned int irq = platform_get_irq(vm_dev->pdev, 0);
> +   int irq = platform_get_irq(vm_dev->pdev, 0);
> int i, err, queue_idx = 0;
>  
> +   if (irq < 0) {
> +   dev_err(>dev, "Cannot get IRQ resource\n");
> +   return irq;
> +   }
> +
> err = request_irq(irq, vm_interrupt, IRQF_SHARED,
> dev_name(>dev), vm_dev);
> if (err)
> -- 
> 2.17.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-mmio: add error check for platform_get_irq

2019-07-24 Thread Ivan T. Ivanov


Hi,

Quoting Ihor Matushchak (2019-07-02 12:59:18)
> in vm_find_vqs() irq has a wrong type
> so, in case of no IRQ resource defined,
> wrong parameter will be passed to request_irq()
> 
> Signed-off-by: Ihor Matushchak 
> ---
>  drivers/virtio/virtio_mmio.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index f363fbeb5ab0..60dde8ed163b 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -463,9 +463,14 @@ static int vm_find_vqs(struct virtio_device *vdev, 
> unsigned nvqs,
>struct irq_affinity *desc)
>  {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> -   unsigned int irq = platform_get_irq(vm_dev->pdev, 0);
> +   int irq = platform_get_irq(vm_dev->pdev, 0);
> int i, err, queue_idx = 0;
>  
> +   if (irq < 0) {
> +   dev_err(>dev, "no IRQ resource defined\n");
> +   return -ENODEV;

Don't overwrite error code value. Just return it as it is.

Regards,
Ivan

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: memory leak in vhost_net_ioctl

2019-07-24 Thread Jeremy Sowden
On 2019-06-13, at 20:04:01 -0700, syzbot wrote:
> syzbot has tested the proposed patch but the reproducer still
> triggered crash: memory leak in batadv_tvlv_handler_register

There's already a fix for this batman leak:

  https://lore.kernel.org/netdev/17d64c058965f...@google.com/
  https://www.open-mesh.org/issues/378

>   484.626788][  T156] bond0 (unregistering): Releasing backup
>   interface bond_slave_1
> Warning: Permanently added '10.128.0.87' (ECDSA) to the list of known
> hosts.
> BUG: memory leak
> unreferenced object 0x88811d25c4c0 (size 64):
>   comm "softirq", pid 0, jiffies 4294943668 (age 434.830s)
>   hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 e0 fc 5b 20 81 88 ff ff  ..[ 
> 00 00 00 00 00 00 00 00 20 91 15 83 ff ff ff ff   ...
>   backtrace:
> [<0045bc9d>] kmemleak_alloc_recursive
> include/linux/kmemleak.h:43 [inline]
> [<0045bc9d>] slab_post_alloc_hook mm/slab.h:439 [inline]
> [<0045bc9d>] slab_alloc mm/slab.c:3326 [inline]
> [<0045bc9d>] kmem_cache_alloc_trace+0x13d/0x280
> mm/slab.c:3553
> [<197d773e>] kmalloc include/linux/slab.h:547 [inline]
> [<197d773e>] kzalloc include/linux/slab.h:742 [inline]
> [<197d773e>] batadv_tvlv_handler_register+0xae/0x140
> net/batman-adv/tvlv.c:529
> [] batadv_tt_init+0x78/0x180
> net/batman-adv/translation-table.c:4411
> [<8c50839d>] batadv_mesh_init+0x196/0x230
> net/batman-adv/main.c:208
> [<1c5a74a3>] batadv_softif_init_late+0x1ca/0x220
> net/batman-adv/soft-interface.c:861
> [<4e676cd1>] register_netdevice+0xbf/0x600
> net/core/dev.c:8635
> [<5601497b>] __rtnl_newlink+0xaca/0xb30
> net/core/rtnetlink.c:3199
> [] rtnl_newlink+0x4e/0x80
> net/core/rtnetlink.c:3245
> [] rtnetlink_rcv_msg+0x178/0x4b0
> net/core/rtnetlink.c:5214
> [<140451f6>] netlink_rcv_skb+0x61/0x170
> net/netlink/af_netlink.c:2482
> [<237e38f7>] rtnetlink_rcv+0x1d/0x30
> net/core/rtnetlink.c:5232
> [<0d47c000>] netlink_unicast_kernel
> net/netlink/af_netlink.c:1307 [inline]
> [<0d47c000>] netlink_unicast+0x1ec/0x2d0
> net/netlink/af_netlink.c:1333
> [<98503d79>] netlink_sendmsg+0x26a/0x480
> net/netlink/af_netlink.c:1922
> [<9263e868>] sock_sendmsg_nosec net/socket.c:646 [inline]
> [<9263e868>] sock_sendmsg+0x54/0x70 net/socket.c:665
> [<7791ad47>] __sys_sendto+0x148/0x1f0 net/socket.c:1958
> [] __do_sys_sendto net/socket.c:1970 [inline]
> [] __se_sys_sendto net/socket.c:1966 [inline]
> [] __x64_sys_sendto+0x2a/0x30 net/socket.c:1966
>
> BUG: memory leak
> unreferenced object 0x8881024a3340 (size 64):
>   comm "softirq", pid 0, jiffies 4294943678 (age 434.730s)
>   hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 e0 2c 66 04 81 88 ff ff  .,f.
> 00 00 00 00 00 00 00 00 20 91 15 83 ff ff ff ff   ...
>   backtrace:
> [<0045bc9d>] kmemleak_alloc_recursive
> include/linux/kmemleak.h:43 [inline]
> [<0045bc9d>] slab_post_alloc_hook mm/slab.h:439 [inline]
> [<0045bc9d>] slab_alloc mm/slab.c:3326 [inline]
> [<0045bc9d>] kmem_cache_alloc_trace+0x13d/0x280
> mm/slab.c:3553
> [<197d773e>] kmalloc include/linux/slab.h:547 [inline]
> [<197d773e>] kzalloc include/linux/slab.h:742 [inline]
> [<197d773e>] batadv_tvlv_handler_register+0xae/0x140
> net/batman-adv/tvlv.c:529
> [] batadv_tt_init+0x78/0x180
> net/batman-adv/translation-table.c:4411
> [<8c50839d>] batadv_mesh_init+0x196/0x230
> net/batman-adv/main.c:208
> [<1c5a74a3>] batadv_softif_init_late+0x1ca/0x220
> net/batman-adv/soft-interface.c:861
> [<4e676cd1>] register_netdevice+0xbf/0x600
> net/core/dev.c:8635
> [<5601497b>] __rtnl_newlink+0xaca/0xb30
> net/core/rtnetlink.c:3199
> [] rtnl_newlink+0x4e/0x80
> net/core/rtnetlink.c:3245
> [] rtnetlink_rcv_msg+0x178/0x4b0
> net/core/rtnetlink.c:5214
> [<140451f6>] netlink_rcv_skb+0x61/0x170
> net/netlink/af_netlink.c:2482
> [<237e38f7>] rtnetlink_rcv+0x1d/0x30
> net/core/rtnetlink.c:5232
> [<0d47c000>] netlink_unicast_kernel
> net/netlink/af_netlink.c:1307 [inline]
> [<0d47c000>] netlink_unicast+0x1ec/0x2d0
> net/netlink/af_netlink.c:1333
> [<98503d79>] netlink_sendmsg+0x26a/0x480
> net/netlink/af_netlink.c:1922
> [<9263e868>] sock_sendmsg_nosec net/socket.c:646 [inline]
> [<9263e868>] sock_sendmsg+0x54/0x70 net/socket.c:665
> [<7791ad47>] __sys_sendto+0x148/0x1f0 net/socket.c:1958
> [] __do_sys_sendto net/socket.c:1970 [inline]
> 

[PATCH v4 3/5] iommu/dma-iommu: Handle deferred devices

2019-07-24 Thread Tom Murphy
Handle devices which defer their attach to the iommu in the dma-iommu api

Signed-off-by: Tom Murphy 
---
 drivers/iommu/dma-iommu.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index e64dbbcde63c..f303bbe20e51 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct iommu_dma_msi_page {
struct list_headlist;
@@ -351,6 +352,21 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
return iova_reserve_iommu_regions(dev, domain);
 }
 
+static int handle_deferred_device(struct device *dev,
+   struct iommu_domain *domain)
+{
+   const struct iommu_ops *ops = domain->ops;
+
+   if (!is_kdump_kernel())
+   return 0;
+
+   if (unlikely(ops->is_attach_deferred &&
+   ops->is_attach_deferred(domain, dev)))
+   return iommu_attach_device(domain, dev);
+
+   return 0;
+}
+
 /**
  * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API
  *page flags.
@@ -462,6 +478,9 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
size_t iova_off = 0;
dma_addr_t iova;
 
+   if (unlikely(handle_deferred_device(dev, domain)))
+   return DMA_MAPPING_ERROR;
+
if (cookie->type == IOMMU_DMA_IOVA_COOKIE) {
iova_off = iova_offset(>iovad, phys);
size = iova_align(>iovad, size + iova_off);
@@ -583,6 +602,9 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
 
*dma_handle = DMA_MAPPING_ERROR;
 
+   if (unlikely(handle_deferred_device(dev, domain)))
+   return NULL;
+
min_size = alloc_sizes & -alloc_sizes;
if (min_size < PAGE_SIZE) {
min_size = PAGE_SIZE;
@@ -715,7 +737,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
int prot = dma_info_to_prot(dir, coherent, attrs);
dma_addr_t dma_handle;
 
-   dma_handle =__iommu_dma_map(dev, phys, size, prot);
+   dma_handle = __iommu_dma_map(dev, phys, size, prot);
if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
dma_handle != DMA_MAPPING_ERROR)
arch_sync_dma_for_device(dev, phys, size, dir);
@@ -825,6 +847,9 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
unsigned long mask = dma_get_seg_boundary(dev);
int i;
 
+   if (unlikely(handle_deferred_device(dev, domain)))
+   return 0;
+
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
 
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: memory leak in vhost_net_ioctl

2019-07-24 Thread syzbot

Hello,

syzbot has tested the proposed patch but the reproducer still triggered  
crash:

memory leak in batadv_tvlv_handler_register

  484.626788][  T156] bond0 (unregistering): Releasing backup interface  
bond_slave_1

Warning: Permanently added '10.128.0.87' (ECDSA) to the list of known hosts.
BUG: memory leak
unreferenced object 0x88811d25c4c0 (size 64):
  comm "softirq", pid 0, jiffies 4294943668 (age 434.830s)
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 e0 fc 5b 20 81 88 ff ff  ..[ 
00 00 00 00 00 00 00 00 20 91 15 83 ff ff ff ff   ...
  backtrace:
[<0045bc9d>] kmemleak_alloc_recursive  
include/linux/kmemleak.h:43 [inline]

[<0045bc9d>] slab_post_alloc_hook mm/slab.h:439 [inline]
[<0045bc9d>] slab_alloc mm/slab.c:3326 [inline]
[<0045bc9d>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
[<197d773e>] kmalloc include/linux/slab.h:547 [inline]
[<197d773e>] kzalloc include/linux/slab.h:742 [inline]
[<197d773e>] batadv_tvlv_handler_register+0xae/0x140  
net/batman-adv/tvlv.c:529
[] batadv_tt_init+0x78/0x180  
net/batman-adv/translation-table.c:4411
[<8c50839d>] batadv_mesh_init+0x196/0x230  
net/batman-adv/main.c:208
[<1c5a74a3>] batadv_softif_init_late+0x1ca/0x220  
net/batman-adv/soft-interface.c:861

[<4e676cd1>] register_netdevice+0xbf/0x600 net/core/dev.c:8635
[<5601497b>] __rtnl_newlink+0xaca/0xb30  
net/core/rtnetlink.c:3199

[] rtnl_newlink+0x4e/0x80 net/core/rtnetlink.c:3245
[] rtnetlink_rcv_msg+0x178/0x4b0  
net/core/rtnetlink.c:5214
[<140451f6>] netlink_rcv_skb+0x61/0x170  
net/netlink/af_netlink.c:2482

[<237e38f7>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5232
[<0d47c000>] netlink_unicast_kernel  
net/netlink/af_netlink.c:1307 [inline]
[<0d47c000>] netlink_unicast+0x1ec/0x2d0  
net/netlink/af_netlink.c:1333
[<98503d79>] netlink_sendmsg+0x26a/0x480  
net/netlink/af_netlink.c:1922

[<9263e868>] sock_sendmsg_nosec net/socket.c:646 [inline]
[<9263e868>] sock_sendmsg+0x54/0x70 net/socket.c:665
[<7791ad47>] __sys_sendto+0x148/0x1f0 net/socket.c:1958
[] __do_sys_sendto net/socket.c:1970 [inline]
[] __se_sys_sendto net/socket.c:1966 [inline]
[] __x64_sys_sendto+0x2a/0x30 net/socket.c:1966

BUG: memory leak
unreferenced object 0x8881024a3340 (size 64):
  comm "softirq", pid 0, jiffies 4294943678 (age 434.730s)
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 e0 2c 66 04 81 88 ff ff  .,f.
00 00 00 00 00 00 00 00 20 91 15 83 ff ff ff ff   ...
  backtrace:
[<0045bc9d>] kmemleak_alloc_recursive  
include/linux/kmemleak.h:43 [inline]

[<0045bc9d>] slab_post_alloc_hook mm/slab.h:439 [inline]
[<0045bc9d>] slab_alloc mm/slab.c:3326 [inline]
[<0045bc9d>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
[<197d773e>] kmalloc include/linux/slab.h:547 [inline]
[<197d773e>] kzalloc include/linux/slab.h:742 [inline]
[<197d773e>] batadv_tvlv_handler_register+0xae/0x140  
net/batman-adv/tvlv.c:529
[] batadv_tt_init+0x78/0x180  
net/batman-adv/translation-table.c:4411
[<8c50839d>] batadv_mesh_init+0x196/0x230  
net/batman-adv/main.c:208
[<1c5a74a3>] batadv_softif_init_late+0x1ca/0x220  
net/batman-adv/soft-interface.c:861

[<4e676cd1>] register_netdevice+0xbf/0x600 net/core/dev.c:8635
[<5601497b>] __rtnl_newlink+0xaca/0xb30  
net/core/rtnetlink.c:3199

[] rtnl_newlink+0x4e/0x80 net/core/rtnetlink.c:3245
[] rtnetlink_rcv_msg+0x178/0x4b0  
net/core/rtnetlink.c:5214
[<140451f6>] netlink_rcv_skb+0x61/0x170  
net/netlink/af_netlink.c:2482

[<237e38f7>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5232
[<0d47c000>] netlink_unicast_kernel  
net/netlink/af_netlink.c:1307 [inline]
[<0d47c000>] netlink_unicast+0x1ec/0x2d0  
net/netlink/af_netlink.c:1333
[<98503d79>] netlink_sendmsg+0x26a/0x480  
net/netlink/af_netlink.c:1922

[<9263e868>] sock_sendmsg_nosec net/socket.c:646 [inline]
[<9263e868>] sock_sendmsg+0x54/0x70 net/socket.c:665
[<7791ad47>] __sys_sendto+0x148/0x1f0 net/socket.c:1958
[] __do_sys_sendto net/socket.c:1970 [inline]
[] __se_sys_sendto net/socket.c:1966 [inline]
[] __x64_sys_sendto+0x2a/0x30 net/socket.c:1966

BUG: memory leak
unreferenced object 0x888108a71b80 (size 128):
  comm "syz-executor.3", pid 7367, jiffies 4294943696 (age 434.550s)
  hex dump (first 32 bytes):
f0 f8 bf 02 81 88 ff ff f0 f8 bf 

[PATCH v4 2/5] iommu: Add gfp parameter to iommu_ops::map

2019-07-24 Thread Tom Murphy
Add a gfp_t parameter to the iommu_ops::map function.
Remove the needless locking in the AMD iommu driver.

The iommu_ops::map function (or the iommu_map function which calls it)
was always supposed to be sleepable (according to Joerg's comment in
this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so
should probably have had a "might_sleep()" since it was written. However
currently the dma-iommu api can call iommu_map in an atomic context,
which it shouldn't do. This doesn't cause any problems because any iommu
driver which uses the dma-iommu api uses gfp_atomic in it's
iommu_ops::map function. But doing this wastes the memory allocators
atomic pools.

Signed-off-by: Tom Murphy 
---
 drivers/iommu/amd_iommu.c  |  3 ++-
 drivers/iommu/arm-smmu-v3.c|  2 +-
 drivers/iommu/arm-smmu.c   |  2 +-
 drivers/iommu/dma-iommu.c  |  6 ++---
 drivers/iommu/exynos-iommu.c   |  2 +-
 drivers/iommu/intel-iommu.c|  2 +-
 drivers/iommu/iommu.c  | 43 +-
 drivers/iommu/ipmmu-vmsa.c |  2 +-
 drivers/iommu/msm_iommu.c  |  2 +-
 drivers/iommu/mtk_iommu.c  |  2 +-
 drivers/iommu/mtk_iommu_v1.c   |  2 +-
 drivers/iommu/omap-iommu.c |  2 +-
 drivers/iommu/qcom_iommu.c |  2 +-
 drivers/iommu/rockchip-iommu.c |  2 +-
 drivers/iommu/s390-iommu.c |  2 +-
 drivers/iommu/tegra-gart.c |  2 +-
 drivers/iommu/tegra-smmu.c |  2 +-
 drivers/iommu/virtio-iommu.c   |  2 +-
 include/linux/iommu.h  | 21 -
 19 files changed, 77 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 065639e090fe..fd8da60f7359 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3030,7 +3030,8 @@ static int amd_iommu_attach_device(struct iommu_domain 
*dom,
 }
 
 static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
-phys_addr_t paddr, size_t page_size, int iommu_prot)
+phys_addr_t paddr, size_t page_size, int iommu_prot,
+gfp_t gfp)
 {
struct protection_domain *domain = to_pdomain(dom);
int prot = 0;
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4d5a694f02c2..66dee90877d7 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1964,7 +1964,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
 }
 
 static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
-   phys_addr_t paddr, size_t size, int prot)
+   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 5aeb1dbfaa08..f33ab7ef9049 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1277,7 +1277,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
 }
 
 static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
-   phys_addr_t paddr, size_t size, int prot)
+   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 0dee374fc64a..e64dbbcde63c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -471,7 +471,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
if (!iova)
return DMA_MAPPING_ERROR;
 
-   if (iommu_map(domain, iova, phys - iova_off, size, prot)) {
+   if (iommu_map_atomic(domain, iova, phys - iova_off, size, prot)) {
iommu_dma_free_iova(cookie, iova, size);
return DMA_MAPPING_ERROR;
}
@@ -615,7 +615,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
arch_dma_prep_coherent(sg_page(sg), sg->length);
}
 
-   if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, ioprot)
+   if (iommu_map_sg_atomic(domain, iova, sgt.sgl, sgt.orig_nents, ioprot)
< size)
goto out_free_sg;
 
@@ -875,7 +875,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
 * We'll leave any physical concatenation to the IOMMU driver's
 * implementation - it knows better than we do.
 */
-   if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len)
+   if (iommu_map_sg_atomic(domain, iova, sg, nents, prot) < iova_len)
goto out_free_iova;
 
return __finalise_sg(dev, sg, nents, iova);
diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 05c6bc099d62..46414234c179 100644
--- a/drivers/iommu/exynos-iommu.c
+++ 

[PATCH v4 5/5] iommu/amd: Convert AMD iommu driver to the dma-iommu api

2019-07-24 Thread Tom Murphy
Convert the AMD iommu driver to the dma-iommu api. Remove the iova
handling and reserve region code from the AMD iommu driver.

Signed-off-by: Tom Murphy 
---
 drivers/iommu/Kconfig |   1 +
 drivers/iommu/amd_iommu.c | 677 --
 2 files changed, 68 insertions(+), 610 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index d47913883d1e..19f966db02a8 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -138,6 +138,7 @@ config AMD_IOMMU
select PCI_PASID
select IOMMU_API
select IOMMU_IOVA
+   select IOMMU_DMA
depends on X86_64 && PCI && ACPI
---help---
  With this option you can enable support for AMD IOMMU hardware in
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index fd8da60f7359..ed881c2d8a6b 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -89,8 +90,6 @@ const struct iommu_ops amd_iommu_ops;
 static ATOMIC_NOTIFIER_HEAD(ppr_notifier);
 int amd_iommu_max_glx_val = -1;
 
-static const struct dma_map_ops amd_iommu_dma_ops;
-
 /*
  * general struct to manage commands send to an IOMMU
  */
@@ -103,21 +102,6 @@ struct kmem_cache *amd_iommu_irq_cache;
 static void update_domain(struct protection_domain *domain);
 static int protection_domain_init(struct protection_domain *domain);
 static void detach_device(struct device *dev);
-static void iova_domain_flush_tlb(struct iova_domain *iovad);
-
-/*
- * Data container for a dma_ops specific protection domain
- */
-struct dma_ops_domain {
-   /* generic protection domain information */
-   struct protection_domain domain;
-
-   /* IOVA RB-Tree */
-   struct iova_domain iovad;
-};
-
-static struct iova_domain reserved_iova_ranges;
-static struct lock_class_key reserved_rbtree_key;
 
 /
  *
@@ -188,12 +172,6 @@ static struct protection_domain *to_pdomain(struct 
iommu_domain *dom)
return container_of(dom, struct protection_domain, domain);
 }
 
-static struct dma_ops_domain* to_dma_ops_domain(struct protection_domain 
*domain)
-{
-   BUG_ON(domain->flags != PD_DMA_OPS_MASK);
-   return container_of(domain, struct dma_ops_domain, domain);
-}
-
 static struct iommu_dev_data *alloc_dev_data(u16 devid)
 {
struct iommu_dev_data *dev_data;
@@ -1267,12 +1245,6 @@ static void domain_flush_pages(struct protection_domain 
*domain,
__domain_flush_pages(domain, address, size, 0);
 }
 
-/* Flush the whole IO/TLB for a given protection domain */
-static void domain_flush_tlb(struct protection_domain *domain)
-{
-   __domain_flush_pages(domain, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS, 0);
-}
-
 /* Flush the whole IO/TLB for a given protection domain - including PDE */
 static void domain_flush_tlb_pde(struct protection_domain *domain)
 {
@@ -1674,43 +1646,6 @@ static unsigned long iommu_unmap_page(struct 
protection_domain *dom,
return unmapped;
 }
 
-/
- *
- * The next functions belong to the address allocator for the dma_ops
- * interface functions.
- *
- /
-
-
-static unsigned long dma_ops_alloc_iova(struct device *dev,
-   struct dma_ops_domain *dma_dom,
-   unsigned int pages, u64 dma_mask)
-{
-   unsigned long pfn = 0;
-
-   pages = __roundup_pow_of_two(pages);
-
-   if (dma_mask > DMA_BIT_MASK(32))
-   pfn = alloc_iova_fast(_dom->iovad, pages,
- IOVA_PFN(DMA_BIT_MASK(32)), false);
-
-   if (!pfn)
-   pfn = alloc_iova_fast(_dom->iovad, pages,
- IOVA_PFN(dma_mask), true);
-
-   return (pfn << PAGE_SHIFT);
-}
-
-static void dma_ops_free_iova(struct dma_ops_domain *dma_dom,
- unsigned long address,
- unsigned int pages)
-{
-   pages = __roundup_pow_of_two(pages);
-   address >>= PAGE_SHIFT;
-
-   free_iova_fast(_dom->iovad, address, pages);
-}
-
 /
  *
  * The next functions belong to the domain allocation. A domain is
@@ -1787,38 +1722,23 @@ static void free_gcr3_table(struct protection_domain 
*domain)
free_page((unsigned long)domain->gcr3_tbl);
 }
 
-static void dma_ops_domain_flush_tlb(struct dma_ops_domain *dom)
-{
-   domain_flush_tlb(>domain);
-   domain_flush_complete(>domain);
-}
-
-static void iova_domain_flush_tlb(struct iova_domain *iovad)
-{
-   struct dma_ops_domain *dom;
-
-   dom = container_of(iovad, struct dma_ops_domain, iovad);
-
-   dma_ops_domain_flush_tlb(dom);
-}

[PATCH v4 4/5] iommu/dma-iommu: Use the dev->coherent_dma_mask

2019-07-24 Thread Tom Murphy
Use the dev->coherent_dma_mask when allocating in the dma-iommu ops api.

Signed-off-by: Tom Murphy 
---
 drivers/iommu/dma-iommu.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f303bbe20e51..082fb789e3cf 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -471,7 +471,7 @@ static void __iommu_dma_unmap(struct device *dev, 
dma_addr_t dma_addr,
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-   size_t size, int prot)
+   size_t size, int prot, dma_addr_t dma_mask)
 {
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
@@ -486,7 +486,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
size = iova_align(>iovad, size + iova_off);
}
 
-   iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
+   iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev);
if (!iova)
return DMA_MAPPING_ERROR;
 
@@ -737,7 +737,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
int prot = dma_info_to_prot(dir, coherent, attrs);
dma_addr_t dma_handle;
 
-   dma_handle = __iommu_dma_map(dev, phys, size, prot);
+   dma_handle = __iommu_dma_map(dev, phys, size, prot, dma_get_mask(dev));
if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
dma_handle != DMA_MAPPING_ERROR)
arch_sync_dma_for_device(dev, phys, size, dir);
@@ -940,7 +940,8 @@ static dma_addr_t iommu_dma_map_resource(struct device 
*dev, phys_addr_t phys,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
return __iommu_dma_map(dev, phys, size,
-   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO);
+   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
+   dma_get_mask(dev));
 }
 
 static void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
@@ -1049,7 +1050,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
if (!cpu_addr)
return NULL;
 
-   *handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot);
+   *handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot,
+   dev->coherent_dma_mask);
if (*handle == DMA_MAPPING_ERROR) {
__iommu_dma_free(dev, size, cpu_addr);
return NULL;
@@ -1178,7 +1180,7 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
if (!msi_page)
return NULL;
 
-   iova = __iommu_dma_map(dev, msi_addr, size, prot);
+   iova = __iommu_dma_map(dev, msi_addr, size, prot, dma_get_mask(dev));
if (iova == DMA_MAPPING_ERROR)
goto out_free_page;
 
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 1/5] iommu/amd: Remove unnecessary locking from AMD iommu driver

2019-07-24 Thread Tom Murphy
We can remove the mutex lock from amd_iommu_map and amd_iommu_unmap.
iommu_map doesn’t lock while mapping and so no two calls should touch
the same iova range. The AMD driver already handles the page table page
allocations without locks so we can safely remove the locks.

Signed-off-by: Tom Murphy 
---
 drivers/iommu/amd_iommu.c   | 10 +-
 drivers/iommu/amd_iommu_types.h |  1 -
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 73740b969e62..065639e090fe 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2858,7 +2858,6 @@ static void protection_domain_free(struct 
protection_domain *domain)
 static int protection_domain_init(struct protection_domain *domain)
 {
spin_lock_init(>lock);
-   mutex_init(>api_lock);
domain->id = domain_id_alloc();
if (!domain->id)
return -ENOMEM;
@@ -3045,9 +3044,7 @@ static int amd_iommu_map(struct iommu_domain *dom, 
unsigned long iova,
if (iommu_prot & IOMMU_WRITE)
prot |= IOMMU_PROT_IW;
 
-   mutex_lock(>api_lock);
ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL);
-   mutex_unlock(>api_lock);
 
domain_flush_np_cache(domain, iova, page_size);
 
@@ -3058,16 +3055,11 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, 
unsigned long iova,
   size_t page_size)
 {
struct protection_domain *domain = to_pdomain(dom);
-   size_t unmap_size;
 
if (domain->mode == PAGE_MODE_NONE)
return 0;
 
-   mutex_lock(>api_lock);
-   unmap_size = iommu_unmap_page(domain, iova, page_size);
-   mutex_unlock(>api_lock);
-
-   return unmap_size;
+   return iommu_unmap_page(domain, iova, page_size);
 }
 
 static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom,
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 52c35d557fad..5d5f5d009b19 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -461,7 +461,6 @@ struct protection_domain {
struct iommu_domain domain; /* generic domain handle used by
   iommu core code */
spinlock_t lock;/* mostly used to lock the page table*/
-   struct mutex api_lock;  /* protect page tables in the iommu-api path */
u16 id; /* the domain id written to the device table */
int mode;   /* paging mode (0-6 levels) */
u64 *pt_root;   /* page table root pointer */
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v4 0/5] iommu/amd: Convert the AMD iommu driver to the dma-iommu api

2019-07-24 Thread Tom Murphy
Convert the AMD iommu driver to the dma-iommu api. Remove the iova
handling and reserve region code from the AMD iommu driver.

Change-log:
V4:
-Rebase on top of linux-next
-Split the removing of the unnecessary locking in the amd iommu driver into a 
seperate patch
-refactor the "iommu/dma-iommu: Handle deferred devices" patch and address 
comments
v3:
-rename dma_limit to dma_mask
-exit handle_deferred_device early if (!is_kdump_kernel())
-remove pointless calls to handle_deferred_device
v2:
-Rebase on top of this series:
 http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-iommu-ops.3
-Add a gfp_t parameter to the iommu_ops::map function.
-Made use of the reserve region code inside the dma-iommu api

Tom Murphy (5):
  iommu/amd: Remove unnecessary locking from AMD iommu driver
  iommu: Add gfp parameter to iommu_ops::map
  iommu/dma-iommu: Handle deferred devices
  iommu/dma-iommu: Use the dev->coherent_dma_mask
  iommu/amd: Convert AMD iommu driver to the dma-iommu api

 drivers/iommu/Kconfig   |   1 +
 drivers/iommu/amd_iommu.c   | 690 
 drivers/iommu/amd_iommu_types.h |   1 -
 drivers/iommu/arm-smmu-v3.c |   2 +-
 drivers/iommu/arm-smmu.c|   2 +-
 drivers/iommu/dma-iommu.c   |  45 ++-
 drivers/iommu/exynos-iommu.c|   2 +-
 drivers/iommu/intel-iommu.c |   2 +-
 drivers/iommu/iommu.c   |  43 +-
 drivers/iommu/ipmmu-vmsa.c  |   2 +-
 drivers/iommu/msm_iommu.c   |   2 +-
 drivers/iommu/mtk_iommu.c   |   2 +-
 drivers/iommu/mtk_iommu_v1.c|   2 +-
 drivers/iommu/omap-iommu.c  |   2 +-
 drivers/iommu/qcom_iommu.c  |   2 +-
 drivers/iommu/rockchip-iommu.c  |   2 +-
 drivers/iommu/s390-iommu.c  |   2 +-
 drivers/iommu/tegra-gart.c  |   2 +-
 drivers/iommu/tegra-smmu.c  |   2 +-
 drivers/iommu/virtio-iommu.c|   2 +-
 include/linux/iommu.h   |  21 +-
 21 files changed, 179 insertions(+), 652 deletions(-)

-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


memory leak in vhost_net_ioctl

2019-07-24 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:788a0249 Merge tag 'arc-5.2-rc4' of git://git.kernel.org/p..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15dc9ea6a0
kernel config:  https://syzkaller.appspot.com/x/.config?x=d5c73825cbdc7326
dashboard link: https://syzkaller.appspot.com/bug?extid=0789f0c7e45efd7bb643
compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=10b31761a0
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=124892c1a0

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0789f0c7e45efd7bb...@syzkaller.appspotmail.com

udit: type=1400 audit(1559768703.229:36): avc:  denied  { map } for   
pid=7116 comm="syz-executor330" path="/root/syz-executor330334897"  
dev="sda1" ino=16461 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023  
tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1

executing program
executing program
BUG: memory leak
unreferenced object 0x88812421fe40 (size 64):
  comm "syz-executor330", pid 7117, jiffies 4294949245 (age 13.030s)
  hex dump (first 32 bytes):
01 00 00 00 20 69 6f 63 00 00 00 00 64 65 76 2f   iocdev/
50 fe 21 24 81 88 ff ff 50 fe 21 24 81 88 ff ff  P.!$P.!$
  backtrace:
[] kmemleak_alloc_recursive  
include/linux/kmemleak.h:55 [inline]

[] slab_post_alloc_hook mm/slab.h:439 [inline]
[] slab_alloc mm/slab.c:3326 [inline]
[] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
[<79ebab38>] kmalloc include/linux/slab.h:547 [inline]
[<79ebab38>] vhost_net_ubuf_alloc drivers/vhost/net.c:241  
[inline]
[<79ebab38>] vhost_net_set_backend drivers/vhost/net.c:1534  
[inline]
[<79ebab38>] vhost_net_ioctl+0xb43/0xc10  
drivers/vhost/net.c:1716

[<9f6204a2>] vfs_ioctl fs/ioctl.c:46 [inline]
[<9f6204a2>] file_ioctl fs/ioctl.c:509 [inline]
[<9f6204a2>] do_vfs_ioctl+0x62a/0x810 fs/ioctl.c:696
[] ksys_ioctl+0x86/0xb0 fs/ioctl.c:713
[] __do_sys_ioctl fs/ioctl.c:720 [inline]
[] __se_sys_ioctl fs/ioctl.c:718 [inline]
[] __x64_sys_ioctl+0x1e/0x30 fs/ioctl.c:718
[<49c1f547>] do_syscall_64+0x76/0x1a0  
arch/x86/entry/common.c:301

[<29cc8ca7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

BUG: memory leak
unreferenced object 0x88812421fa80 (size 64):
  comm "syz-executor330", pid 7130, jiffies 4294949755 (age 7.930s)
  hex dump (first 32 bytes):
01 00 00 00 01 00 00 00 00 00 00 00 2f 76 69 72  /vir
90 fa 21 24 81 88 ff ff 90 fa 21 24 81 88 ff ff  ..!$..!$
  backtrace:
[] kmemleak_alloc_recursive  
include/linux/kmemleak.h:55 [inline]

[] slab_post_alloc_hook mm/slab.h:439 [inline]
[] slab_alloc mm/slab.c:3326 [inline]
[] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
[<79ebab38>] kmalloc include/linux/slab.h:547 [inline]
[<79ebab38>] vhost_net_ubuf_alloc drivers/vhost/net.c:241  
[inline]
[<79ebab38>] vhost_net_set_backend drivers/vhost/net.c:1534  
[inline]
[<79ebab38>] vhost_net_ioctl+0xb43/0xc10  
drivers/vhost/net.c:1716

[<9f6204a2>] vfs_ioctl fs/ioctl.c:46 [inline]
[<9f6204a2>] file_ioctl fs/ioctl.c:509 [inline]
[<9f6204a2>] do_vfs_ioctl+0x62a/0x810 fs/ioctl.c:696
[] ksys_ioctl+0x86/0xb0 fs/ioctl.c:713
[] __do_sys_ioctl fs/ioctl.c:720 [inline]
[] __se_sys_ioctl fs/ioctl.c:718 [inline]
[] __x64_sys_ioctl+0x1e/0x30 fs/ioctl.c:718
[<49c1f547>] do_syscall_64+0x76/0x1a0  
arch/x86/entry/common.c:301

[<29cc8ca7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9



---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 22/22] docs: fix broken documentation links

2019-07-24 Thread Mauro Carvalho Chehab
Em Tue, 4 Jun 2019 06:46:14 -0300
Mauro Carvalho Chehab  escreveu:

> Em Mon, 3 Jun 2019 09:34:15 +0200
> Christophe Leroy  escreveu:
> 

> > [...]
> > 
> > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > > index 8c1c636308c8..e868d2bd48b8 100644
> > > --- a/arch/powerpc/Kconfig
> > > +++ b/arch/powerpc/Kconfig
> > > @@ -898,7 +898,7 @@ config PPC_MEM_KEYS
> > > page-based protections, but without requiring modification of 
> > > the
> > > page tables when an application changes protection domains.
> > >   
> > > -   For details, see Documentation/vm/protection-keys.rst
> > > +   For details, see Documentation/x86/protection-keys.rst  
> > 
> > It looks strange to reference an x86 file, for powerpc arch.
> 
> Indeed. Yet, seeking for the API documented there:
> 
>  $ git grep -l pkey_mprotect
> Documentation/x86/protection-keys.rst
> arch/alpha/kernel/syscalls/syscall.tbl
> arch/arm/tools/syscall.tbl
> arch/arm64/include/asm/unistd32.h
> arch/ia64/kernel/syscalls/syscall.tbl
> arch/m68k/kernel/syscalls/syscall.tbl
> arch/microblaze/kernel/syscalls/syscall.tbl
> arch/mips/kernel/syscalls/syscall_n32.tbl
> arch/mips/kernel/syscalls/syscall_n64.tbl
> arch/mips/kernel/syscalls/syscall_o32.tbl
> arch/parisc/kernel/syscalls/syscall.tbl
> arch/powerpc/kernel/syscalls/syscall.tbl
> arch/s390/kernel/syscalls/syscall.tbl
> arch/sh/kernel/syscalls/syscall.tbl
> arch/sparc/kernel/syscalls/syscall.tbl
> arch/x86/entry/syscalls/syscall_32.tbl
> arch/x86/entry/syscalls/syscall_64.tbl
> arch/xtensa/kernel/syscalls/syscall.tbl
> include/linux/syscalls.h
> include/uapi/asm-generic/unistd.h
> kernel/sys_ni.c
> mm/mprotect.c
> tools/include/uapi/asm-generic/unistd.h
> tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
> tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
> tools/perf/builtin-trace.c
> tools/testing/selftests/x86/protection_keys.c
> 
> Despite being used on several archs, the only documentation for it
> is inside the x86 directory, as it seems that this is not
> arch-specific.
> 
> Perhaps the file should, instead, be moved to another book.

I guess the best is to have this inside the core-api book.

Patch enclosed.

Regards,
Mauro


[PATCH] docs: move protection-keys.rst to the core-api book

This document is used by multiple architectures:

$ echo $(git grep -l  pkey_mprotect arch|cut -d'/' -f 2|sort|uniq)
alpha arm arm64 ia64 m68k microblaze mips parisc powerpc s390 sh sparc 
x86 xtensa

So, let's move it to the core book and adjust the links to it
accordingly.

Signed-off-by: Mauro Carvalho Chehab 

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index ee1bb8983a88..2466a4c51031 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -34,6 +34,7 @@ Core utilities
timekeeping
boot-time-mm
memory-hotplug
+   protection-keys
 
 
 Interfaces for kernel debugging
diff --git a/Documentation/x86/protection-keys.rst 
b/Documentation/core-api/protection-keys.rst
similarity index 100%
rename from Documentation/x86/protection-keys.rst
rename to Documentation/core-api/protection-keys.rst
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index ae36fc5fc649..f2de1b2d3ac7 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -19,7 +19,6 @@ x86-specific Documentation
tlb
mtrr
pat
-   protection-keys
intel_mpx
amd-memory-encryption
pti
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c1c636308c8..3b795a0cab62 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -898,7 +898,7 @@ config PPC_MEM_KEYS
  page-based protections, but without requiring modification of the
  page tables when an application changes protection domains.
 
- For details, see Documentation/vm/protection-keys.rst
+ For details, see Documentation/core-api/protection-keys.rst
 
  If unsure, say y.
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2bbbd4d1ba31..d87d53fcd261 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1911,7 +1911,7 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
  page-based protections, but without requiring modification of the
  page tables when an application changes protection domains.
 
- For details, see Documentation/x86/protection-keys.txt
+ For details, see Documentation/core-api/protection-keys.rst
 
  If unsure, say y.
 
diff --git a/tools/testing/selftests/x86/protection_keys.c 
b/tools/testing/selftests/x86/protection_keys.c
index 5d546dcdbc80..480995bceefa 100644
--- a/tools/testing/selftests/x86/protection_keys.c
+++ b/tools/testing/selftests/x86/protection_keys.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Tests x86 Memory Protection Keys (see Documentation/x86/protection-keys.txt)
+ * Tests x86 Memory Protection Keys (see 
Documentation/core-api/protection-keys.rst)
  *
  * There are examples 

Re: memory leak in vhost_net_ioctl

2019-07-24 Thread syzbot

Hello,

syzbot has tested the proposed patch but the reproducer still triggered  
crash:

memory leak in vhost_net_ioctl

ANGE): hsr_slave_1: link becomes ready
2019/06/13 18:24:57 executed programs: 18
BUG: memory leak
unreferenced object 0x88811cbc6ac0 (size 64):
  comm "syz-executor.0", pid 7196, jiffies 4294943804 (age 14.770s)
  hex dump (first 32 bytes):
01 00 00 00 81 88 ff ff 00 00 00 00 82 88 ff ff  
d0 6a bc 1c 81 88 ff ff d0 6a bc 1c 81 88 ff ff  .j...j..
  backtrace:
[<6c752978>] kmemleak_alloc_recursive  
include/linux/kmemleak.h:43 [inline]

[<6c752978>] slab_post_alloc_hook mm/slab.h:439 [inline]
[<6c752978>] slab_alloc mm/slab.c:3326 [inline]
[<6c752978>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
[] kmalloc include/linux/slab.h:547 [inline]
[] vhost_net_ubuf_alloc drivers/vhost/net.c:241  
[inline]
[] vhost_net_set_backend drivers/vhost/net.c:1535  
[inline]
[] vhost_net_ioctl+0xb43/0xc10  
drivers/vhost/net.c:1717

[<700f02d7>] vfs_ioctl fs/ioctl.c:46 [inline]
[<700f02d7>] file_ioctl fs/ioctl.c:509 [inline]
[<700f02d7>] do_vfs_ioctl+0x62a/0x810 fs/ioctl.c:696
[<9a0ec0a7>] ksys_ioctl+0x86/0xb0 fs/ioctl.c:713
[] __do_sys_ioctl fs/ioctl.c:720 [inline]
[] __se_sys_ioctl fs/ioctl.c:718 [inline]
[] __x64_sys_ioctl+0x1e/0x30 fs/ioctl.c:718
[] do_syscall_64+0x76/0x1a0  
arch/x86/entry/common.c:301

[<8715c149>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

BUG: memory leak
unreferenced object 0x88810b1365c0 (size 64):
  comm "syz-executor.2", pid 7193, jiffies 4294943823 (age 14.580s)
  hex dump (first 32 bytes):
01 00 00 00 81 88 ff ff 00 00 00 00 81 88 ff ff  
d0 65 13 0b 81 88 ff ff d0 65 13 0b 81 88 ff ff  .e...e..
  backtrace:
[<6c752978>] kmemleak_alloc_recursive  
include/linux/kmemleak.h:43 [inline]

[<6c752978>] slab_post_alloc_hook mm/slab.h:439 [inline]
[<6c752978>] slab_alloc mm/slab.c:3326 [inline]
[<6c752978>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
[] kmalloc include/linux/slab.h:547 [inline]
[] vhost_net_ubuf_alloc drivers/vhost/net.c:241  
[inline]
[] vhost_net_set_backend drivers/vhost/net.c:1535  
[inline]
[] vhost_net_ioctl+0xb43/0xc10  
drivers/vhost/net.c:1717

[<700f02d7>] vfs_ioctl fs/ioctl.c:46 [inline]
[<700f02d7>] file_ioctl fs/ioctl.c:509 [inline]
[<700f02d7>] do_vfs_ioctl+0x62a/0x810 fs/ioctl.c:696
[<9a0ec0a7>] ksys_ioctl+0x86/0xb0 fs/ioctl.c:713
[] __do_sys_ioctl fs/ioctl.c:720 [inline]
[] __se_sys_ioctl fs/ioctl.c:718 [inline]
[] __x64_sys_ioctl+0x1e/0x30 fs/ioctl.c:718
[] do_syscall_64+0x76/0x1a0  
arch/x86/entry/common.c:301

[<8715c149>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

BUG: memory leak
unreferenced object 0x88810be23700 (size 64):
  comm "syz-executor.3", pid 7194, jiffies 4294943823 (age 14.580s)
  hex dump (first 32 bytes):
01 00 00 00 00 00 00 00 00 00 00 00 00 c9 ff ff  
10 37 e2 0b 81 88 ff ff 10 37 e2 0b 81 88 ff ff  .7...7..
  backtrace:
[<6c752978>] kmemleak_alloc_recursive  
include/linux/kmemleak.h:43 [inline]

[<6c752978>] slab_post_alloc_hook mm/slab.h:439 [inline]
[<6c752978>] slab_alloc mm/slab.c:3326 [inline]
[<6c752978>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
[] kmalloc include/linux/slab.h:547 [inline]
[] vhost_net_ubuf_alloc drivers/vhost/net.c:241  
[inline]
[] vhost_net_set_backend drivers/vhost/net.c:1535  
[inline]
[] vhost_net_ioctl+0xb43/0xc10  
drivers/vhost/net.c:1717

[<700f02d7>] vfs_ioctl fs/ioctl.c:46 [inline]
[<700f02d7>] file_ioctl fs/ioctl.c:509 [inline]
[<700f02d7>] do_vfs_ioctl+0x62a/0x810 fs/ioctl.c:696
[<9a0ec0a7>] ksys_ioctl+0x86/0xb0 fs/ioctl.c:713
[] __do_sys_ioctl fs/ioctl.c:720 [inline]
[] __se_sys_ioctl fs/ioctl.c:718 [inline]
[] __x64_sys_ioctl+0x1e/0x30 fs/ioctl.c:718
[] do_syscall_64+0x76/0x1a0  
arch/x86/entry/common.c:301

[<8715c149>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

BUG: memory leak
unreferenced object 0x88810b136500 (size 64):
  comm "syz-executor.6", pid 7228, jiffies 4294943827 (age 14.540s)
  hex dump (first 32 bytes):
01 00 00 00 20 69 6f 63 00 00 00 00 64 65 76 2f   iocdev/
10 65 13 0b 81 88 ff ff 10 65 

[PATCH net-next] vsock: correct removal of socket from the list

2019-07-24 Thread Sunil Muthuswamy via Virtualization
The current vsock code for removal of socket from the list is both
subject to race and inefficient. It takes the lock, checks whether
the socket is in the list, drops the lock and if the socket was on the
list, deletes it from the list. This is subject to race because as soon
as the lock is dropped once it is checked for presence, that condition
cannot be relied upon for any decision. It is also inefficient because
if the socket is present in the list, it takes the lock twice.

Signed-off-by: Sunil Muthuswamy 
---
 net/vmw_vsock/af_vsock.c | 38 +++---
 1 file changed, 7 insertions(+), 31 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index d892000..6f063ed 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -282,7 +282,8 @@ EXPORT_SYMBOL_GPL(vsock_insert_connected);
 void vsock_remove_bound(struct vsock_sock *vsk)
 {
spin_lock_bh(_table_lock);
-   __vsock_remove_bound(vsk);
+   if (__vsock_in_bound_table(vsk))
+   __vsock_remove_bound(vsk);
spin_unlock_bh(_table_lock);
 }
 EXPORT_SYMBOL_GPL(vsock_remove_bound);
@@ -290,7 +291,8 @@ EXPORT_SYMBOL_GPL(vsock_remove_bound);
 void vsock_remove_connected(struct vsock_sock *vsk)
 {
spin_lock_bh(_table_lock);
-   __vsock_remove_connected(vsk);
+   if (__vsock_in_connected_table(vsk))
+   __vsock_remove_connected(vsk);
spin_unlock_bh(_table_lock);
 }
 EXPORT_SYMBOL_GPL(vsock_remove_connected);
@@ -326,35 +328,10 @@ struct sock *vsock_find_connected_socket(struct 
sockaddr_vm *src,
 }
 EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
 
-static bool vsock_in_bound_table(struct vsock_sock *vsk)
-{
-   bool ret;
-
-   spin_lock_bh(_table_lock);
-   ret = __vsock_in_bound_table(vsk);
-   spin_unlock_bh(_table_lock);
-
-   return ret;
-}
-
-static bool vsock_in_connected_table(struct vsock_sock *vsk)
-{
-   bool ret;
-
-   spin_lock_bh(_table_lock);
-   ret = __vsock_in_connected_table(vsk);
-   spin_unlock_bh(_table_lock);
-
-   return ret;
-}
-
 void vsock_remove_sock(struct vsock_sock *vsk)
 {
-   if (vsock_in_bound_table(vsk))
-   vsock_remove_bound(vsk);
-
-   if (vsock_in_connected_table(vsk))
-   vsock_remove_connected(vsk);
+   vsock_remove_bound(vsk);
+   vsock_remove_connected(vsk);
 }
 EXPORT_SYMBOL_GPL(vsock_remove_sock);
 
@@ -485,8 +462,7 @@ static void vsock_pending_work(struct work_struct *work)
 * incoming packets can't find this socket, and to reduce the reference
 * count.
 */
-   if (vsock_in_connected_table(vsk))
-   vsock_remove_connected(vsk);
+   vsock_remove_connected(vsk);
 
sk->sk_state = TCP_CLOSE;
 
-- 
2.7.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 12/12] fs/ceph: fix a build warning: returning a value from void function

2019-07-24 Thread john . hubbard
From: John Hubbard 

Trivial build warning fix: don't return a value from a function
whose type is "void".

Signed-off-by: John Hubbard 
---
 fs/ceph/debugfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index 2eb88ed22993..fa14c8e8761d 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -294,7 +294,7 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc)
 
 void ceph_fs_debugfs_init(struct ceph_fs_client *fsc)
 {
-   return 0;
+   return;
 }
 
 void ceph_fs_debugfs_cleanup(struct ceph_fs_client *fsc)
-- 
2.22.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 11/12] 9p/net: convert put_page() to put_user_page*()

2019-07-24 Thread john . hubbard
From: Jérôme Glisse 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Signed-off-by: Jérôme Glisse 
Signed-off-by: John Hubbard 
Cc: linux-fsde...@vger.kernel.org
Cc: linux-bl...@vger.kernel.org
Cc: linux...@kvack.org
Cc: v9fs-develo...@lists.sourceforge.net
Cc: Jan Kara 
Cc: Dan Williams 
Cc: Alexander Viro 
Cc: Johannes Thumshirn 
Cc: Christoph Hellwig 
Cc: Jens Axboe 
Cc: Ming Lei 
Cc: Dave Chinner 
Cc: Jason Gunthorpe 
Cc: Matthew Wilcox 
Cc: Boaz Harrosh 
Cc: Eric Van Hensbergen 
Cc: Latchesar Ionkov 
Cc: Dominique Martinet 
---
 net/9p/trans_common.c | 14 ++
 net/9p/trans_common.h |  3 ++-
 net/9p/trans_virtio.c | 18 +-
 3 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/net/9p/trans_common.c b/net/9p/trans_common.c
index 3dff68f05fb9..e5c359c369a6 100644
--- a/net/9p/trans_common.c
+++ b/net/9p/trans_common.c
@@ -19,12 +19,18 @@
 /**
  *  p9_release_pages - Release pages after the transaction.
  */
-void p9_release_pages(struct page **pages, int nr_pages)
+void p9_release_pages(struct page **pages, int nr_pages, bool from_gup)
 {
int i;
 
-   for (i = 0; i < nr_pages; i++)
-   if (pages[i])
-   put_page(pages[i]);
+   if (from_gup) {
+   for (i = 0; i < nr_pages; i++)
+   if (pages[i])
+   put_user_page(pages[i]);
+   } else {
+   for (i = 0; i < nr_pages; i++)
+   if (pages[i])
+   put_page(pages[i]);
+   }
 }
 EXPORT_SYMBOL(p9_release_pages);
diff --git a/net/9p/trans_common.h b/net/9p/trans_common.h
index c43babb3f635..dcf025867314 100644
--- a/net/9p/trans_common.h
+++ b/net/9p/trans_common.h
@@ -12,4 +12,5 @@
  *
  */
 
-void p9_release_pages(struct page **, int);
+void p9_release_pages(struct page **pages, int nr_pages, bool from_gup);
+
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index a3cd90a74012..3714ca5ecdc2 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -306,11 +306,14 @@ static int p9_get_mapped_pages(struct virtio_chan *chan,
   struct iov_iter *data,
   int count,
   size_t *offs,
-  int *need_drop)
+  int *need_drop,
+  bool *from_gup)
 {
int nr_pages;
int err;
 
+   *from_gup = false;
+
if (!iov_iter_count(data))
return 0;
 
@@ -332,6 +335,7 @@ static int p9_get_mapped_pages(struct virtio_chan *chan,
*need_drop = 1;
nr_pages = DIV_ROUND_UP(n + *offs, PAGE_SIZE);
atomic_add(nr_pages, _pinned);
+   *from_gup = iov_iter_get_pages_use_gup(data);
return n;
} else {
/* kernel buffer, no need to pin pages */
@@ -397,13 +401,15 @@ p9_virtio_zc_request(struct p9_client *client, struct 
p9_req_t *req,
size_t offs;
int need_drop = 0;
int kicked = 0;
+   bool in_from_gup, out_from_gup;
 
p9_debug(P9_DEBUG_TRANS, "virtio request\n");
 
if (uodata) {
__le32 sz;
int n = p9_get_mapped_pages(chan, _pages, uodata,
-   outlen, , _drop);
+   outlen, , _drop,
+   _from_gup);
if (n < 0) {
err = n;
goto err_out;
@@ -422,7 +428,8 @@ p9_virtio_zc_request(struct p9_client *client, struct 
p9_req_t *req,
memcpy(>tc.sdata[0], , sizeof(sz));
} else if (uidata) {
int n = p9_get_mapped_pages(chan, _pages, uidata,
-   inlen, , _drop);
+   inlen, , _drop,
+   _from_gup);
if (n < 0) {
err = n;
goto err_out;
@@ -504,11 +511,12 @@ p9_virtio_zc_request(struct p9_client *client, struct 
p9_req_t *req,
 err_out:
if (need_drop) {
if (in_pages) {
-   p9_release_pages(in_pages, in_nr_pages);
+   p9_release_pages(in_pages, in_nr_pages, in_from_gup);
atomic_sub(in_nr_pages, _pinned);
}
if (out_pages) {
-   p9_release_pages(out_pages, out_nr_pages);
+   p9_release_pages(out_pages, out_nr_pages,
+out_from_gup);
atomic_sub(out_nr_pages, _pinned);

[PATCH 10/12] fs/ceph: convert put_page() to put_user_page*()

2019-07-24 Thread john . hubbard
From: Jérôme Glisse 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Changes from Jérôme's original patch:

* Use the enhanced put_user_pages_dirty_lock().

Signed-off-by: Jérôme Glisse 
Signed-off-by: John Hubbard 
Cc: linux-fsde...@vger.kernel.org
Cc: linux-bl...@vger.kernel.org
Cc: linux...@kvack.org
Cc: ceph-de...@vger.kernel.org
Cc: Jan Kara 
Cc: Dan Williams 
Cc: Alexander Viro 
Cc: Johannes Thumshirn 
Cc: Christoph Hellwig 
Cc: Jens Axboe 
Cc: Ming Lei 
Cc: Dave Chinner 
Cc: Jason Gunthorpe 
Cc: Matthew Wilcox 
Cc: Boaz Harrosh 
Cc: "Yan, Zheng" 
Cc: Sage Weil 
Cc: Ilya Dryomov 
---
 fs/ceph/file.c | 62 ++
 1 file changed, 48 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 685a03cc4b77..c628a1f96978 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -158,18 +158,26 @@ static ssize_t iter_get_bvecs_alloc(struct iov_iter 
*iter, size_t maxsize,
return bytes;
 }
 
-static void put_bvecs(struct bio_vec *bvecs, int num_bvecs, bool should_dirty)
+static void put_bvecs(struct bio_vec *bv, int num_bvecs, bool should_dirty,
+ bool from_gup)
 {
int i;
 
+
for (i = 0; i < num_bvecs; i++) {
-   if (bvecs[i].bv_page) {
+   if (!bv[i].bv_page)
+   continue;
+
+   if (from_gup) {
+   put_user_pages_dirty_lock([i].bv_page, 1,
+ should_dirty);
+   } else {
if (should_dirty)
-   set_page_dirty_lock(bvecs[i].bv_page);
-   put_page(bvecs[i].bv_page);
+   set_page_dirty_lock(bv[i].bv_page);
+   put_page(bv[i].bv_page);
}
}
-   kvfree(bvecs);
+   kvfree(bv);
 }
 
 /*
@@ -730,6 +738,7 @@ struct ceph_aio_work {
 };
 
 static void ceph_aio_retry_work(struct work_struct *work);
+static void ceph_aio_from_gup_retry_work(struct work_struct *work);
 
 static void ceph_aio_complete(struct inode *inode,
  struct ceph_aio_request *aio_req)
@@ -774,7 +783,7 @@ static void ceph_aio_complete(struct inode *inode,
kfree(aio_req);
 }
 
-static void ceph_aio_complete_req(struct ceph_osd_request *req)
+static void _ceph_aio_complete_req(struct ceph_osd_request *req, bool from_gup)
 {
int rc = req->r_result;
struct inode *inode = req->r_inode;
@@ -793,7 +802,9 @@ static void ceph_aio_complete_req(struct ceph_osd_request 
*req)
 
aio_work = kmalloc(sizeof(*aio_work), GFP_NOFS);
if (aio_work) {
-   INIT_WORK(_work->work, ceph_aio_retry_work);
+   INIT_WORK(_work->work, from_gup ?
+ ceph_aio_from_gup_retry_work :
+ ceph_aio_retry_work);
aio_work->req = req;
queue_work(ceph_inode_to_client(inode)->inode_wq,
   _work->work);
@@ -830,7 +841,7 @@ static void ceph_aio_complete_req(struct ceph_osd_request 
*req)
}
 
put_bvecs(osd_data->bvec_pos.bvecs, osd_data->num_bvecs,
- aio_req->should_dirty);
+ aio_req->should_dirty, from_gup);
ceph_osdc_put_request(req);
 
if (rc < 0)
@@ -840,7 +851,17 @@ static void ceph_aio_complete_req(struct ceph_osd_request 
*req)
return;
 }
 
-static void ceph_aio_retry_work(struct work_struct *work)
+static void ceph_aio_complete_req(struct ceph_osd_request *req)
+{
+   _ceph_aio_complete_req(req, false);
+}
+
+static void ceph_aio_from_gup_complete_req(struct ceph_osd_request *req)
+{
+   _ceph_aio_complete_req(req, true);
+}
+
+static void _ceph_aio_retry_work(struct work_struct *work, bool from_gup)
 {
struct ceph_aio_work *aio_work =
container_of(work, struct ceph_aio_work, work);
@@ -891,7 +912,8 @@ static void ceph_aio_retry_work(struct work_struct *work)
 
ceph_osdc_put_request(orig_req);
 
-   req->r_callback = ceph_aio_complete_req;
+   req->r_callback = from_gup ? ceph_aio_from_gup_complete_req :
+ ceph_aio_complete_req;
req->r_inode = inode;
req->r_priv = aio_req;
 
@@ -899,13 +921,23 @@ static void ceph_aio_retry_work(struct work_struct *work)
 out:
if (ret < 0) {
req->r_result = ret;
-   ceph_aio_complete_req(req);
+   _ceph_aio_complete_req(req, from_gup);
}
 
ceph_put_snap_context(snapc);
kfree(aio_work);
 }
 
+static void ceph_aio_retry_work(struct work_struct *work)
+{
+   

[PATCH 08/12] fs/cifs: convert put_page() to put_user_page*()

2019-07-24 Thread john . hubbard
From: Jérôme Glisse 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Signed-off-by: Jérôme Glisse 
Signed-off-by: John Hubbard 
Cc: linux-fsde...@vger.kernel.org
Cc: linux-bl...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-c...@vger.kernel.org
Cc: Jan Kara 
Cc: Dan Williams 
Cc: Alexander Viro 
Cc: Johannes Thumshirn 
Cc: Christoph Hellwig 
Cc: Jens Axboe 
Cc: Ming Lei 
Cc: Dave Chinner 
Cc: Jason Gunthorpe 
Cc: Matthew Wilcox 
Cc: Boaz Harrosh 
Cc: Steve French 
---
 fs/cifs/cifsglob.h |  3 +++
 fs/cifs/file.c | 22 +-
 fs/cifs/misc.c | 19 +++
 3 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index fe610e7e3670..e95cb82bfa50 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1283,6 +1283,7 @@ struct cifs_aio_ctx {
 * If yes, iter is a copy of the user passed iov_iter
 */
booldirect_io;
+   boolfrom_gup;
 };
 
 struct cifs_readdata;
@@ -1317,6 +1318,7 @@ struct cifs_readdata {
struct cifs_credits credits;
unsigned intnr_pages;
struct page **pages;
+   boolfrom_gup;
 };
 
 struct cifs_writedata;
@@ -1343,6 +1345,7 @@ struct cifs_writedata {
struct cifs_credits credits;
unsigned intnr_pages;
struct page **pages;
+   boolfrom_gup;
 };
 
 /*
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 97090693d182..84fa7e0a578f 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2571,8 +2571,13 @@ cifs_uncached_writedata_release(struct kref *refcount)
struct cifs_writedata, refcount);
 
kref_put(>ctx->refcount, cifs_aio_ctx_release);
-   for (i = 0; i < wdata->nr_pages; i++)
-   put_page(wdata->pages[i]);
+   if (wdata->from_gup) {
+   for (i = 0; i < wdata->nr_pages; i++)
+   put_user_page(wdata->pages[i]);
+   } else {
+   for (i = 0; i < wdata->nr_pages; i++)
+   put_page(wdata->pages[i]);
+   }
cifs_writedata_release(refcount);
 }
 
@@ -2781,7 +2786,7 @@ cifs_write_from_iter(loff_t offset, size_t len, struct 
iov_iter *from,
break;
}
 
-
+   wdata->from_gup = iov_iter_get_pages_use_gup(from);
wdata->page_offset = start;
wdata->tailsz =
nr_pages > 1 ?
@@ -2797,6 +2802,7 @@ cifs_write_from_iter(loff_t offset, size_t len, struct 
iov_iter *from,
add_credits_and_wake_if(server, credits, 0);
break;
}
+   wdata->from_gup = false;
 
rc = cifs_write_allocate_pages(wdata->pages, nr_pages);
if (rc) {
@@ -3238,8 +3244,12 @@ cifs_uncached_readdata_release(struct kref *refcount)
unsigned int i;
 
kref_put(>ctx->refcount, cifs_aio_ctx_release);
-   for (i = 0; i < rdata->nr_pages; i++) {
-   put_page(rdata->pages[i]);
+   if (rdata->from_gup) {
+   for (i = 0; i < rdata->nr_pages; i++)
+   put_user_page(rdata->pages[i]);
+   } else {
+   for (i = 0; i < rdata->nr_pages; i++)
+   put_page(rdata->pages[i]);
}
cifs_readdata_release(refcount);
 }
@@ -3502,6 +3512,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct 
cifsFileInfo *open_file,
break;
}
 
+   rdata->from_gup = 
iov_iter_get_pages_use_gup(_iov);
npages = (cur_len + start + PAGE_SIZE-1) / PAGE_SIZE;
rdata->page_offset = start;
rdata->tailsz = npages > 1 ?
@@ -3519,6 +3530,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct 
cifsFileInfo *open_file,
rc = -ENOMEM;
break;
}
+   rdata->from_gup = false;
 
rc = cifs_read_allocate_pages(rdata, npages);
if (rc) {
diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
index f383877a6511..5a04c34fea05 100644
--- a/fs/cifs/misc.c
+++ b/fs/cifs/misc.c
@@ -822,10 +822,18 @@ cifs_aio_ctx_release(struct kref *refcount)
if (ctx->bv) {
unsigned i;
 
-   for (i = 0; i < ctx->npages; i++) {
-   if 

[PATCH 07/12] vhost-scsi: convert put_page() to put_user_page*()

2019-07-24 Thread john . hubbard
From: Jérôme Glisse 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Changes from Jérôme's original patch:

* Changed a WARN_ON to a BUG_ON.

Signed-off-by: Jérôme Glisse 
Signed-off-by: John Hubbard 
Cc: virtualization@lists.linux-foundation.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-bl...@vger.kernel.org
Cc: linux...@kvack.org
Cc: Jan Kara 
Cc: Dan Williams 
Cc: Alexander Viro 
Cc: Johannes Thumshirn 
Cc: Christoph Hellwig 
Cc: Jens Axboe 
Cc: Ming Lei 
Cc: Dave Chinner 
Cc: Jason Gunthorpe 
Cc: Matthew Wilcox 
Cc: Boaz Harrosh 
Cc: Miklos Szeredi 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Paolo Bonzini 
Cc: Stefan Hajnoczi 
---
 drivers/vhost/scsi.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index a9caf1bc3c3e..282565ab5e3f 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -329,11 +329,11 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 
if (tv_cmd->tvc_sgl_count) {
for (i = 0; i < tv_cmd->tvc_sgl_count; i++)
-   put_page(sg_page(_cmd->tvc_sgl[i]));
+   put_user_page(sg_page(_cmd->tvc_sgl[i]));
}
if (tv_cmd->tvc_prot_sgl_count) {
for (i = 0; i < tv_cmd->tvc_prot_sgl_count; i++)
-   put_page(sg_page(_cmd->tvc_prot_sgl[i]));
+   put_user_page(sg_page(_cmd->tvc_prot_sgl[i]));
}
 
vhost_scsi_put_inflight(tv_cmd->inflight);
@@ -630,6 +630,13 @@ vhost_scsi_map_to_sgl(struct vhost_scsi_cmd *cmd,
size_t offset;
unsigned int npages = 0;
 
+   /*
+* Here in all cases we should have an IOVEC which use GUP. If that is
+* not the case then we will wrongly call put_user_page() and the page
+* refcount will go wrong (this is in vhost_scsi_release_cmd())
+*/
+   WARN_ON(!iov_iter_get_pages_use_gup(iter));
+
bytes = iov_iter_get_pages(iter, pages, LONG_MAX,
VHOST_SCSI_PREALLOC_UPAGES, );
/* No pages were pinned */
@@ -681,7 +688,7 @@ vhost_scsi_iov_to_sgl(struct vhost_scsi_cmd *cmd, bool 
write,
while (p < sg) {
struct page *page = sg_page(p++);
if (page)
-   put_page(page);
+   put_user_page(page);
}
return ret;
}
-- 
2.22.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 06/12] fs/nfs: convert put_page() to put_user_page*()

2019-07-24 Thread john . hubbard
From: Jérôme Glisse 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Signed-off-by: Jérôme Glisse 
Signed-off-by: John Hubbard 
Cc: linux-fsde...@vger.kernel.org
Cc: linux-bl...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: Jan Kara 
Cc: Dan Williams 
Cc: Alexander Viro 
Cc: Johannes Thumshirn 
Cc: Christoph Hellwig 
Cc: Jens Axboe 
Cc: Ming Lei 
Cc: Dave Chinner 
Cc: Jason Gunthorpe 
Cc: Matthew Wilcox 
Cc: Boaz Harrosh 
Cc: Trond Myklebust 
Cc: Anna Schumaker 
---
 fs/nfs/direct.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 0cb442406168..35f30fe2900f 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -512,7 +512,10 @@ static ssize_t nfs_direct_read_schedule_iovec(struct 
nfs_direct_req *dreq,
pos += req_len;
dreq->bytes_left -= req_len;
}
-   nfs_direct_release_pages(pagevec, npages);
+   if (iov_iter_get_pages_use_gup(iter))
+   put_user_pages(pagevec, npages);
+   else
+   nfs_direct_release_pages(pagevec, npages);
kvfree(pagevec);
if (result < 0)
break;
@@ -935,7 +938,10 @@ static ssize_t nfs_direct_write_schedule_iovec(struct 
nfs_direct_req *dreq,
pos += req_len;
dreq->bytes_left -= req_len;
}
-   nfs_direct_release_pages(pagevec, npages);
+   if (iov_iter_get_pages_use_gup(iter))
+   put_user_pages(pagevec, npages);
+   else
+   nfs_direct_release_pages(pagevec, npages);
kvfree(pagevec);
if (result < 0)
break;
-- 
2.22.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 04/12] block: bio_release_pages: convert put_page() to put_user_page*()

2019-07-24 Thread john . hubbard
From: Jérôme Glisse 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Changes from Jérôme's original patch:
* reworked to be compatible with recent bio_release_pages() changes,
* refactored slightly to remove some code duplication,
* use an approach that changes fewer bio_check_pages_dirty()
  callers.

Signed-off-by: Jérôme Glisse 
Signed-off-by: John Hubbard 
Cc: Christoph Hellwig 
Cc: Minwoo Im 
Cc: Jens Axboe 
---
 block/bio.c | 60 -
 include/linux/bio.h |  1 +
 2 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 7675e2de509d..74f9eba2583b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -844,7 +844,11 @@ void bio_release_pages(struct bio *bio, enum 
bio_rp_flags_t flags)
bio_for_each_segment_all(bvec, bio, iter_all) {
if ((flags & BIO_RP_MARK_DIRTY) && !PageCompound(bvec->bv_page))
set_page_dirty_lock(bvec->bv_page);
-   put_page(bvec->bv_page);
+
+   if (flags & BIO_RP_FROM_GUP)
+   put_user_page(bvec->bv_page);
+   else
+   put_page(bvec->bv_page);
}
 }
 
@@ -1667,28 +1671,50 @@ static void bio_dirty_fn(struct work_struct *work);
 static DECLARE_WORK(bio_dirty_work, bio_dirty_fn);
 static DEFINE_SPINLOCK(bio_dirty_lock);
 static struct bio *bio_dirty_list;
+static struct bio *bio_gup_dirty_list;
 
-/*
- * This runs in process context
- */
-static void bio_dirty_fn(struct work_struct *work)
+static void __bio_dirty_fn(struct work_struct *work,
+  struct bio **dirty_list,
+  enum bio_rp_flags_t flags)
 {
struct bio *bio, *next;
 
spin_lock_irq(_dirty_lock);
-   next = bio_dirty_list;
-   bio_dirty_list = NULL;
+   next = *dirty_list;
+   *dirty_list = NULL;
spin_unlock_irq(_dirty_lock);
 
while ((bio = next) != NULL) {
next = bio->bi_private;
 
-   bio_release_pages(bio, BIO_RP_MARK_DIRTY);
+   bio_release_pages(bio, BIO_RP_MARK_DIRTY | flags);
bio_put(bio);
}
 }
 
-void bio_check_pages_dirty(struct bio *bio)
+/*
+ * This runs in process context
+ */
+static void bio_dirty_fn(struct work_struct *work)
+{
+   __bio_dirty_fn(work, _dirty_list, BIO_RP_NORMAL);
+   __bio_dirty_fn(work, _gup_dirty_list, BIO_RP_FROM_GUP);
+}
+
+/**
+ * __bio_check_pages_dirty() - queue up pages on a workqueue to dirty them
+ * @bio: the bio struct containing the pages we should dirty
+ * @from_gup: did the pages in the bio came from GUP (get_user_pages*())
+ *
+ * This will go over all pages in the bio, and for each non dirty page, the
+ * bio is added to a list of bio's that need to get their pages dirtied.
+ *
+ * We also need to know if the pages in the bio are coming from GUP or not,
+ * as GUPed pages need to be released via put_user_page(), instead of
+ * put_page(). Please see Documentation/vm/get_user_pages.rst for details
+ * on that.
+ */
+void __bio_check_pages_dirty(struct bio *bio, bool from_gup)
 {
struct bio_vec *bvec;
unsigned long flags;
@@ -1699,17 +1725,27 @@ void bio_check_pages_dirty(struct bio *bio)
goto defer;
}
 
-   bio_release_pages(bio, BIO_RP_NORMAL);
+   bio_release_pages(bio, from_gup ? BIO_RP_FROM_GUP : BIO_RP_NORMAL);
bio_put(bio);
return;
 defer:
spin_lock_irqsave(_dirty_lock, flags);
-   bio->bi_private = bio_dirty_list;
-   bio_dirty_list = bio;
+   if (from_gup) {
+   bio->bi_private = bio_gup_dirty_list;
+   bio_gup_dirty_list = bio;
+   } else {
+   bio->bi_private = bio_dirty_list;
+   bio_dirty_list = bio;
+   }
spin_unlock_irqrestore(_dirty_lock, flags);
schedule_work(_dirty_work);
 }
 
+void bio_check_pages_dirty(struct bio *bio)
+{
+   __bio_check_pages_dirty(bio, false);
+}
+
 void update_io_ticks(struct hd_struct *part, unsigned long now)
 {
unsigned long stamp;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 2715e55679c1..d68a40c2c9d4 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -444,6 +444,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter 
*iter);
 enum bio_rp_flags_t {
BIO_RP_NORMAL   = 0,
BIO_RP_MARK_DIRTY   = 1,
+   BIO_RP_FROM_GUP = 2,
 };
 
 static inline enum bio_rp_flags_t bio_rp_dirty_flag(bool mark_dirty)
-- 
2.22.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

[PATCH 05/12] block_dev: convert put_page() to put_user_page*()

2019-07-24 Thread john . hubbard
From: Jérôme Glisse 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Changes from Jérôme's original patch:

* reworked to be compatible with recent bio_release_pages() changes.

Signed-off-by: Jérôme Glisse 
Signed-off-by: John Hubbard 
Cc: linux-fsde...@vger.kernel.org
Cc: linux-bl...@vger.kernel.org
Cc: linux...@kvack.org
Cc: Jan Kara 
Cc: Dan Williams 
Cc: Alexander Viro 
Cc: Johannes Thumshirn 
Cc: Christoph Hellwig 
Cc: Jens Axboe 
Cc: Ming Lei 
Cc: Dave Chinner 
Cc: Jason Gunthorpe 
Cc: Matthew Wilcox 
Cc: Boaz Harrosh 
---
 block/bio.c | 13 +
 fs/block_dev.c  | 22 +-
 include/linux/bio.h |  8 
 3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 74f9eba2583b..3b9f66e64bc1 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1746,6 +1746,19 @@ void bio_check_pages_dirty(struct bio *bio)
__bio_check_pages_dirty(bio, false);
 }
 
+enum bio_rp_flags_t bio_rp_flags(struct iov_iter *iter, bool mark_dirty)
+{
+   enum bio_rp_flags_t flags = BIO_RP_NORMAL;
+
+   if (mark_dirty)
+   flags |= BIO_RP_MARK_DIRTY;
+
+   if (iov_iter_get_pages_use_gup(iter))
+   flags |= BIO_RP_FROM_GUP;
+
+   return flags;
+}
+
 void update_io_ticks(struct hd_struct *part, unsigned long now)
 {
unsigned long stamp;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9fe6616f8788..d53abaf31e54 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -259,7 +259,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct 
iov_iter *iter,
}
__set_current_state(TASK_RUNNING);
 
-   bio_release_pages(, bio_rp_dirty_flag(should_dirty));
+   bio_release_pages(, bio_rp_flags(iter, should_dirty));
if (unlikely(bio.bi_status))
ret = blk_status_to_errno(bio.bi_status);
 
@@ -295,7 +295,7 @@ static int blkdev_iopoll(struct kiocb *kiocb, bool wait)
return blk_poll(q, READ_ONCE(kiocb->ki_cookie), wait);
 }
 
-static void blkdev_bio_end_io(struct bio *bio)
+static void _blkdev_bio_end_io(struct bio *bio, bool from_gup)
 {
struct blkdev_dio *dio = bio->bi_private;
bool should_dirty = dio->should_dirty;
@@ -327,13 +327,23 @@ static void blkdev_bio_end_io(struct bio *bio)
}
 
if (should_dirty) {
-   bio_check_pages_dirty(bio);
+   __bio_check_pages_dirty(bio, from_gup);
} else {
-   bio_release_pages(bio, BIO_RP_NORMAL);
+   bio_release_pages(bio, bio_rp_gup_flag(from_gup));
bio_put(bio);
}
 }
 
+static void blkdev_bio_end_io(struct bio *bio)
+{
+   _blkdev_bio_end_io(bio, false);
+}
+
+static void blkdev_bio_from_gup_end_io(struct bio *bio)
+{
+   _blkdev_bio_end_io(bio, true);
+}
+
 static ssize_t
 __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 {
@@ -380,7 +390,9 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bio->bi_iter.bi_sector = pos >> 9;
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
-   bio->bi_end_io = blkdev_bio_end_io;
+   bio->bi_end_io = iov_iter_get_pages_use_gup(iter) ?
+blkdev_bio_from_gup_end_io :
+blkdev_bio_end_io;
bio->bi_ioprio = iocb->ki_ioprio;
 
ret = bio_iov_iter_get_pages(bio, iter);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d68a40c2c9d4..b9460d1a4679 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -452,6 +452,13 @@ static inline enum bio_rp_flags_t bio_rp_dirty_flag(bool 
mark_dirty)
return mark_dirty ? BIO_RP_MARK_DIRTY : BIO_RP_NORMAL;
 }
 
+static inline enum bio_rp_flags_t bio_rp_gup_flag(bool from_gup)
+{
+   return from_gup ? BIO_RP_FROM_GUP : BIO_RP_NORMAL;
+}
+
+enum bio_rp_flags_t bio_rp_flags(struct iov_iter *iter, bool mark_dirty);
+
 void bio_release_pages(struct bio *bio, enum bio_rp_flags_t flags);
 struct rq_map_data;
 extern struct bio *bio_map_user_iov(struct request_queue *,
@@ -463,6 +470,7 @@ extern struct bio *bio_copy_kern(struct request_queue *, 
void *, unsigned int,
 gfp_t, int);
 extern void bio_set_pages_dirty(struct bio *bio);
 extern void bio_check_pages_dirty(struct bio *bio);
+void __bio_check_pages_dirty(struct bio *bio, bool from_gup);
 
 void generic_start_io_acct(struct request_queue *q, int op,
unsigned long sectors, struct hd_struct *part);
-- 
2.22.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

[PATCH 03/12] block: bio_release_pages: use flags arg instead of bool

2019-07-24 Thread john . hubbard
From: John Hubbard 

In commit d241a95f3514 ("block: optionally mark pages dirty in
bio_release_pages"), new "bool mark_dirty" argument was added to
bio_release_pages.

In upcoming work, another bool argument (to indicate that the pages came
from get_user_pages) is going to be added. That's one bool too many,
because it's not desirable have calls of the form:

foo(true, false, true, etc);

In order to prepare for that, change the argument from a bool, to a
typesafe (enum-based) flags argument.

Cc: Christoph Hellwig 
Cc: Jérôme Glisse 
Cc: Minwoo Im 
Cc: Jens Axboe 
Signed-off-by: John Hubbard 
---
 block/bio.c | 12 ++--
 fs/block_dev.c  |  4 ++--
 fs/direct-io.c  |  2 +-
 include/linux/bio.h | 13 -
 4 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 299a0e7651ec..7675e2de509d 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -833,7 +833,7 @@ int bio_add_page(struct bio *bio, struct page *page,
 }
 EXPORT_SYMBOL(bio_add_page);
 
-void bio_release_pages(struct bio *bio, bool mark_dirty)
+void bio_release_pages(struct bio *bio, enum bio_rp_flags_t flags)
 {
struct bvec_iter_all iter_all;
struct bio_vec *bvec;
@@ -842,7 +842,7 @@ void bio_release_pages(struct bio *bio, bool mark_dirty)
return;
 
bio_for_each_segment_all(bvec, bio, iter_all) {
-   if (mark_dirty && !PageCompound(bvec->bv_page))
+   if ((flags & BIO_RP_MARK_DIRTY) && !PageCompound(bvec->bv_page))
set_page_dirty_lock(bvec->bv_page);
put_page(bvec->bv_page);
}
@@ -1421,7 +1421,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
return bio;
 
  out_unmap:
-   bio_release_pages(bio, false);
+   bio_release_pages(bio, BIO_RP_NORMAL);
bio_put(bio);
return ERR_PTR(ret);
 }
@@ -1437,7 +1437,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
  */
 void bio_unmap_user(struct bio *bio)
 {
-   bio_release_pages(bio, bio_data_dir(bio) == READ);
+   bio_release_pages(bio, bio_rp_dirty_flag(bio_data_dir(bio) == READ));
bio_put(bio);
bio_put(bio);
 }
@@ -1683,7 +1683,7 @@ static void bio_dirty_fn(struct work_struct *work)
while ((bio = next) != NULL) {
next = bio->bi_private;
 
-   bio_release_pages(bio, true);
+   bio_release_pages(bio, BIO_RP_MARK_DIRTY);
bio_put(bio);
}
 }
@@ -1699,7 +1699,7 @@ void bio_check_pages_dirty(struct bio *bio)
goto defer;
}
 
-   bio_release_pages(bio, false);
+   bio_release_pages(bio, BIO_RP_NORMAL);
bio_put(bio);
return;
 defer:
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 4707dfff991b..9fe6616f8788 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -259,7 +259,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct 
iov_iter *iter,
}
__set_current_state(TASK_RUNNING);
 
-   bio_release_pages(, should_dirty);
+   bio_release_pages(, bio_rp_dirty_flag(should_dirty));
if (unlikely(bio.bi_status))
ret = blk_status_to_errno(bio.bi_status);
 
@@ -329,7 +329,7 @@ static void blkdev_bio_end_io(struct bio *bio)
if (should_dirty) {
bio_check_pages_dirty(bio);
} else {
-   bio_release_pages(bio, false);
+   bio_release_pages(bio, BIO_RP_NORMAL);
bio_put(bio);
}
 }
diff --git a/fs/direct-io.c b/fs/direct-io.c
index ae196784f487..423ef431ddda 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -551,7 +551,7 @@ static blk_status_t dio_bio_complete(struct dio *dio, 
struct bio *bio)
if (dio->is_async && should_dirty) {
bio_check_pages_dirty(bio); /* transfers ownership */
} else {
-   bio_release_pages(bio, should_dirty);
+   bio_release_pages(bio, bio_rp_dirty_flag(should_dirty));
bio_put(bio);
}
return err;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 3cdb84cdc488..2715e55679c1 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -440,7 +440,18 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
*page,
 void __bio_add_page(struct bio *bio, struct page *page,
unsigned int len, unsigned int off);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
-void bio_release_pages(struct bio *bio, bool mark_dirty);
+
+enum bio_rp_flags_t {
+   BIO_RP_NORMAL   = 0,
+   BIO_RP_MARK_DIRTY   = 1,
+};
+
+static inline enum bio_rp_flags_t bio_rp_dirty_flag(bool mark_dirty)
+{
+   return mark_dirty ? BIO_RP_MARK_DIRTY : BIO_RP_NORMAL;
+}
+
+void bio_release_pages(struct bio *bio, enum bio_rp_flags_t flags);
 struct rq_map_data;
 extern struct bio *bio_map_user_iov(struct request_queue *,
struct iov_iter *, gfp_t);
-- 

[PATCH 02/12] iov_iter: add helper to test if an iter would use GUP v2

2019-07-24 Thread john . hubbard
From: Jérôme Glisse 

Add a helper to test if call to iov_iter_get_pages*() with a given
iter would result in calls to GUP (get_user_pages*()). We want to
use different tracking of page references if they are coming from
GUP (get_user_pages*()) and thus  we need to know when GUP is used
for a given iter.

Changes since Jérôme's original patch:

* iov_iter_get_pages_use_gup(): do not return true for the ITER_PIPE
case, because iov_iter_get_pages() calls pipe_get_pages(), which in
turn uses get_page(), not get_user_pages().

* Remove some obsolete code, as part of rebasing onto Linux 5.3.

* Fix up the kerneldoc comment to "Return:" rather than "Returns:",
and a few other grammatical tweaks.

Signed-off-by: Jérôme Glisse 
Signed-off-by: John Hubbard 
Cc: linux-fsde...@vger.kernel.org
Cc: linux-bl...@vger.kernel.org
Cc: linux...@kvack.org
Cc: John Hubbard 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: Alexander Viro 
Cc: Johannes Thumshirn 
Cc: Christoph Hellwig 
Cc: Jens Axboe 
Cc: Ming Lei 
Cc: Dave Chinner 
Cc: Jason Gunthorpe 
Cc: Matthew Wilcox 
---
 include/linux/uio.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index ab5f523bc0df..2a179af8e5a7 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -86,6 +86,17 @@ static inline unsigned char iov_iter_rw(const struct 
iov_iter *i)
return i->type & (READ | WRITE);
 }
 
+/**
+ * iov_iter_get_pages_use_gup - report if iov_iter_get_pages(i) uses GUP
+ * @i: iterator
+ * Return: true if a call to iov_iter_get_pages*() with the iter provided in
+ *  the argument would result in the use of get_user_pages*()
+ */
+static inline bool iov_iter_get_pages_use_gup(const struct iov_iter *i)
+{
+   return iov_iter_type(i) == ITER_IOVEC;
+}
+
 /*
  * Total number of bytes covered by an iovec.
  *
-- 
2.22.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 01/12] mm/gup: add make_dirty arg to put_user_pages_dirty_lock()

2019-07-24 Thread john . hubbard
From: John Hubbard 

Provide more capable variation of put_user_pages_dirty_lock(),
and delete put_user_pages_dirty(). This is based on the
following:

1. Lots of call sites become simpler if a bool is passed
into put_user_page*(), instead of making the call site
choose which put_user_page*() variant to call.

2. Christoph Hellwig's observation that set_page_dirty_lock()
is usually correct, and set_page_dirty() is usually a
bug, or at least questionable, within a put_user_page*()
calling chain.

This leads to the following API choices:

* put_user_pages_dirty_lock(page, npages, make_dirty)

* There is no put_user_pages_dirty(). You have to
  hand code that, in the rare case that it's
  required.

Cc: Matthew Wilcox 
Cc: Jan Kara 
Cc: Christoph Hellwig 
Cc: Ira Weiny 
Cc: Jason Gunthorpe 
Signed-off-by: John Hubbard 
---
 drivers/infiniband/core/umem.c |   5 +-
 drivers/infiniband/hw/hfi1/user_pages.c|   5 +-
 drivers/infiniband/hw/qib/qib_user_pages.c |   5 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c   |   5 +-
 drivers/infiniband/sw/siw/siw_mem.c|   8 +-
 include/linux/mm.h |   5 +-
 mm/gup.c   | 115 +
 7 files changed, 58 insertions(+), 90 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 08da840ed7ee..965cf9dea71a 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -54,10 +54,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
 
for_each_sg_page(umem->sg_head.sgl, _iter, umem->sg_nents, 0) {
page = sg_page_iter_page(_iter);
-   if (umem->writable && dirty)
-   put_user_pages_dirty_lock(, 1);
-   else
-   put_user_page(page);
+   put_user_pages_dirty_lock(, 1, umem->writable && dirty);
}
 
sg_free_table(>sg_head);
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index b89a9b9aef7a..469acb961fbd 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -118,10 +118,7 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned 
long vaddr, size_t np
 void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
 size_t npages, bool dirty)
 {
-   if (dirty)
-   put_user_pages_dirty_lock(p, npages);
-   else
-   put_user_pages(p, npages);
+   put_user_pages_dirty_lock(p, npages, dirty);
 
if (mm) { /* during close after signal, mm can be NULL */
atomic64_sub(npages, >pinned_vm);
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c 
b/drivers/infiniband/hw/qib/qib_user_pages.c
index bfbfbb7e0ff4..6bf764e41891 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -40,10 +40,7 @@
 static void __qib_release_user_pages(struct page **p, size_t num_pages,
 int dirty)
 {
-   if (dirty)
-   put_user_pages_dirty_lock(p, num_pages);
-   else
-   put_user_pages(p, num_pages);
+   put_user_pages_dirty_lock(p, num_pages, dirty);
 }
 
 /**
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c 
b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 0b0237d41613..62e6ffa9ad78 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -75,10 +75,7 @@ static void usnic_uiom_put_pages(struct list_head 
*chunk_list, int dirty)
for_each_sg(chunk->page_list, sg, chunk->nents, i) {
page = sg_page(sg);
pa = sg_phys(sg);
-   if (dirty)
-   put_user_pages_dirty_lock(, 1);
-   else
-   put_user_page(page);
+   put_user_pages_dirty_lock(, 1, dirty);
usnic_dbg("pa: %pa\n", );
}
kfree(chunk);
diff --git a/drivers/infiniband/sw/siw/siw_mem.c 
b/drivers/infiniband/sw/siw/siw_mem.c
index 67171c82b0c4..358d440efa11 100644
--- a/drivers/infiniband/sw/siw/siw_mem.c
+++ b/drivers/infiniband/sw/siw/siw_mem.c
@@ -65,13 +65,7 @@ static void siw_free_plist(struct siw_page_chunk *chunk, int 
num_pages,
 {
struct page **p = chunk->plist;
 
-   while (num_pages--) {
-   if (!PageDirty(*p) && dirty)
-   put_user_pages_dirty_lock(p, 1);
-   else
-   put_user_page(*p);
-   p++;
-   }
+   put_user_pages_dirty_lock(chunk->plist, num_pages, dirty);
 }
 
 void siw_umem_release(struct siw_umem *umem, bool dirty)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0334ca97c584..9759b6a24420 100644
--- a/include/linux/mm.h
+++ 

[PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()

2019-07-24 Thread john . hubbard
From: John Hubbard 

Hi,

This is mostly Jerome's work, converting the block/bio and related areas
to call put_user_page*() instead of put_page(). Because I've changed
Jerome's patches, in some cases significantly, I'd like to get his
feedback before we actually leave him listed as the author (he might
want to disown some or all of these).

I added a new patch, in order to make this work with Christoph Hellwig's
recent overhaul to bio_release_pages(): "block: bio_release_pages: use
flags arg instead of bool".

I've started the series with a patch that I've posted in another
series ("mm/gup: add make_dirty arg to put_user_pages_dirty_lock()"[1]),
because I'm not sure which of these will go in first, and this allows each
to stand alone.

Testing: not much beyond build and boot testing has been done yet. And
I'm not set up to even exercise all of it (especially the IB parts) at
run time.

Anyway, changes here are:

* Store, in the iov_iter, a "came from gup (get_user_pages)" parameter.
  Then, use the new iov_iter_get_pages_use_gup() to retrieve it when
  it is time to release the pages. That allows choosing between put_page()
  and put_user_page*().

* Pass in one more piece of information to bio_release_pages: a "from_gup"
  parameter. Similar use as above.

* Change the block layer, and several file systems, to use
  put_user_page*().

[1] https://lore.kernel.org/r/20190724012606.25844-2-jhubb...@nvidia.com
And please note the correction email that I posted as a follow-up,
if you're looking closely at that patch. :) The fixed version is
included here.

John Hubbard (3):
  mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
  block: bio_release_pages: use flags arg instead of bool
  fs/ceph: fix a build warning: returning a value from void function

Jérôme Glisse (9):
  iov_iter: add helper to test if an iter would use GUP v2
  block: bio_release_pages: convert put_page() to put_user_page*()
  block_dev: convert put_page() to put_user_page*()
  fs/nfs: convert put_page() to put_user_page*()
  vhost-scsi: convert put_page() to put_user_page*()
  fs/cifs: convert put_page() to put_user_page*()
  fs/fuse: convert put_page() to put_user_page*()
  fs/ceph: convert put_page() to put_user_page*()
  9p/net: convert put_page() to put_user_page*()

 block/bio.c|  81 ---
 drivers/infiniband/core/umem.c |   5 +-
 drivers/infiniband/hw/hfi1/user_pages.c|   5 +-
 drivers/infiniband/hw/qib/qib_user_pages.c |   5 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c   |   5 +-
 drivers/infiniband/sw/siw/siw_mem.c|   8 +-
 drivers/vhost/scsi.c   |  13 ++-
 fs/block_dev.c |  22 +++-
 fs/ceph/debugfs.c  |   2 +-
 fs/ceph/file.c |  62 ---
 fs/cifs/cifsglob.h |   3 +
 fs/cifs/file.c |  22 +++-
 fs/cifs/misc.c |  19 +++-
 fs/direct-io.c |   2 +-
 fs/fuse/dev.c  |  22 +++-
 fs/fuse/file.c |  53 +++---
 fs/nfs/direct.c|  10 +-
 include/linux/bio.h|  22 +++-
 include/linux/mm.h |   5 +-
 include/linux/uio.h|  11 ++
 mm/gup.c   | 115 +
 net/9p/trans_common.c  |  14 ++-
 net/9p/trans_common.h  |   3 +-
 net/9p/trans_virtio.c  |  18 +++-
 24 files changed, 357 insertions(+), 170 deletions(-)

-- 
2.22.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] MAINTAINERS: Update my email address

2019-07-24 Thread Will Deacon
On Mon, Jul 22, 2019 at 02:44:40PM +0100, Jean-Philippe Brucker wrote:
> Update MAINTAINERS and .mailmap with my @linaro.org address, since I
> don't have access to my @arm.com address anymore.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  .mailmap| 1 +
>  MAINTAINERS | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/.mailmap b/.mailmap
> index 0fef932de3db..8ce554b9c9f1 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -98,6 +98,7 @@ Jason Gunthorpe  
> 
>  Javi Merino  
>   
>  Jean Tourrilhes 
> + 
>  Jeff Garzik 
>  Jeff Layton  
>  Jeff Layton  
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 783569e3c4b4..bded78c84701 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -17123,7 +17123,7 @@ F:drivers/virtio/virtio_input.c
>  F:   include/uapi/linux/virtio_input.h
>  
>  VIRTIO IOMMU DRIVER
> -M:   Jean-Philippe Brucker 
> +M:   Jean-Philippe Brucker 
>  L:   virtualization@lists.linux-foundation.org
>  S:   Maintained
>  F:   drivers/iommu/virtio-iommu.c

Thanks (and your new address is easier to remember ;). I can take this one
via arm64, since I already have a bunch of MAINTAINERS updates queued for
-rc2.

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Reminder: 2 open syzbot bugs in vhost subsystem

2019-07-24 Thread Hillf Danton


On Tue, 2 Jul 2019 13:30:07 +0800 Jason Wang wrote:
> On 2019/7/2 Eric Biggers wrote:
> > [This email was generated by a script.  Let me know if you have any 
> > suggestions
> > to make it better, or if you want it re-generated with the latest status.]
> >
> > Of the currently open syzbot reports against the upstream kernel, I've 
> > manually
> > marked 2 of them as possibly being bugs in the vhost subsystem.  I've listed
> > these reports below, sorted by an algorithm that tries to list first the 
> > reports
> > most likely to be still valid, important, and actionable.
> >
> > Of these 2 bugs, 1 was seen in mainline in the last week.
> >
> > If you believe a bug is no longer valid, please close the syzbot report by
> > sending a '#syz fix', '#syz dup', or '#syz invalid' command in reply to the
> > original thread, as explained at https://goo.gl/tpsmEJ#status
> >
> > If you believe I misattributed a bug to the vhost subsystem, please let me 
> > know,
> > and if possible forward the report to the correct people or mailing list.
> >
> > Here are the bugs:
> >
> > 
> > Title:  memory leak in vhost_net_ioctl
> > Last occurred:  0 days ago
> > Reported:   26 days ago
> > Branches:   Mainline
> > Dashboard link: 
> > https://syzkaller.appspot.com/bug?id=3D12ba349d7e26ccfe95317bc376e812ebbae2ee0f
> > Original thread:
> > https://lkml.kernel.org/lkml/188da1058a9c2...@google.com/T/#u
> >
> > This bug has a C reproducer.
> >
> > The original thread for this bug has received 4 replies; the last was 17 
> > days
> > ago.
> >
> > If you fix this bug, please add the following tag to the commit:
> >  Reported-by: syzbot+0789f0c7e45efd7bb...@syzkaller.appspotmail.com
> >
> > If you send any email or patch for this bug, please consider replying to the
> > original thread.  For the git send-email command to use, or tips on how to 
> > reply
> > if the thread isn't in your mailbox, see the "Reply instructions" at
> > https://lkml.kernel.org/r/188da1058a9c2...@google.com
> > 
> Cc Hillf who should had a fix for this.
> 
It could not be a fix in any form without the great idea you shared, Jason:)
while reviewing the first version.

> Hillf, would you please post a formal patch for this? (for -net)
> 
And feel free to do this thing appropriate or that thing for fixing the
reported memory leak before I can earn a Tested-by.

--
Hillf

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: memory leak in vhost_net_ioctl

2019-07-24 Thread Hillf Danton


Hello Syzbot

On Fri, 14 Jun 2019 11:04:03 +0800 syzbot wrote:
>
>Hello,
>
>syzbot has tested the proposed patch but the reproducer still triggered crash:
>memory leak in batadv_tvlv_handler_register
>
>   484.626788][  T156] bond0 (unregistering): Releasing backup interface 
> bond_slave_1
>Warning: Permanently added '10.128.0.87' (ECDSA) to the list of known hosts.
>BUG: memory leak
>unreferenced object 0x88811d25c4c0 (size 64):
>   comm "softirq", pid 0, jiffies 4294943668 (age 434.830s)
>   hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 e0 fc 5b 20 81 88 ff ff  ..[ 
> 00 00 00 00 00 00 00 00 20 91 15 83 ff ff ff ff   ...
>   backtrace:
> [<0045bc9d>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 
> [inline]
> [<0045bc9d>] slab_post_alloc_hook mm/slab.h:439 [inline]
> [<0045bc9d>] slab_alloc mm/slab.c:3326 [inline]
> [<0045bc9d>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
> [<197d773e>] kmalloc include/linux/slab.h:547 [inline]
> [<197d773e>] kzalloc include/linux/slab.h:742 [inline]
> [<197d773e>] batadv_tvlv_handler_register+0xae/0x140 
> net/batman-adv/tvlv.c:529
> [] batadv_tt_init+0x78/0x180 
> net/batman-adv/translation-table.c:4411
> [<8c50839d>] batadv_mesh_init+0x196/0x230 
> net/batman-adv/main.c:208
> [<1c5a74a3>] batadv_softif_init_late+0x1ca/0x220 
> net/batman-adv/soft-interface.c:861
> [<4e676cd1>] register_netdevice+0xbf/0x600 net/core/dev.c:8635
> [<5601497b>] __rtnl_newlink+0xaca/0xb30 net/core/rtnetlink.c:3199
> [] rtnl_newlink+0x4e/0x80 net/core/rtnetlink.c:3245
> [] rtnetlink_rcv_msg+0x178/0x4b0 
> net/core/rtnetlink.c:5214
> [<140451f6>] netlink_rcv_skb+0x61/0x170 
> net/netlink/af_netlink.c:2482
> [<237e38f7>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5232
> [<0d47c000>] netlink_unicast_kernel net/netlink/af_netlink.c:1307 
> [inline]
> [<0d47c000>] netlink_unicast+0x1ec/0x2d0 
> net/netlink/af_netlink.c:1333
> [<98503d79>] netlink_sendmsg+0x26a/0x480 
> net/netlink/af_netlink.c:1922
> [<9263e868>] sock_sendmsg_nosec net/socket.c:646 [inline]
> [<9263e868>] sock_sendmsg+0x54/0x70 net/socket.c:665
> [<7791ad47>] __sys_sendto+0x148/0x1f0 net/socket.c:1958
> [] __do_sys_sendto net/socket.c:1970 [inline]
> [] __se_sys_sendto net/socket.c:1966 [inline]
> [] __x64_sys_sendto+0x2a/0x30 net/socket.c:1966
>
>BUG: memory leak
>unreferenced object 0x8881024a3340 (size 64):
>   comm "softirq", pid 0, jiffies 4294943678 (age 434.730s)
>   hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 e0 2c 66 04 81 88 ff ff  .,f.
> 00 00 00 00 00 00 00 00 20 91 15 83 ff ff ff ff   ...
>   backtrace:
> [<0045bc9d>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 
> [inline]
> [<0045bc9d>] slab_post_alloc_hook mm/slab.h:439 [inline]
> [<0045bc9d>] slab_alloc mm/slab.c:3326 [inline]
> [<0045bc9d>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
> [<197d773e>] kmalloc include/linux/slab.h:547 [inline]
> [<197d773e>] kzalloc include/linux/slab.h:742 [inline]
> [<197d773e>] batadv_tvlv_handler_register+0xae/0x140 
> net/batman-adv/tvlv.c:529
> [] batadv_tt_init+0x78/0x180 
> net/batman-adv/translation-table.c:4411
> [<8c50839d>] batadv_mesh_init+0x196/0x230 
> net/batman-adv/main.c:208
> [<1c5a74a3>] batadv_softif_init_late+0x1ca/0x220 
> net/batman-adv/soft-interface.c:861
> [<4e676cd1>] register_netdevice+0xbf/0x600 net/core/dev.c:8635
> [<5601497b>] __rtnl_newlink+0xaca/0xb30 net/core/rtnetlink.c:3199
> [] rtnl_newlink+0x4e/0x80 net/core/rtnetlink.c:3245
> [] rtnetlink_rcv_msg+0x178/0x4b0 
> net/core/rtnetlink.c:5214
> [<140451f6>] netlink_rcv_skb+0x61/0x170 
> net/netlink/af_netlink.c:2482
> [<237e38f7>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5232
> [<0d47c000>] netlink_unicast_kernel net/netlink/af_netlink.c:1307 
> [inline]
> [<0d47c000>] netlink_unicast+0x1ec/0x2d0 
> net/netlink/af_netlink.c:1333
> [<98503d79>] netlink_sendmsg+0x26a/0x480 
> net/netlink/af_netlink.c:1922
> [<9263e868>] sock_sendmsg_nosec net/socket.c:646 [inline]
> [<9263e868>] sock_sendmsg+0x54/0x70 net/socket.c:665
> [<7791ad47>] __sys_sendto+0x148/0x1f0 net/socket.c:1958
> [] __do_sys_sendto net/socket.c:1970 [inline]
> [] __se_sys_sendto net/socket.c:1966 [inline]
> [] __x64_sys_sendto+0x2a/0x30 net/socket.c:1966
>

[PATCH trivial] mm/balloon_compaction: Grammar s/the its/its/

2019-07-24 Thread Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven 
---
 mm/balloon_compaction.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index ba739b76e6c52e55..17ac81d8d26bcb50 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -60,7 +60,7 @@ EXPORT_SYMBOL_GPL(balloon_page_enqueue);
 
 /*
  * balloon_page_dequeue - removes a page from balloon's page list and returns
- *   the its address to allow the driver release the page.
+ *   its address to allow the driver to release the page.
  * @b_dev_info: balloon device decriptor where we will grab a page from.
  *
  * Driver must call it to properly de-allocate a previous enlisted balloon page
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: memory leak in vhost_net_ioctl

2019-07-24 Thread Hillf Danton


Hello Syzbot

On Fri, 14 Jun 2019 11:04:03 +0800 syzbot wrote:
>
>Hello,
>
>syzbot has tested the proposed patch but the reproducer still triggered crash:
>memory leak in batadv_tvlv_handler_register
>
It is not ubuf leak which is addressed in this thread. Good news.
I will see this new leak soon.

>   484.626788][  T156] bond0 (unregistering): Releasing backup interface 
> bond_slave_1
>Warning: Permanently added '10.128.0.87' (ECDSA) to the list of known hosts.
>BUG: memory leak
>unreferenced object 0x88811d25c4c0 (size 64):
>   comm "softirq", pid 0, jiffies 4294943668 (age 434.830s)
>   hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 e0 fc 5b 20 81 88 ff ff  ..[ 
> 00 00 00 00 00 00 00 00 20 91 15 83 ff ff ff ff   ...
>   backtrace:
> [<0045bc9d>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 
> [inline]
> [<0045bc9d>] slab_post_alloc_hook mm/slab.h:439 [inline]
> [<0045bc9d>] slab_alloc mm/slab.c:3326 [inline]
> [<0045bc9d>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
> [<197d773e>] kmalloc include/linux/slab.h:547 [inline]
> [<197d773e>] kzalloc include/linux/slab.h:742 [inline]
> [<197d773e>] batadv_tvlv_handler_register+0xae/0x140 
> net/batman-adv/tvlv.c:529
> [] batadv_tt_init+0x78/0x180 
> net/batman-adv/translation-table.c:4411
> [<8c50839d>] batadv_mesh_init+0x196/0x230 
> net/batman-adv/main.c:208
> [<1c5a74a3>] batadv_softif_init_late+0x1ca/0x220 
> net/batman-adv/soft-interface.c:861
> [<4e676cd1>] register_netdevice+0xbf/0x600 net/core/dev.c:8635
> [<5601497b>] __rtnl_newlink+0xaca/0xb30 net/core/rtnetlink.c:3199
> [] rtnl_newlink+0x4e/0x80 net/core/rtnetlink.c:3245
> [] rtnetlink_rcv_msg+0x178/0x4b0 
> net/core/rtnetlink.c:5214
> [<140451f6>] netlink_rcv_skb+0x61/0x170 
> net/netlink/af_netlink.c:2482
> [<237e38f7>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5232
> [<0d47c000>] netlink_unicast_kernel net/netlink/af_netlink.c:1307 
> [inline]
> [<0d47c000>] netlink_unicast+0x1ec/0x2d0 
> net/netlink/af_netlink.c:1333
> [<98503d79>] netlink_sendmsg+0x26a/0x480 
> net/netlink/af_netlink.c:1922
> [<9263e868>] sock_sendmsg_nosec net/socket.c:646 [inline]
> [<9263e868>] sock_sendmsg+0x54/0x70 net/socket.c:665
> [<7791ad47>] __sys_sendto+0x148/0x1f0 net/socket.c:1958
> [] __do_sys_sendto net/socket.c:1970 [inline]
> [] __se_sys_sendto net/socket.c:1966 [inline]
> [] __x64_sys_sendto+0x2a/0x30 net/socket.c:1966
>
>BUG: memory leak
>unreferenced object 0x8881024a3340 (size 64):
>   comm "softirq", pid 0, jiffies 4294943678 (age 434.730s)
>   hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 e0 2c 66 04 81 88 ff ff  .,f.
> 00 00 00 00 00 00 00 00 20 91 15 83 ff ff ff ff   ...
>   backtrace:
> [<0045bc9d>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 
> [inline]
> [<0045bc9d>] slab_post_alloc_hook mm/slab.h:439 [inline]
> [<0045bc9d>] slab_alloc mm/slab.c:3326 [inline]
> [<0045bc9d>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
> [<197d773e>] kmalloc include/linux/slab.h:547 [inline]
> [<197d773e>] kzalloc include/linux/slab.h:742 [inline]
> [<197d773e>] batadv_tvlv_handler_register+0xae/0x140 
> net/batman-adv/tvlv.c:529
> [] batadv_tt_init+0x78/0x180 
> net/batman-adv/translation-table.c:4411
> [<8c50839d>] batadv_mesh_init+0x196/0x230 
> net/batman-adv/main.c:208
> [<1c5a74a3>] batadv_softif_init_late+0x1ca/0x220 
> net/batman-adv/soft-interface.c:861
> [<4e676cd1>] register_netdevice+0xbf/0x600 net/core/dev.c:8635
> [<5601497b>] __rtnl_newlink+0xaca/0xb30 net/core/rtnetlink.c:3199
> [] rtnl_newlink+0x4e/0x80 net/core/rtnetlink.c:3245
> [] rtnetlink_rcv_msg+0x178/0x4b0 
> net/core/rtnetlink.c:5214
> [<140451f6>] netlink_rcv_skb+0x61/0x170 
> net/netlink/af_netlink.c:2482
> [<237e38f7>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5232
> [<0d47c000>] netlink_unicast_kernel net/netlink/af_netlink.c:1307 
> [inline]
> [<0d47c000>] netlink_unicast+0x1ec/0x2d0 
> net/netlink/af_netlink.c:1333
> [<98503d79>] netlink_sendmsg+0x26a/0x480 
> net/netlink/af_netlink.c:1922
> [<9263e868>] sock_sendmsg_nosec net/socket.c:646 [inline]
> [<9263e868>] sock_sendmsg+0x54/0x70 net/socket.c:665
> [<7791ad47>] __sys_sendto+0x148/0x1f0 net/socket.c:1958
> [] __do_sys_sendto net/socket.c:1970 [inline]
> [] __se_sys_sendto 

Re: memory leak in vhost_net_ioctl

2019-07-24 Thread Hillf Danton


Hello Syzbot

On Fri, 14 Jun 2019 02:26:02 +0800 syzbot wrote:
>
> Hello,
>
> syzbot has tested the proposed patch but the reproducer still triggered crash:
> memory leak in vhost_net_ioctl
>
Oh sorry for my poor patch.

> ANGE): hsr_slave_1: link becomes ready
> 2019/06/13 18:24:57 executed programs: 18
> BUG: memory leak
> unreferenced object 0x88811cbc6ac0 (size 64):
>comm "syz-executor.0", pid 7196, jiffies 4294943804 (age 14.770s)
>hex dump (first 32 bytes):
>  01 00 00 00 81 88 ff ff 00 00 00 00 82 88 ff ff  
>  d0 6a bc 1c 81 88 ff ff d0 6a bc 1c 81 88 ff ff  .j...j..
>backtrace:
>  [<6c752978>] kmemleak_alloc_recursive 
> include/linux/kmemleak.h:43 [inline]
>  [<6c752978>] slab_post_alloc_hook mm/slab.h:439 [inline]
>  [<6c752978>] slab_alloc mm/slab.c:3326 [inline]
>  [<6c752978>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
>  [] kmalloc include/linux/slab.h:547 [inline]
>  [] vhost_net_ubuf_alloc drivers/vhost/net.c:241 
> [inline]
>  [] vhost_net_set_backend drivers/vhost/net.c:1535 
> [inline]
>  [] vhost_net_ioctl+0xb43/0xc10 drivers/vhost/net.c:1717
>  [<700f02d7>] vfs_ioctl fs/ioctl.c:46 [inline]
>  [<700f02d7>] file_ioctl fs/ioctl.c:509 [inline]
>  [<700f02d7>] do_vfs_ioctl+0x62a/0x810 fs/ioctl.c:696
>  [<9a0ec0a7>] ksys_ioctl+0x86/0xb0 fs/ioctl.c:713
>  [] __do_sys_ioctl fs/ioctl.c:720 [inline]
>  [] __se_sys_ioctl fs/ioctl.c:718 [inline]
>  [] __x64_sys_ioctl+0x1e/0x30 fs/ioctl.c:718
>  [] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
>  [<8715c149>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> BUG: memory leak
> unreferenced object 0x88810b1365c0 (size 64):
>comm "syz-executor.2", pid 7193, jiffies 4294943823 (age 14.580s)
>hex dump (first 32 bytes):
>  01 00 00 00 81 88 ff ff 00 00 00 00 81 88 ff ff  
>  d0 65 13 0b 81 88 ff ff d0 65 13 0b 81 88 ff ff  .e...e..
>backtrace:
>  [<6c752978>] kmemleak_alloc_recursive 
> include/linux/kmemleak.h:43 [inline]
>  [<6c752978>] slab_post_alloc_hook mm/slab.h:439 [inline]
>  [<6c752978>] slab_alloc mm/slab.c:3326 [inline]
>
>  [] kmalloc include/linux/slab.h:547 [inline]
>  [] vhost_net_ubuf_alloc drivers/vhost/net.c:241 
> [inline]
>  [] vhost_net_set_backend drivers/vhost/net.c:1535 
> [inline]
>  [] vhost_net_ioctl+0xb43/0xc10 drivers/vhost/net.c:1717
>  [<700f02d7>] vfs_ioctl fs/ioctl.c:46 [inline]
>  [<700f02d7>] file_ioctl fs/ioctl.c:509 [inline]
>  [<700f02d7>] do_vfs_ioctl+0x62a/0x810 fs/ioctl.c:696
>  [<9a0ec0a7>] ksys_ioctl+0x86/0xb0 fs/ioctl.c:713
>  [] __do_sys_ioctl fs/ioctl.c:720 [inline]
>  [] __se_sys_ioctl fs/ioctl.c:718 [inline]
>  [] __x64_sys_ioctl+0x1e/0x30 fs/ioctl.c:718
>  [] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
>  [<8715c149>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> BUG: memory leak
> unreferenced object 0x88810be23700 (size 64):
>comm "syz-executor.3", pid 7194, jiffies 4294943823 (age 14.580s)
>hex dump (first 32 bytes):
>  01 00 00 00 00 00 00 00 00 00 00 00 00 c9 ff ff  
>  10 37 e2 0b 81 88 ff ff 10 37 e2 0b 81 88 ff ff  .7...7..
>backtrace:
>  [<6c752978>] kmemleak_alloc_recursive 
> include/linux/kmemleak.h:43 [inline]
>  [<6c752978>] slab_post_alloc_hook mm/slab.h:439 [inline]
>  [<6c752978>] slab_alloc mm/slab.c:3326 [inline]
>  [<6c752978>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
>  [] kmalloc include/linux/slab.h:547 [inline]
>  [] vhost_net_ubuf_alloc drivers/vhost/net.c:241 
> [inline]
>  [] vhost_net_set_backend drivers/vhost/net.c:1535 
> [inline]
>  [] vhost_net_ioctl+0xb43/0xc10 drivers/vhost/net.c:1717
>  [<700f02d7>] vfs_ioctl fs/ioctl.c:46 [inline]
>  [<700f02d7>] file_ioctl fs/ioctl.c:509 [inline]
>  [<700f02d7>] do_vfs_ioctl+0x62a/0x810 fs/ioctl.c:696
>  [<9a0ec0a7>] ksys_ioctl+0x86/0xb0 fs/ioctl.c:713
>  [] __do_sys_ioctl fs/ioctl.c:720 [inline]
>  [] __se_sys_ioctl fs/ioctl.c:718 [inline]
>  [] __x64_sys_ioctl+0x1e/0x30 fs/ioctl.c:718
>  [] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
>  [<8715c149>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> BUG: memory leak
> unreferenced 

Re: memory leak in vhost_net_ioctl

2019-07-24 Thread Hillf Danton

Hello Dmitry

On Thu, 13 Jun 2019 20:12:06 +0800 Dmitry Vyukov wrote:
> On Thu, Jun 13, 2019 at 2:07 PM Hillf Danton  wrote:
> >
> > Hello Jason
> >
> > On Thu, 13 Jun 2019 17:10:39 +0800 Jason Wang wrote:
> > >
> > > This is basically a kfree(ubuf) after the second vhost_net_flush() in
> > > vhost_net_release().
> > >
> > Fairly good catch.
> >
> > > Could you please post a formal patch?
> > >
> > I'd like very much to do that; but I wont, I am afraid, until I collect a
> > Tested-by because of reproducer without a cutting edge.
>
> You can easily collect Tested-by from syzbot for any bug with a reproducer;)
> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches
>
Thank you for the light you are casting.

Here it goes.
--->8
From: Hillf Danton 
Subject: [PATCH] vhost: fix memory leak in vhost_net_release

syzbot found the following crash on:

HEAD commit:788a0249 Merge tag 'arc-5.2-rc4' of git://git.kernel.org/p..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?xdc9ea6a0
kernel config:  https://syzkaller.appspot.com/x/.config?x�c73825cbdc7326
dashboard link: https://syzkaller.appspot.com/bug?extid89f0c7e45efd7bb643
compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?xb31761a0
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x4892c1a0


udit: type00 audit(1559768703.229:36): avc:  denied  { map } for
pidq16 comm="syz-executor330" path="/root/syz-executor330334897"
dev="sda1" ino461 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
executing program
executing program

BUG: memory leak
unreferenced object 0x88812421fe40 (size 64):
   comm "syz-executor330", pid 7117, jiffies 4294949245 (age 13.030s)
   hex dump (first 32 bytes):
 01 00 00 00 20 69 6f 63 00 00 00 00 64 65 76 2f   iocdev/
 50 fe 21 24 81 88 ff ff 50 fe 21 24 81 88 ff ff  P.!$P.!$
   backtrace:
 [] kmemleak_alloc_recursive include/linux/kmemleak.h:55 
[inline]
 [] slab_post_alloc_hook mm/slab.h:439 [inline]
 [] slab_alloc mm/slab.c:3326 [inline]
 [] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
 [<79ebab38>] kmalloc include/linux/slab.h:547 [inline]
 [<79ebab38>] vhost_net_ubuf_alloc drivers/vhost/net.c:241 [inline]
 [<79ebab38>] vhost_net_set_backend drivers/vhost/net.c:1534 
[inline]
 [<79ebab38>] vhost_net_ioctl+0xb43/0xc10 drivers/vhost/net.c:1716
 [<9f6204a2>] vfs_ioctl fs/ioctl.c:46 [inline]
 [<9f6204a2>] file_ioctl fs/ioctl.c:509 [inline]
 [<9f6204a2>] do_vfs_ioctl+0x62a/0x810 fs/ioctl.c:696
 [] ksys_ioctl+0x86/0xb0 fs/ioctl.c:713
 [] __do_sys_ioctl fs/ioctl.c:720 [inline]
 [] __se_sys_ioctl fs/ioctl.c:718 [inline]
 [] __x64_sys_ioctl+0x1e/0x30 fs/ioctl.c:718
 [<49c1f547>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
 [<29cc8ca7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

BUG: memory leak
unreferenced object 0x88812421fa80 (size 64):
   comm "syz-executor330", pid 7130, jiffies 4294949755 (age 7.930s)
   hex dump (first 32 bytes):
 01 00 00 00 01 00 00 00 00 00 00 00 2f 76 69 72  /vir
 90 fa 21 24 81 88 ff ff 90 fa 21 24 81 88 ff ff  ..!$..!$
   backtrace:
 [] kmemleak_alloc_recursive  include/linux/kmemleak.h:55 
[inline]
 [] slab_post_alloc_hook mm/slab.h:439 [inline]
 [] slab_alloc mm/slab.c:3326 [inline]
 [] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
 [<79ebab38>] kmalloc include/linux/slab.h:547 [inline]
 [<79ebab38>] vhost_net_ubuf_alloc drivers/vhost/net.c:241  [inline]
 [<79ebab38>] vhost_net_set_backend drivers/vhost/net.c:1534  
[inline]
 [<79ebab38>] vhost_net_ioctl+0xb43/0xc10  drivers/vhost/net.c:1716
 [<9f6204a2>] vfs_ioctl fs/ioctl.c:46 [inline]
 [<9f6204a2>] file_ioctl fs/ioctl.c:509 [inline]
 [<9f6204a2>] do_vfs_ioctl+0x62a/0x810 fs/ioctl.c:696
 [] ksys_ioctl+0x86/0xb0 fs/ioctl.c:713
 [] __do_sys_ioctl fs/ioctl.c:720 [inline]
 [] __se_sys_ioctl fs/ioctl.c:718 [inline]
 [] __x64_sys_ioctl+0x1e/0x30 fs/ioctl.c:718
 [<49c1f547>] do_syscall_64+0x76/0x1a0  arch/x86/entry/common.c:301
 [<29cc8ca7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

End of syzbot report.

The function vhost_net_ubuf_alloc() appears in the two cases of dump info, for
pid 7130 and 7117, suggesting that it is ubuf leak.

Since commit c38e39c378f4 ("vhost-net: fix use-after-free in 

Re: [PATCH] vhost: Don't use defined in VHOST_ARCH_CAN_ACCEL_UACCESS definition

2019-07-24 Thread Nathan Chancellor
On Thu, Jun 06, 2019 at 02:28:55PM -0400, Michael S. Tsirkin wrote:
> I'd prefer just changing the definition.
> ifdefs have a disadvantage that it's easy to get
> wrong code if you forget to include a header.
> 
> I queued the below - pls confirm it works for you.

Fine by me, I figured that might be preferred (since clang will warn if
VHOST_ARCH_CAN_ACCEL_UACCESS is not defined so you'd know if the header
was forgotten). Thank you for the fix :)

Reviewed-by: Nathan Chancellor 
Tested-by: Nathan Chancellor 

> 
> 
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index c5d950cf7627..819296332913 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -95,8 +95,11 @@ struct vhost_uaddr {
>   bool write;
>  };
>  
> -#define VHOST_ARCH_CAN_ACCEL_UACCESS defined(CONFIG_MMU_NOTIFIER) && \
> - ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE == 0
> +#if defined(CONFIG_MMU_NOTIFIER) && ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE == 0
> +#define VHOST_ARCH_CAN_ACCEL_UACCESS 1
> +#else
> +#define VHOST_ARCH_CAN_ACCEL_UACCESS 0
> +#endif
>  
>  /* The virtqueue structure describes a queue attached to a device. */
>  struct vhost_virtqueue {
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: memory leak in vhost_net_ioctl

2019-07-24 Thread Hillf Danton


Hello Jason

On Thu, 13 Jun 2019 17:10:39 +0800 Jason Wang wrote:
> 
> This is basically a kfree(ubuf) after the second vhost_net_flush() in
> vhost_net_release().
> 
Fairly good catch.

> Could you please post a formal patch?
> 
I'd like very much to do that; but I wont, I am afraid, until I collect a
Tested-by because of reproducer without a cutting edge.

Thanks
Hillf

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] vhost: Don't use defined in VHOST_ARCH_CAN_ACCEL_UACCESS definition

2019-07-24 Thread Nathan Chancellor
Clang warns:

  drivers/vhost/vhost.c:2085:5: warning: macro expansion producing
  'defined' has undefined behavior [-Wexpansion-to-defined]
  #if VHOST_ARCH_CAN_ACCEL_UACCESS
  ^
  drivers/vhost/vhost.h:98:38: note: expanded from macro
  'VHOST_ARCH_CAN_ACCEL_UACCESS'
  #define VHOST_ARCH_CAN_ACCEL_UACCESS defined(CONFIG_MMU_NOTIFIER) && \
   ^

Rework VHOST_ARCH_CAN_ACCEL_UACCESS to be defined under those conditions
so that the meaning of the code doesn't change and clang no longer
warns.

Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Link: https://github.com/ClangBuiltLinux/linux/issues/508
Signed-off-by: Nathan Chancellor 
---
 drivers/vhost/vhost.c | 44 +--
 drivers/vhost/vhost.h |  7 ---
 2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index dc9301d31f12..cc56d08b4275 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -299,7 +299,7 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
__vhost_vq_meta_reset(d->vqs[i]);
 }
 
-#if VHOST_ARCH_CAN_ACCEL_UACCESS
+#ifdef VHOST_ARCH_CAN_ACCEL_UACCESS
 static void vhost_map_unprefetch(struct vhost_map *map)
 {
kfree(map->pages);
@@ -483,7 +483,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vq->iotlb = NULL;
vq->invalidate_count = 0;
__vhost_vq_meta_reset(vq);
-#if VHOST_ARCH_CAN_ACCEL_UACCESS
+#ifdef VHOST_ARCH_CAN_ACCEL_UACCESS
vhost_reset_vq_maps(vq);
 #endif
 }
@@ -635,7 +635,7 @@ void vhost_dev_init(struct vhost_dev *dev,
INIT_LIST_HEAD(>read_list);
INIT_LIST_HEAD(>pending_list);
spin_lock_init(>iotlb_lock);
-#if VHOST_ARCH_CAN_ACCEL_UACCESS
+#ifdef VHOST_ARCH_CAN_ACCEL_UACCESS
vhost_init_maps(dev);
 #endif
 
@@ -726,7 +726,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
if (err)
goto err_cgroup;
 
-#if VHOST_ARCH_CAN_ACCEL_UACCESS
+#ifdef VHOST_ARCH_CAN_ACCEL_UACCESS
err = mmu_notifier_register(>mmu_notifier, dev->mm);
if (err)
goto err_mmu_notifier;
@@ -734,7 +734,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 
return 0;
 
-#if VHOST_ARCH_CAN_ACCEL_UACCESS
+#ifdef VHOST_ARCH_CAN_ACCEL_UACCESS
 err_mmu_notifier:
vhost_dev_free_iovecs(dev);
 #endif
@@ -828,7 +828,7 @@ static void vhost_clear_msg(struct vhost_dev *dev)
spin_unlock(>iotlb_lock);
 }
 
-#if VHOST_ARCH_CAN_ACCEL_UACCESS
+#ifdef VHOST_ARCH_CAN_ACCEL_UACCESS
 static void vhost_setup_uaddr(struct vhost_virtqueue *vq,
  int index, unsigned long uaddr,
  size_t size, bool write)
@@ -959,12 +959,12 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
dev->worker = NULL;
}
if (dev->mm) {
-#if VHOST_ARCH_CAN_ACCEL_UACCESS
+#ifdef VHOST_ARCH_CAN_ACCEL_UACCESS
mmu_notifier_unregister(>mmu_notifier, dev->mm);
 #endif
mmput(dev->mm);
}
-#if VHOST_ARCH_CAN_ACCEL_UACCESS
+#ifdef VHOST_ARCH_CAN_ACCEL_UACCESS
for (i = 0; i < dev->nvqs; i++)
vhost_uninit_vq_maps(dev->vqs[i]);
 #endif
@@ -1196,7 +1196,7 @@ static inline void __user *__vhost_get_user(struct 
vhost_virtqueue *vq,
 
 static inline int vhost_put_avail_event(struct vhost_virtqueue *vq)
 {
-#if VHOST_ARCH_CAN_ACCEL_UACCESS
+#ifdef VHOST_ARCH_CAN_ACCEL_UACCESS
struct vhost_map *map;
struct vring_used *used;
 
@@ -1224,7 +1224,7 @@ static inline int vhost_put_used(struct vhost_virtqueue 
*vq,
 struct vring_used_elem *head, int idx,
 int count)
 {
-#if VHOST_ARCH_CAN_ACCEL_UACCESS
+#ifdef VHOST_ARCH_CAN_ACCEL_UACCESS
struct vhost_map *map;
struct vring_used *used;
size_t size;
@@ -1252,7 +1252,7 @@ static inline int vhost_put_used(struct vhost_virtqueue 
*vq,
 static inline int vhost_put_used_flags(struct vhost_virtqueue *vq)
 
 {
-#if VHOST_ARCH_CAN_ACCEL_UACCESS
+#ifdef VHOST_ARCH_CAN_ACCEL_UACCESS
struct vhost_map *map;
struct vring_used *used;
 
@@ -1278,7 +1278,7 @@ static inline int vhost_put_used_flags(struct 
vhost_virtqueue *vq)
 static inline int vhost_put_used_idx(struct vhost_virtqueue *vq)
 
 {
-#if VHOST_ARCH_CAN_ACCEL_UACCESS
+#ifdef VHOST_ARCH_CAN_ACCEL_UACCESS
struct vhost_map *map;
struct vring_used *used;
 
@@ -1342,7 +1342,7 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)
 static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq,
  __virtio16 *idx)
 {
-#if VHOST_ARCH_CAN_ACCEL_UACCESS
+#ifdef VHOST_ARCH_CAN_ACCEL_UACCESS
struct vhost_map *map;
struct vring_avail *avail;
 
@@ -1367,7 +1367,7 @@ static inline int vhost_get_avail_idx(struct 
vhost_virtqueue *vq,
 static inline int vhost_get_avail_head(struct 

Re: memory leak in vhost_net_ioctl

2019-07-24 Thread Hillf Danton



On Wed, 05 Jun 2019 16:42:05 -0700 (PDT) syzbot wrote:

Hello,

syzbot found the following crash on:

HEAD commit:788a0249 Merge tag 'arc-5.2-rc4' of git://git.kernel.org/p..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15dc9ea6a0
kernel config:  https://syzkaller.appspot.com/x/.config?x=d5c73825cbdc7326
dashboard link: https://syzkaller.appspot.com/bug?extid=0789f0c7e45efd7bb643
compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=10b31761a0
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=124892c1a0

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0789f0c7e45efd7bb...@syzkaller.appspotmail.com

udit: type=1400 audit(1559768703.229:36): avc:  denied  { map } for   
pid=7116 comm="syz-executor330" path="/root/syz-executor330334897"  
dev="sda1" ino=16461 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023  
tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1

executing program
executing program
BUG: memory leak
unreferenced object 0x88812421fe40 (size 64):
   comm "syz-executor330", pid 7117, jiffies 4294949245 (age 13.030s)
   hex dump (first 32 bytes):
 01 00 00 00 20 69 6f 63 00 00 00 00 64 65 76 2f   iocdev/
 50 fe 21 24 81 88 ff ff 50 fe 21 24 81 88 ff ff  P.!$P.!$
   backtrace:
 [] kmemleak_alloc_recursive include/linux/kmemleak.h:55 
[inline]
 [] slab_post_alloc_hook mm/slab.h:439 [inline]
 [] slab_alloc mm/slab.c:3326 [inline]
 [] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
 [<79ebab38>] kmalloc include/linux/slab.h:547 [inline]
 [<79ebab38>] vhost_net_ubuf_alloc drivers/vhost/net.c:241 [inline]
 [<79ebab38>] vhost_net_set_backend drivers/vhost/net.c:1534 
[inline]
 [<79ebab38>] vhost_net_ioctl+0xb43/0xc10 drivers/vhost/net.c:1716
 [<9f6204a2>] vfs_ioctl fs/ioctl.c:46 [inline]
 [<9f6204a2>] file_ioctl fs/ioctl.c:509 [inline]
 [<9f6204a2>] do_vfs_ioctl+0x62a/0x810 fs/ioctl.c:696
 [] ksys_ioctl+0x86/0xb0 fs/ioctl.c:713
 [] __do_sys_ioctl fs/ioctl.c:720 [inline]
 [] __se_sys_ioctl fs/ioctl.c:718 [inline]
 [] __x64_sys_ioctl+0x1e/0x30 fs/ioctl.c:718
 [<49c1f547>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
 [<29cc8ca7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

BUG: memory leak
unreferenced object 0x88812421fa80 (size 64):
   comm "syz-executor330", pid 7130, jiffies 4294949755 (age 7.930s)
   hex dump (first 32 bytes):
 01 00 00 00 01 00 00 00 00 00 00 00 2f 76 69 72  /vir
 90 fa 21 24 81 88 ff ff 90 fa 21 24 81 88 ff ff  ..!$..!$
   backtrace:
 [] kmemleak_alloc_recursive include/linux/kmemleak.h:55 
[inline]
 [] slab_post_alloc_hook mm/slab.h:439 [inline]
 [] slab_alloc mm/slab.c:3326 [inline]
 [] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
 [<79ebab38>] kmalloc include/linux/slab.h:547 [inline]
 [<79ebab38>] vhost_net_ubuf_alloc drivers/vhost/net.c:241 [inline]
 [<79ebab38>] vhost_net_set_backend drivers/vhost/net.c:1534 
[inline]
 [<79ebab38>] vhost_net_ioctl+0xb43/0xc10 drivers/vhost/net.c:1716
 [<9f6204a2>] vfs_ioctl fs/ioctl.c:46 [inline]
 [<9f6204a2>] file_ioctl fs/ioctl.c:509 [inline]
 [<9f6204a2>] do_vfs_ioctl+0x62a/0x810 fs/ioctl.c:696
 [] ksys_ioctl+0x86/0xb0 fs/ioctl.c:713
 [] __do_sys_ioctl fs/ioctl.c:720 [inline]
 [] __se_sys_ioctl fs/ioctl.c:718 [inline]
 [] __x64_sys_ioctl+0x1e/0x30 fs/ioctl.c:718
 [<49c1f547>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
 [<29cc8ca7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9



---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches


Ignore my noise if you have no interest seeing the syzbot report.

After commit c38e39c378f46f ("vhost-net: fix use-after-free in
vhost_net_flush") flush would no longer free ubuf, just wait until ubuf users
disappear instead.

The following diff, in hope that may perhaps help you handle the memory leak,
makes flush able to free ubuf in the path of file release.

Thanks
Hillf
---
drivers/vhost/net.c | 8 +++-
1 file changed, 7 insertions(+), 1 

Re: [PATCH 22/22] docs: fix broken documentation links

2019-07-24 Thread Christophe Leroy



Le 30/05/2019 à 01:23, Mauro Carvalho Chehab a écrit :

Mostly due to x86 and acpi conversion, several documentation
links are still pointing to the old file. Fix them.

Signed-off-by: Mauro Carvalho Chehab 
---
  Documentation/acpi/dsd/leds.txt  |  2 +-
  Documentation/admin-guide/kernel-parameters.rst  |  6 +++---
  Documentation/admin-guide/kernel-parameters.txt  | 16 
  Documentation/admin-guide/ras.rst|  2 +-
  .../devicetree/bindings/net/fsl-enetc.txt|  7 +++
  .../bindings/pci/amlogic,meson-pcie.txt  |  2 +-
  .../bindings/regulator/qcom,rpmh-regulator.txt   |  2 +-
  Documentation/devicetree/booting-without-of.txt  |  2 +-
  Documentation/driver-api/gpio/board.rst  |  2 +-
  Documentation/driver-api/gpio/consumer.rst   |  2 +-
  .../firmware-guide/acpi/enumeration.rst  |  2 +-
  .../firmware-guide/acpi/method-tracing.rst   |  2 +-
  Documentation/i2c/instantiating-devices  |  2 +-
  Documentation/sysctl/kernel.txt  |  4 ++--
  .../translations/it_IT/process/howto.rst |  2 +-
  .../it_IT/process/stable-kernel-rules.rst|  4 ++--
  .../translations/zh_CN/process/4.Coding.rst  |  2 +-
  Documentation/x86/x86_64/5level-paging.rst   |  2 +-
  Documentation/x86/x86_64/boot-options.rst|  4 ++--
  .../x86/x86_64/fake-numa-for-cpusets.rst |  2 +-
  MAINTAINERS  |  6 +++---
  arch/arm/Kconfig |  2 +-
  arch/arm64/kernel/kexec_image.c  |  2 +-
  arch/powerpc/Kconfig |  2 +-
  arch/x86/Kconfig | 16 
  arch/x86/Kconfig.debug   |  2 +-
  arch/x86/boot/header.S   |  2 +-
  arch/x86/entry/entry_64.S|  2 +-
  arch/x86/include/asm/bootparam_utils.h   |  2 +-
  arch/x86/include/asm/page_64_types.h |  2 +-
  arch/x86/include/asm/pgtable_64_types.h  |  2 +-
  arch/x86/kernel/cpu/microcode/amd.c  |  2 +-
  arch/x86/kernel/kexec-bzimage64.c|  2 +-
  arch/x86/kernel/pci-dma.c|  2 +-
  arch/x86/mm/tlb.c|  2 +-
  arch/x86/platform/pvh/enlighten.c|  2 +-
  drivers/acpi/Kconfig | 10 +-
  drivers/net/ethernet/faraday/ftgmac100.c |  2 +-
  .../fieldbus/Documentation/fieldbus_dev.txt  |  4 ++--
  drivers/vhost/vhost.c|  2 +-
  include/acpi/acpi_drivers.h  |  2 +-
  include/linux/fs_context.h   |  2 +-
  include/linux/lsm_hooks.h|  2 +-
  mm/Kconfig   |  2 +-
  security/Kconfig |  2 +-
  tools/include/linux/err.h|  2 +-
  tools/objtool/Documentation/stack-validation.txt |  4 ++--
  tools/testing/selftests/x86/protection_keys.c|  2 +-
  48 files changed, 77 insertions(+), 78 deletions(-)


[...]


diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c1c636308c8..e868d2bd48b8 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -898,7 +898,7 @@ config PPC_MEM_KEYS
  page-based protections, but without requiring modification of the
  page tables when an application changes protection domains.
  
-	  For details, see Documentation/vm/protection-keys.rst

+ For details, see Documentation/x86/protection-keys.rst


It looks strange to reference an x86 file, for powerpc arch.

Christophe

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 22/22] docs: fix broken documentation links

2019-07-24 Thread Bhupesh Sharma
On 05/30/2019 04:53 AM, Mauro Carvalho Chehab wrote:
> Mostly due to x86 and acpi conversion, several documentation
> links are still pointing to the old file. Fix them.
>
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>   Documentation/acpi/dsd/leds.txt  |  2 +-
>   Documentation/admin-guide/kernel-parameters.rst  |  6 +++---
>   Documentation/admin-guide/kernel-parameters.txt  | 16 
>   Documentation/admin-guide/ras.rst|  2 +-
>   .../devicetree/bindings/net/fsl-enetc.txt|  7 +++
>   .../bindings/pci/amlogic,meson-pcie.txt  |  2 +-
>   .../bindings/regulator/qcom,rpmh-regulator.txt   |  2 +-
>   Documentation/devicetree/booting-without-of.txt  |  2 +-
>   Documentation/driver-api/gpio/board.rst  |  2 +-
>   Documentation/driver-api/gpio/consumer.rst   |  2 +-
>   .../firmware-guide/acpi/enumeration.rst  |  2 +-
>   .../firmware-guide/acpi/method-tracing.rst   |  2 +-
>   Documentation/i2c/instantiating-devices  |  2 +-
>   Documentation/sysctl/kernel.txt  |  4 ++--
>   .../translations/it_IT/process/howto.rst |  2 +-
>   .../it_IT/process/stable-kernel-rules.rst|  4 ++--
>   .../translations/zh_CN/process/4.Coding.rst  |  2 +-
>   Documentation/x86/x86_64/5level-paging.rst   |  2 +-
>   Documentation/x86/x86_64/boot-options.rst|  4 ++--
>   .../x86/x86_64/fake-numa-for-cpusets.rst |  2 +-
>   MAINTAINERS  |  6 +++---
>   arch/arm/Kconfig |  2 +-
>   arch/arm64/kernel/kexec_image.c  |  2 +-
>   arch/powerpc/Kconfig |  2 +-
>   arch/x86/Kconfig | 16 
>   arch/x86/Kconfig.debug   |  2 +-
>   arch/x86/boot/header.S   |  2 +-
>   arch/x86/entry/entry_64.S|  2 +-
>   arch/x86/include/asm/bootparam_utils.h   |  2 +-
>   arch/x86/include/asm/page_64_types.h |  2 +-
>   arch/x86/include/asm/pgtable_64_types.h  |  2 +-
>   arch/x86/kernel/cpu/microcode/amd.c  |  2 +-
>   arch/x86/kernel/kexec-bzimage64.c|  2 +-
>   arch/x86/kernel/pci-dma.c|  2 +-
>   arch/x86/mm/tlb.c|  2 +-
>   arch/x86/platform/pvh/enlighten.c|  2 +-
>   drivers/acpi/Kconfig | 10 +-
>   drivers/net/ethernet/faraday/ftgmac100.c |  2 +-
>   .../fieldbus/Documentation/fieldbus_dev.txt  |  4 ++--
>   drivers/vhost/vhost.c|  2 +-
>   include/acpi/acpi_drivers.h  |  2 +-
>   include/linux/fs_context.h   |  2 +-
>   include/linux/lsm_hooks.h|  2 +-
>   mm/Kconfig   |  2 +-
>   security/Kconfig |  2 +-
>   tools/include/linux/err.h|  2 +-
>   tools/objtool/Documentation/stack-validation.txt |  4 ++--
>   tools/testing/selftests/x86/protection_keys.c|  2 +-
>   48 files changed, 77 insertions(+), 78 deletions(-)
>
> diff --git a/Documentation/acpi/dsd/leds.txt b/Documentation/acpi/dsd/leds.txt
> index 81a63af42ed2..cc58b1a574c5 100644
> --- a/Documentation/acpi/dsd/leds.txt
> +++ b/Documentation/acpi/dsd/leds.txt
> @@ -96,4 +96,4 @@ where
>   
> http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.1.pdf>,
>   referenced 2019-02-21.
>
> -[7] Documentation/acpi/dsd/data-node-reference.txt
> +[7] Documentation/firmware-guide/acpi/dsd/data-node-references.rst
> diff --git a/Documentation/admin-guide/kernel-parameters.rst 
> b/Documentation/admin-guide/kernel-parameters.rst
> index 0124980dca2d..8d3273e32eb1 100644
> --- a/Documentation/admin-guide/kernel-parameters.rst
> +++ b/Documentation/admin-guide/kernel-parameters.rst
> @@ -167,7 +167,7 @@ parameter is applicable::
>   X86-32  X86-32, aka i386 architecture is enabled.
>   X86-64  X86-64 architecture is enabled.
>   More X86-64 boot options can be found in
> - Documentation/x86/x86_64/boot-options.txt .
> + Documentation/x86/x86_64/boot-options.rst.
>   X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
>   X86_UV  SGI UV support is enabled.
>   XEN Xen support is enabled
> @@ -181,10 +181,10 @@ In addition, the following text indicates that the 
> option::
>   Parameters denoted with BOOT are actually interpreted by the boot
>   loader, and have no meaning to the kernel directly.
>   Do not modify the syntax of boot loader parameters without extreme
> -need or coordination with .
> +need or coordination with .
>
>   There are also arch-specific kernel-parameters not documented here.
> -See for example .
> +See for example .
>
> 

PROBLEM: VirtIO DRM driver crashes when setting specific 16.16 fixed-point property values

2019-07-24 Thread Tyler Slabinski
VirtIO DRM driver crashes when setting specific 16.16 fixed-point
property values

When running a virtual machine with a VirtIO GPU, it's possible to
crash the entire VM by setting the value of a 16.16 fixed-point
property to any value below 65536 (1.0 in 16.16 format or 0x0001).
As a specific example, setting the SRC_W property on a plane DRM
object to a value of 3 will cause the VM to hard-shutdown.

Keywords; DRM, GPU, Virtualization, KMS, Kernel, VirtIO, Virtualization

Kernel information:
Linux version 4.19.44 (nixbld@localhost) (gcc version 7.4.0 (GCC))
#1-NixOS SMP Thu May 16 17:41:32 UTC 2019

Log output: No related errors in the logs.

To reproduce: Create a VM with a VirtIO GPU and set the property as
described above.

I have a personal project that lets you execute specific DRM commands
one at a time: 
https://github.com/Smithay/drm-rs/blob/develop/examples/kms_interactive.rs

Here's a snippet of what happens:

```
$ sudo cargo run --example kms_interactive
...
KMS>> GetResources# List out DRM resource
Connectors: [connector::Handle(31)]
Encoders: [encoder::Handle(32)]
CRTCS: [crtc::Handle(30)]
Framebuffers: [...]
Planes: [plane::Handle(28), plane::Handle(29)]
KMS>> GetProperties 28# Get properties of plane with handle 28
Property: property::Handle(7)Value: 1
Property: property::Handle(16)Value: 69
...
Property: property::Handle(10)Value: 67108864
Property: property::Handle(11)Value: 50331648
KMS>> GetInfo 10# Get info of property 10
Name: "SRC_W"
Mutable: true
Atomic: false
Value: UnsignedRange(0, 4294967295)
KMS>> SetProperty 28 10 65536# Set the value of property 10 on
plane 28 to value 65536 (succeeds)
KMS>> SetProperty 28 10 6# Set the value of property 10 on
plane 28 to value 6
```

At this point the VM has shut down.

Environment:
Linux nixos 4.19.44 #1-NixOS SMP Thu May 16 17:41:32 UTC 2019 x86_64 GNU/Linux

GNU C   7.4.0
Binutils 2.31.1
Util-linux   2.33.2
Mount   2.33.2
Module-init-tools   26
E2fsprogs   1.45.0
Linux C Library 2.27
Dynamic linker (ldd) 2.27
Procps   3.3.15
Net-tools   1.60
Kbd 2.0.4
Console-tools   2.0.4
Sh-utils 8.31
Udev 239
Modules Loaded   8021q aesni_intel aes_x86_64 af_packet agpgart
ahci atkbd autofs4 btrfs button cfg80211 crc32c_generic crc32c_intel
crc32_pclmul crc_ccitt crct10dif_pclmul cryptd crypto_simd deflate
dm_mod drm drm_kms_helper efi_pstore efivarfs efivars ehci_hcd
ehci_pci evdev failover fat fb_sys_fops ghash_clmulni_intel
glue_helper hid hid_generic i2c_core i2c_i801 i8042 input_leds
intel_agp intel_gtt ip6table_filter ip6table_raw ip6_tables
ip6t_rpfilter iptable_filter iptable_nat iptable_raw ip_tables
ipt_rpfilter ipv6 irqbypass iTCO_wdt joydev kvm led_class libahci
libata libcrc32c libps2 loop lpc_ich mac_hid mousedev net_failover
nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 nf_log_common nf_log_ipv4
nf_log_ipv6 nf_nat nf_nat_ipv4 nls_cp437 nls_iso8859_1 pcbc psmouse
pstore qemu_fw_cfg raid6_pq rfkill rng_core rtc_cmos scsi_mod serio
serio_raw snd snd_hda_codec snd_hda_codec_generic snd_hda_core
snd_hda_intel snd_hwdep snd_pcm snd_timer soundcore syscopyarea
sysfillrect sysimgblt ttm uhci_hcd usb_common usbcore usbhid vfat
virtio virtio_balloon virtio_blk virtio_console virtio_gpu virtio_net
virtio_pci virtio_ring virtio_rng xor x_tables xt_conntrack xt_LOG
xt_pkttype xt_tcpudp xxhash zstd_compress zstd_decompress

XML Configuration for VM:
```

  DRM-test
  07c20472-206a-4367-8a9c-11b39b836896
  4194304
  4194304
  2
  
/machine
  
  
hvm
/run/libvirt/nix-ovmf/OVMF_CODE.fd
/var/lib/libvirt/qemu/nvram/DRM-test_VARS.fd
  
  



  
  
Skylake-Client-IBRS
Intel











  
  



  
  destroy
  restart
  destroy
  


  
  
/run/libvirt/nix-emulators/qemu-system-x86_64

  
  
  
  
  
  
  


  
  


  
  
  


  
  
  


  
  
  


  
  


  


  
  
  
  


  
  
  
  


  
  
  


  
  
  
  


  
  
  
  


  
  
  
  


  
  
  
  
  
  


  
  

  
  


  
  
  


  
  


  


  


  
  


  
  


  
  
  


  
  


  
  


  
  

  
  
+0:+0
+0:+0
  

```
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

Re: [PATCH 22/22] docs: fix broken documentation links

2019-07-24 Thread Federico Vaga
On Thursday, May 30, 2019 1:23:53 AM CEST Mauro Carvalho Chehab wrote:
> Mostly due to x86 and acpi conversion, several documentation
> links are still pointing to the old file. Fix them.

For the Italian documentation I just send I patch to fix them in a dedicated 
patch

> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  Documentation/acpi/dsd/leds.txt  |  2 +-
>  Documentation/admin-guide/kernel-parameters.rst  |  6 +++---
>  Documentation/admin-guide/kernel-parameters.txt  | 16 
>  Documentation/admin-guide/ras.rst|  2 +-
>  .../devicetree/bindings/net/fsl-enetc.txt|  7 +++
>  .../bindings/pci/amlogic,meson-pcie.txt  |  2 +-
>  .../bindings/regulator/qcom,rpmh-regulator.txt   |  2 +-
>  Documentation/devicetree/booting-without-of.txt  |  2 +-
>  Documentation/driver-api/gpio/board.rst  |  2 +-
>  Documentation/driver-api/gpio/consumer.rst   |  2 +-
>  .../firmware-guide/acpi/enumeration.rst  |  2 +-
>  .../firmware-guide/acpi/method-tracing.rst   |  2 +-
>  Documentation/i2c/instantiating-devices  |  2 +-
>  Documentation/sysctl/kernel.txt  |  4 ++--
>  .../translations/it_IT/process/howto.rst |  2 +-
>  .../it_IT/process/stable-kernel-rules.rst|  4 ++--
>  .../translations/zh_CN/process/4.Coding.rst  |  2 +-
>  Documentation/x86/x86_64/5level-paging.rst   |  2 +-
>  Documentation/x86/x86_64/boot-options.rst|  4 ++--
>  .../x86/x86_64/fake-numa-for-cpusets.rst |  2 +-
>  MAINTAINERS  |  6 +++---
>  arch/arm/Kconfig |  2 +-
>  arch/arm64/kernel/kexec_image.c  |  2 +-
>  arch/powerpc/Kconfig |  2 +-
>  arch/x86/Kconfig | 16 
>  arch/x86/Kconfig.debug   |  2 +-
>  arch/x86/boot/header.S   |  2 +-
>  arch/x86/entry/entry_64.S|  2 +-
>  arch/x86/include/asm/bootparam_utils.h   |  2 +-
>  arch/x86/include/asm/page_64_types.h |  2 +-
>  arch/x86/include/asm/pgtable_64_types.h  |  2 +-
>  arch/x86/kernel/cpu/microcode/amd.c  |  2 +-
>  arch/x86/kernel/kexec-bzimage64.c|  2 +-
>  arch/x86/kernel/pci-dma.c|  2 +-
>  arch/x86/mm/tlb.c|  2 +-
>  arch/x86/platform/pvh/enlighten.c|  2 +-
>  drivers/acpi/Kconfig | 10 +-
>  drivers/net/ethernet/faraday/ftgmac100.c |  2 +-
>  .../fieldbus/Documentation/fieldbus_dev.txt  |  4 ++--
>  drivers/vhost/vhost.c|  2 +-
>  include/acpi/acpi_drivers.h  |  2 +-
>  include/linux/fs_context.h   |  2 +-
>  include/linux/lsm_hooks.h|  2 +-
>  mm/Kconfig   |  2 +-
>  security/Kconfig |  2 +-
>  tools/include/linux/err.h|  2 +-
>  tools/objtool/Documentation/stack-validation.txt |  4 ++--
>  tools/testing/selftests/x86/protection_keys.c|  2 +-
>  48 files changed, 77 insertions(+), 78 deletions(-)
> 
> diff --git a/Documentation/acpi/dsd/leds.txt
> b/Documentation/acpi/dsd/leds.txt index 81a63af42ed2..cc58b1a574c5 100644
> --- a/Documentation/acpi/dsd/leds.txt
> +++ b/Documentation/acpi/dsd/leds.txt
> @@ -96,4 +96,4 @@ where
> 
> http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-da
> ta-extension-UUID-v1.1.pdf>, referenced 2019-02-21.
> 
> -[7] Documentation/acpi/dsd/data-node-reference.txt
> +[7] Documentation/firmware-guide/acpi/dsd/data-node-references.rst
> diff --git a/Documentation/admin-guide/kernel-parameters.rst
> b/Documentation/admin-guide/kernel-parameters.rst index
> 0124980dca2d..8d3273e32eb1 100644
> --- a/Documentation/admin-guide/kernel-parameters.rst
> +++ b/Documentation/admin-guide/kernel-parameters.rst
> @@ -167,7 +167,7 @@ parameter is applicable::
>   X86-32  X86-32, aka i386 architecture is enabled.
>   X86-64  X86-64 architecture is enabled.
>   More X86-64 boot options can be found in
> - Documentation/x86/x86_64/boot-options.txt 
.
> + Documentation/x86/x86_64/boot-options.rst.
>   X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
>   X86_UV  SGI UV support is enabled.
>   XEN Xen support is enabled
> @@ -181,10 +181,10 @@ In addition, the following text indicates that the
> option:: Parameters denoted with BOOT are actually interpreted by the boot
> loader, and have no meaning to the kernel directly.
>  Do not modify the syntax of boot loader parameters without extreme
> -need or coordination with .
> +need or coordination with .
> 
>  There are also arch-specific kernel-parameters not documented here.
> 

Re: custom virt-io support (in user-mode-linux)

2019-07-24 Thread Anton Ivanov




On 22/05/2019 14:02, Johannes Berg wrote:

Hi,

While my main interest is mostly in UML right now [1] I've CC'ed the
qemu and virtualization lists because something similar might actually
apply to other types of virtualization.

I'm thinking about adding virt-io support to UML, but the tricky part is
that while I want to use the virt-io basics (because it's a nice
interface from the 'inside'), I don't actually want the stock drivers
that are part of the kernel now (like virtio-net etc.) but rather
something that integrates with wifi (probably building on hwsim).

The 'inside' interfaces aren't really a problem - just have a specific
device ID for this, and then write a normal virtio kernel driver for it.

The 'outside' interfaces are where my thinking breaks down right now.

Looking at lkl, the outside is just all implemented in lkl as code that
gets linked to the library, so in UML terms it'd just be extra 'outside'
code like the timer handling or other netdev stuff we have today.
Looking at qemu, it's of course also implemented there, and then
interfaces with the real network, console abstraction, etc.

However, like I said above, I really need something very custom and not
likely to make it upstream to any project (because what point is that if
you cannot connect to the rest of the environment I'm building), so I'm
thinking that perhaps it should be possible to write an abstract
'outside' that lets you interact with it really from out-of-process?
Perhaps through some kind of shared memory segment? I think that gets
tricky with virt-io doing DMA (I think it does?) though, so that part
would have to be implemented directly and not out-of-process?

But really that's why I'm asking - is there a better way than to just
link the device-side virt-io code into the same binary (be it lkl lib,
uml binary, qemu binary)?

Thanks,
johannes

[1] Actually, I've considered using qemu, but it doesn't have
virtualized time and doesn't seem to support TSC virtualization. I guess
I could remove TSC from the guest CPU and add a virtualized HPET, but
I've yet to convince myself this works - on UML I made virtual time as a
prototype already:
https://patchwork.ozlabs.org/patch/1095814/
(though my real goal isn't to just skip time forward when the host goes
idle, it's to sync with other simulated components)


___
linux-um mailing list
linux...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um



I have looked at using virtio semantics in UML in the past around the 
point when I wanted to make the recvmmsg/sendmmsg vector drivers common 
in UML and QEMU. It is certainly possible,


I went for the native approach at the end though.

--
Anton R. Ivanov
https://www.kot-begemot.co.uk/
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: custom virt-io support (in user-mode-linux)

2019-07-24 Thread Anton Ivanov




On 22/05/2019 14:46, Johannes Berg wrote:

Hi Anton,


I'm thinking about adding virt-io support to UML, but the tricky part is
that while I want to use the virt-io basics (because it's a nice
interface from the 'inside'), I don't actually want the stock drivers
that are part of the kernel now (like virtio-net etc.) but rather
something that integrates with wifi (probably building on hwsim).



I have looked at using virtio semantics in UML in the past around the
point when I wanted to make the recvmmsg/sendmmsg vector drivers common
in UML and QEMU. It is certainly possible,

I went for the native approach at the end though.


Hmm. I'm not sure what you mean by either :-)

Is there any commonality between the vector drivers? 


I was looking purely from a network driver perspective.

I had two options - either do a direct read/write as it does today or 
implement the ring/king semantics and read/write from that.


I decided to not bother with the latter and read/write directly from/to 
skbs.



I can't see how
that'd work without a bus abstraction (like virtio) in qemu? I mean, the
kernel driver just calls uml_vector_sendmmsg(), which I'd say belongs
more to the 'outside world', but that can't really be done in qemu?

Ok, I guess then I see what you mean by 'native' though.

Similarly, of course, I can implement arbitrary virt-io devices - just
the kernel side doesn't call a function like uml_vector_sendmmsg()
directly, but instead the virt-io model, and the model calls the
function, which essentially is the same just with a (convenient)
abstraction layer.

But this leaves the fundamental fact the model code ("vector_user.c" or
a similar "virtio_user.c") is still part of the build.

I guess what I'm thinking is have something like "virtio_user_rpc.c"
that uses some appropriate RPC to interact with the real model. IOW,
rather than having all the model-specific logic actually be here (like
vector_user.c actually knows how to send network packets over a real
socket fd), try to call out to some RPC that contains the real model.

Now that I thought about it further, I guess my question boils down to
"did anyone ever think about doing RPC for Virt-IO instead of putting
the entire device model into the hypervisor/emulator/...".


Virtio in general no. UML specifically - yes. I have thought of mapping 
out all key device calls to RPCs for a few applications. The issue is 
that it is fairly difficult to make all of this function cleanly without 
blocking in strange places.


You may probably want to look at the UML UBD driver. That is an example 
of moving out all processing to an external thread and talking to it via 
a request/response API. While it still expects shared memory and needs 
access to UML address space the model should be more amenable to 
replacing various calls with RPCs as you have now left the rest of the 
kernel to run while you are processing the RPC. It also provides you 
with RPC completion interrupts, etc as a side effect.


So you basically have UML -> Thread -> RPCs -> Model?



johannes


___
linux-um mailing list
linux...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um



--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 10/10] docs: fix broken documentation links

2019-07-24 Thread Federico Vaga
On Monday, May 20, 2019 4:47:39 PM CEST Mauro Carvalho Chehab wrote:
> Mostly due to x86 and acpi conversion, several documentation
> links are still pointing to the old file. Fix them.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  Documentation/acpi/dsd/leds.txt  |  2 +-
>  Documentation/admin-guide/kernel-parameters.rst  |  6 +++---
>  Documentation/admin-guide/kernel-parameters.txt  | 16 
>  Documentation/admin-guide/ras.rst|  2 +-
>  .../devicetree/bindings/net/fsl-enetc.txt|  7 +++
>  .../bindings/pci/amlogic,meson-pcie.txt  |  2 +-
>  .../bindings/regulator/qcom,rpmh-regulator.txt   |  2 +-
>  Documentation/devicetree/booting-without-of.txt  |  2 +-
>  Documentation/driver-api/gpio/board.rst  |  2 +-
>  Documentation/driver-api/gpio/consumer.rst   |  2 +-
>  .../firmware-guide/acpi/enumeration.rst  |  2 +-
>  .../firmware-guide/acpi/method-tracing.rst   |  2 +-
>  Documentation/i2c/instantiating-devices  |  2 +-
>  Documentation/sysctl/kernel.txt  |  4 ++--
>  .../translations/it_IT/process/4.Coding.rst  |  2 +-
>  .../translations/it_IT/process/howto.rst |  2 +-
>  .../it_IT/process/stable-kernel-rules.rst|  4 ++--
>  .../translations/zh_CN/process/4.Coding.rst  |  2 +-
>  Documentation/x86/x86_64/5level-paging.rst   |  2 +-
>  Documentation/x86/x86_64/boot-options.rst|  4 ++--
>  .../x86/x86_64/fake-numa-for-cpusets.rst |  2 +-
>  MAINTAINERS  |  6 +++---
>  arch/arm/Kconfig |  2 +-
>  arch/arm64/kernel/kexec_image.c  |  2 +-
>  arch/powerpc/Kconfig |  2 +-
>  arch/x86/Kconfig | 16 
>  arch/x86/Kconfig.debug   |  2 +-
>  arch/x86/boot/header.S   |  2 +-
>  arch/x86/entry/entry_64.S|  2 +-
>  arch/x86/include/asm/bootparam_utils.h   |  2 +-
>  arch/x86/include/asm/page_64_types.h |  2 +-
>  arch/x86/include/asm/pgtable_64_types.h  |  2 +-
>  arch/x86/kernel/cpu/microcode/amd.c  |  2 +-
>  arch/x86/kernel/kexec-bzimage64.c|  2 +-
>  arch/x86/kernel/pci-dma.c|  2 +-
>  arch/x86/mm/tlb.c|  2 +-
>  arch/x86/platform/pvh/enlighten.c|  2 +-
>  drivers/acpi/Kconfig | 10 +-
>  drivers/net/ethernet/faraday/ftgmac100.c |  2 +-
>  .../fieldbus/Documentation/fieldbus_dev.txt  |  4 ++--
>  drivers/vhost/vhost.c|  2 +-
>  include/acpi/acpi_drivers.h  |  2 +-
>  include/linux/fs_context.h   |  2 +-
>  include/linux/lsm_hooks.h|  2 +-
>  mm/Kconfig   |  2 +-
>  security/Kconfig |  2 +-
>  tools/include/linux/err.h|  2 +-
>  tools/objtool/Documentation/stack-validation.txt |  4 ++--
>  tools/testing/selftests/x86/protection_keys.c|  2 +-
>  49 files changed, 78 insertions(+), 79 deletions(-)
> 
> diff --git a/Documentation/acpi/dsd/leds.txt
> b/Documentation/acpi/dsd/leds.txt index 81a63af42ed2..cc58b1a574c5 100644
> --- a/Documentation/acpi/dsd/leds.txt
> +++ b/Documentation/acpi/dsd/leds.txt
> @@ -96,4 +96,4 @@ where
> 
> http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-da
> ta-extension-UUID-v1.1.pdf>, referenced 2019-02-21.
> 
> -[7] Documentation/acpi/dsd/data-node-reference.txt
> +[7] Documentation/firmware-guide/acpi/dsd/data-node-references.rst
> diff --git a/Documentation/admin-guide/kernel-parameters.rst
> b/Documentation/admin-guide/kernel-parameters.rst index
> 0124980dca2d..8d3273e32eb1 100644
> --- a/Documentation/admin-guide/kernel-parameters.rst
> +++ b/Documentation/admin-guide/kernel-parameters.rst
> @@ -167,7 +167,7 @@ parameter is applicable::
>   X86-32  X86-32, aka i386 architecture is enabled.
>   X86-64  X86-64 architecture is enabled.
>   More X86-64 boot options can be found in
> - Documentation/x86/x86_64/boot-options.txt 
.
> + Documentation/x86/x86_64/boot-options.rst.
>   X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
>   X86_UV  SGI UV support is enabled.
>   XEN Xen support is enabled
> @@ -181,10 +181,10 @@ In addition, the following text indicates that the
> option:: Parameters denoted with BOOT are actually interpreted by the boot
> loader, and have no meaning to the kernel directly.
>  Do not modify the syntax of boot loader parameters without extreme
> -need or coordination with .
> +need or coordination with .
> 
>  There are also arch-specific kernel-parameters not documented here.
> -See for example .
> +See 

Re: [PATCH 1/1] virtio/s390: fix race on airq_areas[]

2019-07-24 Thread Halil Pasic
On Wed, 24 Jul 2019 10:39:13 +0200
Christian Borntraeger  wrote:

> 
> 
> On 24.07.19 10:34, Cornelia Huck wrote:
> > On Wed, 24 Jul 2019 08:44:19 +0200
> > Christian Borntraeger  wrote:
> > 
> >> On 24.07.19 00:58, Halil Pasic wrote:
> >>> The access to airq_areas was racy ever since the adapter interrupts got
> >>> introduced to virtio-ccw, but since commit 39c7dcb15892 ("virtio/s390:
> >>> make airq summary indicators DMA") this became an issue in practice as
> >>> well. Namely before that commit the airq_info that got overwritten was
> >>> still functional. After that commit however the two infos share a
> >>> summary_indicator, which aggravates the situation. Which means
> >>> auto-online mechanism occasionally hangs the boot with virtio_blk.
> >>>
> >>> Signed-off-by: Halil Pasic 
> >>> Reported-by: Marc Hartmayer 
> >>> Fixes: 96b14536d935 ("virtio-ccw: virtio-ccw adapter interrupt support.")
> >>> ---
> >>> * We need definitely this fixed for 5.3. For older stable kernels it is
> >>> to be discussed. @Connie what do you think: do we need a cc stable?  
> >>
> >> Unless you can prove that the problem could never happen on old version
> >> we absolutely do need cc stable. 
> > 
> > Yes, this needs to be cc:stable.
> > 
> >>
> >>>
> >>> * I have a variant that does not need the extra mutex but uses cmpxchg().
> >>> Decided to post this one because that one is more complex. But if there
> >>> is interest we can have a look at it as well.  
> >>
> >> This is slow path (startup) and never called in hot path. Correct? Mutex 
> >> should be
> >> fine.
> > 
> > Yes, this is ultimately called through the ->probe functions of virtio
> > drivers.
> > 
> >>> ---
> >>>  drivers/s390/virtio/virtio_ccw.c | 4 
> >>>  1 file changed, 4 insertions(+)
> >>>
> >>> diff --git a/drivers/s390/virtio/virtio_ccw.c 
> >>> b/drivers/s390/virtio/virtio_ccw.c
> >>> index 1a55e5942d36..d97742662755 100644
> >>> --- a/drivers/s390/virtio/virtio_ccw.c
> >>> +++ b/drivers/s390/virtio/virtio_ccw.c
> >>> @@ -145,6 +145,8 @@ struct airq_info {
> >>>   struct airq_iv *aiv;
> >>>  };
> >>>  static struct airq_info *airq_areas[MAX_AIRQ_AREAS];
> >>> +DEFINE_MUTEX(airq_areas_lock);
> >>> +
> >>>  static u8 *summary_indicators;
> >>>  
> >>>  static inline u8 *get_summary_indicator(struct airq_info *info)
> >>> @@ -265,9 +267,11 @@ static unsigned long get_airq_indicator(struct 
> >>> virtqueue *vqs[], int nvqs,
> >>>   unsigned long bit, flags;
> >>>  
> >>>   for (i = 0; i < MAX_AIRQ_AREAS && !indicator_addr; i++) {
> >>> + mutex_lock(_areas_lock);
> >>>   if (!airq_areas[i])
> >>>   airq_areas[i] = new_airq_info(i);
> >>>   info = airq_areas[i];
> >>> + mutex_unlock(_areas_lock);
> >>>   if (!info)
> >>>   return 0;
> >>>   write_lock_irqsave(>lock, flags);
> >>>   
> >>
> > 
> > Reviewed-by: Cornelia Huck 
> > 
> > Should I pick this and send a pull request, or is it quicker to just
> > take this directly?
> 
> I think we can you did via a fast path. Halil, can you push to the s390 tree?

Sure!

Regards,
Halil

> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] virtio/s390: fix race on airq_areas[]

2019-07-24 Thread Halil Pasic
On Wed, 24 Jul 2019 08:44:19 +0200
Christian Borntraeger  wrote:

> 
> 
> On 24.07.19 00:58, Halil Pasic wrote:
> > The access to airq_areas was racy ever since the adapter interrupts got
> > introduced to virtio-ccw, but since commit 39c7dcb15892 ("virtio/s390:
> > make airq summary indicators DMA") this became an issue in practice as
> > well. Namely before that commit the airq_info that got overwritten was
> > still functional. After that commit however the two infos share a
> > summary_indicator, which aggravates the situation. Which means
> > auto-online mechanism occasionally hangs the boot with virtio_blk.
> > 
> > Signed-off-by: Halil Pasic 
> > Reported-by: Marc Hartmayer 
> > Fixes: 96b14536d935 ("virtio-ccw: virtio-ccw adapter interrupt support.")
> > ---
> > * We need definitely this fixed for 5.3. For older stable kernels it is
> > to be discussed. @Connie what do you think: do we need a cc stable?
> 
> Unless you can prove that the problem could never happen on old version
> we absolutely do need cc stable.

No I would not like to make an attempt at proving that. I prefer code
race free anyway. CC-ing stable.
 
> 
> > 
> > * I have a variant that does not need the extra mutex but uses cmpxchg().
> > Decided to post this one because that one is more complex. But if there
> > is interest we can have a look at it as well.
> 
> This is slow path (startup) and never called in hot path. Correct? Mutex 
> should be
> fine.

Right, this is only relevant during device initialization, which is an
infrequent operation.

Thanks,
Halil

> > ---
> >  drivers/s390/virtio/virtio_ccw.c | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/s390/virtio/virtio_ccw.c 
> > b/drivers/s390/virtio/virtio_ccw.c
> > index 1a55e5942d36..d97742662755 100644
> > --- a/drivers/s390/virtio/virtio_ccw.c
> > +++ b/drivers/s390/virtio/virtio_ccw.c
> > @@ -145,6 +145,8 @@ struct airq_info {
> > struct airq_iv *aiv;
> >  };
> >  static struct airq_info *airq_areas[MAX_AIRQ_AREAS];
> > +DEFINE_MUTEX(airq_areas_lock);
> > +
> >  static u8 *summary_indicators;
> >  
> >  static inline u8 *get_summary_indicator(struct airq_info *info)
> > @@ -265,9 +267,11 @@ static unsigned long get_airq_indicator(struct 
> > virtqueue *vqs[], int nvqs,
> > unsigned long bit, flags;
> >  
> > for (i = 0; i < MAX_AIRQ_AREAS && !indicator_addr; i++) {
> > +   mutex_lock(_areas_lock);
> > if (!airq_areas[i])
> > airq_areas[i] = new_airq_info(i);
> > info = airq_areas[i];
> > +   mutex_unlock(_areas_lock);
> > if (!info)
> > return 0;
> > write_lock_irqsave(>lock, flags);
> > 
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] virtio/s390: fix race on airq_areas[]

2019-07-24 Thread Christian Borntraeger



On 24.07.19 10:34, Cornelia Huck wrote:
> On Wed, 24 Jul 2019 08:44:19 +0200
> Christian Borntraeger  wrote:
> 
>> On 24.07.19 00:58, Halil Pasic wrote:
>>> The access to airq_areas was racy ever since the adapter interrupts got
>>> introduced to virtio-ccw, but since commit 39c7dcb15892 ("virtio/s390:
>>> make airq summary indicators DMA") this became an issue in practice as
>>> well. Namely before that commit the airq_info that got overwritten was
>>> still functional. After that commit however the two infos share a
>>> summary_indicator, which aggravates the situation. Which means
>>> auto-online mechanism occasionally hangs the boot with virtio_blk.
>>>
>>> Signed-off-by: Halil Pasic 
>>> Reported-by: Marc Hartmayer 
>>> Fixes: 96b14536d935 ("virtio-ccw: virtio-ccw adapter interrupt support.")
>>> ---
>>> * We need definitely this fixed for 5.3. For older stable kernels it is
>>> to be discussed. @Connie what do you think: do we need a cc stable?  
>>
>> Unless you can prove that the problem could never happen on old version
>> we absolutely do need cc stable. 
> 
> Yes, this needs to be cc:stable.
> 
>>
>>>
>>> * I have a variant that does not need the extra mutex but uses cmpxchg().
>>> Decided to post this one because that one is more complex. But if there
>>> is interest we can have a look at it as well.  
>>
>> This is slow path (startup) and never called in hot path. Correct? Mutex 
>> should be
>> fine.
> 
> Yes, this is ultimately called through the ->probe functions of virtio
> drivers.
> 
>>> ---
>>>  drivers/s390/virtio/virtio_ccw.c | 4 
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/s390/virtio/virtio_ccw.c 
>>> b/drivers/s390/virtio/virtio_ccw.c
>>> index 1a55e5942d36..d97742662755 100644
>>> --- a/drivers/s390/virtio/virtio_ccw.c
>>> +++ b/drivers/s390/virtio/virtio_ccw.c
>>> @@ -145,6 +145,8 @@ struct airq_info {
>>> struct airq_iv *aiv;
>>>  };
>>>  static struct airq_info *airq_areas[MAX_AIRQ_AREAS];
>>> +DEFINE_MUTEX(airq_areas_lock);
>>> +
>>>  static u8 *summary_indicators;
>>>  
>>>  static inline u8 *get_summary_indicator(struct airq_info *info)
>>> @@ -265,9 +267,11 @@ static unsigned long get_airq_indicator(struct 
>>> virtqueue *vqs[], int nvqs,
>>> unsigned long bit, flags;
>>>  
>>> for (i = 0; i < MAX_AIRQ_AREAS && !indicator_addr; i++) {
>>> +   mutex_lock(_areas_lock);
>>> if (!airq_areas[i])
>>> airq_areas[i] = new_airq_info(i);
>>> info = airq_areas[i];
>>> +   mutex_unlock(_areas_lock);
>>> if (!info)
>>> return 0;
>>> write_lock_irqsave(>lock, flags);
>>>   
>>
> 
> Reviewed-by: Cornelia Huck 
> 
> Should I pick this and send a pull request, or is it quicker to just
> take this directly?

I think we can you did via a fast path. Halil, can you push to the s390 tree?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] virtio/s390: fix race on airq_areas[]

2019-07-24 Thread Cornelia Huck
On Wed, 24 Jul 2019 08:44:19 +0200
Christian Borntraeger  wrote:

> On 24.07.19 00:58, Halil Pasic wrote:
> > The access to airq_areas was racy ever since the adapter interrupts got
> > introduced to virtio-ccw, but since commit 39c7dcb15892 ("virtio/s390:
> > make airq summary indicators DMA") this became an issue in practice as
> > well. Namely before that commit the airq_info that got overwritten was
> > still functional. After that commit however the two infos share a
> > summary_indicator, which aggravates the situation. Which means
> > auto-online mechanism occasionally hangs the boot with virtio_blk.
> > 
> > Signed-off-by: Halil Pasic 
> > Reported-by: Marc Hartmayer 
> > Fixes: 96b14536d935 ("virtio-ccw: virtio-ccw adapter interrupt support.")
> > ---
> > * We need definitely this fixed for 5.3. For older stable kernels it is
> > to be discussed. @Connie what do you think: do we need a cc stable?  
> 
> Unless you can prove that the problem could never happen on old version
> we absolutely do need cc stable. 

Yes, this needs to be cc:stable.

> 
> > 
> > * I have a variant that does not need the extra mutex but uses cmpxchg().
> > Decided to post this one because that one is more complex. But if there
> > is interest we can have a look at it as well.  
> 
> This is slow path (startup) and never called in hot path. Correct? Mutex 
> should be
> fine.

Yes, this is ultimately called through the ->probe functions of virtio
drivers.

> > ---
> >  drivers/s390/virtio/virtio_ccw.c | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/s390/virtio/virtio_ccw.c 
> > b/drivers/s390/virtio/virtio_ccw.c
> > index 1a55e5942d36..d97742662755 100644
> > --- a/drivers/s390/virtio/virtio_ccw.c
> > +++ b/drivers/s390/virtio/virtio_ccw.c
> > @@ -145,6 +145,8 @@ struct airq_info {
> > struct airq_iv *aiv;
> >  };
> >  static struct airq_info *airq_areas[MAX_AIRQ_AREAS];
> > +DEFINE_MUTEX(airq_areas_lock);
> > +
> >  static u8 *summary_indicators;
> >  
> >  static inline u8 *get_summary_indicator(struct airq_info *info)
> > @@ -265,9 +267,11 @@ static unsigned long get_airq_indicator(struct 
> > virtqueue *vqs[], int nvqs,
> > unsigned long bit, flags;
> >  
> > for (i = 0; i < MAX_AIRQ_AREAS && !indicator_addr; i++) {
> > +   mutex_lock(_areas_lock);
> > if (!airq_areas[i])
> > airq_areas[i] = new_airq_info(i);
> > info = airq_areas[i];
> > +   mutex_unlock(_areas_lock);
> > if (!info)
> > return 0;
> > write_lock_irqsave(>lock, flags);
> >   
> 

Reviewed-by: Cornelia Huck 

Should I pick this and send a pull request, or is it quicker to just
take this directly?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 07/12] vhost-scsi: convert put_page() to put_user_page*()

2019-07-24 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 09:25:13PM -0700, john.hubb...@gmail.com wrote:
> From: Jérôme Glisse 
> 
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page().
> 
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
> 
> Changes from Jérôme's original patch:
> 
> * Changed a WARN_ON to a BUG_ON.
> 
> Signed-off-by: Jérôme Glisse 
> Signed-off-by: John Hubbard 
> Cc: virtualization@lists.linux-foundation.org
> Cc: linux-fsde...@vger.kernel.org
> Cc: linux-bl...@vger.kernel.org
> Cc: linux...@kvack.org
> Cc: Jan Kara 
> Cc: Dan Williams 
> Cc: Alexander Viro 
> Cc: Johannes Thumshirn 
> Cc: Christoph Hellwig 
> Cc: Jens Axboe 
> Cc: Ming Lei 
> Cc: Dave Chinner 
> Cc: Jason Gunthorpe 
> Cc: Matthew Wilcox 
> Cc: Boaz Harrosh 
> Cc: Miklos Szeredi 
> Cc: "Michael S. Tsirkin" 
> Cc: Jason Wang 
> Cc: Paolo Bonzini 
> Cc: Stefan Hajnoczi 

Acked-by: Michael S. Tsirkin 

> ---
>  drivers/vhost/scsi.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index a9caf1bc3c3e..282565ab5e3f 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -329,11 +329,11 @@ static void vhost_scsi_release_cmd(struct se_cmd 
> *se_cmd)
>  
>   if (tv_cmd->tvc_sgl_count) {
>   for (i = 0; i < tv_cmd->tvc_sgl_count; i++)
> - put_page(sg_page(_cmd->tvc_sgl[i]));
> + put_user_page(sg_page(_cmd->tvc_sgl[i]));
>   }
>   if (tv_cmd->tvc_prot_sgl_count) {
>   for (i = 0; i < tv_cmd->tvc_prot_sgl_count; i++)
> - put_page(sg_page(_cmd->tvc_prot_sgl[i]));
> + put_user_page(sg_page(_cmd->tvc_prot_sgl[i]));
>   }
>  
>   vhost_scsi_put_inflight(tv_cmd->inflight);
> @@ -630,6 +630,13 @@ vhost_scsi_map_to_sgl(struct vhost_scsi_cmd *cmd,
>   size_t offset;
>   unsigned int npages = 0;
>  
> + /*
> +  * Here in all cases we should have an IOVEC which use GUP. If that is
> +  * not the case then we will wrongly call put_user_page() and the page
> +  * refcount will go wrong (this is in vhost_scsi_release_cmd())
> +  */
> + WARN_ON(!iov_iter_get_pages_use_gup(iter));
> +
>   bytes = iov_iter_get_pages(iter, pages, LONG_MAX,
>   VHOST_SCSI_PREALLOC_UPAGES, );
>   /* No pages were pinned */
> @@ -681,7 +688,7 @@ vhost_scsi_iov_to_sgl(struct vhost_scsi_cmd *cmd, bool 
> write,
>   while (p < sg) {
>   struct page *page = sg_page(p++);
>   if (page)
> - put_page(page);
> + put_user_page(page);
>   }
>   return ret;
>   }
> -- 
> 2.22.0
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] virtio/s390: fix race on airq_areas[]

2019-07-24 Thread Christian Borntraeger



On 24.07.19 00:58, Halil Pasic wrote:
> The access to airq_areas was racy ever since the adapter interrupts got
> introduced to virtio-ccw, but since commit 39c7dcb15892 ("virtio/s390:
> make airq summary indicators DMA") this became an issue in practice as
> well. Namely before that commit the airq_info that got overwritten was
> still functional. After that commit however the two infos share a
> summary_indicator, which aggravates the situation. Which means
> auto-online mechanism occasionally hangs the boot with virtio_blk.
> 
> Signed-off-by: Halil Pasic 
> Reported-by: Marc Hartmayer 
> Fixes: 96b14536d935 ("virtio-ccw: virtio-ccw adapter interrupt support.")
> ---
> * We need definitely this fixed for 5.3. For older stable kernels it is
> to be discussed. @Connie what do you think: do we need a cc stable?

Unless you can prove that the problem could never happen on old version
we absolutely do need cc stable. 

> 
> * I have a variant that does not need the extra mutex but uses cmpxchg().
> Decided to post this one because that one is more complex. But if there
> is interest we can have a look at it as well.

This is slow path (startup) and never called in hot path. Correct? Mutex should 
be
fine.
> ---
>  drivers/s390/virtio/virtio_ccw.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/s390/virtio/virtio_ccw.c 
> b/drivers/s390/virtio/virtio_ccw.c
> index 1a55e5942d36..d97742662755 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -145,6 +145,8 @@ struct airq_info {
>   struct airq_iv *aiv;
>  };
>  static struct airq_info *airq_areas[MAX_AIRQ_AREAS];
> +DEFINE_MUTEX(airq_areas_lock);
> +
>  static u8 *summary_indicators;
>  
>  static inline u8 *get_summary_indicator(struct airq_info *info)
> @@ -265,9 +267,11 @@ static unsigned long get_airq_indicator(struct virtqueue 
> *vqs[], int nvqs,
>   unsigned long bit, flags;
>  
>   for (i = 0; i < MAX_AIRQ_AREAS && !indicator_addr; i++) {
> + mutex_lock(_areas_lock);
>   if (!airq_areas[i])
>   airq_areas[i] = new_airq_info(i);
>   info = airq_areas[i];
> + mutex_unlock(_areas_lock);
>   if (!info)
>   return 0;
>   write_lock_irqsave(>lock, flags);
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()

2019-07-24 Thread Christoph Hellwig
On Tue, Jul 23, 2019 at 09:25:06PM -0700, john.hubb...@gmail.com wrote:
> * Store, in the iov_iter, a "came from gup (get_user_pages)" parameter.
>   Then, use the new iov_iter_get_pages_use_gup() to retrieve it when
>   it is time to release the pages. That allows choosing between put_page()
>   and put_user_page*().
> 
> * Pass in one more piece of information to bio_release_pages: a "from_gup"
>   parameter. Similar use as above.
> 
> * Change the block layer, and several file systems, to use
>   put_user_page*().

I think we can do this in a simple and better way.  We have 5 ITER_*
types.  Of those ITER_DISCARD as the name suggests never uses pages, so
we can skip handling it.  ITER_PIPE is rejected іn the direct I/O path,
which leaves us with three.

Out of those ITER_BVEC needs a user page reference, so we want to call
put_user_page* on it.  ITER_BVEC always already has page reference,
which means in the block direct I/O path path we alread don't take
a page reference.  We should extent that handling to all other calls
of iov_iter_get_pages / iov_iter_get_pages_alloc.  I think we should
just reject ITER_KVEC for direct I/O as well as we have no users and
it is rather pointless.  Alternatively if we see a use for it the
callers should always have a life page reference anyway (or might
be on kmalloc memory), so we really should not take a reference either.

In other words:  the only time we should ever have to put a page in
this patch is when they are user pages.  We'll need to clean up
various bits of code for that, but that can be done gradually before
even getting to the actual put_user_pages conversion.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization