Re: Unmapping KVM Guest Memory from Host Kernel

2024-03-08 Thread Matthew Wilcox
On Fri, Mar 08, 2024 at 03:50:05PM +, Gowans, James wrote: > Currently when using anonymous memory for KVM guest RAM, the memory all > remains mapped into the kernel direct map. We are looking at options to > get KVM guest memory out of the kernel’s direct map as a principled > approach to

Re: [PATCH] block: Remove special-casing of compound pages

2024-02-29 Thread Matthew Wilcox
On Thu, Feb 29, 2024 at 11:25:13AM -0700, Greg Edwards wrote: > > [1/1] block: Remove special-casing of compound pages > > commit: 1b151e2435fc3a9b10c8946c6aebe9f3e1938c55 > > This commit results in a change of behavior for QEMU VMs backed by hugepages > that open their VM disk image file

Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-08-21 Thread Matthew Wilcox
On Thu, Aug 18, 2022 at 08:00:41PM -0700, Hugh Dickins wrote: > tmpfs and hugetlbfs and page cache are designed around sharing memory: > TDX is designed around absolutely not sharing memory; and the further > uses which Sean foresees appear not to need it as page cache either. > > Except perhaps

Re: [Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device

2019-01-13 Thread Matthew Wilcox
On Mon, Jan 14, 2019 at 10:29:02AM +1100, Dave Chinner wrote: > Until you have images (and hence host page cache) shared between > multiple guests. People will want to do this, because it means they > only need a single set of pages in host memory for executable > binaries rather than a set of

Re: [Qemu-devel] d_off field in struct dirent and 32-on-64 emulation

2018-12-28 Thread Matthew Wilcox
On Sat, Dec 29, 2018 at 12:12:27AM +, Peter Maydell wrote: > On Fri, 28 Dec 2018 at 23:16, Andreas Dilger wrot > > On Dec 28, 2018, at 4:18 AM, Peter Maydell wrote: > > > The problem is that there is no 32-bit API in some cases > > > (unless I have misunderstood the kernel code) -- not all >

Re: [Qemu-devel] [PATCH v21 1/5] xbitmap: Introduce xbitmap

2018-02-16 Thread Matthew Wilcox
On Fri, Feb 16, 2018 at 11:45:51PM +0200, Andy Shevchenko wrote: > Now, the question about test case. Why do you heavily use BUG_ON? > Isn't resulting statistics enough? No. If any of those tests fail, we want to stop dead. They'll lead to horrendous bugs throughout the kernel if they're wrong.

Re: [Qemu-devel] [PATCH v21 1/5] xbitmap: Introduce xbitmap

2018-02-16 Thread Matthew Wilcox
On Fri, Feb 16, 2018 at 07:44:50PM +0200, Andy Shevchenko wrote: > On Tue, Jan 9, 2018 at 1:10 PM, Wei Wang <wei.w.w...@intel.com> wrote: > > From: Matthew Wilcox <mawil...@microsoft.com> > > > > The eXtensible Bitmap is a sparse bitmap representation which is >

Re: [Qemu-devel] [PATCH v20 3/7 RESEND] xbitmap: add more operations

2018-01-02 Thread Matthew Wilcox
On Fri, Dec 22, 2017 at 04:49:11PM +0800, Wei Wang wrote: > Thanks for the improvement. I also found a small bug in xb_zero. With the > following changes, it has passed the current test cases and tested with the > virtio-balloon usage without any issue. Thanks; I applied the change. Can you

Re: [Qemu-devel] [PATCH v20 4/7] virtio-balloon: VIRTIO_BALLOON_F_SG

2018-01-02 Thread Matthew Wilcox
On Sun, Dec 24, 2017 at 03:42:02PM +0800, Wei Wang wrote: > On 12/24/2017 12:45 PM, Tetsuo Handa wrote: > > Matthew Wilcox wrote: > > > If you can't preload with anything better than that, I think that > > > xb_set_bit() should attempt an allocation wit

Re: [Qemu-devel] [PATCH v20 4/7] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-12-23 Thread Matthew Wilcox
On Tue, Dec 19, 2017 at 08:17:56PM +0800, Wei Wang wrote: > +/* > + * Send balloon pages in sgs to host. The balloon pages are recorded in the > + * page xbitmap. Each bit in the bitmap corresponds to a page of PAGE_SIZE. > + * The page xbitmap is searched for continuous "1" bits, which correspond

Re: [Qemu-devel] [PATCH v20 3/7 RESEND] xbitmap: add more operations

2017-12-23 Thread Matthew Wilcox
On Sat, Dec 23, 2017 at 11:33:45PM +0900, Tetsuo Handa wrote: > Matthew Wilcox wrote: > > On Sat, Dec 23, 2017 at 11:59:54AM +0900, Tetsuo Handa wrote: > > > Matthew Wilcox wrote: > > > > + bit %= IDA_BITMAP_BITS; > > > > + radix_tre

Re: [Qemu-devel] [PATCH v20 3/7 RESEND] xbitmap: add more operations

2017-12-22 Thread Matthew Wilcox
On Sat, Dec 23, 2017 at 11:59:54AM +0900, Tetsuo Handa wrote: > Matthew Wilcox wrote: > > + bit %= IDA_BITMAP_BITS; > > + radix_tree_iter_init(, index); > > + slot = idr_get_free_cmn(root, , GFP_NOWAIT | __GFP_NOWARN, index); > > + if (IS_ERR(slot)) { > >

Re: [Qemu-devel] [PATCH v20 3/7 RESEND] xbitmap: add more operations

2017-12-21 Thread Matthew Wilcox
/xbitmap.h b/include/linux/xbitmap.h new file mode 100644 index ..c008309a9494 --- /dev/null +++ b/include/linux/xbitmap.h @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * eXtensible Bitmaps + * Copyright (c) 2017 Microsoft Corporation + * Author: Matthew Wilcox <ma

Re: [Qemu-devel] [PATCH v20 3/7 RESEND] xbitmap: add more operations

2017-12-21 Thread Matthew Wilcox
First of all, the test-suite doesn't build, so I don't know whether you ran it or not. Then I added the xb_find_set() call below, and it fails the assert, so you should probably fix that. diff --git a/lib/xbitmap.c b/lib/xbitmap.c index f03a0f9f9e29..b29af08a7597 100644 --- a/lib/xbitmap.c +++

Re: [Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement

2017-12-21 Thread Matthew Wilcox
On Thu, Dec 21, 2017 at 10:49:44AM +0800, Wei Wang wrote: > On 12/21/2017 01:10 AM, Matthew Wilcox wrote: > One more question is about the return value, why would it be ambiguous? I > think it is the same as find_next_bit() which returns the found bit or size > if not found. Because f

Re: [Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement

2017-12-20 Thread Matthew Wilcox
On Wed, Dec 20, 2017 at 04:13:16PM +, Wang, Wei W wrote: > On Wednesday, December 20, 2017 8:26 PM, Matthew Wilcox wrote: > > unsigned long bit; > > xb_preload(GFP_KERNEL); > > xb_set_bit(xb, 700); > > xb_preload_end(); > > b

Re: [Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement

2017-12-20 Thread Matthew Wilcox
On Wed, Dec 20, 2017 at 06:34:36PM +0800, Wei Wang wrote: > On 12/19/2017 10:05 PM, Tetsuo Handa wrote: > > I think xb_find_set() has a bug in !node path. > > I think we can probably remove the "!node" path for now. It would be good to > get the fundamental part in first, and leave optimization

Re: [Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement

2017-12-19 Thread Matthew Wilcox
On Tue, Dec 19, 2017 at 11:05:11PM +0900, Tetsuo Handa wrote: > Removing exceptional path made this patch easier to read. > But what I meant is > > Can you eliminate exception path and fold all xbitmap patches into one, and > post only one xbitmap patch without virtio-balloon changes? > > .

Re: [Qemu-devel] [PATCH v19 3/7] xbitmap: add more operations

2017-12-17 Thread Matthew Wilcox
On Mon, Dec 18, 2017 at 10:33:00AM +0800, Wei Wang wrote: > > My only qualm is that I've been considering optimising the memory > > consumption when an entire 1024-bit chunk is full; instead of keeping a > > pointer to a 128-byte entry full of ones, store a special value in the > > radix tree

Re: [Qemu-devel] [PATCH v19 3/7] xbitmap: add more operations

2017-12-17 Thread Matthew Wilcox
On Sun, Dec 17, 2017 at 01:47:21PM +, Wang, Wei W wrote: > On Saturday, December 16, 2017 3:22 AM, Matthew Wilcox wrote: > > On Fri, Dec 15, 2017 at 10:49:15AM -0800, Matthew Wilcox wrote: > > > Here's the API I'm looking at right now. The user need take no lock; > >

Re: [Qemu-devel] [PATCH v19 3/7] xbitmap: add more operations

2017-12-15 Thread Matthew Wilcox
(1) record any integer value in [0, ULONG_MAX] range > > (2) fetch all recorded values, with consecutive values combined in > min,max (or start,count) form for efficiently > > and I wonder whether we need to invent complete API set which > Matthew Wilcox and Wei Wang are p

Re: [Qemu-devel] [PATCH v19 3/7] xbitmap: add more operations

2017-12-15 Thread Matthew Wilcox
On Fri, Dec 15, 2017 at 10:49:15AM -0800, Matthew Wilcox wrote: > Here's the API I'm looking at right now. The user need take no lock; > the locking (spinlock) is handled internally to the implementation. I looked at the API some more and found some flaws: - how does xbit_alloc communicat

Re: [Qemu-devel] [PATCH v19 3/7] xbitmap: add more operations

2017-12-15 Thread Matthew Wilcox
On Sat, Dec 16, 2017 at 01:21:52AM +0900, Tetsuo Handa wrote: > My understanding is that virtio-balloon wants to handle sparsely spreaded > unsigned long values (which is PATCH 4/7) and wants to find all chunks of > consecutive "1" bits efficiently. Therefore, I guess that holding the values > in

Re: [Qemu-devel] [PATCH v19 3/7] xbitmap: add more operations

2017-12-15 Thread Matthew Wilcox
On Tue, Dec 12, 2017 at 07:55:55PM +0800, Wei Wang wrote: > +int xb_preload_and_set_bit(struct xb *xb, unsigned long bit, gfp_t gfp); I'm struggling to understand when one would use this. The xb_ API requires you to handle your own locking. But specifying GFP flags here implies you can sleep.

Re: [Qemu-devel] [PATCH v19 1/7] xbitmap: Introduce xbitmap

2017-12-15 Thread Matthew Wilcox
On Fri, Dec 15, 2017 at 07:05:07PM +0800, kbuild test robot wrote: > 21struct radix_tree_node *node; > 22void **slot; ^^^ missing __rcu annotation here. Wei, could you fold that change into your next round? Thanks!

Re: [Qemu-devel] [PATCH v19 3/7] xbitmap: add more operations

2017-12-14 Thread Matthew Wilcox
On Fri, Dec 15, 2017 at 01:29:45AM +0900, Tetsuo Handa wrote: > > > Also, one more thing you need to check. Have you checked how long does > > > xb_find_next_set_bit(xb, 0, ULONG_MAX) on an empty xbitmap takes? > > > If it causes soft lockup warning, should we add cond_resched() ? > > > If yes,

Re: [Qemu-devel] [PATCH v19 3/7] xbitmap: add more operations

2017-12-14 Thread Matthew Wilcox
On Wed, Dec 13, 2017 at 08:26:06PM +0800, Wei Wang wrote: > On 12/12/2017 09:20 PM, Tetsuo Handa wrote: > > Can you eliminate exception path and fold all xbitmap patches into one, and > > post only one xbitmap patch without virtio-baloon changes? If exception path > > is valuable, you can add

Re: [Qemu-devel] [PATCH v18 05/10] xbitmap: add more operations

2017-12-01 Thread Matthew Wilcox
On Fri, Dec 01, 2017 at 03:09:08PM +, Wang, Wei W wrote: > On Friday, December 1, 2017 9:02 PM, Tetsuo Handa wrote: > > If start == end is legal, > > > >for (; start < end; start = (start | (IDA_BITMAP_BITS - 1)) + 1) { > > > > makes this loop do nothing because 10 < 10 is false. > >

Re: [Qemu-devel] [PATCH v18 05/10] xbitmap: add more operations

2017-12-01 Thread Matthew Wilcox
On Fri, Dec 01, 2017 at 10:02:01PM +0900, Tetsuo Handa wrote: > If start == end is legal, > >for (; start < end; start = (start | (IDA_BITMAP_BITS - 1)) + 1) { > > makes this loop do nothing because 10 < 10 is false. ... and this is why we add tests to the test-suite!

Re: [Qemu-devel] [PATCH v18 05/10] xbitmap: add more operations

2017-11-30 Thread Matthew Wilcox
On Thu, Nov 30, 2017 at 10:35:03PM +0900, Tetsuo Handa wrote: > According to xb_set_bit(), it seems to me that we are trying to avoid memory > allocation > for "struct ida_bitmap" when all set bits within a 1024-bits bitmap reside in > the first > 61 bits. > > But does such saving help? Is

Re: [Qemu-devel] [PATCH v18 01/10] idr: add #include

2017-11-29 Thread Matthew Wilcox
On Wed, Nov 29, 2017 at 09:55:17PM +0800, Wei Wang wrote: > The was removed from radix-tree.h by the following commit: > f5bba9d11a256ad2a1c2f8e7fc6aabe6416b7890. > > Since that commit, tools/testing/radix-tree/ couldn't pass compilation > due to: tools/testing/radix-tree/idr.c:17: undefined

Re: [Qemu-devel] [PATCH v17 2/6] radix tree test suite: add tests for xbitmap

2017-11-06 Thread Matthew Wilcox
On Fri, Nov 03, 2017 at 04:13:02PM +0800, Wei Wang wrote: > From: Matthew Wilcox <mawil...@microsoft.com> > > Add the following tests for xbitmap: > 1) single bit test: single bit set/clear/find; > 2) bit range test: set/clear a range of bits and find a 0 or 1 bit in > th

Re: [Qemu-devel] [PATCH v15 2/5] lib/xbitmap: add xb_find_next_bit() and xb_zero()

2017-09-11 Thread Matthew Wilcox
On Mon, Aug 28, 2017 at 06:08:30PM +0800, Wei Wang wrote: > +/** > + * xb_zero - zero a range of bits in the xbitmap > + * @xb: the xbitmap that the bits reside in > + * @start: the start of the range, inclusive > + * @end: the end of the range, inclusive > + */ > +void xb_zero(struct xb *xb,

Re: [Qemu-devel] [PATCH v15 1/5] lib/xbitmap: Introduce xbitmap

2017-09-11 Thread Matthew Wilcox
On Mon, Aug 28, 2017 at 06:08:29PM +0800, Wei Wang wrote: > From: Matthew Wilcox <mawil...@microsoft.com> > > The eXtensible Bitmap is a sparse bitmap representation which is > efficient for set bits which tend to cluster. It supports up to > 'unsigned long' worth of bits,

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v11 3/6] virtio-balloon: VIRTIO_BALLOON_F_PAGE_CHUNKS

2017-06-28 Thread Matthew Wilcox
On Thu, Jun 15, 2017 at 04:10:17PM +0800, Wei Wang wrote: > > So you still have a home-grown bitmap. I'd like to know why > > isn't xbitmap suggested for this purpose by Matthew Wilcox > > appropriate. Please add a comment explaining the requirements > > from the data struc

Re: [Qemu-devel] [PATCH v9 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

2017-04-14 Thread Matthew Wilcox
On Fri, Apr 14, 2017 at 04:50:48AM +0300, Michael S. Tsirkin wrote: > On Thu, Apr 13, 2017 at 01:44:11PM -0700, Matthew Wilcox wrote: > > On Thu, Apr 13, 2017 at 05:35:03PM +0800, Wei Wang wrote: > > > 2) transfer the guest unused pages to the host so that they > > >

Re: [Qemu-devel] [PATCH v9 3/5] mm: function to offer a page block on the free list

2017-04-13 Thread Matthew Wilcox
On Fri, Apr 14, 2017 at 10:30:27AM +0800, Wei Wang wrote: > OK. What do you think if we add this: > > #if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE) That's spelled "IS_ENABLED(CONFIG_VIRTIO_BALLOON)", FYI.

Re: [Qemu-devel] [PATCH v9 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

2017-04-13 Thread Matthew Wilcox
On Thu, Apr 13, 2017 at 05:35:03PM +0800, Wei Wang wrote: > 2) transfer the guest unused pages to the host so that they > can be skipped to migrate in live migration. I don't understand this second bit. You leave the pages on the free list, and tell the host they're free. What's preventing

Re: [Qemu-devel] [PATCH v9 2/5] virtio-balloon: VIRTIO_BALLOON_F_BALLOON_CHUNKS

2017-04-13 Thread Matthew Wilcox
On Thu, Apr 13, 2017 at 07:34:19PM +0300, Michael S. Tsirkin wrote: > So we don't need the bitmap to talk to host, it is just > a data structure we chose to maintain lists of pages, right? > OK as far as it goes but you need much better isolation for it. > Build a data structure with APIs such as

Re: [Qemu-devel] [PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER

2017-03-11 Thread Matthew Wilcox
On Sat, Mar 11, 2017 at 07:59:31PM +0800, Wei Wang wrote: > I'm thinking what if the guest needs to transfer these much physically > continuous > memory to host: 1GB+2MB+64KB+32KB+16KB+4KB. > Is it going to use Six 64-bit chunks? Would it be simpler if we just > use the 128-bit chunk format (we

Re: [Qemu-devel] [PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER

2017-03-10 Thread Matthew Wilcox
On Fri, Mar 10, 2017 at 09:35:21PM +0200, Michael S. Tsirkin wrote: > > bit 0 clear => bits 1-11 encode a page count, bits 12-63 encode a PFN, page > > size 4k. > > bit 0 set, bit 1 clear => bits 2-12 encode a page count, bits 13-63 encode > > a PFN, page size 8k > > bits 0+1 set, bit 2 clear =>

Re: [Qemu-devel] [PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER

2017-03-10 Thread Matthew Wilcox
On Fri, Mar 10, 2017 at 09:10:53PM +0200, Michael S. Tsirkin wrote: > > I completely agree with you that we should be able to pass a hugepage > > as a single chunk. Also we shouldn't assume that host and guest have > > the same page size. I think we can come up with a scheme that actually > >

Re: [Qemu-devel] [PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER

2017-03-10 Thread Matthew Wilcox
On Fri, Mar 10, 2017 at 05:58:28PM +0200, Michael S. Tsirkin wrote: > One of the issues of current balloon is the 4k page size > assumption. For example if you free a huge page you > have to split it up and pass 4k chunks to host. > Quite often host can't free these 4k chunks at all (e.g. > when

Re: [Qemu-devel] [PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER

2017-03-09 Thread Matthew Wilcox
On Fri, Mar 03, 2017 at 01:40:28PM +0800, Wei Wang wrote: > From: Liang Li > 1) allocating pages (6.5%) > 2) sending PFNs to host (68.3%) > 3) address translation (6.1%) > 4) madvise (19%) > > This patch optimizes step 2) by transfering pages to the host in > chunks. A