[kvm-devel] fastcall removal

2008-02-15 Thread Andrea Arcangeli
This allows compiling the external module against linux.git (fastcall has finally become the default and only choice). Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/kernel/external-module-compat.h b/kernel/external-module-compat.h index 5611c12..52b745c 100644 --- a/kernel

Re: [kvm-devel] [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-16 Thread Andrea Arcangeli
On Fri, Feb 15, 2008 at 07:37:36PM -0800, Andrew Morton wrote: The | is obviously deliberate. But no explanation is provided telling us why we still call the callback if ptep_clear_flush_young() said the page was recently referenced. People who read your code will want to understand this.

[kvm-devel] [PATCH] KVM swapping with MMU Notifiers V7

2008-02-16 Thread Andrea Arcangeli
. The race can materialize if the linux pte is zapped after get_user_pages returns but before the page is mapped by the spte and tracked by rmap. The invalidate_ calls can also likely be optimized further but it's not a fast path so it's not urgent. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED

Re: [kvm-devel] [PATCH] KVM swapping with MMU Notifiers V7

2008-02-18 Thread Andrea Arcangeli
On Sat, Feb 16, 2008 at 03:08:17AM -0800, Andrew Morton wrote: On Sat, 16 Feb 2008 11:48:27 +0100 Andrea Arcangeli [EMAIL PROTECTED] wrote: +void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, + struct mm_struct *mm

Re: [kvm-devel] [PATCH] KVM swapping with MMU Notifiers V7

2008-02-18 Thread Andrea Arcangeli
On Sat, Feb 16, 2008 at 05:51:38AM -0600, Robin Holt wrote: I am doing this in xpmem with a stack-based structure in the function calling get_user_pages. That structure describes the start and end address of the range we are doing the get_user_pages on. If an invalidate_range_begin comes in

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-19 Thread Andrea Arcangeli
On Tue, Feb 19, 2008 at 07:54:14PM +1100, Nick Piggin wrote: As far as sleeping inside callbacks goes... I think there are big problems with the patch (the sleeping patch and the external rmap patch). I don't think it is workable in its current state. Either we have to make some big changes to

Re: [kvm-devel] [patch] my mmu notifiers

2008-02-19 Thread Andrea Arcangeli
On Tue, Feb 19, 2008 at 09:43:57AM +0100, Nick Piggin wrote: are rather similar. However I have tried to make a point of minimising the impact the the core mm/. I don't see why we need to invalidate or flush I also tried hard to minimise the impact of the core mm/, I also argued with Christoph

Re: [kvm-devel] [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-19 Thread Andrea Arcangeli
On Tue, Feb 19, 2008 at 07:46:10PM +1100, Nick Piggin wrote: On Sunday 17 February 2008 06:22, Christoph Lameter wrote: On Fri, 15 Feb 2008, Andrew Morton wrote: flush_cache_page(vma, address, pte_pfn(*pte)); entry = ptep_clear_flush(vma, address, pte);

Re: [kvm-devel] [patch] my mmu notifiers

2008-02-19 Thread Andrea Arcangeli
On Tue, Feb 19, 2008 at 11:59:23PM +0100, Nick Piggin wrote: That's why I don't understand the need for the pairs: it should be done like this. Yes, except it can't be done like this for xpmem. OK, I didn't see the invalidate_pages call... See the last patch I posted to Andrew, you've

Re: [kvm-devel] [patch] my mmu notifiers

2008-02-19 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 12:11:57AM +0100, Nick Piggin wrote: Sorry, I realise I still didn't get this through my head yet (and also have not seen your patch recently). So I don't know exactly what you are doing... The last version was posted here:

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-19 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 10:08:49AM +1100, Nick Piggin wrote: You can't sleep inside rcu_read_lock()! I must say that for a patch that is up to v8 or whatever and is posted twice a week to such a big cc list, it is kind of slack to not even test it and expect other people to review it. Well,

[kvm-devel] [PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
. I doubt xpmem fits inside a CONFIG_MMU_NOTIFIER anymore, or we'll all run a bit slower because of it. It's really a call of how much we want to optimize the MMU notifier, by keeping things like RCU for the registration. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/include/asm

[kvm-devel] mmdrop external module oops

2008-02-20 Thread Andrea Arcangeli
A 2.6.25-rc based kernel spawned an oops in mmdrop when kvm quit so that reminded me of this: Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/kernel/external-module-compat.h b/kernel/external-module-compat.h index 20ef841..fd3cb1d 100644 --- a/kernel/external-module-compat.h +++ b

[kvm-devel] [PATCH] KVM swapping (+ seqlock fix) with mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
, without requiring a page pin). Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 41962e7..e1287ab 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -21,6 +21,7 @@ config KVM tristate Kernel-based Virtual Machine

Re: [kvm-devel] [PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 05:33:13AM -0600, Robin Holt wrote: But won't that other subsystem cause us to have two seperate callouts that do equivalent things and therefore force a removal of this and go back to what Christoph has currently proposed? The point is that a new kind of notifier that

Re: [kvm-devel] [PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 06:24:24AM -0600, Robin Holt wrote: We do not need to do any allocation in the messaging layer, all structures used for messaging are allocated at module load time. The allocation discussions we had early on were about trying to rearrange you notifiers to allow a

Re: [kvm-devel] [PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 08:41:55AM -0600, Robin Holt wrote: On Wed, Feb 20, 2008 at 11:39:42AM +0100, Andrea Arcangeli wrote: XPMEM simply can't use RCU for the registration locking if it wants to schedule inside the mmu notifier calls. So I guess it's better to add Whoa

Re: [kvm-devel] [PATCH] mmu notifiers #v6

2008-02-21 Thread Andrea Arcangeli
On Thu, Feb 21, 2008 at 05:54:30AM +0100, Nick Piggin wrote: will send you incremental changes that can be discussed more easily that way (nothing major, mainly style and minor things). I don't need to say you're very welcome ;). I agree: your coherent, non-sleeping mmu notifiers are pretty

[kvm-devel] [PATCH] mmu notifiers #v7

2008-02-27 Thread Andrea Arcangeli
on the below to be optimal for GRU/KVM and trivially extendible once a CONFIG_XPMEM will be added. So this first part can go in now I think. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] Signed-off-by: Christoph Lameter [EMAIL PROTECTED] diff --git a/include/linux/mm_types.h b/include/linux

[kvm-devel] [PATCH] KVM swapping with mmu notifiers #v7

2008-02-27 Thread Andrea Arcangeli
Same as before but one one hand ported to #v7 API and on the other hand ported to latest kvm.git. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 41962e7..e1287ab 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig

Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-27 Thread Andrea Arcangeli
Hi Izik kvm-devel, Just wanted to remind that if we'll converge on #v7, the ksm code in replace_page will have to call ptep_clear_flush_notify too (just like do_wp_page). - This SF.net email is sponsored by: Microsoft Defy

Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-27 Thread Andrea Arcangeli
On Wed, Feb 27, 2008 at 03:06:10PM -0800, Christoph Lameter wrote: Ok so it somehow works slowly with GRU and you are happy with it. What As far as GRU is concerned, performance is the same as with your patch (Jack can confirm). about the RDMA folks etc etc? If RDMA/IB folks needed to block

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-27 Thread Andrea Arcangeli
On Wed, Feb 27, 2008 at 02:23:29PM -0800, Christoph Lameter wrote: How would that work? You rely on the pte locking. Thus calls are all in an I don't rely on the pte locking in #v7, exactly to satisfy GRU (so far purely theoretical) performance complains. atomic context. I think we need a

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-27 Thread Andrea Arcangeli
On Wed, Feb 27, 2008 at 02:35:59PM -0800, Christoph Lameter wrote: Could you be specific? This refers to page migration? Hmmm... Guess we If the reader schedule, the synchronize_rcu will return in the other cpu and the objects in the list will be freed and overwritten, and when the task is

Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-27 Thread Andrea Arcangeli
On Wed, Feb 27, 2008 at 04:08:07PM -0800, Christoph Lameter wrote: On Thu, 28 Feb 2008, Andrea Arcangeli wrote: If RDMA/IB folks needed to block in invalidate_range, I guess they need to do so on top of tmpfs too, and that never worked with your patch anyway. How about blocking

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-27 Thread Andrea Arcangeli
On Wed, Feb 27, 2008 at 02:39:46PM -0800, Christoph Lameter wrote: On Wed, 20 Feb 2008, Andrea Arcangeli wrote: Well, xpmem requirements are complex. As as side effect of the simplicity of my approach, my patch is 100% safe since #v1. Now it also works for GRU and it cluster invalidates

Re: [kvm-devel] [patch 5/6] mmu_notifier: Support for drivers with revers maps (f.e. for XPmem)

2008-02-27 Thread Andrea Arcangeli
On Wed, Feb 27, 2008 at 02:43:41PM -0800, Christoph Lameter wrote: Nope. unmap_mapping_range is already handled by the range callbacks. But they're called with atomic=1 on anything but anonymous memory. I understood Andrew asked to remove the atomic param and to allow sleeping for all kind of

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-27 Thread Andrea Arcangeli
On Wed, Feb 27, 2008 at 04:14:08PM -0800, Christoph Lameter wrote: Erm. This would also be needed by RDMA etc. The only RDMA I know is Quadrics, and Quadrics apparently doesn't need to schedule inside the invalidate methods AFIK, so I doubt the above is true. It'd be interesting to know if IB is

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-27 Thread Andrea Arcangeli
On Wed, Feb 27, 2008 at 05:03:21PM -0800, Christoph Lameter wrote: RDMA works across a network and I would assume that it needs confirmation that a connection has been torn down before pages can be unmapped. Depends on the latency of the network, for example with page pinning it can even try

Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-28 Thread Andrea Arcangeli
On Thu, Feb 28, 2008 at 11:48:10AM -0800, Christoph Lameter wrote: make it work after the VM locking will be altered (for example the ^^^ CONFIG_XPMEM should also switch the mmu_register/unregister locking

Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-28 Thread Andrea Arcangeli
On Thu, Feb 28, 2008 at 05:17:33PM -0600, Jack Steiner wrote: I disagree. The location of the callout IS a performance issue. In simple comparisons of the 2 patches (Christoph's vs. Andrea's), Andrea's has a 7X increase in the number of TLB purges being issued to the GRU. TLB flushing Are you

Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-28 Thread Andrea Arcangeli
On Thu, Feb 28, 2008 at 03:05:30PM -0800, Christoph Lameter wrote: Still think that the lock here is not of too much use and can be easily replaced by mmap_sem. I can use the mmap_sem. +#define mmu_notifier(function, mm, args...) \ + do {

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-28 Thread Andrea Arcangeli
On Thu, Feb 28, 2008 at 10:43:54AM -0800, Christoph Lameter wrote: What about invalidate_page()? That would just spin waiting an ack (just like the smp-tlb-flushing invalidates in numa already does). Thinking more about this, we could also parallelize it with an invalidate_page_before/end. If

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-29 Thread Andrea Arcangeli
On Thu, Feb 28, 2008 at 04:59:59PM -0800, Christoph Lameter wrote: And thus the device driver may stop receiving data on a UP system? It will never get the ack. Not sure to follow, sorry. My idea was: post the invalidate in the mmio region of the device smp_call_function() while

Re: [kvm-devel] [PATCH] mmu notifiers #v7

2008-02-29 Thread Andrea Arcangeli
On Thu, Feb 28, 2008 at 05:03:01PM -0800, Christoph Lameter wrote: I thought you wanted to get rid of the sync via pte lock? Sure. _notify is happening inside the pt lock by coincidence, to reduce the changes to mm/* as long as the mmu notifiers aren't sleep capable. What changes to do_wp_page

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-29 Thread Andrea Arcangeli
On Fri, Feb 29, 2008 at 11:55:17AM -0800, Christoph Lameter wrote: post the invalidate in the mmio region of the device smp_call_function() while (mmio device wait-bitflag is on); So the device driver on UP can only operate through interrupts? If you are hogging the only cpu

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-29 Thread Andrea Arcangeli
On Fri, Feb 29, 2008 at 01:34:34PM -0800, Christoph Lameter wrote: On Fri, 29 Feb 2008, Andrea Arcangeli wrote: On Fri, Feb 29, 2008 at 01:03:16PM -0800, Christoph Lameter wrote: That means we need both the anon_vma locks and the i_mmap_lock to become semaphores. I think semaphores

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-29 Thread Andrea Arcangeli
On Fri, Feb 29, 2008 at 01:03:16PM -0800, Christoph Lameter wrote: That means we need both the anon_vma locks and the i_mmap_lock to become semaphores. I think semaphores are better than mutexes. Rik and Lee saw some performance improvements because list can be traversed in parallel when

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-29 Thread Andrea Arcangeli
On Fri, Feb 29, 2008 at 02:12:57PM -0800, Christoph Lameter wrote: On Fri, 29 Feb 2008, Andrea Arcangeli wrote: AFAICT The rw semaphore fastpath is similar in performance to a rw spinlock. read side is taken in the slow path. Slowpath meaning VM slowpath or lock slow path? Its

Re: [kvm-devel] [PATCH] mmu notifiers #v8 + xpmem

2008-03-02 Thread Andrea Arcangeli
to linux-mm in a separate thread). Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/mm/rmap.c b/mm/rmap.c --- a/mm/rmap.c +++ b/mm/rmap.c @@ -274,7 +274,7 @@ static int page_referenced_one(struct pa unsigned long address; pte_t *pte; spinlock_t *ptl; - int

[kvm-devel] [PATCH] mmu notifiers #v8

2008-03-02 Thread Andrea Arcangeli
in .26. The brainer part of the VM work to do to make it sleep capable is pretty much orthogonal with this patch. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] Signed-off-by: Christoph Lameter [EMAIL PROTECTED] diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h --- a/include

Re: [kvm-devel] [PATCH] mmu notifiers #v8

2008-03-03 Thread Andrea Arcangeli
On Mon, Mar 03, 2008 at 04:29:34AM +0100, Nick Piggin wrote: to something I prefer. Others may not, but I'll post them for debate anyway. Sure, thanks! I didn't drop invalidate_page, because invalidate_range_begin/end would be slower for usages like KVM/GRU (we don't need a begin/end

Re: [kvm-devel] [PATCH] mmu notifiers #v8

2008-03-03 Thread Andrea Arcangeli
On Mon, Mar 03, 2008 at 02:10:17PM +0100, Nick Piggin wrote: Is this just a GRU problem? Can't we just require them to take a ref on the page (IIRC Jack said GRU could be changed to more like a TLB model). Yes, it's just a GRU problem, it tries to optimize performance by calling follow_page

Re: [kvm-devel] [PATCH] mmu notifiers #v8

2008-03-03 Thread Andrea Arcangeli
On Mon, Mar 03, 2008 at 11:01:22AM -0800, Christoph Lameter wrote: API still has rcu issues and the example given for making things sleepable is only working for the aging callback. The most important callback is for try_to_unmao and page_mkclean. This means the API is still not generic

[kvm-devel] [PATCH] mmu notifiers #v9

2008-03-03 Thread Andrea Arcangeli
and at the same time deferring _end after the whole tlb_gather page freeing is reducing the number of invalidates. .26 will allow all the methods to sleep by following the roadmap described in the #v8 patch. KVM so far is swapping fine on top of this. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED

[kvm-devel] [PATCH] KVM swapping with mmu notifiers #v9

2008-03-03 Thread Andrea Arcangeli
Notably the registration now requires the mmap_sem in write mode. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 41962e7..e1287ab 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -21,6 +21,7 @@ config KVM

Re: [kvm-devel] [PATCH] KVM swapping with mmu notifiers #v9

2008-03-04 Thread Andrea Arcangeli
Hello Izik, On Tue, Mar 04, 2008 at 02:44:07AM +0200, Izik Eidus wrote: i wrote to you about this before (i didnt get answer for this so i write Ouch I must have lost your previous comment with a too-fast pgdown in the full quoting of the patch sorry. again) with large pages support i think

Re: [kvm-devel] [RFC] Notifier for Externally Mapped Memory (EMM)

2008-03-04 Thread Andrea Arcangeli
On Mon, Mar 03, 2008 at 11:31:15PM -0800, Christoph Lameter wrote: @@ -446,6 +450,8 @@ static int page_mkclean_one(struct page if (address == -EFAULT) goto out; + /* rmap lock held */ + emm_notify(mm, emm_invalidate_start, address, address + PAGE_SIZE);

Re: [kvm-devel] [RFC] Notifier for Externally Mapped Memory (EMM)

2008-03-04 Thread Andrea Arcangeli
On Tue, Mar 04, 2008 at 11:00:31AM -0800, Christoph Lameter wrote: But as you pointed out before that path is a slow path anyways. Its rarely It's a slow path but I don't see why you think two hooks are better than one, when only one is necessary. I once ripped invalidate_page while working on

[kvm-devel] [PATCH] 2/4 move all invalidate_page outside of PT lock (#v9 was 1/4)

2008-03-07 Thread Andrea Arcangeli
) is to decrease the non obviously safe mangling over mm/* during .25. The below patch is simple, but not as obviously safe as s/ptep_clear_flush/ptep_clear_flush_notify/. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h

Re: [kvm-devel] [PATCH] 3/4 combine RCU with seqlock to allow mmu notifier methods to sleep (#v9 was 1/4)

2008-03-07 Thread Andrea Arcangeli
to keep mmu_notifier_unregister. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -10,6 +10,7 @@ #include linux/rbtree.h #include linux/rwsem.h #include linux

[kvm-devel] [PATCH] 4/4 i_mmap_lock spinlock2rwsem (#v9 was 1/4)

2008-03-07 Thread Andrea Arcangeli
This is a rediff of Christoph's plain i_mmap_lock2rwsem patch on top of #v9 1/4 + 2/4 + 3/4 (hence this is called 4/4). This is mostly to show that after 3/4, any patch that plugs on the EMM patchset will plug nicely on top of my MMU notifer patchset too. The patch trigger bug checks here in

Re: [kvm-devel] [PATCH] 3/4 combine RCU with seqlock to allow mmu notifier methods to sleep (#v9 was 1/4)

2008-03-07 Thread Andrea Arcangeli
On Fri, Mar 07, 2008 at 05:52:42PM +0100, Peter Zijlstra wrote: hlist_del_rcu(mn-hlist) + rcu_read_unlock(); kfree(mn); young |= mn-ops-clear_flush_young(mn, mm, address); *BANG* My objective was to allow mmu_notifier_register/unregister to be

Re: [kvm-devel] [PATCH] 3/4 combine RCU with seqlock to allow mmu notifier methods to sleep (#v9 was 1/4)

2008-03-07 Thread Andrea Arcangeli
On Fri, Mar 07, 2008 at 07:01:35PM +0100, Peter Zijlstra wrote: The reason Christoph can do without RCU is because he doesn't allow unregister, and as soon as you drop that you'll end up with something Not sure to follow, what do you mean he doesn't allow? We'll also have to rip unregister

Re: [kvm-devel] Notifier for Externally Mapped Memory (EMM) V1

2008-03-07 Thread Andrea Arcangeli
On Wed, Mar 05, 2008 at 04:22:11PM -0800, Christoph Lameter wrote: + if (e-callback) { + x = e-callback(e, mm, op, start, end); + if (x) + return x; [..] + + if (emm_notify(mm, emm_referenced, address,

Re: [kvm-devel] [PATCH] 3/4 combine RCU with seqlock to allow mmu notifier methods to sleep (#v9 was 1/4)

2008-03-07 Thread Andrea Arcangeli
On Fri, Mar 07, 2008 at 07:45:52PM +0100, Andrea Arcangeli wrote: On Fri, Mar 07, 2008 at 07:01:35PM +0100, Peter Zijlstra wrote: The reason Christoph can do without RCU is because he doesn't allow unregister, and as soon as you drop that you'll end up with something Not sure to follow

Re: [kvm-devel] KVM: MMU: add KVM_ZAP_GFN ioctl

2008-03-21 Thread Andrea Arcangeli
On Thu, Mar 20, 2008 at 02:09:15PM +0200, Avi Kivity wrote: Marcelo Tosatti wrote: Add an ioctl to zap all mappings to a given gfn. This allows userspace remove the QEMU process mappings and the page without causing inconsistency. I'm thinking of comitting rmap_nuke() to kvm.git,

Re: [kvm-devel] KVM: MMU: add KVM_ZAP_GFN ioctl

2008-03-21 Thread Andrea Arcangeli
On Fri, Mar 21, 2008 at 10:37:00AM -0300, Marcelo Tosatti wrote: This is not the final put_page(). Remote TLB's are flushed here, after rmap_remove: + if (nuked) + kvm_flush_remote_tlbs(kvm); This ioctl is called before zap_page_range() is executed through

Re: [kvm-devel] KVM: MMU: add KVM_ZAP_GFN ioctl

2008-03-24 Thread Andrea Arcangeli
On Fri, Mar 21, 2008 at 06:23:41PM -0300, Marcelo Tosatti wrote: If there are any active shadow mappings to a page there is a guarantee that there is a valid linux pte mapping pointing at it. So page_count == ^^ was 1 + nr_sptes. Yes. So the theoretical race you're talking

Re: [kvm-devel] KVM: MMU: add KVM_ZAP_GFN ioctl

2008-03-26 Thread Andrea Arcangeli
On Mon, Mar 24, 2008 at 07:54:27AM +0100, Andrea Arcangeli wrote: I'd more accurately describe the race as this: CPU0 CPU1 spte = rmap_next(kvm, rmapp, NULL); while (spte) { BUG_ON(!spte

Re: [kvm-devel] [PATCH] KVM: MMU: Fix rmap_remove() race

2008-03-26 Thread Andrea Arcangeli
On Wed, Mar 26, 2008 at 02:51:28PM -0300, Marcelo Tosatti wrote: Nope. If a physical CPU has page translations cached it _must_ be running in the context of a qemu thread (does not matter if its in userspace or executing guest code). The bit corresponding to such CPU's will be set in

Re: [kvm-devel] [PATCH] KVM: MMU: Fix rmap_remove() race

2008-03-26 Thread Andrea Arcangeli
On Wed, Mar 26, 2008 at 05:02:53PM +0200, Avi Kivity wrote: Andrea notes that freeing the page before flushing the tlb is a race, as the guest can sneak in one last write before the tlb is flushed, writing to a page that may belong to someone else. Fix be reversing the order of freeing and

Re: [kvm-devel] [PATCH] KVM: MMU: Fix rmap_remove() race

2008-03-26 Thread Andrea Arcangeli
On Wed, Mar 26, 2008 at 08:22:31PM +0100, Andrea Arcangeli wrote: what happens if invalidate_page runs after rmap_remove is returned (the spte isn't visible anymore by the rmap code and in turn by invalidate_page) but before the set_shadow_pte(nonpresent) runs. Thinking some more the mmu_lock

Re: [kvm-devel] [PATCH] KVM: MMU: Fix rmap_remove() race

2008-03-27 Thread Andrea Arcangeli
On Thu, Mar 27, 2008 at 10:11:42AM +0200, Avi Kivity wrote: Erm I don't think this means what you think it means. This is the kernel/user communication area, used to pass exit data to userspace. It's not the memslot vma. Yep... only the kvm_vm_vm_ops can run gfn_to_page, and I assume that

Re: [kvm-devel] [PATCH] KVM: MMU: Fix rmap_remove() race

2008-03-27 Thread Andrea Arcangeli
On Thu, Mar 27, 2008 at 03:56:56PM +0200, Avi Kivity wrote: That's not good. We need to support the older userspace, for a while yet. Why is there a problem? IIRC it's just anonymous memory. Problem is that for it to be unmapped __do_fault must call page_add_new_anon_rmap on it. Even anon

Re: [kvm-devel] [PATCH] KVM: MMU: Fix rmap_remove() race

2008-03-28 Thread Andrea Arcangeli
safe spte=nonpresent; tlbflush; put_page ordering, then it'll always be safe, but it'll be slower as there will be more tlb flushes than needed. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index caa9f94..5343216 100644 --- a/arch/x86/kvm/x86

[kvm-devel] regression breaks lowmem reserved RAM

2008-03-28 Thread Andrea Arcangeli
This is crashing at boot my lowmem reserved RAM patch. This is causing GFP_DMA allocations at boot for no good reason. It crashes in my case because there's no ram below 16M available to linux. Are you sure this is needed at all, for sure if there's any bug this isn't the right fix. Please

Re: [kvm-devel] [PATCH] KVM: MMU: Fix rmap_remove() race

2008-03-28 Thread Andrea Arcangeli
On Fri, Mar 28, 2008 at 03:01:13PM +0100, Andrea Arcangeli wrote: @@ -271,8 +292,12 @@ int __kvm_set_memory_region(struct kvm *kvm, r = -EINVAL; /* General sanity checks */ + if (mem-userspace_addr (PAGE_SIZE - 1)) + goto out; if (mem-memory_size

Re: [kvm-devel] [PATCH] KVM: MMU: Fix rmap_remove() race

2008-03-31 Thread Andrea Arcangeli
On Mon, Mar 31, 2008 at 09:35:00AM +0300, Avi Kivity wrote: This can be done by taking mmu_lock in _begin and releasing it in _end, unless there's a lock dependency issue. The main problem is if want to be able to co-exit with XPMEM methods registered in the same notifier chain for the same MM

[kvm-devel] [0/3] -reserved-ram for PCI passthrough without VT-d and without paravirt

2008-03-31 Thread Andrea Arcangeli
Hello, These three patches (one against host kernel, one against kvm.git, one against kvm-userland.git) forces KVM to map all RAM mapped in the virtualized e820 map provided to the guest with gfn = hfn. In turn it's now possible to give direct hardware access to the guest, all DMA will work fine

[kvm-devel] [1/3] -reserved-ram for PCI passthrough without VT-d and without paravirt

2008-03-31 Thread Andrea Arcangeli
array be allocated with holes corresponding to the holes generated in the e820 map, simply bad_page will be returned gracefully without risk like if this patch wasn't applied to kvm.git. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm

[kvm-devel] [2/3] -reserved-ram for PCI passthrough without VT-d and without paravirt

2008-03-31 Thread Andrea Arcangeli
don't use pci-passthrough, but then pci passthrough will randomly memory corrupt the host. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/bios/rombios.c b/bios/rombios.c index 318de57..f93a6c6 100644 --- a/bios/rombios.c +++ b/bios/rombios.c @@ -4251,6 +4251,7 @@ int15_function32

[kvm-devel] [3/3] -reserved-ram for PCI passthrough without iommu and without paravirt

2008-03-31 Thread Andrea Arcangeli
-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1107,8 +1107,36 @@ config CRASH_DUMP (CONFIG_RELOCATABLE=y). For more details see Documentation/kdump/kdump.txt +config RESERVE_PHYSICAL_START

Re: [kvm-devel] regression breaks lowmem reserved RAM

2008-04-01 Thread Andrea Arcangeli
tell, this will stop the regression with isa dma operations at boot for 99% of blkdev/memory combinations out there and I guess this fixes the setups with 4G of ram and 32bit pci cards as well (this also retains symmetry with the 32bit code). Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff

Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Andrea Arcangeli
On Tue, Apr 01, 2008 at 10:20:49AM -0500, Anthony Liguori wrote: Which is apparently entirely unnecessary as we already have /sys/bus/pci/.../region. It's just a matter of checking if a vma is VM_IO and then dealing with the subsequent reference counting issues as Avi points out. Do you

Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Andrea Arcangeli
On Tue, Apr 01, 2008 at 10:22:51PM +0300, Avi Kivity wrote: It's just something we discussed, not code. Yes, the pfn_valid check should skip all refcounting for mmio regions without a struct page. But gfn_to_page can't work without a struct page, so some change will be needed there. With the

Re: [kvm-devel] [patch 1/9] EMM Notifier: The notifier calls

2008-04-02 Thread Andrea Arcangeli
On Tue, Apr 01, 2008 at 01:55:32PM -0700, Christoph Lameter wrote: +/* Perform a callback */ +int __emm_notify(struct mm_struct *mm, enum emm_operation op, + unsigned long start, unsigned long end) +{ + struct emm_notifier *e = rcu_dereference(mm)-emm_notifier; + int x;

Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-02 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 07:32:35AM +0300, Avi Kivity wrote: It ought to work. gfn_to_hfn() (old gfn_to_page) will still need to take a refcount if possible. This reminds me, that mmu notifiers we could implement gfn_to_hfn only with follow_page and skip the refcounting on the struct page.

Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-02 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 12:50:50PM +0300, Avi Kivity wrote: Isn't it faster though? We don't need to pull in the cacheline containing the struct page anymore. Exactly, not only that, get_user_pages is likely a bit slower that we need for just kvm pte lookup. GRU uses follow_page directly

Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-02 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 02:16:41PM +0300, Avi Kivity wrote: Ugh, there's still mark_page_accessed() and SetPageDirty(). btw, like PG_dirty is only set if the spte is writeable, mark_page_accessed should only run if the accessed bit is set in the spte. It doesn't matter now as nobody could

Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-02 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 01:50:19PM +0200, Andrea Arcangeli wrote: if (pfn_valid(pfn)) { page = pfn_to_page(pfn); if (!PageReserved(page)) { BUG_ON(page_count(page) != 1); if (is_writeable_pte(*spte

Re: [kvm-devel] [patch 5/9] Convert anon_vma lock to rw_sem and refcount

2008-04-02 Thread Andrea Arcangeli
On Tue, Apr 01, 2008 at 01:55:36PM -0700, Christoph Lameter wrote: This results in f.e. the Aim9 brk performance test to got down by 10-15%. I guess it's more likely because of overscheduling for small crtitical sections, did you counted the total number of context switches? I guess there will

Re: [kvm-devel] EMM: Fixup return value handling of emm_notify()

2008-04-02 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 12:03:50PM -0700, Christoph Lameter wrote: + /* + * Callback may return a positive value to indicate a count + * or a negative error code. We keep the first error code + * but

Re: [kvm-devel] [patch 1/9] EMM Notifier: The notifier calls

2008-04-02 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 10:59:50AM -0700, Christoph Lameter wrote: Did I see #v10? Could you start a new subject when you post please? Do not respond to some old message otherwise the threading will be wrong. I wasn't clear enough, #v10 was in the works... I was thinking about the last two

Re: [kvm-devel] [patch 5/9] Convert anon_vma lock to rw_sem and refcount

2008-04-02 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 11:15:26AM -0700, Christoph Lameter wrote: On Wed, 2 Apr 2008, Andrea Arcangeli wrote: On Tue, Apr 01, 2008 at 01:55:36PM -0700, Christoph Lameter wrote: This results in f.e. the Aim9 brk performance test to got down by 10-15%. I guess it's more likely

[kvm-devel] [PATCH 2 of 8] Moves all mmu notifier methods outside the PT lock (first and not last

2008-04-02 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1207159010 -7200 # Node ID fe00cb9deeb31467396370c835cb808f4b85209a # Parent a406c0cc686d0ca94a4d890d661cdfa48cfba09f Moves all mmu notifier methods outside the PT lock (first and not last step to make them sleep capable

[kvm-devel] [PATCH 4 of 8] The conversion to a rwsem allows callbacks during rmap traversal

2008-04-02 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1207159011 -7200 # Node ID 3c3787c496cab1fc590ba3f97e7904bdfaab5375 # Parent d880c227ddf345f5d577839d36d150c37b653bfd The conversion to a rwsem allows callbacks during rmap traversal for files in a non atomic context. A rw

[kvm-devel] [PATCH 0 of 8] mmu notifiers #v10

2008-04-02 Thread Andrea Arcangeli
Hello, this is the mmu notifier #v10. Patches 1 and 2 are the only difference between this and EMM V2. The rest is the same as with Christoph's patches. I think maximum priority should be given in merging patch 1 and 2 into -mm and ASAP in mainline. Patches from 3 to 8 can go in -mm for testing

[kvm-devel] [PATCH 5 of 8] We no longer abort unmapping in unmap vmas because we can reschedule while

2008-04-02 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1207159055 -7200 # Node ID 316e5b1e4bf388ef0198c91b3067ed1e4171d7f6 # Parent 3c3787c496cab1fc590ba3f97e7904bdfaab5375 We no longer abort unmapping in unmap vmas because we can reschedule while unmapping since we are holding

[kvm-devel] [PATCH 8 of 8] This patch adds a lock ordering rule to avoid a potential deadlock when

2008-04-02 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1207159059 -7200 # Node ID f3f119118b0abd9c4624263ef388dc7230d937fe # Parent 31fc23193bd039cc595fba1ca149a9715f7d0fb2 This patch adds a lock ordering rule to avoid a potential deadlock when multiple mmap_sems need to be locked

[kvm-devel] [PATCH 3 of 8] Move the tlb flushing into free_pgtables. The conversion of the locks

2008-04-02 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1207159010 -7200 # Node ID d880c227ddf345f5d577839d36d150c37b653bfd # Parent fe00cb9deeb31467396370c835cb808f4b85209a Move the tlb flushing into free_pgtables. The conversion of the locks taken for reverse map scanning would

Re: [kvm-devel] [patch 1/9] EMM Notifier: The notifier calls

2008-04-02 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 02:54:52PM -0700, Christoph Lameter wrote: On Wed, 2 Apr 2008, Andrea Arcangeli wrote: Hmmm... Okay that is one solution that would just require a BUG_ON in the registration methods. Perhaps you didn't notice that this solution can't work if you call

Re: [kvm-devel] [patch 5/9] Convert anon_vma lock to rw_sem and refcount

2008-04-02 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 02:56:25PM -0700, Christoph Lameter wrote: I am a bit surprised that brk performance is that important. There may be I think it's not brk but fork that is being slowed down, did you oprofile? AIM forks a lot... The write side fast path generating the overscheduling I

Re: [kvm-devel] EMM: Require single threadedness for registration.

2008-04-02 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 03:06:19PM -0700, Christoph Lameter wrote: On Thu, 3 Apr 2008, Andrea Arcangeli wrote: That would work for #v10 if I remove the invalidate_range_start from try_to_unmap_cluster, it can't work for EMM because you've emm_invalidate_start firing anywhere outside

Re: [kvm-devel] [PATCH 1 of 8] Core of mmu notifiers

2008-04-02 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 03:34:01PM -0700, Christoph Lameter wrote: Still two methods ... Yes, the invalidate_page is called with the core VM holding a reference on the page _after_ the tlb flush. The invalidate_end is called after the page has been freed already and after the tlb flush. They've

Re: [kvm-devel] EMM: Fixup return value handling of emm_notify()

2008-04-03 Thread Andrea Arcangeli
On Thu, Apr 03, 2008 at 12:40:46PM +0200, Peter Zijlstra wrote: It seems to me that common code can be shared using functions? No need FWIW I prefer separate methods. kvm patch using mmu notifiers shares 99% of the code too between the two different methods implemented indeed. Code sharing is

Re: [kvm-devel] EMM: disable other notifiers before register and unregister

2008-04-03 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 06:24:15PM -0700, Christoph Lameter wrote: Ok lets forget about the single theaded thing to solve the registration races. As Andrea pointed out this still has ssues with other subscribed subsystems (and also try_to_unmap). We could do something like what

Re: [kvm-devel] EMM: disable other notifiers before register and unregister

2008-04-04 Thread Andrea Arcangeli
On Thu, Apr 03, 2008 at 12:20:41PM -0700, Christoph Lameter wrote: On Thu, 3 Apr 2008, Andrea Arcangeli wrote: My attempt to fix this once and for all is to walk all vmas of the mm inside mmu_notifier_register and take all anon_vma locks and i_mmap_locks in virtual address order in a row

[kvm-devel] [PATCH] mmu notifier #v11

2008-04-04 Thread Andrea Arcangeli
of this one. Andrew can you apply this to -mm? Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] Signed-off-by: Nick Piggin [EMAIL PROTECTED] Signed-off-by: Christoph Lameter [EMAIL PROTECTED] diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h

Re: [kvm-devel] [PATCH] mmu notifier #v11

2008-04-04 Thread Andrea Arcangeli
On Fri, Apr 04, 2008 at 03:06:18PM -0700, Christoph Lameter wrote: Adds some comments. Still objectionable is the multiple ways of invalidating pages in #v11. Callout now has similar locking to emm. range_begin exists because range_end is called after the page has already been freed.

<    1   2   3   4   >