Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-09 Thread Andrea Arcangeli
On Fri, May 09, 2008 at 08:37:29PM +0200, Peter Zijlstra wrote: Another possibility, would something like this work? /* * null out the begin function, no new begin calls can be made */ rcu_assing_pointer(my_notifier.invalidate_start_begin, NULL); /* * lock/unlock all rmap

[kvm-devel] [PATCH 001/001] mmu-notifier-core v17

2008-05-09 Thread Andrea Arcangeli
From: Andrea Arcangeli [EMAIL PROTECTED] With KVM/GFP/XPMEM there isn't just the primary CPU MMU pointing to pages. There are secondary MMUs (with secondary sptes and secondary tlbs) too. sptes in the kvm case are shadow pagetables, but when I say spte in mmu-notifier context, I mean secondary

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-08 Thread Andrea Arcangeli
On Thu, May 08, 2008 at 09:11:33AM -0700, Linus Torvalds wrote: Btw, this is an issue only on 32-bit x86, because on 64-bit one we already have the padding due to the alignment of the 64-bit pointers in the list_head (so there's already empty space there). On 32-bit, the alignment of

[kvm-devel] [PATCH 04 of 11] free-pgtables

2008-05-07 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1210115130 -7200 # Node ID 34f6a4bf67ce66714ba2d5c13a5fed241d34fb09 # Parent d60d200565abde6a8ed45271e53cde9c5c75b426 free-pgtables Move the tlb flushing into free_pgtables. The conversion of the locks taken for reverse map

[kvm-devel] [PATCH 03 of 11] invalidate_page outside PT lock

2008-05-07 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1210115129 -7200 # Node ID d60d200565abde6a8ed45271e53cde9c5c75b426 # Parent c5badbefeee07518d9d1acca13e94c981420317c invalidate_page outside PT lock Moves all mmu notifier methods outside the PT lock (first and not last step

[kvm-devel] [PATCH 10 of 11] export zap_page_range for XPMEM

2008-05-07 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1210115797 -7200 # Node ID 5b2eb7d28a4517daf91b08b4dcfbb58fd2b42d0b # Parent 94eaa1515369e8ef183e2457f6f25a7f36473d70 export zap_page_range for XPMEM XPMEM would have used sys_madvise() except that madvise_dontneed() returns

[kvm-devel] [PATCH 11 of 11] mmap sems

2008-05-07 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1210115798 -7200 # Node ID eb924315351f6b056428e35c983ad28040420fea # Parent 5b2eb7d28a4517daf91b08b4dcfbb58fd2b42d0b mmap sems This patch adds a lock ordering rule to avoid a potential deadlock when multiple mmap_sems need

[kvm-devel] [PATCH 06 of 11] rwsem contended

2008-05-07 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1210115132 -7200 # Node ID 0621238970155f8ff2d60ca4996dcdd470f9c6ce # Parent 20bc6a66a86ef6bd60919cc77ff51d4af741b057 rwsem contended Add a function to rw_semaphores to check if there are any processes waiting

[kvm-devel] [PATCH 09 of 11] mm_lock-rwsem

2008-05-07 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1210115508 -7200 # Node ID 94eaa1515369e8ef183e2457f6f25a7f36473d70 # Parent 6b384bb988786aa78ef07440180e4b2948c4c6a2 mm_lock-rwsem Convert mm_lock to use semaphores after i_mmap_lock and anon_vma_lock conversion. Signed-off

[kvm-devel] [PATCH 05 of 11] unmap vmas tlb flushing

2008-05-07 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1210115131 -7200 # Node ID 20bc6a66a86ef6bd60919cc77ff51d4af741b057 # Parent 34f6a4bf67ce66714ba2d5c13a5fed241d34fb09 unmap vmas tlb flushing Move the tlb flushing inside of unmap vmas. This saves us from passing a pointer

[kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1210115136 -7200 # Node ID 6b384bb988786aa78ef07440180e4b2948c4c6a2 # Parent 58f716ad4d067afb6bdd1b5f7042e19d854aae0d anon-vma-rwsem Convert the anon_vma spinlock to a rw semaphore. This allows concurrent traversal of reverse

[kvm-devel] [PATCH 07 of 11] i_mmap_rwsem

2008-05-07 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1210115135 -7200 # Node ID 58f716ad4d067afb6bdd1b5f7042e19d854aae0d # Parent 0621238970155f8ff2d60ca4996dcdd470f9c6ce i_mmap_rwsem The conversion to a rwsem allows notifier callbacks during rmap traversal for files. A rw style

[kvm-devel] [PATCH 01 of 11] mmu-notifier-core

2008-05-07 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1210096013 -7200 # Node ID e20917dcc8284b6a07cfcced13dda4cbca850a9c # Parent 5026689a3bc323a26d33ad882c34c4c9c9a3ecd8 mmu-notifier-core With KVM/GFP/XPMEM there isn't just the primary CPU MMU pointing to pages

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-05-07 Thread Andrea Arcangeli
On Tue, Apr 29, 2008 at 06:03:40PM +0200, Andrea Arcangeli wrote: Christoph if you've interest in evolving anon-vma-sem and i_mmap_sem yourself in this direction, you're very welcome to go ahead while I In case you didn't notice this already, for a further explanation of why semaphores runs

[kvm-devel] [PATCH 00 of 11] mmu notifier #v16

2008-05-07 Thread Andrea Arcangeli
Hello, this is the last update of the mmu notifier patch. Jack asked a __mmu_notifier_register to call under mmap_sem in write mode. Here an update with that change plus allowing -release not to be implemented (two liner change to mmu_notifier.c). The entire diff between v15 and v16

Re: [kvm-devel] [PATCH 02 of 11] get_task_mm

2008-05-07 Thread Andrea Arcangeli
On Wed, May 07, 2008 at 10:59:48AM -0500, Robin Holt wrote: You can drop this patch. This turned out to be a race in xpmem. It appeared as if it were a race in get_task_mm, but it really is not. The current-mm field is cleared under the task_lock and the task_lock is grabbed by

Re: [kvm-devel] [PATCH 03 of 11] invalidate_page outside PT lock

2008-05-07 Thread Andrea Arcangeli
On Wed, May 07, 2008 at 01:39:43PM -0400, Rik van Riel wrote: Would it be an idea to merge them into one, so the first patch introduces the right conventions directly? The only reason this isn't merged into one, is that this requires non obvious (not difficult though) to the core VM code. I

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
On Wed, May 07, 2008 at 01:56:23PM -0700, Linus Torvalds wrote: This also looks very debatable indeed. The only performance numbers quoted are: This results in f.e. the Aim9 brk performance test to got down by 10-15%. which just seems like a total disaster. The whole series looks

Re: [kvm-devel] [PATCH 01 of 11] mmu-notifier-core

2008-05-07 Thread Andrea Arcangeli
On Wed, May 07, 2008 at 01:30:39PM -0700, Linus Torvalds wrote: On Wed, 7 May 2008, Andrew Morton wrote: The patch looks OK to me. As far as I can tell, authorship has been destroyed by at least two of the patches (ie Christoph seems to be the author, but Andrea seems to have

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
On Wed, May 07, 2008 at 02:36:57PM -0700, Linus Torvalds wrote: had to do any blocking I/O during vmtruncate before, now we have to. I really suspect we don't really have to, and that it would be better to just fix the code that does that. I'll let you discuss with Christoph and Robin

Re: [kvm-devel] [PATCH 01 of 11] mmu-notifier-core

2008-05-07 Thread Andrea Arcangeli
On Wed, May 07, 2008 at 03:11:10PM -0700, Linus Torvalds wrote: On Wed, 7 May 2008, Andrea Arcangeli wrote: As far as I can tell, authorship has been destroyed by at least two of the patches (ie Christoph seems to be the author, but Andrea seems to have dropped that fact

Re: [kvm-devel] [PATCH 01 of 11] mmu-notifier-core

2008-05-07 Thread Andrea Arcangeli
On Thu, May 08, 2008 at 12:27:58AM +0200, Andrea Arcangeli wrote: I rechecked and I guarantee that the patches where Christoph isn't listed are developed by myself and he didn't write a single line on them. In any case I expect Christoph to review (he's CCed) and to point me to any attribution

Re: [kvm-devel] [ofa-general] Re: [PATCH 01 of 11] mmu-notifier-core

2008-05-07 Thread Andrea Arcangeli
On Wed, May 07, 2008 at 03:31:08PM -0700, Roland Dreier wrote: I think the point you're missing is that any patches written by Christoph need a line like From: Christoph Lameter [EMAIL PROTECTED] at the top of the body so that Christoph becomes the author when it is committed into git.

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
On Wed, May 07, 2008 at 03:31:03PM -0700, Andrew Morton wrote: Nope. We only need to take the global lock before taking *two or more* of the per-vma locks. I really wish I'd thought of that. I don't see how you can avoid taking the system-wide-global lock before every single

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
On Wed, May 07, 2008 at 03:44:24PM -0700, Linus Torvalds wrote: On Thu, 8 May 2008, Andrea Arcangeli wrote: Unfortunately the lock you're talking about would be: static spinlock_t global_lock = ... There's no way to make it more granular. Right. So what? It's still about

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
To remove mm_lock without adding an horrible system-wide lock before every i_mmap_lock etc.. we've to remove invalidate_range_begin/end. Then we can return to an older approach of doing only invalidate_page and serializing it with the PT lock against get_user_pages. That works fine for KVM but GRU

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
Hi Andrew, On Wed, May 07, 2008 at 03:59:14PM -0700, Andrew Morton wrote: CPU0: CPU1: spin_lock(global_lock) spin_lock(a-lock); spin_lock(b-lock); == mmu_notifier_register() spin_lock(b-lock);

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
On Thu, May 08, 2008 at 09:28:38AM +1000, Benjamin Herrenschmidt wrote: On Thu, 2008-05-08 at 00:44 +0200, Andrea Arcangeli wrote: Please note, we can't allow a thread to be in the middle of zap_page_range while mmu_notifier_register runs. You said yourself that mmu_notifier_register

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
On Wed, May 07, 2008 at 06:02:49PM -0700, Linus Torvalds wrote: You replace mm_lock() with the sequence that Andrew gave you (and I described): spin_lock(global_lock) .. get all locks UNORDERED .. spin_unlock(global_lock) and you're now done. You have your mm_lock()

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
Sorry for not having completely answered to this. I initially thought stop_machine could work when you mentioned it, but I don't think it can even removing xpmem block-inside-mmu-notifier-method requirements. For stop_machine to solve this (besides being slower and potentially not more safe as

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
On Wed, May 07, 2008 at 06:39:48PM -0700, Linus Torvalds wrote: On Wed, 7 May 2008, Christoph Lameter wrote: (That said, we're not running out of vm flags yet, and if we were, we could just add another word. We're already wasting that space right now on 64-bit by calling it

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
On Wed, May 07, 2008 at 06:57:05PM -0700, Linus Torvalds wrote: Take five minutes. Take a deep breadth. And *think* about actually reading what I wrote. The bitflag *can* prevent taking the same lock twice. It just needs to be in the right place. It's not that I didn't read it, but to do

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
On Wed, May 07, 2008 at 06:12:32PM -0700, Christoph Lameter wrote: Andrea's mm_lock could have wider impact. It is the first effective way that I have seen of temporarily holding off reclaim from an address space. It sure is a brute force approach. The only improvement I can imagine on

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
On Wed, May 07, 2008 at 08:10:33PM -0700, Christoph Lameter wrote: On Thu, 8 May 2008, Andrea Arcangeli wrote: to the sort function to break the loop. After that we remove the 512 vma cap and mm_lock is free to run as long as it wants like /dev/urandom, nobody can care less how long

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
On Wed, May 07, 2008 at 09:14:45PM -0700, Linus Torvalds wrote: IOW, you didn't even look at it, did you? Actually I looked both at the struct and at the slab alignment just in case it was changed recently. Now after reading your mail I also compiled it just in case. 2.6.26-rc1 # name

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-07 Thread Andrea Arcangeli
On Thu, May 08, 2008 at 08:30:20AM +0300, Pekka Enberg wrote: On Thu, May 8, 2008 at 8:27 AM, Pekka Enberg [EMAIL PROTECTED] wrote: You might want to read carefully what Linus wrote: The one that already has a 4 byte padding thing on x86-64 just after the spinlock? And that on 32-bit

Re: [kvm-devel] [PATCH 01 of 11] mmu-notifier-core

2008-05-06 Thread Andrea Arcangeli
On Mon, May 05, 2008 at 02:46:25PM -0500, Jack Steiner wrote: If a task fails to unmap a GRU segment, they still exist at the start of Yes, this will also happen in case the well behaved task receives SIGKILL, so you can test it that way too. exit. On the -release callout, I set a flag in the

[kvm-devel] mmu notifier v15 - v16 diff

2008-05-06 Thread Andrea Arcangeli
Hello everyone, This is to allow GRU code to call __mmu_notifier_register inside the mmap_sem (write mode is required as documented in the patch). It also removes the requirement to implement -release as it's not guaranteed all users will really need it. I didn't integrate the search function

Re: [kvm-devel] [PATCH 01 of 11] mmu-notifier-core

2008-05-05 Thread Andrea Arcangeli
On Mon, May 05, 2008 at 11:21:13AM -0500, Jack Steiner wrote: The GRU does the registration/deregistration of mmu notifiers from mmap/munmap. At this point, the mmap_sem is already held writeable. I hit a deadlock in mm_lock. It'd been better to know about this detail earlier, but frankly

Re: [kvm-devel] [PATCH 01 of 11] mmu-notifier-core

2008-05-05 Thread Andrea Arcangeli
On Mon, May 05, 2008 at 12:25:06PM -0500, Jack Steiner wrote: Agree. My apologies... I should have caught it. No problem. __mmu_notifier_register/__mmu_notifier_unregister seems like a better way to go, although either is ok. If you also like __mmu_notifier_register more I'll go with it. The

Re: [kvm-devel] [PATCH 01 of 11] mmu-notifier-core

2008-05-04 Thread Andrea Arcangeli
On Sun, May 04, 2008 at 02:13:45PM -0500, Robin Holt wrote: diff --git a/mm/Kconfig b/mm/Kconfig --- a/mm/Kconfig +++ b/mm/Kconfig @@ -205,3 +205,6 @@ config VIRT_TO_BUS config VIRT_TO_BUS def_bool y depends on !ARCH_NO_VIRT_TO_BUS + +config MMU_NOTIFIER + bool

[kvm-devel] kvm mmu notifier update

2008-05-03 Thread Andrea Arcangeli
of runtime failure). Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 8d45fab..ce3251c 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -21,6 +21,7 @@ config KVM tristate Kernel-based Virtual Machine (KVM

[kvm-devel] [PATCH 00 of 11] mmu notifier #v15

2008-05-02 Thread Andrea Arcangeli
Hello everyone, 1/11 is the latest version of the mmu-notifier-core patch. As usual all later 2-11/11 patches follows but those aren't meant for 2.6.26. Thanks! Andrea - This SF.net email is sponsored by the 2008

[kvm-devel] [PATCH 02 of 11] get_task_mm

2008-05-02 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1209740185 -7200 # Node ID c85c85c4be165eb6de16136bb97cf1fa7fd5c88f # Parent 1489529e7b53d3f2dab8431372aa4850ec821caa get_task_mm get_task_mm should not succeed if mmput() is running and has reduced the mm_users count to zero

[kvm-devel] [PATCH 01 of 11] mmu-notifier-core

2008-05-02 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1209740175 -7200 # Node ID 1489529e7b53d3f2dab8431372aa4850ec821caa # Parent 5026689a3bc323a26d33ad882c34c4c9c9a3ecd8 mmu-notifier-core With KVM/GFP/XPMEM there isn't just the primary CPU MMU pointing to pages

[kvm-devel] [PATCH 05 of 11] unmap vmas tlb flushing

2008-05-02 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1209740186 -7200 # Node ID a8ac53b928dfcea0ccb326fb7d71f908f0df85f4 # Parent 14e9f5a12bb1657fa6756e18d5dac71d4ad1a55e unmap vmas tlb flushing Move the tlb flushing inside of unmap vmas. This saves us from passing a pointer

[kvm-devel] [PATCH 04 of 11] free-pgtables

2008-05-02 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1209740185 -7200 # Node ID 14e9f5a12bb1657fa6756e18d5dac71d4ad1a55e # Parent ea8fc9187b6d3ef2742061b4f62598afe55281cf free-pgtables Move the tlb flushing into free_pgtables. The conversion of the locks taken for reverse map

[kvm-devel] [PATCH 06 of 11] rwsem contended

2008-05-02 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1209740186 -7200 # Node ID 74b873f3ea07012e2fc864f203edf1179865feb1 # Parent a8ac53b928dfcea0ccb326fb7d71f908f0df85f4 rwsem contended Add a function to rw_semaphores to check if there are any processes waiting

[kvm-devel] [PATCH 07 of 11] i_mmap_rwsem

2008-05-02 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1209740186 -7200 # Node ID de28c85baef11b90c993047ca851a2f52c85a5be # Parent 74b873f3ea07012e2fc864f203edf1179865feb1 i_mmap_rwsem The conversion to a rwsem allows notifier callbacks during rmap traversal for files. A rw style

[kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

2008-05-02 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1209740186 -7200 # Node ID 0be678c52e540d5f5d5fd9af549b57b9bb018d32 # Parent de28c85baef11b90c993047ca851a2f52c85a5be anon-vma-rwsem Convert the anon_vma spinlock to a rw semaphore. This allows concurrent traversal of reverse

[kvm-devel] [PATCH 09 of 11] mm_lock-rwsem

2008-05-02 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1209740226 -7200 # Node ID 721c3787cd42043734331e54a42eb20c51766f71 # Parent 0be678c52e540d5f5d5fd9af549b57b9bb018d32 mm_lock-rwsem Convert mm_lock to use semaphores after i_mmap_lock and anon_vma_lock conversion. Signed-off

[kvm-devel] [PATCH 10 of 11] export zap_page_range for XPMEM

2008-05-02 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1209740229 -7200 # Node ID 4f462fb3dff614cd7d971219c3feaef0b43359c1 # Parent 721c3787cd42043734331e54a42eb20c51766f71 export zap_page_range for XPMEM XPMEM would have used sys_madvise() except that madvise_dontneed() returns

[kvm-devel] [PATCH 11 of 11] mmap sems

2008-05-02 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1209740229 -7200 # Node ID b4bf6df98bc00bfbef9423b0dd31cfdba63a5eeb # Parent 4f462fb3dff614cd7d971219c3feaef0b43359c1 mmap sems This patch adds a lock ordering rule to avoid a potential deadlock when multiple mmap_sems need

Re: [kvm-devel] fx_init schedule in atomic

2008-05-02 Thread Andrea Arcangeli
On Fri, May 02, 2008 at 12:28:32PM +0300, Avi Kivity wrote: Applied, thanks. Dynamic allocation for the fpu state was introduced in 2.6.26-rc, right? It seems very recent, hit mainline on 30 Apr. Also we may want to think if there's something cheaper than fx_save to trigger a math exception

[kvm-devel] mmu notifier-core v14-v15 diff for review

2008-05-01 Thread Andrea Arcangeli
Hello everyone, this is the v14 to v15 difference to the mmu-notifier-core patch. This is just for review of the difference, I'll post full v15 soon, please review the diff in the meantime. Lots of those cleanups are thanks to Andrew review on mmu-notifier-core in v14. He also spotted the

Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)

2008-04-30 Thread Andrea Arcangeli
On Wed, Apr 30, 2008 at 11:59:47AM +0300, Avi Kivity wrote: The code is not trying to find a vma for the address, but a vma for the address which also has VM_PFNMAP set. The cases for vma not found, or vma found, but not VM_PFNMAP, are folded together. Muli's saying the comparison is

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-29 Thread Andrea Arcangeli
Hi Hugh!! On Tue, Apr 29, 2008 at 11:49:11AM +0100, Hugh Dickins wrote: [I'm scarcely following the mmu notifiers to-and-fro, which seems to be in good hands, amongst faster thinkers than me: who actually need and can test this stuff. Don't let me slow you down; but I can quickly clarify on

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-29 Thread Andrea Arcangeli
On Mon, Apr 28, 2008 at 06:28:06PM -0700, Christoph Lameter wrote: On Tue, 29 Apr 2008, Andrea Arcangeli wrote: Frankly I've absolutely no idea why rcu is needed in all rmap code when walking the page-mapping. Definitely the PG_locked is taken so there's no way page-mapping could possibly

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-29 Thread Andrea Arcangeli
On Tue, Apr 29, 2008 at 10:50:30AM -0500, Robin Holt wrote: You have said this continually about a CONFIG option. I am unsure how that could be achieved. Could you provide a patch? I'm busy with the reserved ram patch against 2.6.25 and latest kvm.git that is moving from pages to pfn for pci

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-28 Thread Andrea Arcangeli
On Mon, Apr 28, 2008 at 01:34:11PM -0700, Christoph Lameter wrote: On Sun, 27 Apr 2008, Andrea Arcangeli wrote: Talking about post 2.6.26: the refcount with rcu in the anon-vma conversion seems unnecessary and may explain part of the AIM slowdown too. The rest looks ok and probably we

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-27 Thread Andrea Arcangeli
On Sat, Apr 26, 2008 at 08:17:34AM -0500, Robin Holt wrote: the first four sets. The fifth is the oversubscription test which trips my xpmem bug. This is as good as the v12 runs from before. Now that mmu-notifier-core #v14 seems finished and hopefully will appear in 2.6.26 ;), I started

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-26 Thread Andrea Arcangeli
On Sat, Apr 26, 2008 at 08:17:34AM -0500, Robin Holt wrote: Since this include and the one for mm_types.h both are build breakages for ia64, I think you need to apply your ia64_cpumask and the following (possibly as a single patch) first or in your patch 1. Without that, ia64 doing a

[kvm-devel] fix external module compile

2008-04-26 Thread Andrea Arcangeli
-compat in the same place with the other includes where `pwd` works instead of $(src) that doesn't work anymore for whatever reason. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/kernel/Kbuild b/kernel/Kbuild index cabfc75..d9245eb 100644 --- a/kernel/Kbuild +++ b/kernel/Kbuild @@ -1,4

Re: [kvm-devel] mmu notifier #v14

2008-04-26 Thread Andrea Arcangeli
On Sat, Apr 26, 2008 at 01:59:23PM -0500, Anthony Liguori wrote: +static void kvm_unmap_spte(struct kvm *kvm, u64 *spte) +{ +struct page *page = pfn_to_page((*spte PT64_BASE_ADDR_MASK) PAGE_SHIFT); +get_page(page); You should not assume a struct page exists for any given

Re: [kvm-devel] mmu notifier #v14

2008-04-26 Thread Andrea Arcangeli
that passes my swap test (the only missing thing is the out_lock cleanup). Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 8d45fab..ce3251c 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -21,6 +21,7 @@ config KVM

Re: [kvm-devel] [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen

2008-04-25 Thread Andrea Arcangeli
I somehow lost missed this email in my inbox, found it now because it was strangely still unread... Sorry for the late reply! On Tue, Apr 22, 2008 at 03:06:24PM +1000, Rusty Russell wrote: On Wednesday 09 April 2008 01:44:04 Andrea Arcangeli wrote: --- a/include/linux/mm.h +++ b/include

Re: [kvm-devel] [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen

2008-04-25 Thread Andrea Arcangeli
On Fri, Apr 25, 2008 at 06:56:39PM +0200, Andrea Arcangeli wrote: + data-i_mmap_locks = vmalloc(nr_i_mmap_locks * + sizeof(spinlock_t)); This is why non-typesafe allocators suck. You want 'sizeof(spinlock_t *)' here. + data

Re: [kvm-devel] [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen

2008-04-25 Thread Andrea Arcangeli
On Fri, Apr 25, 2008 at 02:25:32PM -0500, Robin Holt wrote: I think you still need mm_lock (unless I miss something). What happens when one callout is scanning mmu_notifier_invalidate_range_start() and you unlink. That list next pointer with LIST_POISON1 which is a really bad address for the

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-24 Thread Andrea Arcangeli
On Thu, Apr 24, 2008 at 12:19:28AM +0200, Andrea Arcangeli wrote: /dev/kvm closure. Given this can be a considered a bugfix to mmu_notifier_unregister I'll apply it to 1/N and I'll release a new I'm not sure anymore this can be considered a bugfix given how large change this resulted

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-24 Thread Andrea Arcangeli
the mmap_sem in addition or in replacement of the unregister_lock. The srcu_read_lock can also likely moved just before releasing the unregister_lock but that's just a minor optimization to make the code more strict. On Thu, Apr 24, 2008 at 08:49:40AM +0200, Andrea Arcangeli wrote: ... diff --git a/mm

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-24 Thread Andrea Arcangeli
On Thu, Apr 24, 2008 at 05:39:43PM +0200, Andrea Arcangeli wrote: There's at least one small issue I noticed so far, that while _release don't need to care about _register, but _unregister definitely need to care about _register. I've to take the mmap_sem in addition or in In the end the best

Re: [kvm-devel] [PATCH 00 of 12] mmu notifier #v13

2008-04-23 Thread Andrea Arcangeli
On Tue, Apr 22, 2008 at 01:30:53PM -0700, Christoph Lameter wrote: One solution would be to separate the invalidate_page() callout into a patch at the very end that can be omitted. AFACIT There is no compelling reason to have this callback and it complicates the API for the device driver

Re: [kvm-devel] [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last

2008-04-23 Thread Andrea Arcangeli
On Tue, Apr 22, 2008 at 04:14:26PM -0700, Christoph Lameter wrote: We want a full solution and this kind of patching makes the patches difficuilt to review because later patches revert earlier ones. I know you rather want to see KVM development stalled for more months than to get a partial

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-23 Thread Andrea Arcangeli
On Tue, Apr 22, 2008 at 06:07:27PM -0500, Robin Holt wrote: The only other change I did has been to move mmu_notifier_unregister at the end of the patchset after getting more questions about its reliability and I documented a bit the rmmod requirements for -release. we'll think later if it

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-23 Thread Andrea Arcangeli
later in a backwards compatible way (plus we're perfectly fine with the API having not backwards compatible changes as long as 2.6.26 can work for us). - Implement unregister but it's not reliable, only -release is reliable. Signed-off-by: Andrea Arcangeli [EMAIL

Re: [kvm-devel] [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last

2008-04-23 Thread Andrea Arcangeli
On Wed, Apr 23, 2008 at 10:45:36AM -0500, Robin Holt wrote: XPMEM has passed all regression tests using your version 12 notifiers. That's great news, thanks! I'd greatly appreciate if you could test #v13 too as I posted it. It already passed GRU and KVM regressions tests and it should work fine

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-23 Thread Andrea Arcangeli
On Tue, Apr 22, 2008 at 04:20:35PM -0700, Christoph Lameter wrote: I guess I have to prepare another patchset then? If you want to embarrass yourself three time in a row go ahead ;). I thought two failed takeovers was enough.

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-23 Thread Andrea Arcangeli
On Tue, Apr 22, 2008 at 07:28:49PM -0500, Jack Steiner wrote: The GRU driver unregisters the notifier when all GRU mappings are unmapped. I could make it work either way - either with or without an unregister function. However, unregister is the most logical action to take when all mappings

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-23 Thread Andrea Arcangeli
On Wed, Apr 23, 2008 at 06:26:29PM +0200, Andrea Arcangeli wrote: On Tue, Apr 22, 2008 at 04:20:35PM -0700, Christoph Lameter wrote: I guess I have to prepare another patchset then? Apologies for my previous not too polite comment in answer to the above, but I thought this double patchset

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-23 Thread Andrea Arcangeli
On Wed, Apr 23, 2008 at 12:09:09PM -0500, Jack Steiner wrote: You may have spotted this already. If so, just ignore this. It looks like there is a bug in copy_page_range() around line 667. It's possible to do a mmu_notifier_invalidate_range_start(), then return -ENOMEM w/o doing a

Re: [kvm-devel] [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last

2008-04-23 Thread Andrea Arcangeli
On Wed, Apr 23, 2008 at 11:02:18AM -0700, Christoph Lameter wrote: We have had this workaround effort done years ago and have been suffering the ill effects of pinning for years. Had to deal with Yes. In addition to the pinning, there's lot of additional tlb flushing work to do in kvm without

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-23 Thread Andrea Arcangeli
On Wed, Apr 23, 2008 at 11:09:35AM -0700, Christoph Lameter wrote: Why is there still the hlist stuff being used for the mmu notifier list? And why is this still unsafe? What's the problem with hlist, it saves 8 bytes for each mm_struct, you should be using it too instead of list. There are

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-23 Thread Andrea Arcangeli
On Wed, Apr 23, 2008 at 11:19:26AM -0700, Christoph Lameter wrote: If unregister fails then the driver should not detach from the address space immediately but wait until --release is called. That may be a possible solution. It will be rare that the unregister fails. This is the current idea,

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-23 Thread Andrea Arcangeli
On Wed, Apr 23, 2008 at 11:21:49AM -0700, Christoph Lameter wrote: No I really want you to do this. I have no interest in a takeover in the Ok if you want me to do this, I definitely prefer the core to go in now. It's so much easier to concentrate on two problems at different times then to

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-23 Thread Andrea Arcangeli
On Wed, Apr 23, 2008 at 11:27:21AM -0700, Christoph Lameter wrote: There is a potential issue in move_ptes where you call invalidate_range_end after dropping i_mmap_sem whereas my patches did the opposite. Mmap_sem saves you there? Yes, there's really no risk of races in this area after

Re: [kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-23 Thread Andrea Arcangeli
On Wed, Apr 23, 2008 at 06:37:13PM +0200, Andrea Arcangeli wrote: I'm afraid if you don't want to worst-case unregister with -release you need to have a better idea than my mm_lock and personally I can't see any other way than mm_lock to ensure not to miss range_begin... But wait

Re: [kvm-devel] [PATCH 0 of 9] mmu notifier #v12

2008-04-22 Thread Andrea Arcangeli
On Tue, Apr 22, 2008 at 09:20:26AM +0200, Andrea Arcangeli wrote: invalidate_range_start { spin_lock(kvm-mmu_lock); kvm-invalidate_range_count++; rmap-invalidate of sptes in range write_seqlock; write_sequnlock; spin_unlock(kvm-mmu_lock

Re: [kvm-devel] [PATCH 0 of 9] mmu notifier #v12

2008-04-22 Thread Andrea Arcangeli
On Tue, Apr 22, 2008 at 08:01:20AM -0500, Robin Holt wrote: On Tue, Apr 22, 2008 at 02:00:56PM +0200, Andrea Arcangeli wrote: On Tue, Apr 22, 2008 at 09:20:26AM +0200, Andrea Arcangeli wrote: invalidate_range_start { spin_lock(kvm-mmu_lock); kvm-invalidate_range_count

Re: [kvm-devel] [PATCH 0 of 9] mmu notifier #v12

2008-04-22 Thread Andrea Arcangeli
On Tue, Apr 22, 2008 at 08:36:04AM -0500, Robin Holt wrote: I am a little confused about the value of the seq_lock versus a simple atomic, but I assumed there is a reason and left it at that. There's no value for anything but get_user_pages (get_user_pages takes its own lock internally though).

[kvm-devel] [PATCH 01 of 12] Core of mmu notifiers

2008-04-22 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1208870142 -7200 # Node ID ea87c15371b1bd49380c40c3f15f1c7ca4438af5 # Parent fb3bc9942fb78629d096bd07564f435d51d86e5f Core of mmu notifiers. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] Signed-off-by: Nick Piggin [EMAIL

[kvm-devel] [PATCH 02 of 12] Fix ia64 compilation failure because of common code include bug

2008-04-22 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1208872186 -7200 # Node ID 3c804dca25b15017b22008647783d6f5f3801fa9 # Parent ea87c15371b1bd49380c40c3f15f1c7ca4438af5 Fix ia64 compilation failure because of common code include bug. Signed-off-by: Andrea Arcangeli [EMAIL

[kvm-devel] [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last

2008-04-22 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1208872186 -7200 # Node ID ac9bb1fb3de2aa5d27210a28edf24f6577094076 # Parent a6672bdeead0d41b2ebd6846f731d43a611645b7 Moves all mmu notifier methods outside the PT lock (first and not last step to make them sleep capable

[kvm-devel] [PATCH 10 of 12] Convert mm_lock to use semaphores after i_mmap_lock and anon_vma_lock

2008-04-22 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1208872187 -7200 # Node ID f8210c45f1c6f8b38d15e5dfebbc5f7c1f890c93 # Parent bdb3d928a0ba91cdce2b61bd40a2f80bddbe4ff2 Convert mm_lock to use semaphores after i_mmap_lock and anon_vma_lock conversion. Signed-off-by: Andrea

[kvm-devel] [PATCH 05 of 12] Move the tlb flushing into free_pgtables. The conversion of the locks

2008-04-22 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1208872186 -7200 # Node ID ee8c0644d5f67c1ef59142cce91b0bb6f34a53e0 # Parent ac9bb1fb3de2aa5d27210a28edf24f6577094076 Move the tlb flushing into free_pgtables. The conversion of the locks taken for reverse map scanning would

[kvm-devel] [PATCH 12 of 12] This patch adds a lock ordering rule to avoid a potential deadlock when

2008-04-22 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1208872187 -7200 # Node ID e847039ee2e815088661933b7195584847dc7540 # Parent 128d705f38c8a774ac11559db445787ce6e91c77 This patch adds a lock ordering rule to avoid a potential deadlock when multiple mmap_sems need to be locked

[kvm-devel] [PATCH 11 of 12] XPMEM would have used sys_madvise() except that madvise_dontneed()

2008-04-22 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1208872187 -7200 # Node ID 128d705f38c8a774ac11559db445787ce6e91c77 # Parent f8210c45f1c6f8b38d15e5dfebbc5f7c1f890c93 XPMEM would have used sys_madvise() except that madvise_dontneed() returns an -EINVAL if VM_PFNMAP is set

[kvm-devel] [PATCH 06 of 12] Move the tlb flushing inside of unmap vmas. This saves us from passing

2008-04-22 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1208872186 -7200 # Node ID fbce3fecb033eb3fba1d9c2398ac74401ce0ecb5 # Parent ee8c0644d5f67c1ef59142cce91b0bb6f34a53e0 Move the tlb flushing inside of unmap vmas. This saves us from passing a pointer to the TLB structure around

[kvm-devel] [PATCH 00 of 12] mmu notifier #v13

2008-04-22 Thread Andrea Arcangeli
Hello, This is the latest and greatest version of the mmu notifier patch #v13. Changes are mainly in the mm_lock that uses sort() suggested by Christoph. This reduces the complexity from O(N**2) to O(N*log(N)). I folded the mm_lock functionality together with the mmu-notifier-core 1/12 patch to

[kvm-devel] [PATCH 03 of 12] get_task_mm should not succeed if mmput() is running and has reduced

2008-04-22 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1208872186 -7200 # Node ID a6672bdeead0d41b2ebd6846f731d43a611645b7 # Parent 3c804dca25b15017b22008647783d6f5f3801fa9 get_task_mm should not succeed if mmput() is running and has reduced the mm_users count to zero. This can

[kvm-devel] [PATCH 09 of 12] Convert the anon_vma spinlock to a rw semaphore. This allows concurrent

2008-04-22 Thread Andrea Arcangeli
# HG changeset patch # User Andrea Arcangeli [EMAIL PROTECTED] # Date 1208872187 -7200 # Node ID bdb3d928a0ba91cdce2b61bd40a2f80bddbe4ff2 # Parent 6e04df1f4284689b1c46e57a67559abe49ecf292 Convert the anon_vma spinlock to a rw semaphore. This allows concurrent traversal of reverse maps

  1   2   3   4   >