Re: [RFC 0/2] VFIO SRIOV support
On 24/12/2015 15:51, Alex Williamson wrote: > No. A privileged entity needs to grant a user ownership of a group and > sufficient locked memory limits to make it useful, but then use of the > group does not require root permission. So we're thinking how we can force the VFs in these cases to be in the same IOMMU group with the PF, and make sure it is vfio-pci that probes them. We thought about the following: We could add a flag to pci_dev->dev_flags on the PF, that says that the PF's VFs must be in the same IOMMU group with it. Modify iommu_group_get_for_pci_dev() so that it will return the PFs group for VFs whose PF has that flag set. In the vfio_group_nb_add_dev() function set driver_override to "vfio-pci" for PCI devices that are added to a live group. That would prevent the host from probing these devices with the default driver. What do you think? Regards, Haggai and Ilya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: set_pte_at_notify regression
On 12/01/2014 19:50, Andrea Arcangeli wrote: On Sun, Jan 12, 2014 at 01:56:12PM +0200, Haggai Eran wrote: Hi, On 10/01/2014 18:57, Andrea Arcangeli wrote: Hi! On Fri, Jan 10, 2014 at 06:38:06PM +0200, Izik Eidus wrote: It look like commit 6bdb913f0a70a4dfb7f066fb15e2d6f960701d00 break the semantic of set_pte_at_notify. The change of calling first to mmu_notifier_invalidate_range_start, then to set_pte_at_notify, and then to mmu_notifier_invalidate_range_end not only increase the amount of locks kvm have to take and release by factor of 3, but in addition mmu_notifier_invalidate_range_start is zapping the pte entry from kvm, so when set_pte_at_notify get called, it doesn`t have any spte to set and it acctuly get called for nothing, the result is increasing of vmexits for kvm from both do_wp_page and replace_page, and broken semantic of set_pte_at_notify. Agreed. I would suggest to change set_pte_at_notify to return if change_pte was missing in some mmu notifier attached to this mm, so we can do something like: ptep = page_check_address(page, mm, addr, ptl, 0); [..] notify_missing = false; if (... ) { entry = ptep_clear_flush(...); [..] notify_missing = set_pte_at_notify(mm, addr, ptep, entry); } pte_unmap_unlock(ptep, ptl); if (notify_missing) mmu_notifier_invalidate_page_if_missing_change_pte(mm, addr); and drop the range calls. This will provide sleepability and at the same time it won't screw the ability of change_pte to update sptes (by leaving those established by the time change_pte runs). I think it would be better for notifiers that do not support change_pte to keep getting both range_start and range_end notifiers. Otherwise, the invalidate_page notifier might end up marking the old page as dirty after it was already replaced in the primary page table. Ok but why would that be a problem? If the secondary pagetable mapping is found dirty, the old page shall be marked dirty as it means it was modified through the secondary mmu and is on-disk version may need to be updated before discarding the in-ram copy. What the difference would be to mark the page dirty in the range_start while the primary page table is still established, or after? ... But in places like ksm merging and do_wp_page we hold a page reference before we start the primary pagetable updating, until after the mmu notifier invalidate. Right. I missed that page locking. Another possible issue is with reads from the secondary page table. Given a read-only page, suppose one host thread writes to that page, and performs COW, but before it calls the mmu_notifier_invalidate_page_if_missing_change_pte function another host thread writes to the same page (this time without a page fault). Then we have a valid entry in the secondary page table to a stale page, and someone may read stale data from there. Do you agree? Thanks, Haggai -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: set_pte_at_notify regression
Hi, On 10/01/2014 18:57, Andrea Arcangeli wrote: Hi! On Fri, Jan 10, 2014 at 06:38:06PM +0200, Izik Eidus wrote: It look like commit 6bdb913f0a70a4dfb7f066fb15e2d6f960701d00 break the semantic of set_pte_at_notify. The change of calling first to mmu_notifier_invalidate_range_start, then to set_pte_at_notify, and then to mmu_notifier_invalidate_range_end not only increase the amount of locks kvm have to take and release by factor of 3, but in addition mmu_notifier_invalidate_range_start is zapping the pte entry from kvm, so when set_pte_at_notify get called, it doesn`t have any spte to set and it acctuly get called for nothing, the result is increasing of vmexits for kvm from both do_wp_page and replace_page, and broken semantic of set_pte_at_notify. Agreed. I would suggest to change set_pte_at_notify to return if change_pte was missing in some mmu notifier attached to this mm, so we can do something like: ptep = page_check_address(page, mm, addr, ptl, 0); [..] notify_missing = false; if (... ) { entry = ptep_clear_flush(...); [..] notify_missing = set_pte_at_notify(mm, addr, ptep, entry); } pte_unmap_unlock(ptep, ptl); if (notify_missing) mmu_notifier_invalidate_page_if_missing_change_pte(mm, addr); and drop the range calls. This will provide sleepability and at the same time it won't screw the ability of change_pte to update sptes (by leaving those established by the time change_pte runs). I think it would be better for notifiers that do not support change_pte to keep getting both range_start and range_end notifiers. Otherwise, the invalidate_page notifier might end up marking the old page as dirty after it was already replaced in the primary page table. Perhaps we can have a flag in the mmu_notifier, similar to the notify_missing returned value here, that determines in these cases whether to call the invalidate_range_start/end pair, or just the set_pte_at_notify. Thanks, Haggai -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] mm: call invalidate_range_end in do_wp_page even for zero pages
The previous patch mm: wrap calls to set_pte_at_notify with invalidate_range_start and invalidate_range_end only called the invalidate_range_end mmu notifier function in do_wp_page when the new_page variable wasn't NULL. This was done in order to only call invalidate_range_end after invalidate_range_start was called. Unfortunately, there are situations where new_page is NULL and invalidate_range_start is called. This caused invalidate_range_start to be called without a matching invalidate_range_end, causing kvm to loop indefinitely on the first page fault. This patch adds a flag variable to do_wp_page that marks whether the invalidate_range_start notifier was called. invalidate_range_end is then called if the flag is true. Reported-by: Jiri Slaby jsl...@suse.cz Cc: Avi Kivity a...@redhat.com Cc: Andrew Morton a...@linux-foundation.org Signed-off-by: Haggai Eran hagg...@mellanox.com --- I tested this patch against yesterday's linux-next (next-20120918), and it seems to solve the problem with kvm. I used the same command line you reported: qemu-kvm -k en-us -usbdevice tablet -balloon virtio -hda IMAGE -smp 2 \ -m 1000M -net user -net nic,model=e1000 -usb -serial pty I was hoping you could also test it yourself, and see that it also works for you, if you don't mind. mm/memory.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 1a92d87..76ec199 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2529,6 +2529,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, struct page *dirty_page = NULL; unsigned long mmun_start; /* For mmu_notifiers */ unsigned long mmun_end; /* For mmu_notifiers */ + bool mmun_called = false; /* For mmu_notifiers */ old_page = vm_normal_page(vma, address, orig_pte); if (!old_page) { @@ -2706,8 +2707,9 @@ gotten: if (mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL)) goto oom_free_new; - mmun_start = address PAGE_MASK; - mmun_end = (address PAGE_MASK) + PAGE_SIZE; + mmun_start = address PAGE_MASK; + mmun_end= (address PAGE_MASK) + PAGE_SIZE; + mmun_called = true; mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); /* @@ -2776,8 +2778,7 @@ gotten: page_cache_release(new_page); unlock: pte_unmap_unlock(page_table, ptl); - if (new_page) - /* Only call the end notifier if the begin was called. */ + if (mmun_called) mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); if (old_page) { /* -- 1.7.11.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html