Re: [RFC 0/2] VFIO SRIOV support

2016-01-05 Thread Haggai Eran
On 24/12/2015 15:51, Alex Williamson wrote:
> No.  A privileged entity needs to grant a user ownership of a group and
> sufficient locked memory limits to make it useful, but then use of the
> group does not require root permission.

So we're thinking how we can force the VFs in these cases to be in the same
IOMMU group with the PF, and make sure it is vfio-pci that probes them. We
thought about the following:

We could add a flag to pci_dev->dev_flags on the PF, that says that the PF's
VFs must be in the same IOMMU group with it. Modify
iommu_group_get_for_pci_dev() so that it will return the PFs group for VFs
whose PF has that flag set.

In the vfio_group_nb_add_dev() function set driver_override to "vfio-pci" for
PCI devices that are added to a live group. That would prevent the host from
probing these devices with the default driver.

What do you think?

Regards,
Haggai and Ilya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: set_pte_at_notify regression

2014-01-15 Thread Haggai Eran
On 12/01/2014 19:50, Andrea Arcangeli wrote:
 On Sun, Jan 12, 2014 at 01:56:12PM +0200, Haggai Eran wrote:
 Hi,

 On 10/01/2014 18:57, Andrea Arcangeli wrote:
 Hi!

 On Fri, Jan 10, 2014 at 06:38:06PM +0200, Izik Eidus wrote:
 It look like commit 6bdb913f0a70a4dfb7f066fb15e2d6f960701d00 break the 
 semantic of set_pte_at_notify.
 The change of calling first to mmu_notifier_invalidate_range_start, then 
 to set_pte_at_notify, and then to mmu_notifier_invalidate_range_end
 not only increase the amount of locks kvm have to take and release by 
 factor of 3, but in addition mmu_notifier_invalidate_range_start is zapping
 the pte entry from kvm, so when set_pte_at_notify get called, it doesn`t 
 have any spte to set and it acctuly get called for nothing, the result is
 increasing of vmexits for kvm from both do_wp_page and replace_page, and 
 broken semantic of set_pte_at_notify.

 Agreed.

 I would suggest to change set_pte_at_notify to return if change_pte
 was missing in some mmu notifier attached to this mm, so we can do
 something like:

ptep = page_check_address(page, mm, addr, ptl, 0);
[..]
notify_missing = false;
if (... ) {
 entry = ptep_clear_flush(...);
 [..]
 notify_missing = set_pte_at_notify(mm, addr, ptep, entry);
}
pte_unmap_unlock(ptep, ptl);
if (notify_missing)
 mmu_notifier_invalidate_page_if_missing_change_pte(mm, addr);

 and drop the range calls. This will provide sleepability and at the
 same time it won't screw the ability of change_pte to update sptes (by
 leaving those established by the time change_pte runs).

 I think it would be better for notifiers that do not support change_pte
 to keep getting both range_start and range_end notifiers. Otherwise, the
 invalidate_page notifier might end up marking the old page as dirty
 after it was already replaced in the primary page table.
 
 Ok but why would that be a problem? If the secondary pagetable mapping
 is found dirty, the old page shall be marked dirty as it means it was
 modified through the secondary mmu and is on-disk version may need to
 be updated before discarding the in-ram copy. What the difference
 would be to mark the page dirty in the range_start while the primary
 page table is still established, or after?
 
 ...
 
 But in places like ksm merging and do_wp_page we hold a page reference
 before we start the primary pagetable updating, until after the mmu
 notifier invalidate.

Right. I missed that page locking.

Another possible issue is with reads from the secondary page table.
Given a read-only page, suppose one host thread writes to that page, and
performs COW, but before it calls the
mmu_notifier_invalidate_page_if_missing_change_pte function another host
thread writes to the same page (this time without a page fault). Then we
have a valid entry in the secondary page table to a stale page, and
someone may read stale data from there.

Do you agree?

Thanks,
Haggai
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: set_pte_at_notify regression

2014-01-12 Thread Haggai Eran
Hi,

On 10/01/2014 18:57, Andrea Arcangeli wrote:
 Hi!

 On Fri, Jan 10, 2014 at 06:38:06PM +0200, Izik Eidus wrote:
 It look like commit 6bdb913f0a70a4dfb7f066fb15e2d6f960701d00 break the 
 semantic of set_pte_at_notify.
 The change of calling first to mmu_notifier_invalidate_range_start, then 
 to set_pte_at_notify, and then to mmu_notifier_invalidate_range_end
 not only increase the amount of locks kvm have to take and release by 
 factor of 3, but in addition mmu_notifier_invalidate_range_start is zapping
 the pte entry from kvm, so when set_pte_at_notify get called, it doesn`t 
 have any spte to set and it acctuly get called for nothing, the result is
 increasing of vmexits for kvm from both do_wp_page and replace_page, and 
 broken semantic of set_pte_at_notify.

 Agreed.

 I would suggest to change set_pte_at_notify to return if change_pte
 was missing in some mmu notifier attached to this mm, so we can do
 something like:

ptep = page_check_address(page, mm, addr, ptl, 0);
[..]
notify_missing = false;
if (... ) {
   entry = ptep_clear_flush(...);
 [..]
   notify_missing = set_pte_at_notify(mm, addr, ptep, entry);
}
pte_unmap_unlock(ptep, ptl);
if (notify_missing)
   mmu_notifier_invalidate_page_if_missing_change_pte(mm, addr);

 and drop the range calls. This will provide sleepability and at the
 same time it won't screw the ability of change_pte to update sptes (by
 leaving those established by the time change_pte runs).

I think it would be better for notifiers that do not support change_pte
to keep getting both range_start and range_end notifiers. Otherwise, the
invalidate_page notifier might end up marking the old page as dirty
after it was already replaced in the primary page table.

Perhaps we can have a flag in the mmu_notifier, similar to the
notify_missing returned value here, that determines in these cases
whether to call the invalidate_range_start/end pair, or just the
set_pte_at_notify.

Thanks,
Haggai
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mm: call invalidate_range_end in do_wp_page even for zero pages

2012-09-19 Thread Haggai Eran
The previous patch mm: wrap calls to set_pte_at_notify with
invalidate_range_start and invalidate_range_end only called the
invalidate_range_end mmu notifier function in do_wp_page when the new_page
variable wasn't NULL. This was done in order to only call invalidate_range_end
after invalidate_range_start was called. Unfortunately, there are situations
where new_page is NULL and invalidate_range_start is called. This caused
invalidate_range_start to be called without a matching invalidate_range_end,
causing kvm to loop indefinitely on the first page fault.

This patch adds a flag variable to do_wp_page that marks whether the
invalidate_range_start notifier was called. invalidate_range_end is then
called if the flag is true.

Reported-by: Jiri Slaby jsl...@suse.cz
Cc: Avi Kivity a...@redhat.com
Cc: Andrew Morton a...@linux-foundation.org
Signed-off-by: Haggai Eran hagg...@mellanox.com
---
I tested this patch against yesterday's linux-next (next-20120918), and it
seems to solve the problem with kvm. I used the same command line you reported:

  qemu-kvm -k en-us -usbdevice tablet -balloon virtio -hda IMAGE -smp 2 \
  -m 1000M -net user -net nic,model=e1000 -usb -serial pty

I was hoping you could also test it yourself, and see that it also works for
you, if you don't mind.

 mm/memory.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 1a92d87..76ec199 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2529,6 +2529,7 @@ static int do_wp_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
struct page *dirty_page = NULL;
unsigned long mmun_start;   /* For mmu_notifiers */
unsigned long mmun_end; /* For mmu_notifiers */
+   bool mmun_called = false;   /* For mmu_notifiers */
 
old_page = vm_normal_page(vma, address, orig_pte);
if (!old_page) {
@@ -2706,8 +2707,9 @@ gotten:
if (mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))
goto oom_free_new;
 
-   mmun_start = address  PAGE_MASK;
-   mmun_end   = (address  PAGE_MASK) + PAGE_SIZE;
+   mmun_start  = address  PAGE_MASK;
+   mmun_end= (address  PAGE_MASK) + PAGE_SIZE;
+   mmun_called = true;
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
 
/*
@@ -2776,8 +2778,7 @@ gotten:
page_cache_release(new_page);
 unlock:
pte_unmap_unlock(page_table, ptl);
-   if (new_page)
-   /* Only call the end notifier if the begin was called. */
+   if (mmun_called)
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
if (old_page) {
/*
-- 
1.7.11.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html