Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)
On Tue, Apr 29, 2008 at 02:09:20PM -0500, Anthony Liguori wrote: This patch allows VMA's that contain no backing page to be used for guest memory. This is a drop-in replacement for Ben-Ami's first page in his direct mmio series. Here, we continue to allow mmio pages to be represented in the rmap. Since v1, I've taken into account Andrea's suggestions at using VM_PFNMAP instead of VM_IO and changed the BUG_ON to a return of bad_page. Signed-off-by: Anthony Liguori [EMAIL PROTECTED] diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1d7991a..64e5efe 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -532,6 +532,7 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) struct page *page[1]; unsigned long addr; int npages; + pfn_t pfn; might_sleep(); @@ -544,19 +545,35 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) npages = get_user_pages(current, current-mm, addr, 1, 1, 1, page, NULL); - if (npages != 1) { - get_page(bad_page); - return page_to_pfn(bad_page); - } + if (unlikely(npages != 1)) { + struct vm_area_struct *vma; - return page_to_pfn(page[0]); + vma = find_vma(current-mm, addr); + if (vma == NULL || addr = vma-vm_start || + !(vma-vm_flags VM_PFNMAP)) { Isn't the check for addr backwards here? For the VMA we would like to to find, vma-vm_start = addr vma-vm_end. Cheers, Muli - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)
Muli Ben-Yehuda wrote: On Tue, Apr 29, 2008 at 02:09:20PM -0500, Anthony Liguori wrote: This patch allows VMA's that contain no backing page to be used for guest memory. This is a drop-in replacement for Ben-Ami's first page in his direct mmio series. Here, we continue to allow mmio pages to be represented in the rmap. Since v1, I've taken into account Andrea's suggestions at using VM_PFNMAP instead of VM_IO and changed the BUG_ON to a return of bad_page. Signed-off-by: Anthony Liguori [EMAIL PROTECTED] diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1d7991a..64e5efe 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -532,6 +532,7 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) struct page *page[1]; unsigned long addr; int npages; +pfn_t pfn; might_sleep(); @@ -544,19 +545,35 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) npages = get_user_pages(current, current-mm, addr, 1, 1, 1, page, NULL); -if (npages != 1) { -get_page(bad_page); -return page_to_pfn(bad_page); -} +if (unlikely(npages != 1)) { +struct vm_area_struct *vma; -return page_to_pfn(page[0]); +vma = find_vma(current-mm, addr); +if (vma == NULL || addr = vma-vm_start || +!(vma-vm_flags VM_PFNMAP)) { Isn't the check for addr backwards here? For the VMA we would like to to find, vma-vm_start = addr vma-vm_end. Yes it is. Thanks for spotting that. Regards, Anthony Liguori Cheers, Muli - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)
On Wed, Apr 30, 2008 at 11:59:47AM +0300, Avi Kivity wrote: The code is not trying to find a vma for the address, but a vma for the address which also has VM_PFNMAP set. The cases for vma not found, or vma found, but not VM_PFNMAP, are folded together. Muli's saying the comparison is reversed, change = to . - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)
Andrea Arcangeli wrote: On Wed, Apr 30, 2008 at 11:59:47AM +0300, Avi Kivity wrote: The code is not trying to find a vma for the address, but a vma for the address which also has VM_PFNMAP set. The cases for vma not found, or vma found, but not VM_PFNMAP, are folded together. Muli's saying the comparison is reversed, change = to . Err, yes. -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)
Muli Ben-Yehuda wrote: @@ -544,19 +545,35 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) npages = get_user_pages(current, current-mm, addr, 1, 1, 1, page, NULL); -if (npages != 1) { -get_page(bad_page); -return page_to_pfn(bad_page); -} +if (unlikely(npages != 1)) { +struct vm_area_struct *vma; -return page_to_pfn(page[0]); +vma = find_vma(current-mm, addr); +if (vma == NULL || addr = vma-vm_start || +!(vma-vm_flags VM_PFNMAP)) { Isn't the check for addr backwards here? For the VMA we would like to to find, vma-vm_start = addr vma-vm_end. The code is not trying to find a vma for the address, but a vma for the address which also has VM_PFNMAP set. The cases for vma not found, or vma found, but not VM_PFNMAP, are folded together. -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)
Avi Kivity wrote: Hollis/Xiantao/Carsten, can you confirm that this approach works for you? Carsten, I believe you don't have mmio, but at least this shouldn't interfere. Should work fine on s390 afaics. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)
On Tuesday 29 April 2008 18:12:51 Anthony Liguori wrote: IIUC PPC correctly, all IO pages have corresponding struct pages. This means that get_user_pages() would succeed and you can reference count them? In this case, we would never take the VM_PFNMAP path. Is that correct? I think Andrea's answered this, but for the record: I believe ioremap() will get you struct pages on PPC, but they don't automatically exist. -- Hollis Blanchard IBM Linux Technology Center - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)
Andrea Arcangeli wrote: On Tue, Apr 29, 2008 at 06:12:51PM -0500, Anthony Liguori wrote: IIUC PPC correctly, all IO pages have corresponding struct pages. This means that get_user_pages() would succeed and you can reference count them? In this case, we would never take the VM_PFNMAP path. get_user_pages only works on vmas where only pfn with struct page can be mapped, but if a struct page exists it doesn't mean get_user_pages will succeed. All mmio regions should be marked VM_IO as reading on them affects hardware somehow and that prevents get_user_pages to work on them regardless if a struct page exists. Ah, thanks for the clarification. Regards, Anthony Liguori - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH] Handle vma regions with no backing page (v2)
This patch allows VMA's that contain no backing page to be used for guest memory. This is a drop-in replacement for Ben-Ami's first page in his direct mmio series. Here, we continue to allow mmio pages to be represented in the rmap. Since v1, I've taken into account Andrea's suggestions at using VM_PFNMAP instead of VM_IO and changed the BUG_ON to a return of bad_page. Signed-off-by: Anthony Liguori [EMAIL PROTECTED] diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1d7991a..64e5efe 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -532,6 +532,7 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) struct page *page[1]; unsigned long addr; int npages; + pfn_t pfn; might_sleep(); @@ -544,19 +545,35 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) npages = get_user_pages(current, current-mm, addr, 1, 1, 1, page, NULL); - if (npages != 1) { - get_page(bad_page); - return page_to_pfn(bad_page); - } + if (unlikely(npages != 1)) { + struct vm_area_struct *vma; - return page_to_pfn(page[0]); + vma = find_vma(current-mm, addr); + if (vma == NULL || addr = vma-vm_start || + !(vma-vm_flags VM_PFNMAP)) { + get_page(bad_page); + return page_to_pfn(bad_page); + } + + pfn = ((addr - vma-vm_start) PAGE_SHIFT) + vma-vm_pgoff; + BUG_ON(pfn_valid(pfn)); + } else + pfn = page_to_pfn(page[0]); + + return pfn; } EXPORT_SYMBOL_GPL(gfn_to_pfn); struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) { - return pfn_to_page(gfn_to_pfn(kvm, gfn)); + pfn_t pfn; + + pfn = gfn_to_pfn(kvm, gfn); + if (pfn_valid(pfn)) + return pfn_to_page(pfn); + + return NULL; } EXPORT_SYMBOL_GPL(gfn_to_page); @@ -569,7 +586,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean); void kvm_release_pfn_clean(pfn_t pfn) { - put_page(pfn_to_page(pfn)); + if (pfn_valid(pfn)) + put_page(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); @@ -594,21 +612,25 @@ EXPORT_SYMBOL_GPL(kvm_set_page_dirty); void kvm_set_pfn_dirty(pfn_t pfn) { - struct page *page = pfn_to_page(pfn); - if (!PageReserved(page)) - SetPageDirty(page); + if (pfn_valid(pfn)) { + struct page *page = pfn_to_page(pfn); + if (!PageReserved(page)) + SetPageDirty(page); + } } EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty); void kvm_set_pfn_accessed(pfn_t pfn) { - mark_page_accessed(pfn_to_page(pfn)); + if (pfn_valid(pfn)) + mark_page_accessed(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed); void kvm_get_pfn(pfn_t pfn) { - get_page(pfn_to_page(pfn)); + if (pfn_valid(pfn)) + get_page(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_get_pfn); - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)
Anthony Liguori wrote: This patch allows VMA's that contain no backing page to be used for guest memory. This is a drop-in replacement for Ben-Ami's first page in his direct mmio series. Here, we continue to allow mmio pages to be represented in the rmap. I like this very much, as it only affects accessors and not the mmu core itself. Hollis/Xiantao/Carsten, can you confirm that this approach works for you? Carsten, I believe you don't have mmio, but at least this shouldn't interfere. struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) { - return pfn_to_page(gfn_to_pfn(kvm, gfn)); + pfn_t pfn; + + pfn = gfn_to_pfn(kvm, gfn); + if (pfn_valid(pfn)) + return pfn_to_page(pfn); + + return NULL; } You're returning NULL here, not bad_page. -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)
Avi Kivity wrote: Anthony Liguori wrote: This patch allows VMA's that contain no backing page to be used for guest memory. This is a drop-in replacement for Ben-Ami's first page in his direct mmio series. Here, we continue to allow mmio pages to be represented in the rmap. I like this very much, as it only affects accessors and not the mmu core itself. Hollis/Xiantao/Carsten, can you confirm that this approach works for you? Carsten, I believe you don't have mmio, but at least this shouldn't interfere. struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) { -return pfn_to_page(gfn_to_pfn(kvm, gfn)); +pfn_t pfn; + +pfn = gfn_to_pfn(kvm, gfn); +if (pfn_valid(pfn)) +return pfn_to_page(pfn); + +return NULL; } You're returning NULL here, not bad_page. My thinking was that bad_page indicates that the gfn is invalid. This is a different type of error though. The problem is that the guest is we are trying to kmap() a page that has no struct page associated with it. I'm not sure what the right thing to do here is. Perhaps we should be replacing consumers of gfn_to_page() with copy_to_user() instead? Regards, Anthony Liguori - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)
Anthony Liguori wrote: struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) { -return pfn_to_page(gfn_to_pfn(kvm, gfn)); +pfn_t pfn; + +pfn = gfn_to_pfn(kvm, gfn); +if (pfn_valid(pfn)) +return pfn_to_page(pfn); + +return NULL; } You're returning NULL here, not bad_page. My thinking was that bad_page indicates that the gfn is invalid. This is a different type of error though. The problem is that the guest is we are trying to kmap() a page that has no struct page associated with it. I'm not sure what the right thing to do here is. It depends on what's going on? Does a page table point to mmio? Or the glommerclock? Not sure there is a single answer. Perhaps we should be replacing consumers of gfn_to_page() with copy_to_user() instead? Indeed we should. The problem is access in atomic contexts. It's easy to detect failure, but not always easy to handle it. -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)
Avi Kivity wrote: It depends on what's going on? Does a page table point to mmio? Or the glommerclock? Not sure there is a single answer. Perhaps we should be replacing consumers of gfn_to_page() with copy_to_user() instead? Indeed we should. The problem is access in atomic contexts. It's easy to detect failure, but not always easy to handle it. So I think we should replace it with a rate limited printk and returning bad_page. That way the guest can't exploit it and we'll still hopefully get printk()s to track down instances of things going bad. Regards, Anthony Liguori - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)
Anthony Liguori wrote: Avi Kivity wrote: It depends on what's going on? Does a page table point to mmio? Or the glommerclock? Not sure there is a single answer. Perhaps we should be replacing consumers of gfn_to_page() with copy_to_user() instead? Indeed we should. The problem is access in atomic contexts. It's easy to detect failure, but not always easy to handle it. So I think we should replace it with a rate limited printk and returning bad_page. That way the guest can't exploit it and we'll still hopefully get printk()s to track down instances of things going bad. Agreed. Add a stacktrace so we can see what causes the badness. -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Handle vma regions with no backing page (v2)
Hollis Blanchard wrote: On Tuesday 29 April 2008 17:17:49 Avi Kivity wrote: I like this very much, as it only affects accessors and not the mmu core itself. Hollis/Xiantao/Carsten, can you confirm that this approach works for you? Carsten, I believe you don't have mmio, but at least this shouldn't interfere. OK, so the idea is to mmap /sys/bus/pci/.../region within the guest RAM area, and just include it within the normal guest RAM memslot? In the case of x86, since the PCI IO region is pretty far away from normal RAM, we'll probably use a different memory slot, but yes, that's the general idea. IIUC PPC correctly, all IO pages have corresponding struct pages. This means that get_user_pages() would succeed and you can reference count them? In this case, we would never take the VM_PFNMAP path. Is that correct? How will the IOMMU be programmed? Wouldn't you still need to register a special type of memslot for that? That's independent of this patchset. For non-aware guests, we'll have to pin all of physical memory up front and then create an IOMMU table from the pinned physical memory. For aware guests with a PV DMA window API, we'll be able to build that mapping on the fly (enforcing mlock allocation limits). Regards, Anthony Liguori - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel