Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-03 Thread Avi Kivity
Andrea Arcangeli wrote:
 On Wed, Apr 02, 2008 at 02:16:41PM +0300, Avi Kivity wrote:
   
 Ugh, there's still mark_page_accessed() and SetPageDirty().
 

 btw, like PG_dirty is only set if the spte is writeable,
 mark_page_accessed should only run if the accessed bit is set in the
 spte. It doesn't matter now as nobody could possibly clear it, 

No one will clear it now, but it can start out cleared.  This is done on 
speculative mmu_set_spte(): when the guest writes into its page tables, 
we update the spte speculatively, but the guest may not actually access 
that location (for example, due to a page fault clustering).

So the change makes sense even now.

 It still skips an atomic op. Your plan still sounds just fine despite
 the above, infact it sounds too perfect: the awk hack to re-add the
 refcounting when building the external module if CONFIG_MMU_NOTIFIER
 isn't defined is going to be messy, a plain CONFIG_MMU_NOTIFIER in
 kvm.git would be simpler and more robust IMHO even if less perfect :).
   

Worst case, we stick a get_user_pages() inside the memslot setup 
function.  That makes things not swappable for pre-mmu notifiers, but 
completely safe.

I'd rather avoid special casing the core code, whenever possible.

-- 
error compiling committee.c: too many arguments to function


-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-02 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 07:32:35AM +0300, Avi Kivity wrote:
 It ought to work.  gfn_to_hfn() (old gfn_to_page) will still need to take a 
 refcount if possible.

This reminds me, that mmu notifiers we could implement gfn_to_hfn only
with follow_page and skip the refcounting on the struct page.

I'm not suggesting that though, the refcount makes the code more
robust IMHO, and notably it allows to run on kernels without mmu
notifiers.

So the trick is to do something like

   if (pfn_valid(pfn))
  put_page(pfn_to_page(pfn));

That's the trick I was suggesting to take care of mmio space.

If the reserved ram is used instead of VT-d to allow DMA, that will
have change to:

   if (pfn_valid(pfn))
  page = pfn_to_page(pfn);
  if (page_count(page))
 put_page(page);

that will take care of both the reserved ram and the pfn.

In general I made it for the reserved-ram with only the page_count !=
0 check, because 'struct page' existed.

I'm unsure if it's good to add struct pages for non-ram, I find it
slightly confusing and not the right thing, it takes memory for stuff
that can't happen (refcounting only makes sense if the page finally
goes in the freelist when count reaches 0, and PG_dirty/referenced
bits and the like don't make sense either for non-ram or
reserved-ram). So I'm not sure the vmemmap trick is the best.

 This will increase the potential for type errors, so maybe we need to make 
 gfn_t and hfn_t distinct structures, and use accessors to get the actual 
 values.

That's also a possibility. If we go for this then hfn_t is probably a
better name as it's less likely to collide with core kernel VM
code. Otherwise perhaps pfn can be used instead of hfn, it's up to
you ;).

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-02 Thread Avi Kivity
Andrea Arcangeli wrote:
 On Wed, Apr 02, 2008 at 07:32:35AM +0300, Avi Kivity wrote:
   
 It ought to work.  gfn_to_hfn() (old gfn_to_page) will still need to take a 
 refcount if possible.
 

 This reminds me, that mmu notifiers we could implement gfn_to_hfn only
 with follow_page and skip the refcounting on the struct page.

 I'm not suggesting that though, the refcount makes the code more
 robust IMHO, and notably it allows to run on kernels without mmu
 notifiers.

   

Isn't it faster though?  We don't need to pull in the cacheline 
containing the struct page anymore.

We could hack something to make pre mmu notifier kernels work.


 I'm unsure if it's good to add struct pages for non-ram, I find it
 slightly confusing and not the right thing, it takes memory for stuff
 that can't happen (refcounting only makes sense if the page finally
 goes in the freelist when count reaches 0, and PG_dirty/referenced
 bits and the like don't make sense either for non-ram or
 reserved-ram). So I'm not sure the vmemmap trick is the best.

   

I thought we'd meet with resistance to that idea :) though I'd like to 
point out that struct pages to exist for mmio on some machines (those 
with = 4GB).

 This will increase the potential for type errors, so maybe we need to make 
 gfn_t and hfn_t distinct structures, and use accessors to get the actual 
 values.
 

 That's also a possibility. If we go for this then hfn_t is probably a
 better name as it's less likely to collide with core kernel VM
 code. Otherwise perhaps pfn can be used instead of hfn, it's up to
 you ;).
   

I guess we can move to pfn, they're unambiguous enough.

-- 
error compiling committee.c: too many arguments to function


-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-02 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 12:50:50PM +0300, Avi Kivity wrote:
 Isn't it faster though?  We don't need to pull in the cacheline containing 
 the struct page anymore.

Exactly, not only that, get_user_pages is likely a bit slower that we
need for just kvm pte lookup. GRU uses follow_page directly because it
runs in the tlb miss handler, for us instead the tlb miss handler
doesn't invoke a page fault unless the spte is non present.

I expect for us that optimization will be mostly lost in the noise
but it is likely measurable in heavy VM workloads and in general it
sure worth it in the long run (if nothing else as a microoptimization
for whatever spte fault benchmark).

 We could hack something to make pre mmu notifier kernels work.

Actually I already removed the refcounting for the reserved ram, so
it's going to be very easy to do with a few CONFIG_MMU_NOTIFIER.

But because it's a low prio I would defer it for later, first patch
can do the refcounting unconditionally to better test it. (I'm
assuming we're going to deal with kernels without mmu notifiers [or
without EMM ;) ] for a long while)

 I thought we'd meet with resistance to that idea :) though I'd like to 
 point out that struct pages to exist for mmio on some machines (those with 
 = 4GB).

Sure, those should have page_count 0 like the reserved ram. I'm
uncertain about the PageReserved semantics, I thought Nick nuked it
long ago but perhaps it was something else. Now copy_page_range bails
out on PFNMAP vmas, in the old days it would threat PG_reserved pages
mapped with remap_page_range like a VM_SHARED by not wrprotecting and
not refcounting even if this was a private mapping.

In general using page_t for anything but ram is something that should
be avoided and that also makes PG_reserved semantics mostly
irrelevant. The users of remap_pfn_range (like /dev/mem) should never
use pages regardless if it's ram or non-ram or if page_t exists or
not. The fact page_t sometime exists for non-ram is strictly a
performance optimization to make the pfn_to_page resolution quicker at
runtime (speedup is significant and memory loss is negligeable).

 I guess we can move to pfn, they're unambiguous enough.

Ok! ;)

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-02 Thread Avi Kivity
Andrea Arcangeli wrote:
 On Wed, Apr 02, 2008 at 12:50:50PM +0300, Avi Kivity wrote:
   
 Isn't it faster though?  We don't need to pull in the cacheline containing 
 the struct page anymore.
 

 Exactly, not only that, get_user_pages is likely a bit slower that we
 need for just kvm pte lookup. GRU uses follow_page directly because it
 runs in the tlb miss handler, for us instead the tlb miss handler
 doesn't invoke a page fault unless the spte is non present.

   

How about this plan?

0. Merge mmu notifiers

1. gfn_to_page() - gfn_to_pfn()

Still keeping the refcount.  Change bad_page to kvm_bad_hfn.

2. Drop the refcounting from gfn_to_pfn() and from kvm_release_page_*()

Still using get_user_pages() (and dropping the refcount immediately)

Simultaneously, change hack_module.awk to add the refcount back.

3. Export follow_page() or something based on fast_gup(), and use it

btw, if we change the method we use to read the Linux pte, I'd like to 
get the writable bit out of it.  This was, when we create an spte for a 
gpte that is writable and dirty, we can set the spte writable iff the 
Linux pte is writable.  This avoids breaking COW unnecessarily.

-- 
error compiling committee.c: too many arguments to function


-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-02 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 02:16:41PM +0300, Avi Kivity wrote:
 Ugh, there's still mark_page_accessed() and SetPageDirty().

btw, like PG_dirty is only set if the spte is writeable,
mark_page_accessed should only run if the accessed bit is set in the
spte. It doesn't matter now as nobody could possibly clear it, but
with mmu notifiers the -clear_young will clear that accessed bit for
the first time, and if the guest didn't use the spte in the meantime,
we don't need to mark the page accessed and we can let the VM swapout
the page at the next round of the lru.

But that's ok, we'll simply run mark_page_accessed and SetPageDirty if
pfn_valid is true. We're under mmu_lock so the mmu notifiers will
prevent the page to go away from under us. 

The detail I'm uncertain is why my reserved-ram patch had both
pfn_valid = 1 and page_count =0 and PG_reserved =0 for all reserved
ram. Perhaps I triggered a bug in some memmap initialization as the
common code should keep all pages that aren't supposed to go in the
freelist when freed as PG_reserved and page_count 1. Anyway in
practice it worked fine and there's zero risk I'm not using
reserved-ram with the page_count = 0 check ;).

The rmap_remove should look like this (this won't work with the
reserved ram but that needs fixing somehow as reserved-ram must be
like mmio region but with a pfn_valid and in turn with a page_t, and
then the reserved-ram patch will be able to cope with the reserved ram
not having a allocated memmap by luck of how page_alloc.c works).

if (pfn_valid(pfn)) {
page = pfn_to_page(pfn);
if (!PageReserved(page)) {
BUG_ON(page_count(page) != 1);
if (is_writeable_pte(*spte))
SetPageDirty(page);
if (*spte  PT_ACCESSED_MASK)
mark_page_accessed(page);
}
}

It still skips an atomic op. Your plan still sounds just fine despite
the above, infact it sounds too perfect: the awk hack to re-add the
refcounting when building the external module if CONFIG_MMU_NOTIFIER
isn't defined is going to be messy, a plain CONFIG_MMU_NOTIFIER in
kvm.git would be simpler and more robust IMHO even if less perfect :).

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-02 Thread Andrea Arcangeli
On Wed, Apr 02, 2008 at 01:50:19PM +0200, Andrea Arcangeli wrote:
   if (pfn_valid(pfn)) {
   page = pfn_to_page(pfn);
   if (!PageReserved(page)) {
   BUG_ON(page_count(page) != 1);
   if (is_writeable_pte(*spte))
   SetPageDirty(page);
   if (*spte  PT_ACCESSED_MASK)
   mark_page_accessed(page);
   }
   }

warning the indenting of the above in text-mode was wrong... read it
like a c compiler would do sorry ;)

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-02 Thread Anthony Liguori
Avi Kivity wrote:
 Andrea Arcangeli wrote:
 On Wed, Apr 02, 2008 at 12:50:50PM +0300, Avi Kivity wrote:
  
 Isn't it faster though?  We don't need to pull in the cacheline 
 containing the struct page anymore.
 

 Exactly, not only that, get_user_pages is likely a bit slower that we
 need for just kvm pte lookup. GRU uses follow_page directly because it
 runs in the tlb miss handler, for us instead the tlb miss handler
 doesn't invoke a page fault unless the spte is non present.

   

 How about this plan?

 0. Merge mmu notifiers

Are mmu-notifiers likely to get pushed when the 2.6.26 window opens up?

 1. gfn_to_page() - gfn_to_pfn()

 Still keeping the refcount.  Change bad_page to kvm_bad_hfn.

kvm_bad_pfn.

 2. Drop the refcounting from gfn_to_pfn() and from kvm_release_page_*()

 Still using get_user_pages() (and dropping the refcount immediately)

 Simultaneously, change hack_module.awk to add the refcount back.

This has the dependency on mmu notifiers so step 1 can conceivably be 
merged in the absence of mmu notifiers.

Regards,

Anthony Liguori

 3. Export follow_page() or something based on fast_gup(), and use it

 btw, if we change the method we use to read the Linux pte, I'd like to 
 get the writable bit out of it.  This was, when we create an spte for 
 a gpte that is writable and dirty, we can set the spte writable iff 
 the Linux pte is writable.  This avoids breaking COW unnecessarily.



-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Avi Kivity
[EMAIL PROTECTED] wrote:
 From: Ben-Ami Yassour [EMAIL PROTECTED]

 Enable a guest to access a device's memory mapped I/O regions directly.
 Userspace sends the mmio regions that the guest can access. On the first
 page fault for an access to an mmio address the host translates the gva to 
 hpa,
 and updates the sptes.

   

Can you explain why you're not using the regular memory slot mechanism? 
i.e. have userspace mmap(/dev/mem) and create a memslot containing that 
at the appropriate guest physical address?

There are some issues with refcounting, but Andrea has some tricks to 
deal with that.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Anthony Liguori
Avi Kivity wrote:
 [EMAIL PROTECTED] wrote:
   
 From: Ben-Ami Yassour [EMAIL PROTECTED]

 Enable a guest to access a device's memory mapped I/O regions directly.
 Userspace sends the mmio regions that the guest can access. On the first
 page fault for an access to an mmio address the host translates the gva to 
 hpa,
 and updates the sptes.

   
 

 Can you explain why you're not using the regular memory slot mechanism? 
 i.e. have userspace mmap(/dev/mem) and create a memslot containing that 
 at the appropriate guest physical address?
   

/dev/mem is often restricted in what memory can be mapped.  However, we 
can't add something like this to KVM that allows arbitrary HPA's to be 
mapped into a guest from userspace.  This is just as bad as /dev/mem and 
is going to upset a lot of people.

Regardless of whether we can use /dev/mem, I think we should introduce a 
new char device anyway.  We only need to mmap() MMIO regions which are 
mapped by the PCI bus, presumably, the kernel should know about these 
mappings.  The driver should only allow mappings that are valid for a 
particular PCI device such that it cannot be abused to map arbitrary 
regions of memory into a guest.  Bonus points if it can validate that 
there isn't a valid Linux driver loaded for the given PCI device.

Regards,

Anthony Liguori

 There are some issues with refcounting, but Andrea has some tricks to 
 deal with that.

   


-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Avi Kivity
Anthony Liguori wrote:

 Regardless of whether we can use /dev/mem, I think we should 
 introduce a new char device anyway.  We only need to mmap() MMIO 
 regions which are mapped by the PCI bus, presumably, the kernel 
 should know about these mappings.  The driver should only allow 
 mappings that are valid for a particular PCI device such that it 
 cannot be abused to map arbitrary regions of memory into a guest.  
 Bonus points if it can validate that there isn't a valid Linux driver 
 loaded for the given PCI device.

 Which is apparently entirely unnecessary as we already have 
 /sys/bus/pci/.../region.  It's just a matter of checking if a vma is 
 VM_IO and then dealing with the subsequent reference counting issues 
 as Avi points out.


 From idea to working implementation in 38 minutes.  Congrats.


-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Daniel P. Berrange
On Tue, Apr 01, 2008 at 08:03:14PM +0300, Avi Kivity wrote:
 Anthony Liguori wrote:
  Avi Kivity wrote:
  [EMAIL PROTECTED] wrote:
   
  From: Ben-Ami Yassour [EMAIL PROTECTED]
 
  Enable a guest to access a device's memory mapped I/O regions directly.
  Userspace sends the mmio regions that the guest can access. On the 
  first
  page fault for an access to an mmio address the host translates the 
  gva to hpa,
  and updates the sptes.
 

 
  Can you explain why you're not using the regular memory slot 
  mechanism? i.e. have userspace mmap(/dev/mem) and create a memslot 
  containing that at the appropriate guest physical address?

 
  /dev/mem is often restricted in what memory can be mapped.  
 
 Please elaborate.

The /dev/mem, /dev/kmem devices have a special SELinux context memory_device_t
and very few application domains are allowed to access them. THe KVM/QEMU
policy will not allow this for example. Basically on the X server, HAL and
dmidecode have access in current policy. It would be undesirable to have to
all KVM guests full access to /dev/mem, so a more fine grained access method
would have benefits here. 

Dan.
-- 
|: Red Hat, Engineering, Boston   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Anthony Liguori
Andrea Arcangeli wrote:
 On Tue, Apr 01, 2008 at 10:20:49AM -0500, Anthony Liguori wrote:
   
 Which is apparently entirely unnecessary as we already have 
 /sys/bus/pci/.../region.  It's just a matter of checking if a vma is VM_IO 
 and then dealing with the subsequent reference counting issues as Avi 
 points out.
 

 Do you need to map it in userland too, isn't it enough to map it in
 the sptes?

 For the ram I had to map it in userland too with /dev/mem, and then I
 used the pte_pfn to fill the spte, so the emulated qemu drivers can
 input/output. But for the mmio space I doubt the userland side is
 needed.

 If you add a direct memslot (new bitflag type) I will use it too
 instead of catching get_user_pages failures and walking ptes on the
 RAM pieces overwritten by /dev/mem.
   

There's a certain amount of elegance in mapping to userspace and not 
introducing a direct memslot.  It helps simplify the security model 
since an application isn't able to do anything more than it could 
without KVM.

The difficulty with a direct memslot is how you introduce policies to 
limit what guests can access what memslots directly.  You would have to 
teach KVM to interact with the PCI subsystem to determine what memory 
was within a particular PCI IO region.  Not impossible, just ugly.

Regards,

Anthony Liguori

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Daniel P. Berrange
On Tue, Apr 01, 2008 at 08:10:31PM +0200, Andrea Arcangeli wrote:
 On Tue, Apr 01, 2008 at 06:18:07PM +0100, Daniel P. Berrange wrote:
  and very few application domains are allowed to access them. THe KVM/QEMU
  policy will not allow this for example. Basically on the X server, HAL and
  dmidecode have access in current policy. It would be undesirable to have to
  all KVM guests full access to /dev/mem, so a more fine grained access method
  would have benefits here. 
 
 But pci-passthrough can give a root on the host even to the ring0
 guest, just like /dev/mem without VT-d, so there's no muchx difference
 with using /dev/mem as far as security is concerned. Only on the CPUs
 including VT-d it's possible to retain a mostly equivalent security
 level despite pci-passthrough.

Clearly it is a loosing battle without VT-d. That doesn't mean we should 
design it to loose in general. So we should design to that when we do have
VT-D it will have the maximum security possible. VT-d will only become more
widespread over time.

Dan.
-- 
|: Red Hat, Engineering, Boston   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Anthony Liguori
Andrea Arcangeli wrote:
 On Tue, Apr 01, 2008 at 06:18:07PM +0100, Daniel P. Berrange wrote:
   
 and very few application domains are allowed to access them. THe KVM/QEMU
 policy will not allow this for example. Basically on the X server, HAL and
 dmidecode have access in current policy. It would be undesirable to have to
 all KVM guests full access to /dev/mem, so a more fine grained access method
 would have benefits here. 
 

 But pci-passthrough can give a root on the host even to the ring0
 guest, just like /dev/mem without VT-d, so there's no muchx difference
 with using /dev/mem as far as security is concerned. Only on the CPUs
 including VT-d it's possible to retain a mostly equivalent security
 level despite pci-passthrough.
   

Eventually, most machines with have an IOMMU so that's the assumption to 
design to.  It is true that PCI pass-through w/o VT-d is always going to 
be equivalent to /dev/mem access but it's really a special case.  Unless 
you have an IOMMU, you would not do PCI pass-through if you cared at all 
about security/reliability.

Regards,

Anthony Liguori


-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Andrea Arcangeli
On Tue, Apr 01, 2008 at 10:20:49AM -0500, Anthony Liguori wrote:
 Which is apparently entirely unnecessary as we already have 
 /sys/bus/pci/.../region.  It's just a matter of checking if a vma is VM_IO 
 and then dealing with the subsequent reference counting issues as Avi 
 points out.

Do you need to map it in userland too, isn't it enough to map it in
the sptes?

For the ram I had to map it in userland too with /dev/mem, and then I
used the pte_pfn to fill the spte, so the emulated qemu drivers can
input/output. But for the mmio space I doubt the userland side is
needed.

If you add a direct memslot (new bitflag type) I will use it too
instead of catching get_user_pages failures and walking ptes on the
RAM pieces overwritten by /dev/mem.

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Avi Kivity
Anthony Liguori wrote:
 I looked at Andrea's patches and I didn't see any special handling for 
 non-RAM pages.  Something Muli mentioned that kept them from doing 
 /sys/devices/pci/.../region to begin with was the fact that IO pages do 
 not have a struct page backing them so get_user_pages() will fail in 
 gfn_to_page().
   

It's just something we discussed, not code.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Ben-Ami Yassour1


Avi Kivity [EMAIL PROTECTED] wrote on 01/04/2008 16:30:00:

 [EMAIL PROTECTED] wrote:
  From: Ben-Ami Yassour [EMAIL PROTECTED]
 
  Enable a guest to access a device's memory mapped I/O regions directly.
  Userspace sends the mmio regions that the guest can access. On the
first
  page fault for an access to an mmio address the host translates
 the gva to hpa,
  and updates the sptes.
 
 

 Can you explain why you're not using the regular memory slot mechanism?
 i.e. have userspace mmap(/dev/mem) and create a memslot containing that
 at the appropriate guest physical address?

 There are some issues with refcounting, but Andrea has some tricks to
 deal with that.

 --
 Any sufficiently difficult bug is indistinguishable from a feature.


Our initial approach was to mmap /sys/bus/pci/devices/.../resource#
and create a memory slot for it. However eventually we realized that for
mmio we don't need hva mapped to the mmio region, we can map the gpa
directly to hpa.

As far as I understand, the memory slots mechanism is used to map
gpa to hva. Then gfn_to_page uses get_user_pages to map hva to hpa.
However, get_user_pages does not work for mmio, and in addition we
know the hpa to begin with, so there is no real need to map an hva
for the mmio region.

In addition there is an assumption in the code that there is a page
struct for the frame which is not the case for mmio. So it was
easier to simply add a list of mmio gpa-hpa mapping.

I guess we can use the memory slots array to holds the gpa to hpa
mapping but it is not necessarily natural, and would probably
require more hooks in the code to handle an mmio memory slot. BTW,
note that for a guest that has multiple passthrough devices each
with a set of mmio regions, it might become a long list, so there
might be value to keep it separate from that respect as well.

With regards to the potential security issue Anthony pointed out
(ioctls taking hpa's) we can change the interface so that they will
take (device, BAR) instead and the kernel will translate to hpa

What do you think? Given that the shadow page table code has to be
modified anyway (due to the struct page issue), is it worthwhile to
experiment with mmap(...region) or is the current approach sufficient?

Thanks,
Ben


-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Avi Kivity
Ben-Ami Yassour1 wrote:

 Can you explain why you're not using the regular memory slot mechanism?
 i.e. have userspace mmap(/dev/mem) and create a memslot containing that
 at the appropriate guest physical address?

 
 Our initial approach was to mmap /sys/bus/pci/devices/.../resource#
 and create a memory slot for it. However eventually we realized that for
 mmio we don't need hva mapped to the mmio region, we can map the gpa
 directly to hpa.

 As far as I understand, the memory slots mechanism is used to map
 gpa to hva. Then gfn_to_page uses get_user_pages to map hva to hpa.
 However, get_user_pages does not work for mmio, and in addition we
 know the hpa to begin with, so there is no real need to map an hva
 for the mmio region.

 In addition there is an assumption in the code that there is a page
 struct for the frame which is not the case for mmio. So it was
 easier to simply add a list of mmio gpa-hpa mapping.

 I guess we can use the memory slots array to holds the gpa to hpa
 mapping but it is not necessarily natural, and would probably
 require more hooks in the code to handle an mmio memory slot. BTW,
 note that for a guest that has multiple passthrough devices each
 with a set of mmio regions, it might become a long list, so there
 might be value to keep it separate from that respect as well.

 With regards to the potential security issue Anthony pointed out
 (ioctls taking hpa's) we can change the interface so that they will
 take (device, BAR) instead and the kernel will translate to hpa

   

Not enough.  How do you know if this calling process has permissions to 
access that pci device (I retract my previous pci passthrough is as 
rootish as you get remark).

 What do you think? Given that the shadow page table code has to be
 modified anyway (due to the struct page issue), is it worthwhile to
 experiment with mmap(...region) or is the current approach sufficient?
   

As Anthony points out, the advantage to mmap() is that whatever security 
is needed can be applied at that level.  Passing host physical addresses 
from userspace requires a whole new security model.

The issue with gfn_to_page() is real.  We can either try to work around 
it somehow, or we can try to back mmio regions with struct pages.  Now 
that it looks like mem_map is becoming virtually mapped, the cost is 
minimal, but I expect this approach will meet some resistance.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Anthony Liguori
Avi Kivity wrote:
 Ben-Ami Yassour1 wrote:
   
   
 

 Not enough.  How do you know if this calling process has permissions to 
 access that pci device (I retract my previous pci passthrough is as 
 rootish as you get remark).

   
 What do you think? Given that the shadow page table code has to be
 modified anyway (due to the struct page issue), is it worthwhile to
 experiment with mmap(...region) or is the current approach sufficient?
   
 

 As Anthony points out, the advantage to mmap() is that whatever security 
 is needed can be applied at that level.  Passing host physical addresses 
 from userspace requires a whole new security model.

 The issue with gfn_to_page() is real.  We can either try to work around 
 it somehow, or we can try to back mmio regions with struct pages.  Now 
 that it looks like mem_map is becoming virtually mapped, the cost is 
 minimal, but I expect this approach will meet some resistance.
   

What about switching the KVM MMU code to use hfn_t instead of struct 
page? The initial conversion is pretty straight forward as the places 
where you actually need a struct page you can always get it from 
pfn_to_page() (like in kvm_release_page_dirty).

We can then teach the MMU to deal with hfn_t's that don't have a 
corresponding page.  IIUC, the implementation of something like 
kvm_release_page_dirty is a nop for pfn's that don't have a 
corresponding page.  We just have to be able to detect a pfn_to_page() 
failure and then assume we're dealing with IO memory.

Regards,

Anthony Liguori



-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Anthony Liguori
Andrea Arcangeli wrote:
 On Tue, Apr 01, 2008 at 01:21:37PM -0500, Anthony Liguori wrote:
   
 return a page, not a HPA.  I haven't looked too deeply yet, but my 
 suspicion is that to properly support mapping in VM_IO pages will require 
 some general refactoring since we always assume that a struct page exists 
 for any HPA.
 

 Yes, that was potentially problem for reserved _ram_ pages too, as it
 isn't guaranteed that memmap_t (old days nomenclature) will exist for
 physical addresses not defined as ram in the e820 map (to make it work
 without VT-d I have to reserve the ram in the host at the e820 map
 parsing time). If the memmap will not exist for the reserved ram
 physical range, the pfn_valid() will fail at runtime in kvm and the
 bad_page will generate a graceful emulation failure, so it's very
 safe. But once we handle direct memslots for mmio regions, the
 reserved ram will better stop depending on the memmap too.
   

Direct memslots is quite a sledgehammer though to deal with IO pages.

You could get away with supporting reserved RAM another way though.  If 
we refactored the MMU to use hfn_t instead of struct page, you would 
then need a mechanism to mmap() reserved ram into userspace (similar to 
ioremap I guess).  In fact, you may be able to just lie and claim 
reserved ram is IO ram.

Then we could treat the reserved ram in the same way as IO ram and not 
have to introduce a new interface in KVM for mapping arbitrary regions 
of memory.

I would be interested in this sort of mechanism not for reserving low 
RAM, but for reserving high RAM.  Basically, I'd like to boot with mem= 
something and then still be able to map the higher memory in QEMU 
userspace and then create KVM guests with it.

Regards,

Anthony Liguori

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Andrea Arcangeli
On Tue, Apr 01, 2008 at 10:22:51PM +0300, Avi Kivity wrote:
 It's just something we discussed, not code.

Yes, the pfn_valid check should skip all refcounting for mmio regions
without a struct page. But gfn_to_page can't work without a struct
page, so some change will be needed there. With the reserved ram patch
I didn't need to touch the gfn_to_page interface because the memmap
happens to exist despite the ram is reserved in the e820 map, so I
used !page_count() instead of the previously discussed !pfn_valid to
skip all refcounting (reserved ram must skip the refcounting too
because those ram pages can't be freed as they're not-ram as far as
linux is concerned).

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Avi Kivity
Anthony Liguori wrote:

 You could get away with supporting reserved RAM another way though.  
 If we refactored the MMU to use hfn_t instead of struct page, you 
 would then need a mechanism to mmap() reserved ram into userspace 
 (similar to ioremap I guess).  In fact, you may be able to just lie 
 and claim reserved ram is IO ram.

We'd need it to return a (hfn_t, page_type_t) pair so the caller would 
know whether to use page refcounting or not.  The caller might be able 
to ask Linux whether the page has a struct page or not, but it gets hairy.

That's why I'm interested in the possibility of adding struct pages at 
runtime. vmemmap is needed for many other reasons (NUMA, memory hotplug, 
sparse SGI memory) so it's reasonable that it will be enabled on most 
distros.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] direct mmio for passthrough - kernel part

2008-04-01 Thread Avi Kivity
Anthony Liguori wrote:
 What about switching the KVM MMU code to use hfn_t instead of struct 
 page? The initial conversion is pretty straight forward as the places 
 where you actually need a struct page you can always get it from 
 pfn_to_page() (like in kvm_release_page_dirty).

 We can then teach the MMU to deal with hfn_t's that don't have a 
 corresponding page.  IIUC, the implementation of something like 
 kvm_release_page_dirty is a nop for pfn's that don't have a 
 corresponding page.  We just have to be able to detect a pfn_to_page() 
 failure and then assume we're dealing with IO memory.

   

It ought to work.  gfn_to_hfn() (old gfn_to_page) will still need to 
take a refcount if possible.

This will increase the potential for type errors, so maybe we need to 
make gfn_t and hfn_t distinct structures, and use accessors to get the 
actual values.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel