Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-12 Thread Isaku Yamahata
On Wed, Dec 05, 2007 at 06:15:49PM +, Derek Murray wrote: Keir Fraser wrote: Yes, this would work okay I suspect. Good enough as a stop-gap measure? Are there any other responsibilities that you acquire if you make use of VM_FOREIGN (in particular, how would this affect get_user_pages)?

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-06 Thread Gerd Hoffmann
D.G. Murray wrote: Hi Mark, Maybe a change to the gntdev userspace API to allow batching of mapping requests? Something along the lines of the following? void *xc_gnttab_map_grant_refs(int xcg_handle, uint32_t count,

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-06 Thread Derek Murray
Gerd Hoffmann wrote: Yes, except that it should actually work ;) It doesn't for me (Fedora 8 again). Grab xenner 0.9 (just uploaded), edit blkbackd.c and flip the BATCH_MAPS from 0 to 1, compile, run, see it not work. Which version of the Xen tools are you using? There was a bug in the

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-06 Thread Jeremy Fitzhardinge
Derek Murray wrote: Keir Fraser wrote: You'd need to track pte-grant_handle mappings somewhere, but it could certainly be done this way, yes. At the moment, blktap and gntdev provide struct pages to get_user_pages by smuggling them in the vm_private_data field of the relevant

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Hi Gerd, Gerd Hoffmann wrote: Want reproduce? Here we go: * grab xenner 0.8 from http://dl.bytesex.org/releases/xenner/ * grab a xenified dom0 kernel without blktap driver (either not compiled or module not loaded). * start xend * start blkbackd from xenner package (you probably

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Gerd, Can you try the attached patch against linux-2.6.18-xen.hg? I think the problem was that the gntdev VMA is not marked as being VM_PFNMAP, therefore it tries to get a struct page_struct for each granted page when it is unmapped (and maybe sometimes succeeds (incorrectly), which could be

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Keir Fraser wrote: Is this patch to go into linux-2.6.18-xen.hg then? Yes, even if it doesn't fix the exact bug we're seeing here, I think it should go in. I've attached a version with my signed-off-by and a better commit comment. Cheers, Derek. # HG changeset patch # User [EMAIL

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Keir Fraser wrote: Yes, this would work okay I suspect. Good enough as a stop-gap measure? Are there any other responsibilities that you acquire if you make use of VM_FOREIGN (in particular, how would this affect get_user_pages)? VM_FOREIGN is already set for the gntdev VMA (mostly because

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Jeremy Fitzhardinge wrote: Could we use one of the software-defined bits in the PTE to indicate that this is a foreign/granted PTE, and have set_pte_at behave differently if you pass it a pte with this bit set? Actually, as Gerd pointed out in his answer to his own question, the use of

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Keir Fraser wrote: Actually I'm not so sure now. Presumably you add VM_PFNMAP to make vm_normal_page() return NULL? But actually I would expect pte_pfn() to return max_mapnr because the mapped page is not a local page. And that should cause vm_normal_page() to return NULL always, regardless of

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Keir Fraser wrote: Need to bite the bullet and fix this properly by setting a software flag in ptes that are not subject to reference counting. Could we get away with testing the VM_FOREIGN flag in vm_normal_page()? Although I get the impression that this wouldn't be easily justified if

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Jeremy Fitzhardinge
Derek Murray wrote: Ultimately, fork calls dup_mm, which calls, dup_mmap, which calls copy_{page,pud,pmd,pte}_range, which calls copy_one_pte, which calls set_pte_at, which hypercalls HYPERVISOR_update_va_mapping. The hypercall will not succeed and will return an error code indicating the

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Jeremy Fitzhardinge
Derek Murray wrote: Jeremy Fitzhardinge wrote: Could we use one of the software-defined bits in the PTE to indicate that this is a foreign/granted PTE, and have set_pte_at behave differently if you pass it a pte with this bit set? Actually, as Gerd pointed out in his answer to his own

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Keir Fraser
On 5/12/07 17:17, Derek Murray [EMAIL PROTECTED] wrote: Actually I'm not so sure now. Presumably you add VM_PFNMAP to make vm_normal_page() return NULL? But actually I would expect pte_pfn() to return max_mapnr because the mapped page is not a local page. And that should cause

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Keir Fraser
On 5/12/07 20:15, Jeremy Fitzhardinge [EMAIL PROTECTED] wrote: In 2.6.18-xen the only two implementations of zap_pte are blktap_clear_pte and gntdev_clear_pte. Given a ptep with the grant-mapping bit set, could we determine which of these need calling and do the appropriate thing? Do we

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Gerd Hoffmann
Alternatively, could we use the _PAGE_GNTTAB PTE flag that is used for debugging? Indeed, if we did this, could be obviate the need for the PTE-zapping hook, by instead catching the case where this flag is set, and unmapping the grant implicitly? Well, in the general case you don't have

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Keir Fraser
On 5/12/07 14:30, Derek Murray [EMAIL PROTECTED] wrote: Keir Fraser wrote: Is this patch to go into linux-2.6.18-xen.hg then? Yes, even if it doesn't fix the exact bug we're seeing here, I think it should go in. I've attached a version with my signed-off-by and a better commit comment.

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Stephen C. Tweedie wrote: So... the interface (a) cannot be used on the Linux VM without at least one invasive VM modification, due to the requirement of ptes being explicitly unmapped via hypercall; Also there is the use of VM_FOREIGN

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Gerd Hoffmann
Hi, gntdev doesn't even try to handle forking. I wouldn't be surprised if that is a great way to kill Domain-0. The xen hypervisor will most likely not be amused to find a pte refering to a granted (but foreign) page which wasn't established using the grant table interface. Pinning the

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-04 Thread Derek Murray
Gerd Hoffmann wrote: On this point I completely agree with you! If anyone has any less radical suggestions, then I'd be delighted to refactor the gntdev code to use them. However, I'm not currently aware of any alternative that maintains robustness to process crashes. Oh, for me it isn't

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-04 Thread Gerd Hoffmann
Derek Murray wrote: Gerd Hoffmann wrote: Oh, for me it isn't robust at all, it crashes on the first munmap syscall. It is the Fedora 8 kernel. See attachment. Didn't try xensource 2.6.18 yet. My gut feeling is that something changed in mm between 2.6.18 and 2.6.21, but that seems like a

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-04 Thread Gerd Hoffmann
Stephen C. Tweedie wrote: Hi, On Tue, 2007-12-04 at 13:01 +0100, Gerd Hoffmann wrote: Who uses the gntdev device right now? Good question! I'm aware of it being used in a few research projects, and it seems to work for them (though I think it is mostly used with the linux-2.6.18-xen

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Gerd Hoffmann
Derek Murray wrote: I take the blame for that one. I added the hook because, if a process were to die whilst holding one or more grants, there were no hooks that would make it possible to carry out the grant-unmap. All existing hooks on either the device or the VMA were called *after* the PTEs

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Derek Murray
Gerd Hoffmann wrote: Derek Murray wrote: I take the blame for that one. I added the hook because, if a process were to die whilst holding one or more grants, there were no hooks that would make it possible to carry out the grant-unmap. All existing hooks on either the device or the VMA were

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Derek Murray
I take the blame for that one. I added the hook because, if a process were to die whilst holding one or more grants, there were no hooks that would make it possible to carry out the grant-unmap. All existing hooks on either the device or the VMA were called *after* the PTEs were cleared. It

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Mark Williamson
It gets better, though. The same hook is used in the version of blktap in linux-2.6.18-xen (not, as far as I can see, in the sparse tree for xen-3.1-testing): Oh, I'm thinking more in the direction of killing blktap altogether in favor of a pure userspace implementation on top of

RE: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread D.G. Murray
Hi Mark, Maybe a change to the gntdev userspace API to allow batching of mapping requests? Something along the lines of the following? /** * Memory maps one or more grant references from one or more domains to a * contiguous local address range. Mappings should be unmapped with *

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Mark Williamson
Hi Mark, Maybe a change to the gntdev userspace API to allow batching of mapping requests? Something along the lines of the following? Just like that :-D When you said multiple syscalls per mapping I assumed you meant that we'd lose the batching you get by doing a mulicall. If it's

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-27 Thread Jan Beulich
It breaks with: Intel machine check architecture supported. (XEN) traps.c:1734:d0 Domain attempted WRMSR 0404 from :0001 to :. Intel machine check reporting enabled on CPU#0. general protection fault: [#1] SMP Modules linked in: Hm. Looks like

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-27 Thread Jeremy Fitzhardinge
Jan Beulich wrote: It breaks with: Intel machine check architecture supported. (XEN) traps.c:1734:d0 Domain attempted WRMSR 0404 from :0001 to :. Intel machine check reporting enabled on CPU#0. general protection fault: [#1] SMP Modules linked in:

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-27 Thread Jan Beulich
The oops and backtrace doesn't suggest it's an MSR write. Does a crX Oh, right, the MSR write is being ignored, not failed. write take the same path through the emulator as an MSR write? No, the two operations take different paths. Jan ___

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-26 Thread Juan Quintela
Hi, your console works great, but rest of patches are assuming: arch/x86/boot/compressed/notes-xen.c arch/x86/xen/early.c at least. It looks as if there is missing another patche, could you take a look, please? Otherwise, I will take a look at what is missing. It breaks with: Intel machine