Re: [XenPPC] [PATCH] [RFC] Xencomm patch to fix modules
On Thu, 2007-01-18 at 17:16 -0500, Jimi Xenidis wrote: > On Jan 18, 2007, at 4:26 PM, Hollis Blanchard wrote: > > > On Thu, 2007-01-18 at 16:18 -0500, Jimi Xenidis wrote: > >> > >> Hey, wouldn't virt_addr_valid() do? > > > > I can pass a perfectly "valid" virtual address that is within a > > physically discontiguous area of memory, and this function would > > return > > 0. > > > hmm, I thought the page allocate in linux was contiguous, and: > fastcall void free_pages(unsigned long addr, unsigned int order) > { > if (addr != 0) { > BUG_ON(!virt_addr_valid((void *)addr)); > __free_pages(virt_to_page((void *)addr), order); > } > } So you're thinking that virt_addr_valid() has something to do with being physically contiguous? I couldn't find any code or comments to suggest that. -- Hollis Blanchard IBM Linux Technology Center ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [PATCH] [RFC] Xencomm patch to fix modules
On Jan 18, 2007, at 4:26 PM, Hollis Blanchard wrote: On Thu, 2007-01-18 at 16:18 -0500, Jimi Xenidis wrote: Hey, wouldn't virt_addr_valid() do? I can pass a perfectly "valid" virtual address that is within a physically discontiguous area of memory, and this function would return 0. hmm, I thought the page allocate in linux was contiguous, and: fastcall void free_pages(unsigned long addr, unsigned int order) { if (addr != 0) { BUG_ON(!virt_addr_valid((void *)addr)); __free_pages(virt_to_page((void *)addr), order); } } -JX ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [PATCH] [RFC] Xencomm patch to fix modules
On Thu, 2007-01-18 at 16:18 -0500, Jimi Xenidis wrote: > On Jan 18, 2007, at 1:55 PM, Hollis Blanchard wrote: > > > On Thu, 2007-01-18 at 12:17 -0500, Jimi Xenidis wrote: > >> I agree with most of Hollis's comments, but have some of my own. > >> > >> First, I do not think that the implementation of > is_phys_contiguous() > >> answers the question in its name and IMNSHO is bogus. Perhaps > >> something like: > >>mm/sparse.c vaddr_in_vmalloc_area 232 static int > >> vaddr_in_vmalloc_area(void *addr) > >> > >> And use if (!vaddr_in_vmalloc_area) > > > > The name was my suggestion. It should be commented, but think about > > it: > > we don't care if something is vmalloc or not. We care if it's > > physically > > contiguous or not, so I strongly believe that should be the name of > > the > > test. > > I'm not big on functions that do not implement what the name says it > does. It does exactly what the name says it does: it returns 0 if the area is known to be physically discontiguous, and 1 otherwise. It's called "malloc", not "allocate_from_buddy" or "allocate_first_from_bitmap". That's because you can provide any implementation of the interface, and it can change at any time, and when it does change, you don't need to change the callers. > However, the worst that can happen is a false-negative, (unless it is > an ioremap() address which would be other problems). > > Hey, wouldn't virt_addr_valid() do? I can pass a perfectly "valid" virtual address that is within a physically discontiguous area of memory, and this function would return 0. -- Hollis Blanchard IBM Linux Technology Center ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [PATCH] [RFC] Xencomm patch to fix modules
On Jan 18, 2007, at 1:55 PM, Hollis Blanchard wrote: On Thu, 2007-01-18 at 12:17 -0500, Jimi Xenidis wrote: I agree with most of Hollis's comments, but have some of my own. First, I do not think that the implementation of is_phys_contiguous() answers the question in its name and IMNSHO is bogus. Perhaps something like: mm/sparse.c vaddr_in_vmalloc_area 232 static int vaddr_in_vmalloc_area(void *addr) And use if (!vaddr_in_vmalloc_area) The name was my suggestion. It should be commented, but think about it: we don't care if something is vmalloc or not. We care if it's physically contiguous or not, so I strongly believe that should be the name of the test. I'm not big on functions that do not implement what the name says it does. However, the worst that can happen is a false-negative, (unless it is an ioremap() address which would be other problems). Hey, wouldn't virt_addr_valid() do? More importantly... (and this needs to be addressed before the patch is accepted) I like the "map" name change, but not sure about "early". oops this last line slipped by.. I wrote the email over multiple sittings. What is your request? Just renaming it? My point is the "early" is not the only reason mini was used. There are 2 reasons why we had mini/inline/early: 1) because the allocator was not ready, I think this applies to a small number of hcalls 2) we cannot "sleep" (in_interrupt()) and the allocator sleeps, Mainly evtchn related and console. So early covers (1) but (2) will be problematic, I noted the ones below that may reflect (2), I for one, have not been diligent in commenting why mini/inline is actually used and I think we need to do so. What, add a comment describing a code path that may change in the future? I would object to that. Was that sarcasm? So giving this some more thought, and jerking you around even more (sorry), aside from using inline to optimize this: - We can detect (1) with slab_is_available() and use mini - We can detect (2) with in_interrupt() and try GFP_ATOMIC then mini. Not a request, but something to think about. What *is* your request then, the one that needs to be addressed? My request is that ,at a minimum, everything that was inline/mini must be _early because we cannot alloc anything while in_interrupt(). Ultimately, we call the right allocator at the right time. And since we have interfaces to detect both early and in_interrupt we can probably bring this down to a single inteligent xencomm_map() call. Mini, has always bugged me, it seems to me that this could be satisfied by a per_cpu static, or preallocated buffer to for the xen descriptor. This may not be viable because it assumes no interrupts, and though for asynchronous interrupts this may be a safe assumption, if one were to suffer a series of page faults (from a wild pointer in this path) we would have a silent hang, which is never good. So mini has always bugged you, but you agree there's no better way to do it and are just thinking out loud? yes. -JX ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [PATCH] [RFC] Xencomm patch to fix modules
On Thu, 2007-01-18 at 12:17 -0500, Jimi Xenidis wrote: > I agree with most of Hollis's comments, but have some of my own. > > First, I do not think that the implementation of is_phys_contiguous() > answers the question in its name and IMNSHO is bogus. Perhaps > something like: >mm/sparse.c vaddr_in_vmalloc_area 232 static int > vaddr_in_vmalloc_area(void *addr) > > And use if (!vaddr_in_vmalloc_area) The name was my suggestion. It should be commented, but think about it: we don't care if something is vmalloc or not. We care if it's physically contiguous or not, so I strongly believe that should be the name of the test. If in the future vmalloc space becomes physically contiguous, or some other region of memory becomes physically discontiguous, we will need to change the implementation of the function, but the name remains the same. > More importantly... (and this needs to be addressed before the patch > is accepted) > I like the "map" name change, but not sure about "early". What is your request? Just renaming it? > There are 2 reasons why we had mini/inline/early: > 1) because the allocator was not ready, I think this applies to a > small number of hcalls > 2) we cannot "sleep" (in_interrupt()) and the allocator sleeps, > Mainly evtchn related and console. > > So early covers (1) but (2) will be problematic, I noted the ones > below that may reflect (2), I for one, have not been diligent in > commenting why mini/inline is actually used and I think we need to do > so. What, add a comment describing a code path that may change in the future? I would object to that. > So giving this some more thought, and jerking you around even more > (sorry), aside from using inline to optimize this: > > - We can detect (1) with slab_is_available() and use mini > - We can detect (2) with in_interrupt() and try GFP_ATOMIC then mini. > > Not a request, but something to think about. What *is* your request then, the one that needs to be addressed? > Mini, has always bugged me, it seems to me that this could be > satisfied by a per_cpu static, or preallocated buffer to for the xen > descriptor. This may not be viable because it assumes no interrupts, > and though for asynchronous interrupts this may be a safe assumption, > if one were to suffer a series of page faults (from a wild pointer in > this path) we would have a silent hang, which is never good. So mini has always bugged you, but you agree there's no better way to do it and are just thinking out loud? > See the comment below that should remove this function entirely, > Hollis please ACK. > > +/* "mini" routines, for stack-based communications: */ > > +static void *xencomm_alloc_mini(void *area, int arealen) > > +{ > > + unsigned long base = (unsigned long)area; > > + unsigned int left_in_page; > > + > > + left_in_page = PAGE_SIZE - base % PAGE_SIZE; > > + > > + /* we probably fit right at the front of area */ > > + if (left_in_page >= sizeof(struct xencomm_mini)) { > > + return area; > > + } > > + > > + /* if not, see if area is big enough to advance to the next page */ > > + if ((arealen - left_in_page) >= sizeof(struct xencomm_mini)) > > + return (void *)(base + left_in_page); > > + > > + /* area was too small */ > > + return NULL; > > +} ... > > Since we are completely automating MINI we can simplify (remove) all > the alignment logic by asking gcc for some help. > > > +#define xencomm_map_early(ptr, bytes) \ > > + ({char xc_area[XENCOMM_MINI_AREA];\ > ({char xc_area[XENCOMM_MINI_AREA] \ > __attribute__((__aligned__(sizeof(struct xencomm_mini;\ > > In fact it can now be a the struct instead of a char. Agreed. -- Hollis Blanchard IBM Linux Technology Center ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [PATCH] [RFC] Xencomm patch to fix modules
I agree with most of Hollis's comments, but have some of my own. First, I do not think that the implementation of is_phys_contiguous() answers the question in its name and IMNSHO is bogus. Perhaps something like: mm/sparse.c vaddr_in_vmalloc_area 232 static int vaddr_in_vmalloc_area(void *addr) And use if (!vaddr_in_vmalloc_area) More importantly... (and this needs to be addressed before the patch is accepted) I like the "map" name change, but not sure about "early". There are 2 reasons why we had mini/inline/early: 1) because the allocator was not ready, I think this applies to a small number of hcalls 2) we cannot "sleep" (in_interrupt()) and the allocator sleeps, Mainly evtchn related and console. So early covers (1) but (2) will be problematic, I noted the ones below that may reflect (2), I for one, have not been diligent in commenting why mini/inline is actually used and I think we need to do so. So giving this some more thought, and jerking you around even more (sorry), aside from using inline to optimize this: - We can detect (1) with slab_is_available() and use mini - We can detect (2) with in_interrupt() and try GFP_ATOMIC then mini. Not a request, but something to think about. Mini, has always bugged me, it seems to me that this could be satisfied by a per_cpu static, or preallocated buffer to for the xen descriptor. This may not be viable because it assumes no interrupts, and though for asynchronous interrupts this may be a safe assumption, if one were to suffer a series of page faults (from a wild pointer in this path) we would have a silent hang, which is never good. More on the patch below. On Jan 17, 2007, at 10:20 PM, Jerone Young wrote: I haven't had a chance to fully test this patch. As I have been having problems with my blade today. But here is the code for review. I'll get back to the list once it is fully tested. Make sure you run some VIO workloads to make sure we get the in_interrupt() cases. Signed-off-by: Jerone Young <[EMAIL PROTECTED]> diff -r e5962db17966 arch/powerpc/platforms/xen/gnttab.c --- a/arch/powerpc/platforms/xen/gnttab.c Wed Jan 17 10:25:30 2007 -0600 +++ b/arch/powerpc/platforms/xen/gnttab.c Tue Jan 16 23:50:28 2007 -0600 @@ -263,8 +263,9 @@ int HYPERVISOR_grant_table_op(unsigned i memcpy(&setup, op, sizeof(setup)); argsize = sizeof(setup); - frame_list = xencomm_create_inline( - xen_guest_handle(setup.frame_list)); + frame_list = xencomm_map( + xen_guest_handle(setup.frame_list), + (sizeof(ulong) * setup.nr_frames)); set_xen_guest_handle(setup.frame_list, frame_list); memcpy(op, &setup, sizeof(setup)); @@ -286,7 +287,7 @@ int HYPERVISOR_grant_table_op(unsigned i return -ENOSYS; } I believe this is an in_interrupt() case. - desc = xencomm_create_inline(op); + desc = xencomm_map(op, argsize); ret = plpar_hcall_norets(XEN_MARK(__HYPERVISOR_grant_table_op), cmd, desc, count); diff -r e5962db17966 arch/powerpc/platforms/xen/hcall.c --- a/arch/powerpc/platforms/xen/hcall.c Wed Jan 17 10:25:30 2007 -0600 +++ b/arch/powerpc/platforms/xen/hcall.c Wed Jan 17 00:17:17 2007 -0600 @@ -50,11 +50,16 @@ * In general, we need a xencomm descriptor to cover the top-level data * structure (e.g. the domctl op), plus another for every embedded pointer to * another data structure (i.e. for every GUEST_HANDLE). + * + * Some hypercalls are made before the memory subsystem is up, so instead of + * calling xencomm_create(), we allocate XENCOMM_MINI_AREA bytes from the stack + * to hold the xencomm descriptor. */ int HYPERVISOR_console_io(int cmd, int count, char *str) { - void *desc = xencomm_create_inline(str); + + void *desc = xencomm_map_early(str, count); Early and interrupt return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_console_io), cmd, count, desc); @@ -63,7 +68,8 @@ EXPORT_SYMBOL(HYPERVISOR_console_io); int HYPERVISOR_event_channel_op(int cmd, void *op) { - void *desc = xencomm_create_inline(op); + + void *desc = xencomm_map_early(op, sizeof(evtchn_op_t)); interrupt return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_event_channel_op), cmd, desc); @@ -120,8 +126,8 @@ EXPORT_SYMBOL(HYPERVISOR_xen_version); int HYPERVISOR_xen_version(int cmd, void *arg) { - if (is_kernel_addr((unsigned long)arg)) { - void *desc = xencomm_create_inline(arg); + if (is_phys_contiguous((unsigned long)arg)) { + void *desc = xencomm_map_early(arg, sizeof(__u64)); return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_xen_version), cmd, desc); } return HYPERVISOR_xen_version_userspace(cmd, arg); @@ -12
Re: [XenPPC] [PATCH] [RFC] Xencomm patch to fix modules
On Wed, 2007-01-17 at 21:20 -0600, Jerone Young wrote: > I haven't had a chance to fully test this patch. As I have been having > problems with my blade today. But here is the code for review. I'll get > back to the list once it is fully tested. This looks pretty good; just a couple issues left... :) Rename HYPERVISOR_xen_version_userspace() to HYPERVISOR_xen_version(), delete the old HYPERVISOR_xen_version(), and replace the call to xencomm_create() with xencomm_map(). In fact, now that we have xencomm_map() and xencomm_map_early(), xencomm_create() and xencomm_create_mini() should become static (and don't export them in xencomm.h). Don't forget to update the copyright statement on all these files. diff -r e5962db17966 drivers/xen/core/xencomm.c --- a/drivers/xen/core/xencomm.cWed Jan 17 10:25:30 2007 -0600 +++ b/drivers/xen/core/xencomm.cWed Jan 17 01:42:11 2007 -0600 @@ -123,9 +123,87 @@ void *xencomm_create_inline(void *ptr) { unsigned long paddr; - BUG_ON(!is_kernel_addr((unsigned long)ptr)); - paddr = (unsigned long)xencomm_pa(ptr); BUG_ON(paddr & XENCOMM_INLINE_FLAG); return (void *)(paddr | XENCOMM_INLINE_FLAG); } We should change this to BUG_ON(!is_phys_contig((unsigned long)ptr)) to prevent accidents. As soon as these issues are solved and you've tested the patch with VIO modules, I'll check it in. -- Hollis Blanchard IBM Linux Technology Center ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
[XenPPC] [PATCH] [RFC] Xencomm patch to fix modules
I haven't had a chance to fully test this patch. As I have been having problems with my blade today. But here is the code for review. I'll get back to the list once it is fully tested. Signed-off-by: Jerone Young <[EMAIL PROTECTED]> diff -r e5962db17966 arch/powerpc/platforms/xen/gnttab.c --- a/arch/powerpc/platforms/xen/gnttab.c Wed Jan 17 10:25:30 2007 -0600 +++ b/arch/powerpc/platforms/xen/gnttab.c Tue Jan 16 23:50:28 2007 -0600 @@ -263,8 +263,9 @@ int HYPERVISOR_grant_table_op(unsigned i memcpy(&setup, op, sizeof(setup)); argsize = sizeof(setup); - frame_list = xencomm_create_inline( - xen_guest_handle(setup.frame_list)); + frame_list = xencomm_map( + xen_guest_handle(setup.frame_list), + (sizeof(ulong) * setup.nr_frames)); set_xen_guest_handle(setup.frame_list, frame_list); memcpy(op, &setup, sizeof(setup)); @@ -286,7 +287,7 @@ int HYPERVISOR_grant_table_op(unsigned i return -ENOSYS; } - desc = xencomm_create_inline(op); + desc = xencomm_map(op, argsize); ret = plpar_hcall_norets(XEN_MARK(__HYPERVISOR_grant_table_op), cmd, desc, count); diff -r e5962db17966 arch/powerpc/platforms/xen/hcall.c --- a/arch/powerpc/platforms/xen/hcall.cWed Jan 17 10:25:30 2007 -0600 +++ b/arch/powerpc/platforms/xen/hcall.cWed Jan 17 00:17:17 2007 -0600 @@ -50,11 +50,16 @@ * In general, we need a xencomm descriptor to cover the top-level data * structure (e.g. the domctl op), plus another for every embedded pointer to * another data structure (i.e. for every GUEST_HANDLE). + * + * Some hypercalls are made before the memory subsystem is up, so instead of + * calling xencomm_create(), we allocate XENCOMM_MINI_AREA bytes from the stack + * to hold the xencomm descriptor. */ int HYPERVISOR_console_io(int cmd, int count, char *str) { - void *desc = xencomm_create_inline(str); + + void *desc = xencomm_map_early(str, count); return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_console_io), cmd, count, desc); @@ -63,7 +68,8 @@ EXPORT_SYMBOL(HYPERVISOR_console_io); int HYPERVISOR_event_channel_op(int cmd, void *op) { - void *desc = xencomm_create_inline(op); + + void *desc = xencomm_map_early(op, sizeof(evtchn_op_t)); return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_event_channel_op), cmd, desc); @@ -120,8 +126,8 @@ EXPORT_SYMBOL(HYPERVISOR_xen_version); int HYPERVISOR_xen_version(int cmd, void *arg) { - if (is_kernel_addr((unsigned long)arg)) { - void *desc = xencomm_create_inline(arg); + if (is_phys_contiguous((unsigned long)arg)) { + void *desc = xencomm_map_early(arg, sizeof(__u64)); return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_xen_version), cmd, desc); } return HYPERVISOR_xen_version_userspace(cmd, arg); @@ -129,7 +135,7 @@ int HYPERVISOR_xen_version(int cmd, void int HYPERVISOR_physdev_op(int cmd, void *op) { - void *desc = xencomm_create_inline(op); + void *desc = xencomm_map(op, sizeof(physdev_op_t)); return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_physdev_op), cmd, desc); @@ -139,6 +145,7 @@ int HYPERVISOR_sched_op(int cmd, void *a int HYPERVISOR_sched_op(int cmd, void *arg) { struct xencomm_desc *desc; + int argsize = 0; switch (cmd) { case SCHEDOP_yield: @@ -151,24 +158,29 @@ int HYPERVISOR_sched_op(int cmd, void *a evtchn_port_t *ports; struct sched_poll sched_poll; + argsize = sizeof(struct sched_poll); + memcpy(&sched_poll, arg, sizeof(sched_poll)); - ports = xencomm_create_inline( - xen_guest_handle(sched_poll.ports)); + ports = xencomm_map( + xen_guest_handle(sched_poll.ports), + (sizeof(evtchn_port_t) * sched_poll.nr_ports)); set_xen_guest_handle(sched_poll.ports, ports); memcpy(arg, &sched_poll, sizeof(sched_poll)); - + } break; case SCHEDOP_shutdown: + argsize = sizeof(struct sched_shutdown); case SCHEDOP_remote_shutdown: + argsize = sizeof(struct sched_remote_shutdown); break; default: printk(KERN_ERR "%s: unknown sched op %d\n", __func__, cmd); return -ENOSYS; } - desc = xencomm_create_inline(arg); + desc = xencomm_map(arg, argsize); return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_sched_op), cmd, desc); diff -r e5962db17966 drivers/xen/core/xencomm.c --- a/drivers/xen/co