Re: [XenPPC] [PATCH] [RFC] Xencomm patch to fix modules

2007-01-19 Thread Hollis Blanchard
On Thu, 2007-01-18 at 17:16 -0500, Jimi Xenidis wrote:
> On Jan 18, 2007, at 4:26 PM, Hollis Blanchard wrote:
> 
> > On Thu, 2007-01-18 at 16:18 -0500, Jimi Xenidis wrote:
> >>
> >> Hey, wouldn't virt_addr_valid() do?
> >
> > I can pass a perfectly "valid" virtual address that is within a
> > physically discontiguous area of memory, and this function would
> > return
> > 0.
> >
> hmm, I thought the page allocate in linux was contiguous, and:
> fastcall void free_pages(unsigned long addr, unsigned int order)
> {
>   if (addr != 0) {
>   BUG_ON(!virt_addr_valid((void *)addr));
>   __free_pages(virt_to_page((void *)addr), order);
>   }
> }

So you're thinking that virt_addr_valid() has something to do with being
physically contiguous? I couldn't find any code or comments to suggest
that.

-- 
Hollis Blanchard
IBM Linux Technology Center


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [PATCH] [RFC] Xencomm patch to fix modules

2007-01-18 Thread Jimi Xenidis


On Jan 18, 2007, at 4:26 PM, Hollis Blanchard wrote:


On Thu, 2007-01-18 at 16:18 -0500, Jimi Xenidis wrote:


Hey, wouldn't virt_addr_valid() do?


I can pass a perfectly "valid" virtual address that is within a
physically discontiguous area of memory, and this function would  
return

0.


hmm, I thought the page allocate in linux was contiguous, and:
fastcall void free_pages(unsigned long addr, unsigned int order)
{
if (addr != 0) {
BUG_ON(!virt_addr_valid((void *)addr));
__free_pages(virt_to_page((void *)addr), order);
}
}

-JX


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [PATCH] [RFC] Xencomm patch to fix modules

2007-01-18 Thread Hollis Blanchard
On Thu, 2007-01-18 at 16:18 -0500, Jimi Xenidis wrote:
> On Jan 18, 2007, at 1:55 PM, Hollis Blanchard wrote:
> 
> > On Thu, 2007-01-18 at 12:17 -0500, Jimi Xenidis wrote:
> >> I agree with most of Hollis's comments, but have some of my own.
> >>
> >> First, I do not think that the implementation of
> is_phys_contiguous()
> >> answers the question in its name and IMNSHO is bogus.  Perhaps
> >> something like:
> >>mm/sparse.c vaddr_in_vmalloc_area 232 static int
> >> vaddr_in_vmalloc_area(void *addr)
> >>
> >> And use if (!vaddr_in_vmalloc_area)
> >
> > The name was my suggestion. It should be commented, but think about
> > it:
> > we don't care if something is vmalloc or not. We care if it's
> > physically
> > contiguous or not, so I strongly believe that should be the name of
> > the
> > test.
> 
> I'm not big on functions that do not implement what the name says it
> does.

It does exactly what the name says it does: it returns 0 if the area is
known to be physically discontiguous, and 1 otherwise.

It's called "malloc", not "allocate_from_buddy" or
"allocate_first_from_bitmap". That's because you can provide any
implementation of the interface, and it can change at any time, and when
it does change, you don't need to change the callers.

> However, the worst that can happen is a false-negative, (unless it is
> an ioremap() address which would be other problems).
> 
> Hey, wouldn't virt_addr_valid() do?

I can pass a perfectly "valid" virtual address that is within a
physically discontiguous area of memory, and this function would return
0.

-- 
Hollis Blanchard
IBM Linux Technology Center


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [PATCH] [RFC] Xencomm patch to fix modules

2007-01-18 Thread Jimi Xenidis

On Jan 18, 2007, at 1:55 PM, Hollis Blanchard wrote:


On Thu, 2007-01-18 at 12:17 -0500, Jimi Xenidis wrote:

I agree with most of Hollis's comments, but have some of my own.

First, I do not think that the implementation of is_phys_contiguous()
answers the question in its name and IMNSHO is bogus.  Perhaps
something like:
   mm/sparse.c vaddr_in_vmalloc_area 232 static int
vaddr_in_vmalloc_area(void *addr)

And use if (!vaddr_in_vmalloc_area)


The name was my suggestion. It should be commented, but think about  
it:
we don't care if something is vmalloc or not. We care if it's  
physically
contiguous or not, so I strongly believe that should be the name of  
the

test.


I'm not big on functions that do not implement what the name says it  
does.
However, the worst that can happen is a false-negative, (unless it is  
an ioremap() address which would be other problems).


Hey, wouldn't virt_addr_valid() do?




More importantly... (and this needs to be addressed before the patch
is accepted)
I like the "map" name change, but not sure about "early".
oops this last line slipped by.. I wrote the email over multiple  
sittings.


What is your request? Just renaming it?


My point is the "early" is not the only reason mini was used.




There are 2 reasons why we had mini/inline/early:
1) because the allocator was not ready, I think this applies to a
small number of hcalls
2) we cannot "sleep" (in_interrupt()) and the allocator sleeps,
Mainly evtchn related and console.

So early covers (1) but (2) will be problematic, I noted the ones
below that may reflect (2), I for one, have not been diligent in
commenting why mini/inline is actually used and I think we need to do
so.


What, add a comment describing a code path that may change in the
future? I would object to that.

Was that sarcasm?




So giving this some more thought, and jerking you around even more
(sorry), aside from using inline to optimize this:

  - We can detect (1) with slab_is_available() and use mini
  - We can detect (2) with in_interrupt() and try GFP_ATOMIC then  
mini.


Not a request, but something to think about.


What *is* your request then, the one that needs to be addressed?
My request is that ,at a minimum, everything that was inline/mini  
must be _early because we cannot alloc anything while in_interrupt().  
Ultimately, we call the right allocator at the right time.
And since we have interfaces to detect both early and in_interrupt we  
can probably bring this down to a single inteligent xencomm_map() call.





Mini, has always bugged me, it seems to me that this could be
satisfied by a per_cpu static, or preallocated buffer to for the xen
descriptor.  This may not be viable because it assumes no interrupts,
and though for asynchronous interrupts this may be a safe assumption,
if one were to suffer a series of page faults (from a wild pointer in
this path) we would have a silent hang, which is never good.


So mini has always bugged you, but you agree there's no better way  
to do

it and are just thinking out loud?


yes.
-JX


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [PATCH] [RFC] Xencomm patch to fix modules

2007-01-18 Thread Hollis Blanchard
On Thu, 2007-01-18 at 12:17 -0500, Jimi Xenidis wrote:
> I agree with most of Hollis's comments, but have some of my own.
> 
> First, I do not think that the implementation of is_phys_contiguous()
> answers the question in its name and IMNSHO is bogus.  Perhaps
> something like:
>mm/sparse.c vaddr_in_vmalloc_area 232 static int
> vaddr_in_vmalloc_area(void *addr)
> 
> And use if (!vaddr_in_vmalloc_area)

The name was my suggestion. It should be commented, but think about it:
we don't care if something is vmalloc or not. We care if it's physically
contiguous or not, so I strongly believe that should be the name of the
test.

If in the future vmalloc space becomes physically contiguous, or some
other region of memory becomes physically discontiguous, we will need to
change the implementation of the function, but the name remains the
same.

> More importantly... (and this needs to be addressed before the patch
> is accepted)
> I like the "map" name change, but not sure about "early".

What is your request? Just renaming it?

> There are 2 reasons why we had mini/inline/early:
> 1) because the allocator was not ready, I think this applies to a
> small number of hcalls
> 2) we cannot "sleep" (in_interrupt()) and the allocator sleeps,
> Mainly evtchn related and console.
> 
> So early covers (1) but (2) will be problematic, I noted the ones
> below that may reflect (2), I for one, have not been diligent in
> commenting why mini/inline is actually used and I think we need to do
> so.

What, add a comment describing a code path that may change in the
future? I would object to that.

> So giving this some more thought, and jerking you around even more
> (sorry), aside from using inline to optimize this:
> 
>   - We can detect (1) with slab_is_available() and use mini
>   - We can detect (2) with in_interrupt() and try GFP_ATOMIC then mini.
> 
> Not a request, but something to think about.

What *is* your request then, the one that needs to be addressed?

> Mini, has always bugged me, it seems to me that this could be
> satisfied by a per_cpu static, or preallocated buffer to for the xen
> descriptor.  This may not be viable because it assumes no interrupts,
> and though for asynchronous interrupts this may be a safe assumption,
> if one were to suffer a series of page faults (from a wild pointer in
> this path) we would have a silent hang, which is never good.

So mini has always bugged you, but you agree there's no better way to do
it and are just thinking out loud?

> See the comment below that should remove this function entirely,
> Hollis please ACK.
> > +/* "mini" routines, for stack-based communications: */
> > +static void *xencomm_alloc_mini(void *area, int arealen)
> > +{
> > +   unsigned long base = (unsigned long)area;
> > +   unsigned int left_in_page;
> > +
> > +   left_in_page = PAGE_SIZE - base % PAGE_SIZE;
> > +
> > +   /* we probably fit right at the front of area */
> > +   if (left_in_page >= sizeof(struct xencomm_mini)) {
> > +   return area;
> > +   }
> > +
> > +   /* if not, see if area is big enough to advance to the next page */
> > +   if ((arealen - left_in_page) >= sizeof(struct xencomm_mini))
> > +   return (void *)(base + left_in_page);
> > +
> > +   /* area was too small */
> > +   return NULL;
> > +}
...
> 
> Since we are completely automating MINI we can simplify (remove) all
> the alignment logic by asking gcc for some help.
> 
> > +#define xencomm_map_early(ptr, bytes) \
> > +   ({char xc_area[XENCOMM_MINI_AREA];\
>   ({char xc_area[XENCOMM_MINI_AREA] \
>   __attribute__((__aligned__(sizeof(struct xencomm_mini;\
> 
> In fact it can now be a the struct instead of a char.

Agreed.

-- 
Hollis Blanchard
IBM Linux Technology Center


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [PATCH] [RFC] Xencomm patch to fix modules

2007-01-18 Thread Jimi Xenidis

I agree with most of Hollis's comments, but have some of my own.

First, I do not think that the implementation of is_phys_contiguous()  
answers the question in its name and IMNSHO is bogus.  Perhaps  
something like:
  mm/sparse.c vaddr_in_vmalloc_area 232 static int  
vaddr_in_vmalloc_area(void *addr)


And use if (!vaddr_in_vmalloc_area)

More importantly... (and this needs to be addressed before the patch  
is accepted)

I like the "map" name change, but not sure about "early".
There are 2 reasons why we had mini/inline/early:
1) because the allocator was not ready, I think this applies to a  
small number of hcalls
2) we cannot "sleep" (in_interrupt()) and the allocator sleeps,  
Mainly evtchn related and console.


So early covers (1) but (2) will be problematic, I noted the ones  
below that may reflect (2), I for one, have not been diligent in  
commenting why mini/inline is actually used and I think we need to do  
so.


So giving this some more thought, and jerking you around even more  
(sorry), aside from using inline to optimize this:


 - We can detect (1) with slab_is_available() and use mini
 - We can detect (2) with in_interrupt() and try GFP_ATOMIC then mini.

Not a request, but something to think about.
Mini, has always bugged me, it seems to me that this could be  
satisfied by a per_cpu static, or preallocated buffer to for the xen  
descriptor.  This may not be viable because it assumes no interrupts,  
and though for asynchronous interrupts this may be a safe assumption,  
if one were to suffer a series of page faults (from a wild pointer in  
this path) we would have a silent hang, which is never good.



More on the patch below.

On Jan 17, 2007, at 10:20 PM, Jerone Young wrote:


I haven't had a chance to fully test this patch. As I have been having
problems with my blade today. But here is the code for review. I'll  
get

back to the list once it is fully tested.


Make sure you run some VIO workloads to make sure we get the  
in_interrupt() cases.




Signed-off-by: Jerone Young <[EMAIL PROTECTED]>

diff -r e5962db17966 arch/powerpc/platforms/xen/gnttab.c
--- a/arch/powerpc/platforms/xen/gnttab.c	Wed Jan 17 10:25:30 2007  
-0600
+++ b/arch/powerpc/platforms/xen/gnttab.c	Tue Jan 16 23:50:28 2007  
-0600

@@ -263,8 +263,9 @@ int HYPERVISOR_grant_table_op(unsigned i
memcpy(&setup, op, sizeof(setup));
argsize = sizeof(setup);

-   frame_list = xencomm_create_inline(
-   xen_guest_handle(setup.frame_list));
+   frame_list = xencomm_map(
+   xen_guest_handle(setup.frame_list),
+   (sizeof(ulong) * setup.nr_frames));

set_xen_guest_handle(setup.frame_list, frame_list);
memcpy(op, &setup, sizeof(setup));
@@ -286,7 +287,7 @@ int HYPERVISOR_grant_table_op(unsigned i
return -ENOSYS;
}


I believe this is an in_interrupt() case.


-   desc = xencomm_create_inline(op);
+   desc = xencomm_map(op, argsize);

ret = plpar_hcall_norets(XEN_MARK(__HYPERVISOR_grant_table_op), cmd,
 desc, count);
diff -r e5962db17966 arch/powerpc/platforms/xen/hcall.c
--- a/arch/powerpc/platforms/xen/hcall.c	Wed Jan 17 10:25:30 2007  
-0600
+++ b/arch/powerpc/platforms/xen/hcall.c	Wed Jan 17 00:17:17 2007  
-0600

@@ -50,11 +50,16 @@
  * In general, we need a xencomm descriptor to cover the top-level  
data

  * structure (e.g. the domctl op), plus another for every embedded
pointer to
  * another data structure (i.e. for every GUEST_HANDLE).
+ *
+ * Some hypercalls are made before the memory subsystem is up, so
instead of
+ * calling xencomm_create(), we allocate XENCOMM_MINI_AREA bytes from
the stack
+ * to hold the xencomm descriptor.
  */

 int HYPERVISOR_console_io(int cmd, int count, char *str)
 {
-   void *desc = xencomm_create_inline(str);
+
+   void *desc = xencomm_map_early(str, count);

Early and interrupt



return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_console_io),
  cmd, count, desc);
@@ -63,7 +68,8 @@ EXPORT_SYMBOL(HYPERVISOR_console_io);

 int HYPERVISOR_event_channel_op(int cmd, void *op)
 {
-   void *desc = xencomm_create_inline(op);
+
+   void *desc = xencomm_map_early(op, sizeof(evtchn_op_t));


interrupt



return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_event_channel_op),
cmd, desc);
@@ -120,8 +126,8 @@ EXPORT_SYMBOL(HYPERVISOR_xen_version);

 int HYPERVISOR_xen_version(int cmd, void *arg)
 {
-   if (is_kernel_addr((unsigned long)arg)) {
-   void *desc = xencomm_create_inline(arg);
+   if (is_phys_contiguous((unsigned long)arg)) {
+   void *desc = xencomm_map_early(arg, sizeof(__u64));
return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_xen_version), 
cmd,
desc);
}
return HYPERVISOR_xen_version_userspace(cmd, arg);
@@ -12

Re: [XenPPC] [PATCH] [RFC] Xencomm patch to fix modules

2007-01-18 Thread Hollis Blanchard
On Wed, 2007-01-17 at 21:20 -0600, Jerone Young wrote:
> I haven't had a chance to fully test this patch. As I have been having
> problems with my blade today. But here is the code for review. I'll get
> back to the list once it is fully tested.

This looks pretty good; just a couple issues left... :)

Rename HYPERVISOR_xen_version_userspace() to HYPERVISOR_xen_version(),
delete the old HYPERVISOR_xen_version(), and replace the call to
xencomm_create() with xencomm_map().

In fact, now that we have xencomm_map() and xencomm_map_early(),
xencomm_create() and xencomm_create_mini() should become static (and
don't export them in xencomm.h).

Don't forget to update the copyright statement on all these files.

diff -r e5962db17966 drivers/xen/core/xencomm.c
--- a/drivers/xen/core/xencomm.cWed Jan 17 10:25:30 2007 -0600
+++ b/drivers/xen/core/xencomm.cWed Jan 17 01:42:11 2007 -0600
@@ -123,9 +123,87 @@ void *xencomm_create_inline(void *ptr)
 {
unsigned long paddr;

-   BUG_ON(!is_kernel_addr((unsigned long)ptr));
-
paddr = (unsigned long)xencomm_pa(ptr);
BUG_ON(paddr & XENCOMM_INLINE_FLAG);
return (void *)(paddr | XENCOMM_INLINE_FLAG);
 }
We should change this to BUG_ON(!is_phys_contig((unsigned long)ptr)) to
prevent accidents.

As soon as these issues are solved and you've tested the patch with VIO
modules, I'll check it in.

-- 
Hollis Blanchard
IBM Linux Technology Center


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


[XenPPC] [PATCH] [RFC] Xencomm patch to fix modules

2007-01-17 Thread Jerone Young
I haven't had a chance to fully test this patch. As I have been having
problems with my blade today. But here is the code for review. I'll get
back to the list once it is fully tested.

Signed-off-by: Jerone Young <[EMAIL PROTECTED]>

diff -r e5962db17966 arch/powerpc/platforms/xen/gnttab.c
--- a/arch/powerpc/platforms/xen/gnttab.c   Wed Jan 17 10:25:30 2007 -0600
+++ b/arch/powerpc/platforms/xen/gnttab.c   Tue Jan 16 23:50:28 2007 -0600
@@ -263,8 +263,9 @@ int HYPERVISOR_grant_table_op(unsigned i
memcpy(&setup, op, sizeof(setup));
argsize = sizeof(setup);
 
-   frame_list = xencomm_create_inline(
-   xen_guest_handle(setup.frame_list));
+   frame_list = xencomm_map(
+   xen_guest_handle(setup.frame_list),
+   (sizeof(ulong) * setup.nr_frames));
 
set_xen_guest_handle(setup.frame_list, frame_list);
memcpy(op, &setup, sizeof(setup));
@@ -286,7 +287,7 @@ int HYPERVISOR_grant_table_op(unsigned i
return -ENOSYS;
}
 
-   desc = xencomm_create_inline(op);
+   desc = xencomm_map(op, argsize);
 
ret = plpar_hcall_norets(XEN_MARK(__HYPERVISOR_grant_table_op), cmd,
 desc, count);
diff -r e5962db17966 arch/powerpc/platforms/xen/hcall.c
--- a/arch/powerpc/platforms/xen/hcall.cWed Jan 17 10:25:30 2007 -0600
+++ b/arch/powerpc/platforms/xen/hcall.cWed Jan 17 00:17:17 2007 -0600
@@ -50,11 +50,16 @@
  * In general, we need a xencomm descriptor to cover the top-level data
  * structure (e.g. the domctl op), plus another for every embedded
pointer to
  * another data structure (i.e. for every GUEST_HANDLE).
+ *
+ * Some hypercalls are made before the memory subsystem is up, so
instead of
+ * calling xencomm_create(), we allocate XENCOMM_MINI_AREA bytes from
the stack
+ * to hold the xencomm descriptor.
  */
 
 int HYPERVISOR_console_io(int cmd, int count, char *str)
 {
-   void *desc = xencomm_create_inline(str);
+
+   void *desc = xencomm_map_early(str, count);
 
return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_console_io),
  cmd, count, desc);
@@ -63,7 +68,8 @@ EXPORT_SYMBOL(HYPERVISOR_console_io);
 
 int HYPERVISOR_event_channel_op(int cmd, void *op)
 {
-   void *desc = xencomm_create_inline(op);
+
+   void *desc = xencomm_map_early(op, sizeof(evtchn_op_t));
 
return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_event_channel_op),
cmd, desc);
@@ -120,8 +126,8 @@ EXPORT_SYMBOL(HYPERVISOR_xen_version);
 
 int HYPERVISOR_xen_version(int cmd, void *arg)
 {
-   if (is_kernel_addr((unsigned long)arg)) {
-   void *desc = xencomm_create_inline(arg);
+   if (is_phys_contiguous((unsigned long)arg)) {
+   void *desc = xencomm_map_early(arg, sizeof(__u64));
return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_xen_version), 
cmd,
desc);
}
return HYPERVISOR_xen_version_userspace(cmd, arg);
@@ -129,7 +135,7 @@ int HYPERVISOR_xen_version(int cmd, void
 
 int HYPERVISOR_physdev_op(int cmd, void *op)
 {
-   void *desc = xencomm_create_inline(op);
+   void *desc = xencomm_map(op, sizeof(physdev_op_t));
 
return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_physdev_op),
cmd, desc);
@@ -139,6 +145,7 @@ int HYPERVISOR_sched_op(int cmd, void *a
 int HYPERVISOR_sched_op(int cmd, void *arg)
 {
struct xencomm_desc *desc;
+   int argsize = 0;
 
switch (cmd) {
case SCHEDOP_yield:
@@ -151,24 +158,29 @@ int HYPERVISOR_sched_op(int cmd, void *a
evtchn_port_t *ports;
struct sched_poll sched_poll;
 
+   argsize = sizeof(struct sched_poll);
+
memcpy(&sched_poll, arg, sizeof(sched_poll));
 
-   ports = xencomm_create_inline(
-   xen_guest_handle(sched_poll.ports));
+   ports = xencomm_map(
+   xen_guest_handle(sched_poll.ports),
+   (sizeof(evtchn_port_t) * sched_poll.nr_ports));
set_xen_guest_handle(sched_poll.ports, ports);
memcpy(arg, &sched_poll, sizeof(sched_poll));
-   
+
}
break;
case SCHEDOP_shutdown:
+   argsize = sizeof(struct sched_shutdown);
case SCHEDOP_remote_shutdown:
+   argsize = sizeof(struct sched_remote_shutdown);
break;
default:
printk(KERN_ERR "%s: unknown sched op %d\n", __func__, cmd);
return -ENOSYS;
}
 
-   desc = xencomm_create_inline(arg);
+   desc = xencomm_map(arg, argsize);
 
return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_sched_op),
cmd, desc);
diff -r e5962db17966 drivers/xen/core/xencomm.c
--- a/drivers/xen/co