Re: [Xen-devel] [PATCH 1/2] xen: vnuma support for PV guests running as domU.

2013-11-14 Thread Elena Ufimtseva
On Thu, Nov 14, 2013 at 6:48 AM, Dario Faggioli
 wrote:
> On gio, 2013-11-14 at 11:21 +, David Vrabel wrote:
>> On 14/11/13 07:26, Dario Faggioli wrote:
>> > IIRC, it's more something that was already happening (the breakage, I
>> > mean), than a "safety net" for the unforeseeable future. Might be worth
>> > giving some context about it, perhaps referencing the email thread or
>> > the git commit hash in the comment.
>>
>> Yes, a comment like:
>>
>> /*
>>  * Set a dummy node and return success.  This prevents calling any
>>  * hardware-specific initializers which do not work in a PV guest.
>>  */
>>
>> is better.  No need to refer to any specific threads.  It's pretty clear
>> that any hardware-specific init isn't appropriate for a PV guest.
>>
> Ok.
>
>> >> +  if (rc != 0) {
>> >> +  for (i = 0; i < MAX_LOCAL_APIC; i++)
>> >> +  set_apicid_to_node(i, NUMA_NO_NODE);
>> >> +  nodes_clear(numa_nodes_parsed);
>> >> +  nodes_clear(node_possible_map);
>> >> +  nodes_clear(node_online_map);
>> >> +  node_set(0, numa_nodes_parsed);
>> >> +  numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
>> >> +  }
>> >> +  return 0;
>> >>
>> > Ok, so, we always return 'success', as we were saying during last round.
>> > However, we do not call dummy_numa_init() directly, and instead we do
>> > all these stuff, with the last two statements being exactly what
>> > dummy_numa_init() does. Reason is linking, i.e., the fact that
>> > dummy_numa_init() is not exported and you can't reach it from here,
>> > right?

Ah, my bad, I left these comments and they dont make sense :)
>>
>> I think this bit is fine as-is.
>>
> Ok, cool. :-) Shouldn't we then kill or reformulate the comments where
> dummy_numa_init is explicitly referenced then?
>
> E.g., this one: /* will pass to dummy_numa_init */
>
> It might be me, but I find it rather confusing. After seeing that, I'd
> expect to see that, at some point, either the function returns failure
> (which of course we don't want), or a direct call dummy_numa_init().
>
> Dario
>
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)
>



-- 
Elena
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH 1/2] xen: vnuma support for PV guests running as domU.

2013-11-14 Thread Stefano Stabellini
On Wed, 13 Nov 2013, Elena Ufimtseva wrote:
> Issues Xen hypercall subop XENMEM_get_vnumainfo and sets the
> NUMA topology, otherwise sets dummy NUMA node and prevents
> numa_init from calling other numa initializators as they may
> break other guests.
> 
> Signed-off-by: Elena Ufimtseva 
> ---
>  arch/x86/include/asm/xen/vnuma.h |   12 
>  arch/x86/mm/numa.c   |5 ++
>  arch/x86/xen/Makefile|2 +-
>  arch/x86/xen/vnuma.c |  119 
> ++
>  include/xen/interface/memory.h   |   28 +
>  5 files changed, 165 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/include/asm/xen/vnuma.h
>  create mode 100644 arch/x86/xen/vnuma.c
> 
> diff --git a/arch/x86/include/asm/xen/vnuma.h 
> b/arch/x86/include/asm/xen/vnuma.h
> new file mode 100644
> index 000..1ba1e06
> --- /dev/null
> +++ b/arch/x86/include/asm/xen/vnuma.h
> @@ -0,0 +1,12 @@
> +#ifndef _ASM_X86_VNUMA_H
> +#define _ASM_X86_VNUMA_H
> +
> +#ifdef CONFIG_XEN
> +int xen_vnuma_supported(void);
> +int xen_numa_init(void);
> +#else
> +int xen_vnuma_supported(void) { return 0; };
> +int xen_numa_init(void) { return -1; };

static inline?

> +#endif
>
> +#endif /* _ASM_X86_VNUMA_H */
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index 8bf93ba..c8a61dc 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -19,6 +19,7 @@
>  #include 
>  
>  #include "numa_internal.h"
> +#include "asm/xen/vnuma.h"
>  
>  int __initdata numa_off;
>  nodemask_t numa_nodes_parsed __initdata;
> @@ -621,6 +622,10 @@ static int __init dummy_numa_init(void)
>  void __init x86_numa_init(void)
>  {
>   if (!numa_off) {
> +#ifdef CONFIG_XEN
> + if (xen_vnuma_supported() && !numa_init(xen_numa_init))
> + return;
> +#endif

Given the non-Xen function definitions above, you can remove the ifdef
CONFIG_XEN here.


>  #ifdef CONFIG_X86_NUMAQ
>   if (!numa_init(numaq_numa_init))
>   return;
> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
> index 96ab2c0..de9deab 100644
> --- a/arch/x86/xen/Makefile
> +++ b/arch/x86/xen/Makefile
> @@ -13,7 +13,7 @@ CFLAGS_mmu.o:= $(nostackp)
>  obj-y:= enlighten.o setup.o multicalls.o mmu.o irq.o \
>   time.o xen-asm.o xen-asm_$(BITS).o \
>   grant-table.o suspend.o platform-pci-unplug.o \
> - p2m.o
> + p2m.o vnuma.o
>  
>  obj-$(CONFIG_EVENT_TRACING) += trace.o
>  
> diff --git a/arch/x86/xen/vnuma.c b/arch/x86/xen/vnuma.c
> new file mode 100644
> index 000..b4fc667
> --- /dev/null
> +++ b/arch/x86/xen/vnuma.c
> @@ -0,0 +1,119 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#ifdef CONFIG_NUMA
> +
> +/* Checks if hypercall is suported */
   ^ supported
> +int xen_vnuma_supported()
> +{
> + return HYPERVISOR_memory_op(XENMEM_get_vnuma_info, NULL) == -ENOSYS ? 0 
> : 1;
> +}
> +
> +int __init xen_numa_init(void)
> +{
> + int rc;
> + unsigned int i, j, nr_nodes, cpu, idx, pcpus;
> + u64 physm, physd, physc;
> + unsigned int *vdistance, *cpu_to_node;
> + unsigned long mem_size, dist_size, cpu_to_node_size;

physm, physd and physc need to be initialized to 0, otherwise the
vnumaout error path below is erroneous

> + struct vmemrange *vblock;
> +
> + struct vnuma_topology_info numa_topo = {
> + .domid = DOMID_SELF,
> + .__pad = 0
> + };
> + rc = -EINVAL;
> +
> + /* For now only PV guests are supported */
> + if (!xen_pv_domain())
> + return rc;
> +
> + pcpus = num_possible_cpus();
> +
> + mem_size =  pcpus * sizeof(struct vmemrange);
> + dist_size = pcpus * pcpus * sizeof(*numa_topo.vdistance);
> + cpu_to_node_size = pcpus * sizeof(*numa_topo.cpu_to_node);
> +
> + physm = memblock_alloc(mem_size, PAGE_SIZE);
> + vblock = __va(physm);
> +
> + physd = memblock_alloc(dist_size, PAGE_SIZE);
> + vdistance  = __va(physd);
> +
> + physc = memblock_alloc(cpu_to_node_size, PAGE_SIZE);
> + cpu_to_node  = __va(physc);
> +
> + if (!physm || !physc || !physd)
> + goto vnumaout;
> +
> + set_xen_guest_handle(numa_topo.nr_nodes, _nodes);
> + set_xen_guest_handle(numa_topo.vmemblks, vblock);
> + set_xen_guest_handle(numa_topo.vdistance, vdistance);
> + set_xen_guest_handle(numa_topo.cpu_to_node, cpu_to_node);
> +
> + rc = HYPERVISOR_memory_op(XENMEM_get_vnuma_info, _topo);
> +
> + if (rc < 0)
> + goto vnumaout;
> + if (*numa_topo.nr_nodes == 0) {
> + /* will pass to dummy_numa_init */
> + goto vnumaout;
> + }
> + if (*numa_topo.nr_nodes > num_possible_cpus()) {
> + pr_debug("vNUMA: Node without cpu is not supported in this 
> version.\n");
> + goto 

Re: [Xen-devel] [PATCH 1/2] xen: vnuma support for PV guests running as domU.

2013-11-14 Thread Dario Faggioli
On gio, 2013-11-14 at 11:21 +, David Vrabel wrote:
> On 14/11/13 07:26, Dario Faggioli wrote:
> > IIRC, it's more something that was already happening (the breakage, I
> > mean), than a "safety net" for the unforeseeable future. Might be worth
> > giving some context about it, perhaps referencing the email thread or
> > the git commit hash in the comment.
> 
> Yes, a comment like:
> 
> /*
>  * Set a dummy node and return success.  This prevents calling any
>  * hardware-specific initializers which do not work in a PV guest.
>  */
> 
> is better.  No need to refer to any specific threads.  It's pretty clear
> that any hardware-specific init isn't appropriate for a PV guest.
> 
Ok.

> >> +  if (rc != 0) {
> >> +  for (i = 0; i < MAX_LOCAL_APIC; i++)
> >> +  set_apicid_to_node(i, NUMA_NO_NODE);
> >> +  nodes_clear(numa_nodes_parsed);
> >> +  nodes_clear(node_possible_map);
> >> +  nodes_clear(node_online_map);
> >> +  node_set(0, numa_nodes_parsed);
> >> +  numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
> >> +  }
> >> +  return 0;
> >>
> > Ok, so, we always return 'success', as we were saying during last round.
> > However, we do not call dummy_numa_init() directly, and instead we do
> > all these stuff, with the last two statements being exactly what
> > dummy_numa_init() does. Reason is linking, i.e., the fact that
> > dummy_numa_init() is not exported and you can't reach it from here,
> > right?
> 
> I think this bit is fine as-is.
> 
Ok, cool. :-) Shouldn't we then kill or reformulate the comments where
dummy_numa_init is explicitly referenced then?

E.g., this one: /* will pass to dummy_numa_init */

It might be me, but I find it rather confusing. After seeing that, I'd
expect to see that, at some point, either the function returns failure
(which of course we don't want), or a direct call dummy_numa_init().

Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part


Re: [Xen-devel] [PATCH 1/2] xen: vnuma support for PV guests running as domU.

2013-11-14 Thread David Vrabel
On 14/11/13 07:26, Dario Faggioli wrote:
> On mer, 2013-11-13 at 22:36 -0500, Elena Ufimtseva wrote:
>> +/*
>> + * Set the "dummy" node and exit without error so Linux
>> + * will not try any NUMA init functions which might break
>> + * guests in the future. This will discard all previous
>> + * settings.
>> + */
> IIRC, it's more something that was already happening (the breakage, I
> mean), than a "safety net" for the unforeseeable future. Might be worth
> giving some context about it, perhaps referencing the email thread or
> the git commit hash in the comment.

Yes, a comment like:

/*
 * Set a dummy node and return success.  This prevents calling any
 * hardware-specific initializers which do not work in a PV guest.
 */

is better.  No need to refer to any specific threads.  It's pretty clear
that any hardware-specific init isn't appropriate for a PV guest.

>> +if (rc != 0) {
>> +for (i = 0; i < MAX_LOCAL_APIC; i++)
>> +set_apicid_to_node(i, NUMA_NO_NODE);
>> +nodes_clear(numa_nodes_parsed);
>> +nodes_clear(node_possible_map);
>> +nodes_clear(node_online_map);
>> +node_set(0, numa_nodes_parsed);
>> +numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
>> +}
>> +return 0;
>>
> Ok, so, we always return 'success', as we were saying during last round.
> However, we do not call dummy_numa_init() directly, and instead we do
> all these stuff, with the last two statements being exactly what
> dummy_numa_init() does. Reason is linking, i.e., the fact that
> dummy_numa_init() is not exported and you can't reach it from here,
> right?

I think this bit is fine as-is.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH 1/2] xen: vnuma support for PV guests running as domU.

2013-11-14 Thread David Vrabel
On 14/11/13 07:26, Dario Faggioli wrote:
 On mer, 2013-11-13 at 22:36 -0500, Elena Ufimtseva wrote:
 +/*
 + * Set the dummy node and exit without error so Linux
 + * will not try any NUMA init functions which might break
 + * guests in the future. This will discard all previous
 + * settings.
 + */
 IIRC, it's more something that was already happening (the breakage, I
 mean), than a safety net for the unforeseeable future. Might be worth
 giving some context about it, perhaps referencing the email thread or
 the git commit hash in the comment.

Yes, a comment like:

/*
 * Set a dummy node and return success.  This prevents calling any
 * hardware-specific initializers which do not work in a PV guest.
 */

is better.  No need to refer to any specific threads.  It's pretty clear
that any hardware-specific init isn't appropriate for a PV guest.

 +if (rc != 0) {
 +for (i = 0; i  MAX_LOCAL_APIC; i++)
 +set_apicid_to_node(i, NUMA_NO_NODE);
 +nodes_clear(numa_nodes_parsed);
 +nodes_clear(node_possible_map);
 +nodes_clear(node_online_map);
 +node_set(0, numa_nodes_parsed);
 +numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
 +}
 +return 0;

 Ok, so, we always return 'success', as we were saying during last round.
 However, we do not call dummy_numa_init() directly, and instead we do
 all these stuff, with the last two statements being exactly what
 dummy_numa_init() does. Reason is linking, i.e., the fact that
 dummy_numa_init() is not exported and you can't reach it from here,
 right?

I think this bit is fine as-is.

David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH 1/2] xen: vnuma support for PV guests running as domU.

2013-11-14 Thread Dario Faggioli
On gio, 2013-11-14 at 11:21 +, David Vrabel wrote:
 On 14/11/13 07:26, Dario Faggioli wrote:
  IIRC, it's more something that was already happening (the breakage, I
  mean), than a safety net for the unforeseeable future. Might be worth
  giving some context about it, perhaps referencing the email thread or
  the git commit hash in the comment.
 
 Yes, a comment like:
 
 /*
  * Set a dummy node and return success.  This prevents calling any
  * hardware-specific initializers which do not work in a PV guest.
  */
 
 is better.  No need to refer to any specific threads.  It's pretty clear
 that any hardware-specific init isn't appropriate for a PV guest.
 
Ok.

  +  if (rc != 0) {
  +  for (i = 0; i  MAX_LOCAL_APIC; i++)
  +  set_apicid_to_node(i, NUMA_NO_NODE);
  +  nodes_clear(numa_nodes_parsed);
  +  nodes_clear(node_possible_map);
  +  nodes_clear(node_online_map);
  +  node_set(0, numa_nodes_parsed);
  +  numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
  +  }
  +  return 0;
 
  Ok, so, we always return 'success', as we were saying during last round.
  However, we do not call dummy_numa_init() directly, and instead we do
  all these stuff, with the last two statements being exactly what
  dummy_numa_init() does. Reason is linking, i.e., the fact that
  dummy_numa_init() is not exported and you can't reach it from here,
  right?
 
 I think this bit is fine as-is.
 
Ok, cool. :-) Shouldn't we then kill or reformulate the comments where
dummy_numa_init is explicitly referenced then?

E.g., this one: /* will pass to dummy_numa_init */

It might be me, but I find it rather confusing. After seeing that, I'd
expect to see that, at some point, either the function returns failure
(which of course we don't want), or a direct call dummy_numa_init().

Dario

-- 
This happens because I choose it to happen! (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part


Re: [Xen-devel] [PATCH 1/2] xen: vnuma support for PV guests running as domU.

2013-11-14 Thread Stefano Stabellini
On Wed, 13 Nov 2013, Elena Ufimtseva wrote:
 Issues Xen hypercall subop XENMEM_get_vnumainfo and sets the
 NUMA topology, otherwise sets dummy NUMA node and prevents
 numa_init from calling other numa initializators as they may
 break other guests.
 
 Signed-off-by: Elena Ufimtseva ufimts...@gmail.com
 ---
  arch/x86/include/asm/xen/vnuma.h |   12 
  arch/x86/mm/numa.c   |5 ++
  arch/x86/xen/Makefile|2 +-
  arch/x86/xen/vnuma.c |  119 
 ++
  include/xen/interface/memory.h   |   28 +
  5 files changed, 165 insertions(+), 1 deletion(-)
  create mode 100644 arch/x86/include/asm/xen/vnuma.h
  create mode 100644 arch/x86/xen/vnuma.c
 
 diff --git a/arch/x86/include/asm/xen/vnuma.h 
 b/arch/x86/include/asm/xen/vnuma.h
 new file mode 100644
 index 000..1ba1e06
 --- /dev/null
 +++ b/arch/x86/include/asm/xen/vnuma.h
 @@ -0,0 +1,12 @@
 +#ifndef _ASM_X86_VNUMA_H
 +#define _ASM_X86_VNUMA_H
 +
 +#ifdef CONFIG_XEN
 +int xen_vnuma_supported(void);
 +int xen_numa_init(void);
 +#else
 +int xen_vnuma_supported(void) { return 0; };
 +int xen_numa_init(void) { return -1; };

static inline?

 +#endif

 +#endif /* _ASM_X86_VNUMA_H */
 diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
 index 8bf93ba..c8a61dc 100644
 --- a/arch/x86/mm/numa.c
 +++ b/arch/x86/mm/numa.c
 @@ -19,6 +19,7 @@
  #include asm/amd_nb.h
  
  #include numa_internal.h
 +#include asm/xen/vnuma.h
  
  int __initdata numa_off;
  nodemask_t numa_nodes_parsed __initdata;
 @@ -621,6 +622,10 @@ static int __init dummy_numa_init(void)
  void __init x86_numa_init(void)
  {
   if (!numa_off) {
 +#ifdef CONFIG_XEN
 + if (xen_vnuma_supported()  !numa_init(xen_numa_init))
 + return;
 +#endif

Given the non-Xen function definitions above, you can remove the ifdef
CONFIG_XEN here.


  #ifdef CONFIG_X86_NUMAQ
   if (!numa_init(numaq_numa_init))
   return;
 diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
 index 96ab2c0..de9deab 100644
 --- a/arch/x86/xen/Makefile
 +++ b/arch/x86/xen/Makefile
 @@ -13,7 +13,7 @@ CFLAGS_mmu.o:= $(nostackp)
  obj-y:= enlighten.o setup.o multicalls.o mmu.o irq.o \
   time.o xen-asm.o xen-asm_$(BITS).o \
   grant-table.o suspend.o platform-pci-unplug.o \
 - p2m.o
 + p2m.o vnuma.o
  
  obj-$(CONFIG_EVENT_TRACING) += trace.o
  
 diff --git a/arch/x86/xen/vnuma.c b/arch/x86/xen/vnuma.c
 new file mode 100644
 index 000..b4fc667
 --- /dev/null
 +++ b/arch/x86/xen/vnuma.c
 @@ -0,0 +1,119 @@
 +#include linux/err.h
 +#include linux/memblock.h
 +#include xen/interface/xen.h
 +#include xen/interface/memory.h
 +#include asm/xen/interface.h
 +#include asm/xen/hypercall.h
 +#include asm/xen/vnuma.h
 +
 +#ifdef CONFIG_NUMA
 +
 +/* Checks if hypercall is suported */
   ^ supported
 +int xen_vnuma_supported()
 +{
 + return HYPERVISOR_memory_op(XENMEM_get_vnuma_info, NULL) == -ENOSYS ? 0 
 : 1;
 +}
 +
 +int __init xen_numa_init(void)
 +{
 + int rc;
 + unsigned int i, j, nr_nodes, cpu, idx, pcpus;
 + u64 physm, physd, physc;
 + unsigned int *vdistance, *cpu_to_node;
 + unsigned long mem_size, dist_size, cpu_to_node_size;

physm, physd and physc need to be initialized to 0, otherwise the
vnumaout error path below is erroneous

 + struct vmemrange *vblock;
 +
 + struct vnuma_topology_info numa_topo = {
 + .domid = DOMID_SELF,
 + .__pad = 0
 + };
 + rc = -EINVAL;
 +
 + /* For now only PV guests are supported */
 + if (!xen_pv_domain())
 + return rc;
 +
 + pcpus = num_possible_cpus();
 +
 + mem_size =  pcpus * sizeof(struct vmemrange);
 + dist_size = pcpus * pcpus * sizeof(*numa_topo.vdistance);
 + cpu_to_node_size = pcpus * sizeof(*numa_topo.cpu_to_node);
 +
 + physm = memblock_alloc(mem_size, PAGE_SIZE);
 + vblock = __va(physm);
 +
 + physd = memblock_alloc(dist_size, PAGE_SIZE);
 + vdistance  = __va(physd);
 +
 + physc = memblock_alloc(cpu_to_node_size, PAGE_SIZE);
 + cpu_to_node  = __va(physc);
 +
 + if (!physm || !physc || !physd)
 + goto vnumaout;
 +
 + set_xen_guest_handle(numa_topo.nr_nodes, nr_nodes);
 + set_xen_guest_handle(numa_topo.vmemblks, vblock);
 + set_xen_guest_handle(numa_topo.vdistance, vdistance);
 + set_xen_guest_handle(numa_topo.cpu_to_node, cpu_to_node);
 +
 + rc = HYPERVISOR_memory_op(XENMEM_get_vnuma_info, numa_topo);
 +
 + if (rc  0)
 + goto vnumaout;
 + if (*numa_topo.nr_nodes == 0) {
 + /* will pass to dummy_numa_init */
 + goto vnumaout;
 + }
 + if (*numa_topo.nr_nodes  num_possible_cpus()) {
 + pr_debug(vNUMA: Node without cpu is not supported in this 
 version.\n);
 + goto vnumaout;

Re: [Xen-devel] [PATCH 1/2] xen: vnuma support for PV guests running as domU.

2013-11-14 Thread Elena Ufimtseva
On Thu, Nov 14, 2013 at 6:48 AM, Dario Faggioli
dario.faggi...@citrix.com wrote:
 On gio, 2013-11-14 at 11:21 +, David Vrabel wrote:
 On 14/11/13 07:26, Dario Faggioli wrote:
  IIRC, it's more something that was already happening (the breakage, I
  mean), than a safety net for the unforeseeable future. Might be worth
  giving some context about it, perhaps referencing the email thread or
  the git commit hash in the comment.

 Yes, a comment like:

 /*
  * Set a dummy node and return success.  This prevents calling any
  * hardware-specific initializers which do not work in a PV guest.
  */

 is better.  No need to refer to any specific threads.  It's pretty clear
 that any hardware-specific init isn't appropriate for a PV guest.

 Ok.

  +  if (rc != 0) {
  +  for (i = 0; i  MAX_LOCAL_APIC; i++)
  +  set_apicid_to_node(i, NUMA_NO_NODE);
  +  nodes_clear(numa_nodes_parsed);
  +  nodes_clear(node_possible_map);
  +  nodes_clear(node_online_map);
  +  node_set(0, numa_nodes_parsed);
  +  numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
  +  }
  +  return 0;
 
  Ok, so, we always return 'success', as we were saying during last round.
  However, we do not call dummy_numa_init() directly, and instead we do
  all these stuff, with the last two statements being exactly what
  dummy_numa_init() does. Reason is linking, i.e., the fact that
  dummy_numa_init() is not exported and you can't reach it from here,
  right?

Ah, my bad, I left these comments and they dont make sense :)

 I think this bit is fine as-is.

 Ok, cool. :-) Shouldn't we then kill or reformulate the comments where
 dummy_numa_init is explicitly referenced then?

 E.g., this one: /* will pass to dummy_numa_init */

 It might be me, but I find it rather confusing. After seeing that, I'd
 expect to see that, at some point, either the function returns failure
 (which of course we don't want), or a direct call dummy_numa_init().

 Dario

 --
 This happens because I choose it to happen! (Raistlin Majere)
 -
 Dario Faggioli, Ph.D, http://about.me/dario.faggioli
 Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)




-- 
Elena
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH 1/2] xen: vnuma support for PV guests running as domU.

2013-11-13 Thread Dario Faggioli
On mer, 2013-11-13 at 22:36 -0500, Elena Ufimtseva wrote:
> Signed-off-by: Elena Ufimtseva 
> ---
> +++ b/arch/x86/include/asm/xen/vnuma.h
> @@ -0,0 +1,12 @@
> +#ifndef _ASM_X86_VNUMA_H
> +#define _ASM_X86_VNUMA_H
> +
> +#ifdef CONFIG_XEN
> +int xen_vnuma_supported(void);
> +int xen_numa_init(void);
> +#else
> +int xen_vnuma_supported(void) { return 0; };
> +int xen_numa_init(void) { return -1; };
> +#endif
> +
> +#endif /* _ASM_X86_VNUMA_H */
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
>

> @@ -621,6 +622,10 @@ static int __init dummy_numa_init(void)
>  void __init x86_numa_init(void)
>  {
>   if (!numa_off) {
> +#ifdef CONFIG_XEN
> + if (xen_vnuma_supported() && !numa_init(xen_numa_init))
> + return;
> +#endif
>
This looks a bit weird (looking at both the hunks above, I mean). Isn't
it that, if !CONFIG_XEN, xen_vnuma_supported() and xen_numa_init() are
never called, thus there is very few point in defining a specific
behavior for them?

Oh, having looked at the other patch, I appreciate that at least
xen_vnuma_supported() is (possibly) called even when !CONFIG_XEN. Well,
the above still holds for xen_numa_init(), I guess. :-)

> diff --git a/arch/x86/xen/vnuma.c b/arch/x86/xen/vnuma.c

> +int __init xen_numa_init(void)
> +{

> + mem_size =  pcpus * sizeof(struct vmemrange);
> + dist_size = pcpus * pcpus * sizeof(*numa_topo.vdistance);
> + cpu_to_node_size = pcpus * sizeof(*numa_topo.cpu_to_node);
> +
> + physm = memblock_alloc(mem_size, PAGE_SIZE);
> + vblock = __va(physm);
> +
> + physd = memblock_alloc(dist_size, PAGE_SIZE);
> + vdistance  = __va(physd);
> +
> + physc = memblock_alloc(cpu_to_node_size, PAGE_SIZE);
> + cpu_to_node  = __va(physc);
> +
> + if (!physm || !physc || !physd)
> + goto vnumaout;
> +
It's probably not a big deal, but I think names like 'out', 'err',
'fail', etc. are just fine for labels... No need for having that sort of
func name prefix.

> + set_xen_guest_handle(numa_topo.nr_nodes, _nodes);
> + set_xen_guest_handle(numa_topo.vmemblks, vblock);
> + set_xen_guest_handle(numa_topo.vdistance, vdistance);
> + set_xen_guest_handle(numa_topo.cpu_to_node, cpu_to_node);
> +
> + rc = HYPERVISOR_memory_op(XENMEM_get_vnuma_info, _topo);
> +
> + if (rc < 0)
> + goto vnumaout;
> + if (*numa_topo.nr_nodes == 0) {
> + /* will pass to dummy_numa_init */
> + goto vnumaout;
> + }
> + if (*numa_topo.nr_nodes > num_possible_cpus()) {
> + pr_debug("vNUMA: Node without cpu is not supported in this 
> version.\n");
> + goto vnumaout;
> + }
Blank line here?

> + /*
> +  * NUMA nodes memory ranges are in pfns, constructed and
> +  * aligned based on e820 ram domain map.
> +  */
> + for (i = 0; i < *numa_topo.nr_nodes; i++) {
> + if (numa_add_memblk(i, vblock[i].start, vblock[i].end))
> + /* pass to numa_dummy_init */
> + goto vnumaout;
> + node_set(i, numa_nodes_parsed);
> + }
And here...
> + setup_nr_node_ids();
...And perhaps even here (but it's less important, I think)...
> + /* Setting the cpu, apicid to node */
> + for_each_cpu(cpu, cpu_possible_mask) {
> + set_apicid_to_node(cpu, cpu_to_node[cpu]);
> + numa_set_node(cpu, cpu_to_node[cpu]);
> + cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node[cpu]]);
> + }
...And here...
> + for (i = 0; i < *numa_topo.nr_nodes; i++) {
> + for (j = 0; j < *numa_topo.nr_nodes; j++) {
> + idx = (j * *numa_topo.nr_nodes) + i;
> + numa_set_distance(i, j, *(vdistance + idx));
> + }
> + }
...And probably here too...
> + rc = 0;
> +vnumaout:
> + if (physm)
> + memblock_free(__pa(physm), mem_size);
> + if (physd)
> + memblock_free(__pa(physd), dist_size);
> + if (physc)
> + memblock_free(__pa(physc), cpu_to_node_size);
...And here.
> + /*
> +  * Set the "dummy" node and exit without error so Linux
> +  * will not try any NUMA init functions which might break
> +  * guests in the future. This will discard all previous
> +  * settings.
> +  */
IIRC, it's more something that was already happening (the breakage, I
mean), than a "safety net" for the unforeseeable future. Might be worth
giving some context about it, perhaps referencing the email thread or
the git commit hash in the comment.

> + if (rc != 0) {
> + for (i = 0; i < MAX_LOCAL_APIC; i++)
> + set_apicid_to_node(i, NUMA_NO_NODE);
> + nodes_clear(numa_nodes_parsed);
> + nodes_clear(node_possible_map);
> + nodes_clear(node_online_map);
> + node_set(0, numa_nodes_parsed);
> + numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
> + }
> + return 0;

Re: [Xen-devel] [PATCH 1/2] xen: vnuma support for PV guests running as domU.

2013-11-13 Thread Dario Faggioli
On mer, 2013-11-13 at 22:36 -0500, Elena Ufimtseva wrote:
 Signed-off-by: Elena Ufimtseva ufimts...@gmail.com
 ---
 +++ b/arch/x86/include/asm/xen/vnuma.h
 @@ -0,0 +1,12 @@
 +#ifndef _ASM_X86_VNUMA_H
 +#define _ASM_X86_VNUMA_H
 +
 +#ifdef CONFIG_XEN
 +int xen_vnuma_supported(void);
 +int xen_numa_init(void);
 +#else
 +int xen_vnuma_supported(void) { return 0; };
 +int xen_numa_init(void) { return -1; };
 +#endif
 +
 +#endif /* _ASM_X86_VNUMA_H */
 diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c


 @@ -621,6 +622,10 @@ static int __init dummy_numa_init(void)
  void __init x86_numa_init(void)
  {
   if (!numa_off) {
 +#ifdef CONFIG_XEN
 + if (xen_vnuma_supported()  !numa_init(xen_numa_init))
 + return;
 +#endif

This looks a bit weird (looking at both the hunks above, I mean). Isn't
it that, if !CONFIG_XEN, xen_vnuma_supported() and xen_numa_init() are
never called, thus there is very few point in defining a specific
behavior for them?

Oh, having looked at the other patch, I appreciate that at least
xen_vnuma_supported() is (possibly) called even when !CONFIG_XEN. Well,
the above still holds for xen_numa_init(), I guess. :-)

 diff --git a/arch/x86/xen/vnuma.c b/arch/x86/xen/vnuma.c

 +int __init xen_numa_init(void)
 +{

 + mem_size =  pcpus * sizeof(struct vmemrange);
 + dist_size = pcpus * pcpus * sizeof(*numa_topo.vdistance);
 + cpu_to_node_size = pcpus * sizeof(*numa_topo.cpu_to_node);
 +
 + physm = memblock_alloc(mem_size, PAGE_SIZE);
 + vblock = __va(physm);
 +
 + physd = memblock_alloc(dist_size, PAGE_SIZE);
 + vdistance  = __va(physd);
 +
 + physc = memblock_alloc(cpu_to_node_size, PAGE_SIZE);
 + cpu_to_node  = __va(physc);
 +
 + if (!physm || !physc || !physd)
 + goto vnumaout;
 +
It's probably not a big deal, but I think names like 'out', 'err',
'fail', etc. are just fine for labels... No need for having that sort of
func name prefix.

 + set_xen_guest_handle(numa_topo.nr_nodes, nr_nodes);
 + set_xen_guest_handle(numa_topo.vmemblks, vblock);
 + set_xen_guest_handle(numa_topo.vdistance, vdistance);
 + set_xen_guest_handle(numa_topo.cpu_to_node, cpu_to_node);
 +
 + rc = HYPERVISOR_memory_op(XENMEM_get_vnuma_info, numa_topo);
 +
 + if (rc  0)
 + goto vnumaout;
 + if (*numa_topo.nr_nodes == 0) {
 + /* will pass to dummy_numa_init */
 + goto vnumaout;
 + }
 + if (*numa_topo.nr_nodes  num_possible_cpus()) {
 + pr_debug(vNUMA: Node without cpu is not supported in this 
 version.\n);
 + goto vnumaout;
 + }
Blank line here?

 + /*
 +  * NUMA nodes memory ranges are in pfns, constructed and
 +  * aligned based on e820 ram domain map.
 +  */
 + for (i = 0; i  *numa_topo.nr_nodes; i++) {
 + if (numa_add_memblk(i, vblock[i].start, vblock[i].end))
 + /* pass to numa_dummy_init */
 + goto vnumaout;
 + node_set(i, numa_nodes_parsed);
 + }
And here...
 + setup_nr_node_ids();
...And perhaps even here (but it's less important, I think)...
 + /* Setting the cpu, apicid to node */
 + for_each_cpu(cpu, cpu_possible_mask) {
 + set_apicid_to_node(cpu, cpu_to_node[cpu]);
 + numa_set_node(cpu, cpu_to_node[cpu]);
 + cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node[cpu]]);
 + }
...And here...
 + for (i = 0; i  *numa_topo.nr_nodes; i++) {
 + for (j = 0; j  *numa_topo.nr_nodes; j++) {
 + idx = (j * *numa_topo.nr_nodes) + i;
 + numa_set_distance(i, j, *(vdistance + idx));
 + }
 + }
...And probably here too...
 + rc = 0;
 +vnumaout:
 + if (physm)
 + memblock_free(__pa(physm), mem_size);
 + if (physd)
 + memblock_free(__pa(physd), dist_size);
 + if (physc)
 + memblock_free(__pa(physc), cpu_to_node_size);
...And here.
 + /*
 +  * Set the dummy node and exit without error so Linux
 +  * will not try any NUMA init functions which might break
 +  * guests in the future. This will discard all previous
 +  * settings.
 +  */
IIRC, it's more something that was already happening (the breakage, I
mean), than a safety net for the unforeseeable future. Might be worth
giving some context about it, perhaps referencing the email thread or
the git commit hash in the comment.

 + if (rc != 0) {
 + for (i = 0; i  MAX_LOCAL_APIC; i++)
 + set_apicid_to_node(i, NUMA_NO_NODE);
 + nodes_clear(numa_nodes_parsed);
 + nodes_clear(node_possible_map);
 + nodes_clear(node_online_map);
 + node_set(0, numa_nodes_parsed);
 + numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
 + }
 + return 0;

Ok, so, we always return 'success', as we were saying during last round.
However, we do not call