Re: [Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-04 Thread Alexander Graf

On 01.11.2013, at 11:21, Alexey Kardashevskiy a...@ozlabs.ru wrote:

 SLOF gets really confused if RTAS/device-tree and everything else
 what SLOF can use is not in the very first block of the very first
 memory node.
 
 This makes sure that the RMA area is where SLOF expects it to be.
 
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: Nikunj A Dadhania nik...@linux.vnet.ibm.com
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 hw/ppc/spapr.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)
 
 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
 index 09dc635..09a5d94 100644
 --- a/hw/ppc/spapr.c
 +++ b/hw/ppc/spapr.c
 @@ -1113,7 +1113,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
 int i;
 MemoryRegion *sysmem = get_system_memory();
 MemoryRegion *ram = g_new(MemoryRegion, 1);
 -hwaddr rma_alloc_size;
 +hwaddr rma_alloc_size, node0_size;
 uint32_t initrd_base = 0;
 long kernel_size = 0, initrd_size = 0;
 long load_limit, rtas_limit, fw_size;
 @@ -1154,6 +1154,12 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
 spapr-rma_size = MIN(spapr-rma_size, 0x1000);
 }
 }
 +/*
 + * SLOF gets confused if RMA resides not in the first block
 + * of the first memory node so let's fix it.
 + */
 +node0_size = (nb_numa_nodes  1) ? node_mem[0] : ram_size;
 +spapr-rma_size = MIN(spapr-rma_size, node0_size);

So if I create a NUMA node of 4MB that will be my RMA? That sounds pretty 
broken, especially on 970.

Why does SLOF have any issues with NUMA memory nodes? It can just ignore them, 
no?


Alex




Re: [Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-04 Thread Benjamin Herrenschmidt
On Mon, 2013-11-04 at 11:44 +0100, Alexander Graf wrote:
 On 01.11.2013, at 11:21, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
  SLOF gets really confused if RTAS/device-tree and everything else
  what SLOF can use is not in the very first block of the very first
  memory node.
  
  This makes sure that the RMA area is where SLOF expects it to be.
  
  Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
  Cc: Nikunj A Dadhania nik...@linux.vnet.ibm.com
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
  ---
  hw/ppc/spapr.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)
  
  diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
  index 09dc635..09a5d94 100644
  --- a/hw/ppc/spapr.c
  +++ b/hw/ppc/spapr.c
  @@ -1113,7 +1113,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
  int i;
  MemoryRegion *sysmem = get_system_memory();
  MemoryRegion *ram = g_new(MemoryRegion, 1);
  -hwaddr rma_alloc_size;
  +hwaddr rma_alloc_size, node0_size;
  uint32_t initrd_base = 0;
  long kernel_size = 0, initrd_size = 0;
  long load_limit, rtas_limit, fw_size;
  @@ -1154,6 +1154,12 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
  spapr-rma_size = MIN(spapr-rma_size, 0x1000);
  }
  }
  +/*
  + * SLOF gets confused if RMA resides not in the first block
  + * of the first memory node so let's fix it.
  + */
  +node0_size = (nb_numa_nodes  1) ? node_mem[0] : ram_size;
  +spapr-rma_size = MIN(spapr-rma_size, node0_size);
 
 So if I create a NUMA node of 4MB that will be my RMA? That sounds pretty 
 broken, especially on 970.
 
 Why does SLOF have any issues with NUMA memory nodes? It can just ignore 
 them, no?

Because the only way SLOF knows about the RMA is by using the first
reg entry of the first memory node and that's *all* SLOF knows about.

If we start putting things like the DT, SLOF itself, etc... outside of
that region, it will crash.

So we constrain things to the rma that way.

Creating 4M nodes makes no sense anyway

Cheers,
Ben.





Re: [Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-04 Thread Alexander Graf

On 04.11.2013, at 11:55, Benjamin Herrenschmidt b...@kernel.crashing.org 
wrote:

 On Mon, 2013-11-04 at 11:44 +0100, Alexander Graf wrote:
 On 01.11.2013, at 11:21, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
 SLOF gets really confused if RTAS/device-tree and everything else
 what SLOF can use is not in the very first block of the very first
 memory node.
 
 This makes sure that the RMA area is where SLOF expects it to be.
 
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: Nikunj A Dadhania nik...@linux.vnet.ibm.com
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 hw/ppc/spapr.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)
 
 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
 index 09dc635..09a5d94 100644
 --- a/hw/ppc/spapr.c
 +++ b/hw/ppc/spapr.c
 @@ -1113,7 +1113,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
int i;
MemoryRegion *sysmem = get_system_memory();
MemoryRegion *ram = g_new(MemoryRegion, 1);
 -hwaddr rma_alloc_size;
 +hwaddr rma_alloc_size, node0_size;
uint32_t initrd_base = 0;
long kernel_size = 0, initrd_size = 0;
long load_limit, rtas_limit, fw_size;
 @@ -1154,6 +1154,12 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
spapr-rma_size = MIN(spapr-rma_size, 0x1000);
}
}
 +/*
 + * SLOF gets confused if RMA resides not in the first block
 + * of the first memory node so let's fix it.
 + */
 +node0_size = (nb_numa_nodes  1) ? node_mem[0] : ram_size;
 +spapr-rma_size = MIN(spapr-rma_size, node0_size);
 
 So if I create a NUMA node of 4MB that will be my RMA? That sounds pretty 
 broken, especially on 970.
 
 Why does SLOF have any issues with NUMA memory nodes? It can just ignore 
 them, no?
 
 Because the only way SLOF knows about the RMA is by using the first
 reg entry of the first memory node and that's *all* SLOF knows about.
 
 If we start putting things like the DT, SLOF itself, etc... outside of
 that region, it will crash.
 
 So we constrain things to the rma that way.
 
 Creating 4M nodes makes no sense anyway

So why don't we just use the limit VRMA to 256MB code always and error out of 
node0 is smaller? I don't think SLOF can run with less than 256MB anyway.


Alex




Re: [Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-04 Thread Thomas Huth
On Mon, 4 Nov 2013 12:28:12 +0100
Alexander Graf ag...@suse.de wrote:

 
 On 04.11.2013, at 11:55, Benjamin Herrenschmidt b...@kernel.crashing.org 
 wrote:
 
  On Mon, 2013-11-04 at 11:44 +0100, Alexander Graf wrote:
  On 01.11.2013, at 11:21, Alexey Kardashevskiy a...@ozlabs.ru wrote:
  
  SLOF gets really confused if RTAS/device-tree and everything else
  what SLOF can use is not in the very first block of the very first
  memory node.
  
  This makes sure that the RMA area is where SLOF expects it to be.
  
  Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
  Cc: Nikunj A Dadhania nik...@linux.vnet.ibm.com
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
  ---
  hw/ppc/spapr.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)
  
  diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
  index 09dc635..09a5d94 100644
  --- a/hw/ppc/spapr.c
  +++ b/hw/ppc/spapr.c
  @@ -1113,7 +1113,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs 
  *args)
 int i;
 MemoryRegion *sysmem = get_system_memory();
 MemoryRegion *ram = g_new(MemoryRegion, 1);
  -hwaddr rma_alloc_size;
  +hwaddr rma_alloc_size, node0_size;
 uint32_t initrd_base = 0;
 long kernel_size = 0, initrd_size = 0;
 long load_limit, rtas_limit, fw_size;
  @@ -1154,6 +1154,12 @@ static void ppc_spapr_init(QEMUMachineInitArgs 
  *args)
 spapr-rma_size = MIN(spapr-rma_size, 0x1000);
 }
 }
  +/*
  + * SLOF gets confused if RMA resides not in the first block
  + * of the first memory node so let's fix it.
  + */
  +node0_size = (nb_numa_nodes  1) ? node_mem[0] : ram_size;
  +spapr-rma_size = MIN(spapr-rma_size, node0_size);
  So if I create a NUMA node of 4MB that will be my RMA? That sounds pretty 
  broken, especially on 970.
  
  Why does SLOF have any issues with NUMA memory nodes? It can just ignore 
  them, no?
  
  Because the only way SLOF knows about the RMA is by using the first
  reg entry of the first memory node and that's *all* SLOF knows about.
  
  If we start putting things like the DT, SLOF itself, etc... outside of
  that region, it will crash.

Ok, the question is whether this is a bug in SLOF and should be fixed
there or whether the RMA should really be limited to the RAM of the
first node only.

Looking at the function spapr_populate_memory(), it seems there is
already similar code there, so I assume the RMA should really be
limited to that size:

/* memory node(s) */
node0_size = (nb_numa_nodes  1) ? node_mem[0] : ram_size;
if (spapr-rma_size  node0_size) {
spapr-rma_size = node0_size;
}

Maybe this piece of code could just be done earlier instead, before
setting up the fdt_addr and rtas_addr variables, instead of adding the
similar piece of code of this patch?

  So we constrain things to the rma that way.
  
  Creating 4M nodes makes no sense anyway
 
 So why don't we just use the limit VRMA to 256MB code always and error out 
 of node0 is smaller? I don't think SLOF can run with less than 256MB anyway.

It's 128 MB nowadays ... there is even a define called MIN_RMA_SLOF for
this in the code already.

 Thomas




Re: [Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-04 Thread Alexey Kardashevskiy
On 11/04/2013 10:50 PM, Thomas Huth wrote:
 On Mon, 4 Nov 2013 12:28:12 +0100
 Alexander Graf ag...@suse.de wrote:
 

 On 04.11.2013, at 11:55, Benjamin Herrenschmidt b...@kernel.crashing.org 
 wrote:

 On Mon, 2013-11-04 at 11:44 +0100, Alexander Graf wrote:
 On 01.11.2013, at 11:21, Alexey Kardashevskiy a...@ozlabs.ru wrote:

 SLOF gets really confused if RTAS/device-tree and everything else
 what SLOF can use is not in the very first block of the very first
 memory node.

 This makes sure that the RMA area is where SLOF expects it to be.

 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: Nikunj A Dadhania nik...@linux.vnet.ibm.com
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 hw/ppc/spapr.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
 index 09dc635..09a5d94 100644
 --- a/hw/ppc/spapr.c
 +++ b/hw/ppc/spapr.c
 @@ -1113,7 +1113,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs 
 *args)
int i;
MemoryRegion *sysmem = get_system_memory();
MemoryRegion *ram = g_new(MemoryRegion, 1);
 -hwaddr rma_alloc_size;
 +hwaddr rma_alloc_size, node0_size;
uint32_t initrd_base = 0;
long kernel_size = 0, initrd_size = 0;
long load_limit, rtas_limit, fw_size;
 @@ -1154,6 +1154,12 @@ static void ppc_spapr_init(QEMUMachineInitArgs 
 *args)
spapr-rma_size = MIN(spapr-rma_size, 0x1000);
}
}
 +/*
 + * SLOF gets confused if RMA resides not in the first block
 + * of the first memory node so let's fix it.
 + */
 +node0_size = (nb_numa_nodes  1) ? node_mem[0] : ram_size;
 +spapr-rma_size = MIN(spapr-rma_size, node0_size);
 So if I create a NUMA node of 4MB that will be my RMA? That sounds pretty 
 broken, especially on 970.

 Why does SLOF have any issues with NUMA memory nodes? It can just ignore 
 them, no?

 Because the only way SLOF knows about the RMA is by using the first
 reg entry of the first memory node and that's *all* SLOF knows about.

 If we start putting things like the DT, SLOF itself, etc... outside of
 that region, it will crash.
 
 Ok, the question is whether this is a bug in SLOF and should be fixed
 there or whether the RMA should really be limited to the RAM of the
 first node only.


PAPR says in Hypervisor Call Functions:

Logical addresses start at zero. When control is initially passed to the
OS from the platform, the first region is the
single RMA. The first region has logical region identifier of zero. This
first region is specified by the first address -
length pair of the “reg” property of the /memory node of the OF device tree.


Question about english - is the single RMA equal to the only RMA?


 Looking at the function spapr_populate_memory(), it seems there is
 already similar code there, so I assume the RMA should really be
 limited to that size:
 
 /* memory node(s) */
 node0_size = (nb_numa_nodes  1) ? node_mem[0] : ram_size;
 if (spapr-rma_size  node0_size) {
 spapr-rma_size = node0_size;
 }
 
 Maybe this piece of code could just be done earlier instead, before
 setting up the fdt_addr and rtas_addr variables, instead of adding the
 similar piece of code of this patch?
 
 So we constrain things to the rma that way.

 Creating 4M nodes makes no sense anyway

 So why don't we just use the limit VRMA to 256MB code always and error out 
 of node0 is smaller? I don't think SLOF can run with less than 256MB anyway.
 
 It's 128 MB nowadays ... there is even a define called MIN_RMA_SLOF for
 this in the code already.
 
  Thomas
 


-- 
Alexey



Re: [Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-04 Thread Peter Maydell
On 4 November 2013 13:11, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 PAPR says in Hypervisor Call Functions:

 Logical addresses start at zero. When control is initially passed to the
 OS from the platform, the first region is the
 single RMA. The first region has logical region identifier of zero. This
 first region is specified by the first address -
 length pair of the “reg” property of the /memory node of the OF device tree.


 Question about english - is the single RMA equal to the only RMA?

No. the single RMA is weird English and to me implies
that it's a term that's been defined earlier or at least
that there is some surrounding context that would make it
make more sense (eg some contrasting definition of
single RMA vs double RMA).

-- PMM



Re: [Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-04 Thread Alexey Kardashevskiy
On 11/05/2013 12:19 AM, Peter Maydell wrote:
 On 4 November 2013 13:11, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 PAPR says in Hypervisor Call Functions:

 Logical addresses start at zero. When control is initially passed to the
 OS from the platform, the first region is the
 single RMA. The first region has logical region identifier of zero. This
 first region is specified by the first address -
 length pair of the “reg” property of the /memory node of the OF device tree.


 Question about english - is the single RMA equal to the only RMA?
 
 No. the single RMA is weird English and to me implies
 that it's a term that's been defined earlier or at least
 that there is some surrounding context that would make it
 make more sense (eg some contrasting definition of
 single RMA vs double RMA).


Oh. Some more context then:

14.1.1 Real Mode Accesses
When the OS controlling an LPAR runs with address translation turned off
(MSRDR or MSRIR bit(s) =0) (real mode)
the LPAR hardware translates the memory addresses to an LPAR unique area
known as the Real Mode Area (RMA).
When control is initially passed to the OS from the platform, the RMA
starts at the LPAR's logical address 0 and is the
first logical memory block reported in the LPAR’s device tree. In general,
the RMA is a subset of the LPAR's logical
address space. Attempting a non relocated access beyond the bounds of the
RMA results in an storage interrupt
(ISI/DSI depending upon instruction or data reference). The RMA hardware
translation scheme is platform dependent.
The options are given below.


-- 
Alexey



Re: [Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-04 Thread Benjamin Herrenschmidt
On Tue, 2013-11-05 at 00:11 +1100, Alexey Kardashevskiy wrote:
 Question about english - is the single RMA equal to the only RMA?

Yes.

Ben.




[Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-01 Thread Alexey Kardashevskiy
SLOF gets really confused if RTAS/device-tree and everything else
what SLOF can use is not in the very first block of the very first
memory node.

This makes sure that the RMA area is where SLOF expects it to be.

Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: Nikunj A Dadhania nik...@linux.vnet.ibm.com
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 hw/ppc/spapr.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 09dc635..09a5d94 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1113,7 +1113,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
 int i;
 MemoryRegion *sysmem = get_system_memory();
 MemoryRegion *ram = g_new(MemoryRegion, 1);
-hwaddr rma_alloc_size;
+hwaddr rma_alloc_size, node0_size;
 uint32_t initrd_base = 0;
 long kernel_size = 0, initrd_size = 0;
 long load_limit, rtas_limit, fw_size;
@@ -1154,6 +1154,12 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
 spapr-rma_size = MIN(spapr-rma_size, 0x1000);
 }
 }
+/*
+ * SLOF gets confused if RMA resides not in the first block
+ * of the first memory node so let's fix it.
+ */
+node0_size = (nb_numa_nodes  1) ? node_mem[0] : ram_size;
+spapr-rma_size = MIN(spapr-rma_size, node0_size);
 
 /* We place the device tree and RTAS just below either the top of the RMA,
  * or just below 2GB, whichever is lowere, so that it can be
-- 
1.8.4.rc4