from:"Will Schmidt"

Re: RFC: Reducing the number of non volatile GPRs in the ppc64 kernel

2015-08-07 Thread Bill Schmidt


I agree with Segher.  We already know we have opportunities to do a better
job with shrink-wrapping (pushing this kind of useless activity down past
early exits), so having examples of code to look at to improve this would
be useful.

-- Bill

Bill Schmidt, Ph.D.
Linux on Power Toolchain
IBM Linux Technology Center
wschm...@us.ibm.com   (507) 319-6873




From:   Segher Boessenkool seg...@kernel.crashing.org
To: Anton Blanchard an...@samba.org
Cc: linuxppc-dev@lists.ozlabs.org, Michael
Gschwind/Watson/IBM@IBMUS, Alan Modra amo...@gmail.com, Bill
Schmidt/Rochester/IBM@IBMUS, Ulrich Weigand
ulrich.weig...@de.ibm.com, pau...@samba.org
Date:   08/05/2015 06:20 AM
Subject:Re: RFC: Reducing the number of non volatile GPRs in the ppc64
kernel



Hi Anton,

On Wed, Aug 05, 2015 at 02:03:00PM +1000, Anton Blanchard wrote:
 While looking at traces of kernel workloads, I noticed places where gcc
 used a large number of non volatiles. Some of these functions
 did very little work, and we spent most of our time saving the
 non volatiles to the stack and reading them back.

That is something that should be fixed in GCC -- do you have an example
of such a function?

 It made me wonder if we have the right ratio of volatile to non
 volatile GPRs. Since the kernel is completely self contained, we could
 potentially change that ratio.

 Attached is a quick hack to gcc and the kernel to decrease the number
 of non volatile GPRs to 8. I'm not sure if this is a good idea (and if
 the volatile to non volatile ratio is right), but this gives us
 something to play with.

Instead of the GCC hack you can add a bunch of -fcall-used-r14 etc.
options; does that not work for you?


Segher


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/4] powerpc/cputable: advertise DSCR support on P7/P7+

2013-05-20 Thread Will Schmidt

On Fri, 2013-05-03 at 17:48 -0700, Nishanth Aravamudan wrote:
 Signed-off-by: Nishanth Aravamudan n...@us.ibm.com
 
 diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
 index ae9f433..a792157 100644
 --- a/arch/powerpc/kernel/cputable.c
 +++ b/arch/powerpc/kernel/cputable.c
 @@ -98,6 +98,7 @@ extern void __restore_cpu_e6500(void);
PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | \
PPC_FEATURE_TRUE_LE | \
PPC_FEATURE_PSERIES_PERFMON_COMPAT)
 +#define COMMON_USER2_POWER7  (PPC_FEATURE2_DSCR)
  #define COMMON_USER_POWER8   (COMMON_USER_PPC64 | PPC_FEATURE_ARCH_2_06 |\
PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | \
PPC_FEATURE_TRUE_LE | \
 @@ -428,6 +429,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
   .cpu_name   = POWER7 (architected),
   .cpu_features   = CPU_FTRS_POWER7,
   .cpu_user_features  = COMMON_USER_POWER7,
 + .cpu_user_features2 = COMMON_USER2_POWER7,
   .mmu_features   = MMU_FTRS_POWER7,
   .icache_bsize   = 128,
   .dcache_bsize   = 128,
 @@ -458,6 +460,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
   .cpu_name   = POWER7 (raw),
   .cpu_features   = CPU_FTRS_POWER7,
   .cpu_user_features  = COMMON_USER_POWER7,
 + .cpu_user_features2 = COMMON_USER2_POWER7,
   .mmu_features   = MMU_FTRS_POWER7,
   .icache_bsize   = 128,
   .dcache_bsize   = 128,
 @@ -475,6 +478,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
   .cpu_name   = POWER7+ (raw),
   .cpu_features   = CPU_FTRS_POWER7,
   .cpu_user_features  = COMMON_USER_POWER7,
 + .cpu_user_features  = COMMON_USER2_POWER7,

^ Oops here, I think.Please consider applying this on top.
Untested, but seems obvious.

Fix a typo in setting COMMON_USER2_POWER7 bits to .cpu_user_features2
cpu specs table.

Signed-off-by: Will Schmidt will_schm...@vnet.ibm.com


diff --git a/arch/powerpc/kernel/cputable.c
b/arch/powerpc/kernel/cputable.c
index c60bbec..51eecb5 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -482,7 +482,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
.cpu_name   = POWER7+ (raw),
.cpu_features   = CPU_FTRS_POWER7,
.cpu_user_features  = COMMON_USER_POWER7,
-   .cpu_user_features  = COMMON_USER2_POWER7,
+   .cpu_user_features2 = COMMON_USER2_POWER7,
.mmu_features   = MMU_FTRS_POWER7,
.icache_bsize   = 128,
.dcache_bsize   = 128,

















   .mmu_features   = MMU_FTRS_POWER7,
   .icache_bsize   = 128,
   .dcache_bsize   = 128,
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: NAND BBT corruption on MPC83xx

2011-06-20 Thread Atlant Schmidt

Mike:

 It is not a permanent damage thing.

  A read disturb does no permanent damage to the chip
  but if the read disturb event involves more bits than
  can be corrected by your ECC code, it can do permanent
  damage to the *DATA* you've stored in that block.

  For this reason, a good flash management system manages
  to at least occasionally read through *ALL* of the in-use
  blocks in the device so that single-bit errors can be
  scrubbed out (read and successfully corrected) before
  an adjacent bit in the block also fails (which would
  eventually lead to a multi-bit error that might be
  beyond the ability to be corrected by the ECC).

  As far as I know (and I'm sure the list will correct
  me if I'm wrong! ;-) ), neither UBI nor UBIFS nor any
  Linux layer provides this routine scrubbing; you have
  to code it up yourself, probably by accessing the
  device at the UBI (underlying block device/LEB) layer.

Atlant

-Original Message-
From: linux-mtd-boun...@lists.infradead.org 
[mailto:linux-mtd-boun...@lists.infradead.org] On Behalf Of Mike Hench
Sent: Saturday, June 18, 2011 13:55
To: Scott Wood; Matthew L. Creech
Cc: linuxppc-dev@lists.ozlabs.org; linux-...@lists.infradead.org
Subject: RE: NAND BBT corruption on MPC83xx

Scott Wood wrote:
 As for the corruption, could it be degradation from repeated reads of
that
 one page?

Read Disturb. I Did not know SLC did that.
It just takes 10x as long as MLC, on the order of a million reads.
Supposedly erasing the block fixes it.
It is not a permanent damage thing.
I was seeing ~9 hours before failure with heavy writes.
~4GByte/hour = 2M pages, total ~18 million reads before errors in that
last block showed up.

Cool. Now we know.
Thanks.

Mike Hench



__
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

This e-mail and the information, including any attachments, it contains are 
intended to be a confidential communication only to the person or entity to 
whom it is addressed and may contain information that is privileged. If the 
reader of this message is not the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/1 v2 ] Add kernel parameter to disable batched hcalls

2010-09-28 Thread Will Schmidt


This introduces a pair of kernel parameters that can be used to disable
the MULTITCE and BULK_REMOVE h-calls. 

By default, those hcalls are enabled, active, and good for throughput
and performance.  The ability to disable them will be useful for some of
the PREEMPT_RT related investigation and work occurring on Power.


Signed-off-by: Will Schmidt will_schm...@vnet.ibm.com
cc: Olof Johansson o...@lixom.net
cc: Anton Blanchard an...@samba.org
cc: Benjamin Herrenschmidt b...@kernel.crashing.org

---

v2 - Per feedback from Olof, the code is reworked to utilize kernel
parameter runtime checks, rather than CONFIG options.
   - Added relevant change to kernel-parameters.txt



diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index e2c7487..5c40801 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -426,6 +426,10 @@ and is between 256 and 4096 characters. It is defined in 
the file
bttv.pll=   See Documentation/video4linux/bttv/Insmod-options
bttv.tuner= and Documentation/video4linux/bttv/CARDLIST
 
+   bulk_remove=off [PPC]  This parameter disables the use of the pSeries
+   firmware feature for flushing multiple hpte entries
+   at a time.
+
BusLogic=   [HW,SCSI]
See drivers/scsi/BusLogic.c, comment before function
BusLogic_ParseDriverOptions().
@@ -1499,6 +1503,10 @@ and is between 256 and 4096 characters. It is defined in 
the file
mtdparts=   [MTD]
See drivers/mtd/cmdlinepart.c.
 
+   multitce=off[PPC]  This parameter disables the use of the pSeries
+   firmware feature for updating multiple TCE entries
+   at a time.
+
onenand.bdry=   [HW,MTD] Flex-OneNAND Boundary Configuration
 
Format: 
[die0_boundary][,die0_lock][,die1_boundary][,die1_lock]
diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 902987d..e174a2f 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -625,3 +625,19 @@ void iommu_init_early_pSeries(void)
set_pci_dma_ops(dma_iommu_ops);
 }
 
+static int __init disable_multitce(char *str)
+{
+   if (strcmp(str,off)==0) {
+   if (firmware_has_feature(FW_FEATURE_LPAR)) {
+   if (firmware_has_feature(FW_FEATURE_MULTITCE)) {
+   printk(KERN_INFO Disabling MULTITCE firmware 
feature\n);
+   ppc_md.tce_build = tce_build_pSeriesLP;
+   ppc_md.tce_free  = tce_free_pSeriesLP;
+   powerpc_firmware_features = 
~FW_FEATURE_MULTITCE;
+   }
+   }
+   }
+   return 1;
+}
+
+__setup(multitce=,disable_multitce);
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 0707653..82d15e7 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -599,6 +599,19 @@ static void pSeries_lpar_flush_hash_range(unsigned long 
number, int local)
spin_unlock_irqrestore(pSeries_lpar_tlbie_lock, flags);
 }
 
+static int __init disable_bulk_remove(char *str)
+{
+   if (strcmp(str,off)==0) {
+   if (firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
+   printk(KERN_INFO Disabling BULK_REMOVE firmware 
feature);
+   powerpc_firmware_features = ~FW_FEATURE_BULK_REMOVE;
+   }
+   }
+   return 1;
+}
+
+__setup(bulk_remove=,disable_bulk_remove);
+
 void __init hpte_init_lpar(void)
 {
ppc_md.hpte_invalidate  = pSeries_lpar_hpte_invalidate;


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/1 v3] Add kernel parameter to disable batched hcalls

2010-09-28 Thread Will Schmidt


This introduces a pair of kernel parameters that can be used to disable
the MULTITCE and BULK_REMOVE h-calls.

By default, those hcalls are enabled, active, and good for throughput
and performance.  The ability to disable them will be useful for some of
the PREEMPT_RT related investigation and work occurring on Power.

Signed-off-by: Will Schmidt will_schm...@vnet.ibm.com
cc: Olof Johansson o...@lixom.net
cc: Anton Blanchard an...@samba.org
cc: Benjamin Herrenschmidt b...@kernel.crashing.org

---

v2 - Per feedback from Olof, the code is reworked to utilize kernel
parameter runtime checks, rather than CONFIG options.
   - Added relevant change to kernel-parameters.txt

v3 - ran through checkpatch and fixed whitespace issues.
   - consolidated the if() staircases to save indentation.


diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index e2c7487..5c40801 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -426,6 +426,10 @@ and is between 256 and 4096 characters. It is defined in 
the file
bttv.pll=   See Documentation/video4linux/bttv/Insmod-options
bttv.tuner= and Documentation/video4linux/bttv/CARDLIST
 
+   bulk_remove=off [PPC]  This parameter disables the use of the pSeries
+   firmware feature for flushing multiple hpte entries
+   at a time.
+
BusLogic=   [HW,SCSI]
See drivers/scsi/BusLogic.c, comment before function
BusLogic_ParseDriverOptions().
@@ -1499,6 +1503,10 @@ and is between 256 and 4096 characters. It is defined in 
the file
mtdparts=   [MTD]
See drivers/mtd/cmdlinepart.c.
 
+   multitce=off[PPC]  This parameter disables the use of the pSeries
+   firmware feature for updating multiple TCE entries
+   at a time.
+
onenand.bdry=   [HW,MTD] Flex-OneNAND Boundary Configuration
 
Format: 
[die0_boundary][,die0_lock][,die1_boundary][,die1_lock]
diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 902987d..e174a2f 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -625,3 +625,17 @@ void iommu_init_early_pSeries(void)
set_pci_dma_ops(dma_iommu_ops);
 }
 
+static int __init disable_multitce(char *str)
+{
+   if (strcmp(str, off) == 0 
+   firmware_has_feature(FW_FEATURE_LPAR) 
+   firmware_has_feature(FW_FEATURE_MULTITCE)) {
+   printk(KERN_INFO Disabling MULTITCE firmware feature\n);
+   ppc_md.tce_build = tce_build_pSeriesLP;
+   ppc_md.tce_free  = tce_free_pSeriesLP;
+   powerpc_firmware_features = ~FW_FEATURE_MULTITCE;
+   }
+   return 1;
+}
+
+__setup(multitce=, disable_multitce);
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 0707653..82d15e7 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -599,6 +599,18 @@ static void pSeries_lpar_flush_hash_range(unsigned long 
number, int local)
spin_unlock_irqrestore(pSeries_lpar_tlbie_lock, flags);
 }
 
+static int __init disable_bulk_remove(char *str)
+{
+   if (strcmp(str, off) == 0 
+   firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
+   printk(KERN_INFO Disabling BULK_REMOVE firmware 
feature);
+   powerpc_firmware_features = ~FW_FEATURE_BULK_REMOVE;
+   }
+   return 1;
+}
+
+__setup(bulk_remove=, disable_bulk_remove);
+
 void __init hpte_init_lpar(void)
 {
ppc_md.hpte_invalidate  = pSeries_lpar_hpte_invalidate;


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/1] Add config option for batched hcalls

2010-09-27 Thread Will Schmidt

On Sat, 2010-09-25 at 22:49 -0500, Olof Johansson wrote:
 On Fri, Sep 24, 2010 at 04:44:15PM -0500, Will Schmidt wrote:
  
  Add a config option for the (batched) MULTITCE and BULK_REMOVE h-calls.
  
  By default, these options are on and are beneficial for performance and
  throughput reasons.   If disabled, the code will fall back to using less
  optimal TCE and REMOVE hcalls.   The ability to easily disable these
  options is useful for some of the PREEMPT_RT related investigation and
  work occurring on Power.
 
 Hi,
 
 I can see why it's useful to enable and disable, but these are all
 runtime-checked, wouldn't it be more useful to add a bootarg to handle
 it instead of adding some new config options that pretty much everyone
 will always go with the defaults on?
 
 The bits are set early, but from looking at where they're used, there
 doesn't seem to be any harm in disabling them later on when a bootarg
 is convenient to parse and deal with?
 
 It has the benefit of easier on/off testing, if that has any value for
 production debug down the road.

Hi Olof, 
  Thats a good idea, let me poke at this a bit more, see if I can get
bootargs for this.  

Thanks, 
-Will

 
 
 -Olof
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/1] Add config option for batched hcalls

2010-09-24 Thread Will Schmidt


Add a config option for the (batched) MULTITCE and BULK_REMOVE h-calls.

By default, these options are on and are beneficial for performance and
throughput reasons.   If disabled, the code will fall back to using less
optimal TCE and REMOVE hcalls.   The ability to easily disable these
options is useful for some of the PREEMPT_RT related investigation and
work occurring on Power.


Signed-off-by: Will Schmidt will_schm...@vnet.ibm.com
cc: Anton Blanchard an...@samba.org
cc: Benjamin Herrenschmidt b...@kernel.crashing.org

---
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index f0e6f28..0b5e6a9 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -81,3 +81,23 @@ config DTL
  which are accessible through a debugfs file.
 
  Say N if you are unsure.
+
+config BULK_REMOVE
+   bool Enable BULK_REMOVE
+   depends on PPC_PSERIES
+   default y
+   help
+ Enable the BULK_REMOVE option for the hash page code.
+ This relies on a hcall-bulk firmware feature, and
+ should be enabled for performance throughput.
+
+config MULTITCE
+   bool Enable MultiTCE
+   depends on PPC_PSERIES
+   default y
+   help
+ Enable the Multi-TCE code, allowing a single hcall to
+ update multiple TCE entries at one time.  This relies
+ on a hcall-multi-tce firmware feature, and should be
+ enabled for performance throughput.
+
diff --git a/arch/powerpc/platforms/pseries/firmware.c 
b/arch/powerpc/platforms/pseries/firmware.c
index 0a4d8c..4327064 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -51,9 +51,13 @@ firmware_features_table[FIRMWARE_MAX_FEATURES] = {
{FW_FEATURE_VIO,hcall-vio},
{FW_FEATURE_RDMA,   hcall-rdma},
{FW_FEATURE_LLAN,   hcall-lLAN},
+#if defined(CONFIG_BULK_REMOVE)
{FW_FEATURE_BULK_REMOVE,hcall-bulk},
+#endif
{FW_FEATURE_XDABR,  hcall-xdabr},
+#if defined(CONFIG_MULTITCE)
{FW_FEATURE_MULTITCE,   hcall-multi-tce},
+#endif
{FW_FEATURE_SPLPAR, hcall-splpar},
 };
 
 /* Build up the firmware features bitmask using the contents of



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2]: Powerpc: Fix EHCA driver on relocatable kernel

2010-09-20 Thread Alexander Schmidt

On Thu, 19 Aug 2010 23:08:09 -0500
Sonny Rao sonny...@us.ibm.com wrote:

 Some modules (like eHCA) want to map all of kernel memory, for this to
 work with a relocated kernel, we need to export kernstart_addr so
 modules can use PHYSICAL_START and memstart_addr so they could use
 MEMORY_START.  Note that the 32bit code already exports these symbols.
 
 Signed-off-By: Sonny Rao sonny...@us.ibm.com

Acked-by: Alexander Schmidt al...@linux.vnet.ibm.com

 Index: common/arch/powerpc/mm/init_64.c
 ===
 --- common.orig/arch/powerpc/mm/init_64.c 2010-08-16 02:38:33.0 
 -0500
 +++ common/arch/powerpc/mm/init_64.c  2010-08-16 02:39:25.0 -0500
 @@ -79,7 +79,9 @@
  #endif /* CONFIG_PPC_STD_MMU_64 */
 
  phys_addr_t memstart_addr = ~0;
 +EXPORT_SYMBOL(memstart_addr);
  phys_addr_t kernstart_addr;
 +EXPORT_SYMBOL(kernstart_addr);
 
  void free_initmem(void)
  {
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries

2010-09-03 Thread Will Schmidt


dvh...@linux.vnet.ibm.com wrote on 09/02/2010 01:04:28 AM:

 Subject

 Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with
 CONFIG_PREEMPT on pseries

 With this in place, we no longer see the preempt_count dropping below
 zero. However, if I offline/online a CPU about 246 times I hit the
 opposite problem, a preempt_count() overflow. There appears to be a
 missing preempt_enable() somewhere in the offline/online paths.

This (preempt_count overflow) also occurred in mainline (with
CONFIG_PREEMPT=y) in 2.6.35, but not in 2.6.36-rc3.   A
bisect seems to indicate it was fixed with
a7c2bb8279d20d853e43c34584eaf2b039de8026   powerpc: Re-enable preemption
before cpu_die().

Which may look familiar.  :-)

It looks like this patch went to mainline (likely via the powerpc tree),
but may
have not gotten back into the -rt branch.

-Will


 --
 Darren Hart
 IBM Linux Technology Center
 Real-Time Linux Team___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries

2010-08-19 Thread Will Schmidt

Ankita Garg ank...@in.ibm.com wrote on 08/19/2010 10:58:24 AM:
 Subject
 Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with
 CONFIG_PREEMPT on pseries

 Hi Darren,

 On Thu, Jul 22, 2010 at 11:24:13AM -0700, Darren Hart wrote:
 
  With some instrumentation we were able to determine that the
  preempt_count() appears to change across the extended_cede_processor()
  call.  Specifically across the plpar_hcall_norets(H_CEDE) call. On
  PREEMPT_RT we call this with preempt_count=1 and return with
  preempt_count=0x. On mainline with CONFIG_PREEMPT=y, the value
  is different (0x65) but is still incorrect.

 I was trying to reproduce this issue on a 2.6.33.7-rt29 kernel. I could
 easily reproduce this on the RT kernel and not the non-RT kernel.
 However, I hit it every single time I did a cpu online operation. I also
 noticed that the issue persists even when I disable H_CEDE by passing
 the cede_offline=0 kernel commandline parameter. Could you pl confirm
 if you observe the same in your setup ?

If you see it every time, double-check that you have
http://patchwork.ozlabs.org/patch/60922/  or an equivalent in your tree.
(The
patch moves the preempt_enable_no_resched() call up above a call to cpu_die
().  An
earlier variation called a preempt_enable before calling
start_secondary_resume()).

Darren and I have been seeing different problems (different
dedicated-processor LPARS
on the same physical system).  He's seeing scrambled preempt_count values,
I'm tending
to see a system hang/death [with processor backtraces
showing .pseries_mach_cpu_die or
.pseries_dedicated_idle_sleep as expected, but no processes running] .

I'll be curious which you end up seeing.   With 2.6.33.7-rt27 as of a few
minutes ago,
I'm still seeing a system hang/death.




 However, the issue still remains. Will spend few cycles looking into
 this issue.

 
  Also of interest is that this path
  cpu_idle()-cpu_die()-pseries_mach_cpu_die() to start_secondary()
  enters with a preempt_count=1 if it wasn't corrupted across the hcall.
  The early boot path from _start however appears to call
  start_secondary() with a preempt_count of 0.
 
  The following patch is most certainly not correct, but it does
eliminate
  the situation on mainline 100% of the time (there is still a 25%
  reproduction rate on PREEMPT_RT). Can someone comment on:
 
  1) How can the preempt_count() get mangled across the H_CEDE hcall?
  2) Should we call preempt_enable() in cpu_idle() prior to cpu_die() ?
 
  Hacked-up-by: Darren Hart dvh...@us.ibm.com
 
  Index: linux-2.6.33.6/arch/powerpc/platforms/pseries/hotplug-cpu.c
  ===
  --- linux-2.6.33.6.orig/arch/powerpc/platforms/pseries/hotplug-cpu.c
  +++ linux-2.6.33.6/arch/powerpc/platforms/pseries/hotplug-cpu.c
  @@ -138,6 +138,7 @@ static void pseries_mach_cpu_die(void)
 * Kernel stack will be reset and start_secondary()
 * will be called to continue the online operation.
 */
  + preempt_count() = 0;
start_secondary_resume();
 }
  }
 
 

 --
 Regards,
 Ankita Garg (ank...@in.ibm.com)
 Linux Technology Center
 IBM India Systems  Technology Labs,
 Bangalore, India___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RT,RFC] Hacks allowing -rt to run on POWER7 / Powerpc.

2010-08-04 Thread Will Schmidt

On Sun, 2010-07-11 at 02:49 -0500, Milton Miller wrote:
 On Fri, 09 Jul 2010 about 08:55:01 -, Will Schmidt wrote:
  We've been seeing some issues with userspace randomly SIGSEGV'ing while
  running the -RT kernels on POWER7 based systems.   After lots of
  debugging, head scratching, and experimental changes to the code, the
  problem has been narrowed down such that we can avoid the problems by
  disabling the TLB batching.
  
  After some input from Ben and further debug, we've found that the
  restoration of the batch-active value near the end of __switch_to()
  seems to be the key.( The -RT related  changes within
  arch/powerpc/kernel/processor.c __switch_to()  do the equivalent of a
  arch_leave_lazy_mmu_mode() before calling _switch, use a hadbatch flag
  to indicate if batching was active, and then restore that batch-active
  value on the way out after the call to _switch_to.That particular
  code is in the -RT branch, and not found in mainline  )
  
  Deferring to Ben (or others in the know) for whether this is the proper
  solution or if there is something deeper, but.. 

I believe this is still on Ben's list of things to look at.   Between
then and now, I'll see if I can get Thomas to pick this up for the -RT
tree to keep RT functional on P7 in the mean-time.  

A bit more debug info below.

 
 
 I looked at the patch and noticed 2 changes:
   1) the batch is checked and cleared after local_irq_save
   2) enabling the batch is skipped
 
 I talked to Will and had him try moving the local_irq_save above the
 check for the active batch.  That alone did not seem to be enough.
 However, he confirmed that we are setting batch to active when it is
 already active in lazy_mmu_enter, meaning that batching is being turned
 on recursively.  I suggested debug to check that irqs are off after the
 restore when re-enabling when our debug session timed out.

Based on some of the debug suggestions from Milton:


A WARN_ON for (!irqs_disabled) after local_irq_restore() did not show
any hits.  (while otherwise continuing to suffer from the tlb batching
troubles). 

---

hard_irq_disable();
last = _switch(old_thread, new_thread);

local_irq_restore(flags);

WARN_ON(!irqs_disabled());--


#if defined(CONFIG_PPC64)  defined(CONFIG_PREEMPT_RT)  1
if (hadbatch) {
batch = __get_cpu_var(ppc64_tlb_batch);
batch-active = 1;
}
#endif

 


Another assortment of WARN_ONs in the arch_{enter,leave}_lazy_mmu_mode
functions.  As Milton stated above, the check for batch-active on the
way into the arch_enter_* function did generate lots of hits, the other
warn_ons did not.


static inline void arch_enter_lazy_mmu_mode(void)
{
struct ppc64_tlb_batch *batch = get_cpu_var(ppc64_tlb_batch);

//|-WARN_ON(batch-active); /* lots of hits if enabled */
|---WARN_ON(irqs_disabled()); /* nothing */
|---batch-active = 1;




static inline void arch_leave_lazy_mmu_mode(void)
{
|---struct ppc64_tlb_batch *batch = get_cpu_var(ppc64_tlb_batch);

|---WARN_ON(!batch-active);/* nothing.*/
|---WARN_ON(irqs_disabled());/* nothing */





 
 milton
 
  
  diff -aurp linux-2.6.33.5-rt23.orig/arch/powerpc/kernel/process.c 
  linux-2.6.33.5-rt23.exp/arch/powerpc/kernel/process.c
  --- linux-2.6.33.5-rt23.orig/arch/powerpc/kernel/process.c  2010-06-21 
  11:41:34.402513904 -0500
  +++ linux-2.6.33.5-rt23.exp/arch/powerpc/kernel/process.c   2010-07-09 
  13:15:13.533269904 -0500
  @@ -304,10 +304,6 @@ struct task_struct *__switch_to(struct t
  struct thread_struct *new_thread, *old_thread;
  unsigned long flags;
  struct task_struct *last;
  -#if defined(CONFIG_PPC64)  defined (CONFIG_PREEMPT_RT)
  -   struct ppc64_tlb_batch *batch;
  -   int hadbatch;
  -#endif
   
   #ifdef CONFIG_SMP
  /* avoid complexity of lazy save/restore of fpu
  @@ -401,16 +397,6 @@ struct task_struct *__switch_to(struct t
  new_thread-start_tb = current_tb;
  }
   
  -#ifdef CONFIG_PREEMPT_RT
  -   batch = __get_cpu_var(ppc64_tlb_batch);
  -   if (batch-active) {
  -   hadbatch = 1;
  -   if (batch-index) {
  -   __flush_tlb_pending(batch);
  -   }
  -   batch-active = 0;
  -   }
  -#endif /* #ifdef CONFIG_PREEMPT_RT */
   #endif
   
  local_irq_save(flags);
  @@ -425,16 +411,13 @@ struct task_struct *__switch_to(struct t
   * of sync. Hard disable here.
   */
  hard_irq_disable();
  -   last = _switch(old_thread, new_thread);
  -
  -   local_irq_restore(flags);
   
   #if defined(CONFIG_PPC64)  defined(CONFIG_PREEMPT_RT)
  -   if (hadbatch) {
  -   batch = __get_cpu_var(ppc64_tlb_batch);
  -   batch-active = 1;
  -   }
  +   arch_leave_lazy_mmu_mode();
   #endif
  +   last = _switch(old_thread, new_thread);
  +
  +   local_irq_restore(flags);
   
  return last

Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries

2010-07-23 Thread Will Schmidt

dvh...@linux.vnet.ibm.com wrote on 07/22/2010 06:57:18 PM:
 Subject

 Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with
 CONFIG_PREEMPT on pseries

 On 07/22/2010 03:25 PM, Benjamin Herrenschmidt wrote:
  On Thu, 2010-07-22 at 11:24 -0700, Darren Hart wrote:
 
  1) How can the preempt_count() get mangled across the H_CEDE hcall?
  2) Should we call preempt_enable() in cpu_idle() prior to cpu_die() ?
 
  The preempt count is on the thread info at the bottom of the stack.
 
  Can you check the stack pointers ?

 Hi Ben, thanks for looking.

 I instrumented the area around extended_cede_processor() as follows
 (please confirm I'm getting the stack pointer correctly).

 while (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
asm(mr %0,1 : =r (sp));
printk(before H_CEDE current-stack: %lx, pcnt: %x\n, sp,
 preempt_count());
extended_cede_processor(cede_latency_hint);
asm(mr %0,1 : =r (sp));
printk(after H_CEDE current-stack: %lx, pcnt: %x\n, sp,
 preempt_count());
  }


 On Mainline (2.6.33.6, CONFIG_PREEMPT=y) I see this:
 Jul 22 18:37:08 igoort1 kernel: before H_CEDE current-stack:
 c0010e9e3ce0, pcnt: 1
 Jul 22 18:37:08 igoort1 kernel: after H_CEDE current-stack:
 c0010e9e3ce0, pcnt: 1

 This surprised me as preempt_count is 1 before and after, so no
 corruption appears to occur on mainline. This makes the pcnt of 65 I see
 without the preempt_count()=0 hack very strange. I ran several hundred
 off/on cycles. The issue of preempt_count being 1 is still addressed by
 this patch however.

 On PREEMPT_RT (2.6.33.5-rt23 - tglx, sorry, rt/2.6.33 next time,
promise):
 Jul 22 18:51:11 igoort1 kernel: before H_CEDE current-stack:
 c00089bcfcf0, pcnt: 1
 Jul 22 18:51:11 igoort1 kernel: after H_CEDE current-stack:
 c00089bcfcf0, pcnt: 

I'm not seeing the preempt_count value corrupted with my current set of
debug, however, I have added buffers to the thread_info struct, so
wonder if I've moved the preempt_count variable out of the way of
the corruption.  (Still investigating that point..)

Why..   because I had been trying to set a DABR on the preempt_count
value to catch the corrupter, and due to hits on the nearby _flags fields,
getting
false positives..


struct thread_info {
|---struct task_struct *task;|--|---/* main task structure */
|---struct exec_domain *exec_domain;|---/* execution domain */
|---int||---cpu;|---|---|---/* cpu we're on */
|---int||---pad_buffer[64];
|---int||---preempt_count;|-|---/* 0 = preemptable,
|---|---|---|---|---|---   0 = BUG */
|---int||---pad_buffer2[256];
|---struct restart_block restart_block;
|---unsigned long|--local_flags;|---|---/* private flags for thread
*/

|---/* low level flags - has atomic operations done on it */
|---unsigned long|--flags cacheline_aligned_in_smp;
};



 In both cases the stack pointer appears unchanged.

 Note: there is a BUG triggered in between these statements as the
 preempt_count causes the printk to trigger:
 Badness at kernel/sched.c:5572

 Thanks,

 --
 Darren Hart
 IBM Linux Technology Center
 Real-Time Linux Team

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH, RT, RFC] Hacks allowing -rt to run on POWER7 / Powerpc.

2010-07-09 Thread Will Schmidt

[PATCH, RT, RFC]  Hacks allowing -rt to run on POWER7 / Powerpc.

We've been seeing some issues with userspace randomly SIGSEGV'ing while
running the -RT kernels on POWER7 based systems.   After lots of
debugging, head scratching, and experimental changes to the code, the
problem has been narrowed down such that we can avoid the problems by
disabling the TLB batching.

After some input from Ben and further debug, we've found that the
restoration of the batch-active value near the end of __switch_to()
seems to be the key.( The -RT related  changes within
arch/powerpc/kernel/processor.c __switch_to()  do the equivalent of a
arch_leave_lazy_mmu_mode() before calling _switch, use a hadbatch flag
to indicate if batching was active, and then restore that batch-active
value on the way out after the call to _switch_to.That particular
code is in the -RT branch, and not found in mainline  )

Deferring to Ben (or others in the know) for whether this is the proper
solution or if there is something deeper, but.. 
IF the right answer is to simply disable the restoration of
batch-active, the rest of the CONFIG_PREEMPT_RT changes in
__switch_to() should then be replaceable with a single call to
arch_leave_lazy_mmu_mode().

The patch here is what I am currently running with, on both POWER6 and
POWER7 systems, successfully.


Signed-off-by:   Will Schmidt will_schm...@vnet.ibm.com
CC:Ben Herrenschmidt  b...@kernel.crashing.org
CC:Thomas Gleixner t...@linutronix.de

---
diff -aurp linux-2.6.33.5-rt23.orig/arch/powerpc/kernel/process.c 
linux-2.6.33.5-rt23.exp/arch/powerpc/kernel/process.c
--- linux-2.6.33.5-rt23.orig/arch/powerpc/kernel/process.c  2010-06-21 
11:41:34.402513904 -0500
+++ linux-2.6.33.5-rt23.exp/arch/powerpc/kernel/process.c   2010-07-09 
13:15:13.533269904 -0500
@@ -304,10 +304,6 @@ struct task_struct *__switch_to(struct t
struct thread_struct *new_thread, *old_thread;
unsigned long flags;
struct task_struct *last;
-#if defined(CONFIG_PPC64)  defined (CONFIG_PREEMPT_RT)
-   struct ppc64_tlb_batch *batch;
-   int hadbatch;
-#endif
 
 #ifdef CONFIG_SMP
/* avoid complexity of lazy save/restore of fpu
@@ -401,16 +397,6 @@ struct task_struct *__switch_to(struct t
new_thread-start_tb = current_tb;
}
 
-#ifdef CONFIG_PREEMPT_RT
-   batch = __get_cpu_var(ppc64_tlb_batch);
-   if (batch-active) {
-   hadbatch = 1;
-   if (batch-index) {
-   __flush_tlb_pending(batch);
-   }
-   batch-active = 0;
-   }
-#endif /* #ifdef CONFIG_PREEMPT_RT */
 #endif
 
local_irq_save(flags);
@@ -425,16 +411,13 @@ struct task_struct *__switch_to(struct t
 * of sync. Hard disable here.
 */
hard_irq_disable();
-   last = _switch(old_thread, new_thread);
-
-   local_irq_restore(flags);
 
 #if defined(CONFIG_PPC64)  defined(CONFIG_PREEMPT_RT)
-   if (hadbatch) {
-   batch = __get_cpu_var(ppc64_tlb_batch);
-   batch-active = 1;
-   }
+   arch_leave_lazy_mmu_mode();
 #endif
+   last = _switch(old_thread, new_thread);
+
+   local_irq_restore(flags);
 
return last;
 }







___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-20 Thread Will Schmidt

On Thu, 2010-05-20 at 11:05 +0200, Jan-Bernd Themann wrote:
 Hi Thomas
 
  Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
 
  On Thu, 20 May 2010, Jan-Bernd Themann wrote:
 Thought more about that. The case at hand (ehea) is nasty:

 The driver does _NOT_ disable the rx interrupt in the card in the
 rx
 interrupt handler - for whatever reason.
   
Yeah I saw that, but I don't know why it's written that way. Perhaps
Jan-Bernd or Doug will chime in and enlighten us? :)
  
   From our perspective there is no need to disable interrupts for the
   RX side as the chip does not fire further interrupts until we tell
   the chip to do so for a particular queue. We have multiple receive
 
  The traces tell a different story though:
 
  ehea_recv_irq_handler()
napi_reschedule()
  eoi()
  ehea_poll()
...
ehea_recv_irq_handler() ???
  napi_reschedule()
...
napi_complete()
 
  Can't tell whether you can see the same behaviour in mainline, but I
  don't see a reason why not.
 
 Is this the same interrupt we are seeing here, or do we see a second other
 interrupt popping up on the same CPU? As I said, with multiple receive
 queues
 (if enabled) you can have multiple interrupts in parallel.

Same interrupt number (260).  Per the trace data, the first
ehea_recv_irq_handler (at 117.904525) was on cpu 0, the second (at
117.904689) was on cpu 1.



...-2180  [000]  117.904525: .ehea_recv_irq_handler: ENTER 0 c000e8bd08b0
...-2180  [000]  117.904527: .ehea_recv_irq_handler: napi_reschedule 
COMpleted  c000e8bd08b0
...-2180  [000]  117.904528: .ehea_recv_irq_handler: EXIT reschedule(1) 1 
c000e8bd08b0
...-2180  [000]  117.904529: .xics_unmask_irq: xics: unmask virq 260 772
...-2180  [000]  117.904547: .xics_unmask_irq: xics: unmask virq pre-xive 260 
772 0 status:0 ff
...-2180  [000]  117.904586: .xics_unmask_irq: xics: unmask virq post-xive 
260 772 0 D:11416 status:0 5
...-2180  [000]  117.904602: .handle_fasteoi_irq: 260 8004000
...-2180  [000]  117.904603: .xics_mask_irq: xics: mask virq 260 772
...-2180  [000]  117.904634: .xics_mask_real_irq: xics: before: mask_real 772 
status:0 5
...-2180  [000]  117.904668: .xics_mask_real_irq: xics: after: mask_real 772 
status:0 ff
...-2180  [000]  117.904669: .handle_fasteoi_irq: pre-action:  260 8004100
...-2180  [000]  117.904671: .handle_fasteoi_irq: post-action: 260 8004100
...-2180  [000]  117.904672: .handle_fasteoi_irq: exit.  260 8004000
...-7 [000]  117.904681: .ehea_poll:  ENTER  1  c000e8bd08b0 
poll_counter:0 force:0
...-7 [000]  117.904683: .ehea_proc_rwqes: ehea_check_cqe 0
...-2180  [001]  117.904689: .ehea_recv_irq_handler: ENTER 1 c000e8bd08b0
...-7 [000]  117.904690: .ehea_proc_rwqes: ehea_check_cqe 0
...-2180  [001]  117.904691: .ehea_recv_irq_handler: napi_reschedule 
inCOMplete  c000e8bd08b0
...-2180  [001]  117.904692: .ehea_recv_irq_handler: EXIT reschedule(0) 1 
c000e8bd08b0
...-2180  [001]  117.904694: .xics_unmask_irq: xics: unmask virq 260 772
...-7 [000]  117.904702: .ehea_refill_rq2: ehea_refill_rq2
...-7 [000]  117.904703: .ehea_refill_rq_def: ehea_refill_rq_def
...-7 [000]  117.904704: .ehea_refill_rq3: ehea_refill_rq3
...-7 [000]  117.904705: .ehea_refill_rq_def: ehea_refill_rq_def
...-7 [000]  117.904706: .napi_complete: napi_complete: ENTER  state: 1  
c000e8bd08b0
...-7 [000]  117.904707: .napi_complete: napi_complete: EXIT  state: 0  
c000e8bd08b0
...-7 [000]  117.904710: .ehea_poll: EXIT  !cqe rx(2).   0  
c000e8bd08b0
...-2180  [001]  117.904719: .xics_unmask_irq: xics: unmask virq pre-xive 260 
772 0 status:0 ff
...-2180  [001]  117.904761: .xics_unmask_irq: xics: unmask virq post-xive 
260 772 0 D:12705 status:0 5




 Pleaes check if multiple queues are enabled. The following module parameter
 is used for that:
 
 MODULE_PARM_DESC(use_mcs,  0:NAPI, 1:Multiple receive queues, Default = 0
 );

No module parameters were used, should be plain old defaults.  

 
 you should also see the number of used HEA interrupts in /proc/interrupts
 

256:  100000   00   XICS  Level ehea_neq
259:  000000   00   XICS  Level eth0-aff
260: 36196500000   00   XICS  Level 
eth0-queue0




 
 
   queues with an own interrupt each so that the interrupts can arrive
   on multiple CPUs in parallel.  Interrupts are enabled again when we
   leave the NAPI Poll function for the corresponding receive queue.
 
  I can't see a piece of code which does that, but that's probably just
  lack of detailed hardware knowledge on my side.
 
 If you mean the re-enable piece of code, it is not very obvious, you are
 right.
 Interrupts are only generated if a particular register for our completion
 queues
 is written. We do this in the following line:

Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-20 Thread Will Schmidt

On Thu, 2010-05-20 at 16:45 +0200, Thomas Gleixner wrote:
 On Thu, 20 May 2010, Darren Hart wrote:
 
  On 05/20/2010 01:14 AM, Thomas Gleixner wrote:
   On Thu, 20 May 2010, Jan-Bernd Themann wrote:
  Thought more about that. The case at hand (ehea) is nasty:
  
  The driver does _NOT_ disable the rx interrupt in the card in the rx
  interrupt handler - for whatever reason.
 
 Yeah I saw that, but I don't know why it's written that way. Perhaps
 Jan-Bernd or Doug will chime in and enlighten us? :)

 From our perspective there is no need to disable interrupts for the
RX side as the chip does not fire further interrupts until we tell
the chip to do so for a particular queue. We have multiple receive
   
   The traces tell a different story though:
   
ehea_recv_irq_handler()
  napi_reschedule()
eoi()
ehea_poll()
  ...
  ehea_recv_irq_handler() ???
napi_reschedule()
  ...
  napi_complete()
   
   Can't tell whether you can see the same behaviour in mainline, but I
   don't see a reason why not.
  
  I was going to suggest that because these are threaded handlers, perhaps 
  they
  are rescheduled on a different CPU and then receive the interrupt for the
  other CPU/queue that Jan was mentioning.
  
  But, the handlers are affined if I remember correctly, and we aren't running
  with multiple receive queues. So, we're back to the same question, why are 
  we
  seeing another irq. It comes in before napi_complete() and therefor before 
  the
  ehea_reset*() block of calls which do the equivalent of re-enabling
  interrupts.
 
 Can you slap a few trace points into that driver with a stock mainline
 kernel and verify that ?

2.6.33.4 (non-rt kernel) with similar trace_printk hooks in place...
Most data lumps look like so:

  idle-0 [000]  1097.685337: .handle_fasteoi_irq: ENTER 260 4000
  idle-0 [000]  1097.685339: .handle_fasteoi_irq: pre-action 260 
4100
  idle-0 [000]  1097.685339: .ehea_recv_irq_handler:  ENTER  
c000e8980700
  idle-0 [000]  1097.685340: .ehea_recv_irq_handler: 
napi_schedule ... c000e8980700
  idle-0 [000]  1097.685341: .ehea_recv_irq_handler: 
napi_schedule Calling __napi_schedule ... c000e8980700
  idle-0 [000]  1097.685342: .ehea_recv_irq_handler:  EXIT 
c000e8980700
  idle-0 [000]  1097.685343: .handle_fasteoi_irq: post-action 260 
4100
  idle-0 [000]  1097.685344: .handle_fasteoi_irq: EXIT. 260 4000
  idle-0 [000]  1097.685346: .ehea_poll:  ENTER c000e8980700
  idle-0 [000]  1097.685352: .napi_complete: napi_complete: ENTER 
c000e8980700
  idle-0 [000]  1097.685352: .napi_complete: napi_complete: EXIT 
c000e8980700
  idle-0 [000]  1097.685355: .ehea_poll:  EXIT !cqe rx(1) 
c000e8980700

But I did see one like this, which shows a ehea_recv_irq_handler ENTER
within a ehea_poll ENTER.   (which I think is what you were expecting,
or wanted to verify..)   


  idle-0 [000]  1097.616261: .handle_fasteoi_irq: ENTER 260 4000
  idle-0 [000]  1097.616262: .handle_fasteoi_irq: pre-action 260 
4100
*  idle-0 [000]  1097.616263: .ehea_recv_irq_handler:  ENTER  
c000e8980700
  idle-0 [000]  1097.616264: .ehea_recv_irq_handler: 
napi_schedule ... c000e8980700
  idle-0 [000]  1097.616265: .ehea_recv_irq_handler: 
napi_schedule Calling __napi_schedule ... c000e8980700
  idle-0 [000]  1097.616265: .ehea_recv_irq_handler:  EXIT 
c000e8980700
  idle-0 [000]  1097.616266: .handle_fasteoi_irq: post-action 260 
4100
  idle-0 [000]  1097.616268: .handle_fasteoi_irq: EXIT. 260 4000
*  idle-0 [000]  1097.616270: .ehea_poll:  ENTER c000e8980700
  idle-0 [000]  1097.616282: .handle_fasteoi_irq: ENTER 260 4000
  idle-0 [000]  1097.616283: .handle_fasteoi_irq: pre-action 260 
4100
*  idle-0 [000]  1097.616284: .ehea_recv_irq_handler:  ENTER  
c000e8980700
  idle-0 [000]  1097.616285: .ehea_recv_irq_handler: 
napi_schedule ... c000e8980700
  idle-0 [000]  1097.616286: .ehea_recv_irq_handler: 
napi_schedule NOT Calling __napi_schedule... c000e8980700
  idle-0 [000]  1097.616286: .ehea_recv_irq_handler:  EXIT 
c000e8980700
  idle-0 [000]  1097.616287: .handle_fasteoi_irq: post-action 260 
4100
  idle-0 [000]  1097.616289: .handle_fasteoi_irq: EXIT. 260 4000
  idle-0 [000]  1097.616299: .napi_complete: napi_complete: ENTER 
c000e8980700
  idle-0 [000]  1097.616300: .napi_complete: napi_complete: EXIT 
c000e8980700
  idle-0 [000]  1097.616302: .ehea_poll: napi_reschedule 
COMpleted c000e8980700
  idle-0 [000]

Re: [PATCH 25/37] drivers/infiniband: use .dev.of_node instead of .node in struct of_device

2010-03-12 Thread Alexander Schmidt

On Thu, 11 Mar 2010 11:57:53 -0800
Roland Dreier rdre...@cisco.com wrote:

 Seems fine... adding EHCA guys just to make sure.
 
   .node is being removed
   
   Signed-off-by: Grant Likely grant.lik...@secretlab.ca

Acked-by: Alexander Schmidt al...@linux.vnet.ibm.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: hypervisor call tracepoints hcall_stats touchup.

2009-11-30 Thread Will Schmidt

On Wed, 2009-11-25 at 16:19 -0500, Steven Rostedt wrote:
 On Wed, 2009-11-25 at 10:12 -0600, Will Schmidt wrote:
 
  Tested-by: Will Schmidt will_schm...@vnet.ibm.com
  Signed-off-by: Will Schmidt will_schm...@vnet.ibm.com
 
 Isn't it assumed that the one that made the patch also tested it? Well I
 would hope that is the norm.

I like to make that assumption, but wanted to be clear on the point for
anyone with doubts. :-)


 
 -- Steve
 
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

hypervisor call tracepoints hcall_stats touchup.

2009-11-25 Thread Will Schmidt


The tb_total and purr_total values reported via the hcall_stats code
should be cumulative, rather than being replaced by the latest delta tb
or purr value. 

Tested-by: Will Schmidt will_schm...@vnet.ibm.com
Signed-off-by: Will Schmidt will_schm...@vnet.ibm.com

---
[ This is a touch-up to the [3/6] powerpc: tracing: Add hypervisor call
tracepoints patch submitted by Anton a few weeks back, so I've copied
folks Anton had on CC for his original patch, this fix is rather ppc
specific, so can probably go in via the ppc tree, but I've no real
preference. ]

diff --git a/arch/powerpc/platforms/pseries/hvCall_inst.c 
b/arch/powerpc/platforms/pseries/hvCall_inst.c
index 2f58c71..1fefae7 100644
--- a/arch/powerpc/platforms/pseries/hvCall_inst.c
+++ b/arch/powerpc/platforms/pseries/hvCall_inst.c
@@ -124,8 +124,8 @@ static void probe_hcall_exit(unsigned long opcode, unsigned 
long retval,
 
h = __get_cpu_var(hcall_stats)[opcode / 4];
h-num_calls++;
-   h-tb_total = mftb() - h-tb_start;
-   h-purr_total = mfspr(SPRN_PURR) - h-purr_start;
+   h-tb_total += mftb() - h-tb_start;
+   h-purr_total += mfspr(SPRN_PURR) - h-purr_start;
 
put_cpu_var(hcall_stats);
 }


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] ehca: use port autodetect mode as default

2009-07-07 Thread Alexander Schmidt

This patch sets the port autodetect mode as default for the ehca driver. The
autodetect code has been in the kernel for several releases now and has
proved to be stable.

---
Roland, please queue this change for 2.6.32 if you are okay with it.

 drivers/infiniband/hw/ehca/ehca_main.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_main.c
+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_main.c
@@ -52,7 +52,7 @@
 #include ehca_tools.h
 #include hcp_if.h
 
-#define HCAD_VERSION 0028
+#define HCAD_VERSION 0029
 
 MODULE_LICENSE(Dual BSD/GPL);
 MODULE_AUTHOR(Christoph Raisch rai...@de.ibm.com);
@@ -64,7 +64,7 @@ static int ehca_hw_level  = 0;
 static int ehca_poll_all_eqs  = 1;
 
 int ehca_debug_level   = 0;
-int ehca_nr_ports  = 2;
+int ehca_nr_ports  = -1;
 int ehca_use_hp_mr = 0;
 int ehca_port_act_time = 30;
 int ehca_static_rate   = -1;
@@ -95,8 +95,8 @@ MODULE_PARM_DESC(hw_level,
 Hardware level (0: autosensing (default), 
 0x10..0x14: eHCA, 0x20..0x23: eHCA2));
 MODULE_PARM_DESC(nr_ports,
-number of connected ports (-1: autodetect, 1: port one only, 
-2: two ports (default));
+number of connected ports (-1: autodetect (default), 
+1: port one only, 2: two ports));
 MODULE_PARM_DESC(use_hp_mr,
 Use high performance MRs (default: no));
 MODULE_PARM_DESC(port_act_time,
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [ewg] Re: [PATCH 2.6.31 try 2] ehca: Tolerate dynamic memory operations and huge pages

2009-06-23 Thread Alexander Schmidt

On Mon, 22 Jun 2009 22:19:21 -0700
Roland Dreier rdre...@cisco.com wrote:

 thanks, applied.
 
   -#define HCAD_VERSION 0026
   +#define HCAD_VERSION 0027
 
 the driver version was already 0027 (since bde2cfaf), so I dropped this chunk.

thank you for applying, we would like to increase the version number for this
patch, so please also apply the following:

ehca: Increment version number for DMEM toleration

Signed-off-by: Alexander Schmidt al...@linux.vnet.ibm.com
---
 drivers/infiniband/hw/ehca/ehca_main.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_main.c
+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_main.c
@@ -52,7 +52,7 @@
 #include ehca_tools.h
 #include hcp_if.h
 
-#define HCAD_VERSION 0027
+#define HCAD_VERSION 0028
 
 MODULE_LICENSE(Dual BSD/GPL);
 MODULE_AUTHOR(Christoph Raisch rai...@de.ibm.com);
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [ewg] Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-17 Thread Alexander Schmidt

On Tue, 16 Jun 2009 09:10:39 -0700
Roland Dreier rdre...@cisco.com wrote:

 
   Yeah, the notifier code remains untouched as we still do not allow dynamic
   memory operations _while_ our module is loaded. The patch allows the 
 driver to
   cope with DMEM operations that happened before the module was loaded, which
   might result in a non-contiguous memory layout. When the driver registers
   its global memory region in the system, the memory layout must be 
 considered.
   
   We chose the term toleration instead of support to illustrate this.
 
 I see.  So things just silently broke in some cases when the driver was
 loaded after operations you didn't tolerate?
 
 Anyway, thanks for the explanation.

Well, things did not break silently. The registration of the MR failed with
an error code which was reported to userspace.

Will you push the patch for .31 or .32?

Thanks,
Alex
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-16 Thread Alexander Schmidt

Hi Roland,

thank you for taking a look at the code!

On Fri, 12 Jun 2009 21:50:58 -0700
Roland Dreier rdre...@cisco.com wrote:

 OK, one major issue with this patch and a few minor nits.
 
 First, the major issue is that I don't see anything in the patch that
 changes the code in ehca_mem_notifier() in ehca_main.c:
 
   case MEM_GOING_ONLINE:
   case MEM_GOING_OFFLINE:
   /* only ok if no hca is attached to the lpar */
   spin_lock_irqsave(shca_list_lock, flags);
   if (list_empty(shca_list)) {
   spin_unlock_irqrestore(shca_list_lock, flags);
   return NOTIFY_OK;
   } else {
   spin_unlock_irqrestore(shca_list_lock, flags);
   if (printk_timed_ratelimit(ehca_dmem_warn_time,
  30 * 1000))
   ehca_gen_err(DMEM operations are not allowed
in conjunction with eHCA);
   return NOTIFY_BAD;
   }
 
 But your patch description says:
 
   This patch implements toleration of dynamic memory operations
 
 But it seems you're still going to hit the same NOTIFY_BAD case above
 after your patch.  So something doesn't compute for me.  Could you
 explain more?

Yeah, the notifier code remains untouched as we still do not allow dynamic
memory operations _while_ our module is loaded. The patch allows the driver to
cope with DMEM operations that happened before the module was loaded, which
might result in a non-contiguous memory layout. When the driver registers
its global memory region in the system, the memory layout must be considered.

We chose the term toleration instead of support to illustrate this.

I'll put some more details into the changelog, incorporate the other comments
and send out a second version of the patch.

Thanks,
Alex

 
 Second, a nit:
 
   +#define EHCA_REG_MR 0
   +#define EHCA_REG_BUSMAP_MR (~0)
 
 and you pass these as the reg_busmap parm in:
 
int ehca_reg_mr(struct ehca_shca *shca,
  struct ehca_mr *e_mr,
  u64 *iova_start,
   @@ -991,7 +1031,8 @@
  struct ehca_pd *e_pd,
  struct ehca_mr_pginfo *pginfo,
  u32 *lkey, /*OUT*/
   -  u32 *rkey) /*OUT*/
   +  u32 *rkey, /*OUT*/
   +  int reg_busmap)
 
 and test it as:
 
   +  if (reg_busmap)
   +  ret = ehca_reg_bmap_mr_rpages(shca, e_mr, pginfo);
   +  else
   +  ret = ehca_reg_mr_rpages(shca, e_mr, pginfo);
 
 So the ~0 for true looks a bit odd.  One option would be to make
 reg_busmap a bool, since that's how you're using it, but then you lose
 the nice self-documenting macro where you call things.
 
 So I think it would be cleaner to do something like
 
 enum ehca_reg_type {
   EHCA_REG_MR,
   EHCA_REG_BUSMAP_MR
 };
 
 and make the int reg_busmap parameter into enum ehca_reg_type reg_type
 and have the code become
 
 + if (reg_type == EHCA_REG_BUSMAP_MR)
 + ret = ehca_reg_bmap_mr_rpages(shca, e_mr, pginfo);
 + else if (reg_type == EHCA_REG_MR)
 + ret = ehca_reg_mr_rpages(shca, e_mr, pginfo);
 + else
 + ret = -EINVAL
 
 or something like that.
 
   +struct ib_dma_mapping_ops ehca_dma_mapping_ops = {
   +  .mapping_error = ehca_dma_mapping_error,
   +  .map_single = ehca_dma_map_single,
   +  .unmap_single = ehca_dma_unmap_single,
   +  .map_page = ehca_dma_map_page,
   +  .unmap_page = ehca_dma_unmap_page,
   +  .map_sg = ehca_dma_map_sg,
   +  .unmap_sg = ehca_dma_unmap_sg,
   +  .dma_address = ehca_dma_address,
   +  .dma_len = ehca_dma_len,
   +  .sync_single_for_cpu = ehca_dma_sync_single_for_cpu,
   +  .sync_single_for_device = ehca_dma_sync_single_for_device,
   +  .alloc_coherent = ehca_dma_alloc_coherent,
   +  .free_coherent = ehca_dma_free_coherent,
   +};
 
 I always think structures like this are easier to read if you align the
 '=' signs.  But no big deal.
 
   +  ret = ehca_create_busmap();
   +  if (ret) {
   +  ehca_gen_err(Cannot create busmap.);
   +  goto module_init2;
   +  }
   +
  ret = ibmebus_register_driver(ehca_driver);
  if (ret) {
  ehca_gen_err(Cannot register eHCA device driver);
  ret = -EINVAL;
   -  goto module_init2;
   +  goto module_init3;
  }

  ret = register_memory_notifier(ehca_mem_nb);
  if (ret) {
  ehca_gen_err(Failed registering memory add/remove notifier);
   -  goto module_init3;
   +  goto module_init4;
 
 Having to renumber unrelated things is when something changes is why I
 don't like this style of error path labels.  But I think it's well and
 truly too late to fix that in ehca.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2.6.31 try 2] ehca: Tolerate dynamic memory operations and huge pages

2009-06-16 Thread Alexander Schmidt

From: Hannes Hering heri...@de.ibm.com

This patch implements toleration of dynamic memory operations and 16 GB
gigantic pages. Toleration means that the driver can cope with dynamic
memory operations that happened before the driver was loaded. While using the
ehca driver, dynamic memory operations are still prohibited. On module load the
driver walks through available system memory, checks for available memory ranges
and then registers the kernel internal memory region accordingly. The
translation of address ranges is implemented via a 3-level busmap.

Signed-off-by: Hannes Hering heri...@de.ibm.com

---
This patch is built and tested against infiniband.git. Please apply for 2.6.31.

 drivers/infiniband/hw/ehca/ehca_main.c |   20 +
 drivers/infiniband/hw/ehca/ehca_mrmw.c |  508 -
 drivers/infiniband/hw/ehca/ehca_mrmw.h |   13 
 3 files changed, 523 insertions(+), 18 deletions(-)

--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_main.c
+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_main.c
@@ -52,7 +52,7 @@
 #include ehca_tools.h
 #include hcp_if.h
 
-#define HCAD_VERSION 0026
+#define HCAD_VERSION 0027
 
 MODULE_LICENSE(Dual BSD/GPL);
 MODULE_AUTHOR(Christoph Raisch rai...@de.ibm.com);
@@ -506,6 +506,7 @@ static int ehca_init_device(struct ehca_
shca-ib_device.detach_mcast= ehca_detach_mcast;
shca-ib_device.process_mad = ehca_process_mad;
shca-ib_device.mmap= ehca_mmap;
+   shca-ib_device.dma_ops = ehca_dma_mapping_ops;
 
if (EHCA_BMASK_GET(HCA_CAP_SRQ, shca-hca_cap)) {
shca-ib_device.uverbs_cmd_mask |=
@@ -1028,17 +1029,23 @@ static int __init ehca_module_init(void)
goto module_init1;
}
 
+   ret = ehca_create_busmap();
+   if (ret) {
+   ehca_gen_err(Cannot create busmap.);
+   goto module_init2;
+   }
+
ret = ibmebus_register_driver(ehca_driver);
if (ret) {
ehca_gen_err(Cannot register eHCA device driver);
ret = -EINVAL;
-   goto module_init2;
+   goto module_init3;
}
 
ret = register_memory_notifier(ehca_mem_nb);
if (ret) {
ehca_gen_err(Failed registering memory add/remove notifier);
-   goto module_init3;
+   goto module_init4;
}
 
if (ehca_poll_all_eqs != 1) {
@@ -1053,9 +1060,12 @@ static int __init ehca_module_init(void)
 
return 0;
 
-module_init3:
+module_init4:
ibmebus_unregister_driver(ehca_driver);
 
+module_init3:
+   ehca_destroy_busmap();
+
 module_init2:
ehca_destroy_slab_caches();
 
@@ -1073,6 +1083,8 @@ static void __exit ehca_module_exit(void
 
unregister_memory_notifier(ehca_mem_nb);
 
+   ehca_destroy_busmap();
+
ehca_destroy_slab_caches();
 
ehca_destroy_comp_pool();
--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_mrmw.c
+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_mrmw.c
@@ -53,6 +53,38 @@
 /* max number of rpages (per hcall register_rpages) */
 #define MAX_RPAGES 512
 
+/* DMEM toleration management */
+#define EHCA_SECTSHIFTSECTION_SIZE_BITS
+#define EHCA_SECTSIZE  (1UL  EHCA_SECTSHIFT)
+#define EHCA_HUGEPAGESHIFT 34
+#define EHCA_HUGEPAGE_SIZE (1UL  EHCA_HUGEPAGESHIFT)
+#define EHCA_HUGEPAGE_PFN_MASK ((EHCA_HUGEPAGE_SIZE - 1)  PAGE_SHIFT)
+#define EHCA_INVAL_ADDR0xULL
+#define EHCA_DIR_INDEX_SHIFT 13   /* 8k Entries in 64k block */
+#define EHCA_TOP_INDEX_SHIFT (EHCA_DIR_INDEX_SHIFT * 2)
+#define EHCA_MAP_ENTRIES (1  EHCA_DIR_INDEX_SHIFT)
+#define EHCA_TOP_MAP_SIZE (0x1)   /* currently fixed map size 
*/
+#define EHCA_DIR_MAP_SIZE (0x1)
+#define EHCA_ENT_MAP_SIZE (0x1)
+#define EHCA_INDEX_MASK (EHCA_MAP_ENTRIES - 1)
+
+static unsigned long ehca_mr_len;
+
+/*
+ * Memory map data structures
+ */
+struct ehca_dir_bmap {
+   u64 ent[EHCA_MAP_ENTRIES];
+};
+struct ehca_top_bmap {
+   struct ehca_dir_bmap *dir[EHCA_MAP_ENTRIES];
+};
+struct ehca_bmap {
+   struct ehca_top_bmap *top[EHCA_MAP_ENTRIES];
+};
+
+static struct ehca_bmap *ehca_bmap;
+
 static struct kmem_cache *mr_cache;
 static struct kmem_cache *mw_cache;
 
@@ -68,6 +100,8 @@ enum ehca_mr_pgsize {
 #define EHCA_MR_PGSHIFT1M  20
 #define EHCA_MR_PGSHIFT16M 24
 
+static u64 ehca_map_vaddr(void *caddr);
+
 static u32 ehca_encode_hwpage_size(u32 pgsize)
 {
int log = ilog2(pgsize);
@@ -135,7 +169,8 @@ struct ib_mr *ehca_get_dma_mr(struct ib_
goto get_dma_mr_exit0;
}
 
-   ret = ehca_reg_maxmr(shca, e_maxmr, (u64 *)KERNELBASE,
+   ret = ehca_reg_maxmr(shca, e_maxmr,
+(void *)ehca_map_vaddr((void *)KERNELBASE),
 mr_access_flags, e_pd,

Re: [ewg] Re: [PATCH 1/3] IB/ehca: Replace vmalloc with kmalloc

2009-04-28 Thread Alexander Schmidt

Hi Roland,

did you have a chance to take a look at the patchset and will you apply it, or
are there any outstanding issues we need to address?

Regards,
Alex

On Wed, 22 Apr 2009 16:02:28 +0200
Stefan Roscher ossro...@linux.vnet.ibm.com wrote:

 In case of large queue pairs there is the possibillity of allocation failures 
 due to memory fragmentationo with kmalloc().To ensure the memory is allocated 
 even
 if kmalloc() can not find chunks which are big enough, we try to allocate the 
 memory
 with vmalloc().
 
 Signed-off-by: Stefan Roscher stefan.rosc...@de.ibm.com
 ---
 
 On Tuesday 21 April 2009 07:34:30 pm Roland Dreier wrote:
+queue-queue_pages = kmalloc(nr_of_pages * sizeof(void *), 
  GFP_KERNEL);
  
  How big might this buffer be?  Any chance of allocation failure due to
  memory fragmentation?
  
   - R.
 Hey Roland, 
 yes you are right and here is the patch to circumvent the described problem.
 It will apply on top of the patchset.
 regards Stefan
 
 
 
  drivers/infiniband/hw/ehca/ipz_pt_fn.c |   17 +
  1 files changed, 13 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.c 
 b/drivers/infiniband/hw/ehca/ipz_pt_fn.c
 index a260559..1227c59 100644
 --- a/drivers/infiniband/hw/ehca/ipz_pt_fn.c
 +++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.c
 @@ -222,8 +222,11 @@ int ipz_queue_ctor(struct ehca_pd *pd, struct ipz_queue 
 *queue,
   /* allocate queue page pointers */
   queue-queue_pages = kmalloc(nr_of_pages * sizeof(void *), GFP_KERNEL);
   if (!queue-queue_pages) {
 - ehca_gen_err(Couldn't allocate queue page list);
 - return 0;
 + queue-queue_pages = vmalloc(nr_of_pages * sizeof(void *));
 + if (!queue-queue_pages) {
 + ehca_gen_err(Couldn't allocate queue page list);
 + return 0;
 + }
   }
   memset(queue-queue_pages, 0, nr_of_pages * sizeof(void *));
 
 @@ -240,7 +243,10 @@ int ipz_queue_ctor(struct ehca_pd *pd, struct ipz_queue 
 *queue,
  ipz_queue_ctor_exit0:
   ehca_gen_err(Couldn't alloc pages queue=%p 
nr_of_pages=%x,  queue, nr_of_pages);
 - kfree(queue-queue_pages);
 + if (is_vmalloc_addr(queue-queue_pages))
 + vfree(queue-queue_pages);
 + else
 + kfree(queue-queue_pages);
 
   return 0;
  }
 @@ -262,7 +268,10 @@ int ipz_queue_dtor(struct ehca_pd *pd, struct ipz_queue 
 *queue)
   free_page((unsigned long)queue-queue_pages[i]);
   }
 
 - kfree(queue-queue_pages);
 + if (is_vmalloc_addr(queue-queue_pages))
 + vfree(queue-queue_pages);
 + else
 + kfree(queue-queue_pages);
 
   return 1;
  }
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [ewg] Re: [PATCH 1/3] IB/ehca: Replace vmalloc with kmalloc

2009-04-28 Thread Alexander Schmidt

On Tue, 28 Apr 2009 07:01:32 -0700
Roland Dreier rdre...@cisco.com wrote:

   did you have a chance to take a look at the patchset and will you apply 
 it, or
   are there any outstanding issues we need to address?
 
 I guess it's OK, but definitely 2.6.31 material.  I guess I'll stick it
 linux-next soon.
 
  - R.

Okay with us, thank you very much!

Alex
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] ib/ehca: add flush CQE generation

2008-09-10 Thread Alexander Schmidt

When a QP goes into error state, it is required that flush CQEs are
delivered to the application for any outstanding work requests. eHCA does not
do this in hardware, so this patch adds software flush CQE generation to the
ehca driver.

Whenever a QP gets into error state, it is added to the QP error list of its
respective CQ. If the error QP list of a CQ is not empty, poll_cq()
generates flush CQEs before polling the actual CQ.

Signed-off-by: Alexander Schmidt [EMAIL PROTECTED]
---
Applies on top of 2.6.27-rc3, please consider this for 2.6.28.

 drivers/infiniband/hw/ehca/ehca_classes.h |   14 +
 drivers/infiniband/hw/ehca/ehca_cq.c  |3 
 drivers/infiniband/hw/ehca/ehca_iverbs.h  |2 
 drivers/infiniband/hw/ehca/ehca_qp.c  |  225 --
 drivers/infiniband/hw/ehca/ehca_reqs.c|  211 
 5 files changed, 412 insertions(+), 43 deletions(-)

--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_classes.h
+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_classes.h
@@ -164,6 +164,13 @@ struct ehca_qmap_entry {
u16 reported;
 };
 
+struct ehca_queue_map {
+   struct ehca_qmap_entry *map;
+   unsigned int entries;
+   unsigned int tail;
+   unsigned int left_to_poll;
+};
+
 struct ehca_qp {
union {
struct ib_qp ib_qp;
@@ -173,8 +180,9 @@ struct ehca_qp {
enum ehca_ext_qp_type ext_type;
enum ib_qp_state state;
struct ipz_queue ipz_squeue;
-   struct ehca_qmap_entry *sq_map;
+   struct ehca_queue_map sq_map;
struct ipz_queue ipz_rqueue;
+   struct ehca_queue_map rq_map;
struct h_galpas galpas;
u32 qkey;
u32 real_qp_num;
@@ -204,6 +212,8 @@ struct ehca_qp {
atomic_t nr_events; /* events seen */
wait_queue_head_t wait_completion;
int mig_armed;
+   struct list_head sq_err_node;
+   struct list_head rq_err_node;
 };
 
 #define IS_SRQ(qp) (qp-ext_type == EQPT_SRQ)
@@ -233,6 +243,8 @@ struct ehca_cq {
/* mmap counter for resources mapped into user space */
u32 mm_count_queue;
u32 mm_count_galpa;
+   struct list_head sqp_err_list;
+   struct list_head rqp_err_list;
 };
 
 enum ehca_mr_flag {
--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -53,9 +53,25 @@
 /* in RC traffic, insert an empty RDMA READ every this many packets */
 #define ACK_CIRC_THRESHOLD 200
 
+static u64 replace_wr_id(u64 wr_id, u16 idx)
+{
+   u64 ret;
+
+   ret = wr_id  ~QMAP_IDX_MASK;
+   ret |= idx  QMAP_IDX_MASK;
+
+   return ret;
+}
+
+static u16 get_app_wr_id(u64 wr_id)
+{
+   return wr_id  QMAP_IDX_MASK;
+}
+
 static inline int ehca_write_rwqe(struct ipz_queue *ipz_rqueue,
  struct ehca_wqe *wqe_p,
- struct ib_recv_wr *recv_wr)
+ struct ib_recv_wr *recv_wr,
+ u32 rq_map_idx)
 {
u8 cnt_ds;
if (unlikely((recv_wr-num_sge  0) ||
@@ -69,7 +85,7 @@ static inline int ehca_write_rwqe(struct
/* clear wqe header until sglist */
memset(wqe_p, 0, offsetof(struct ehca_wqe, u.ud_av.sg_list));
 
-   wqe_p-work_request_id = recv_wr-wr_id;
+   wqe_p-work_request_id = replace_wr_id(recv_wr-wr_id, rq_map_idx);
wqe_p-nr_of_data_seg = recv_wr-num_sge;
 
for (cnt_ds = 0; cnt_ds  recv_wr-num_sge; cnt_ds++) {
@@ -146,6 +162,7 @@ static inline int ehca_write_swqe(struct
u64 dma_length;
struct ehca_av *my_av;
u32 remote_qkey = send_wr-wr.ud.remote_qkey;
+   struct ehca_qmap_entry *qmap_entry = qp-sq_map.map[sq_map_idx];
 
if (unlikely((send_wr-num_sge  0) ||
 (send_wr-num_sge  qp-ipz_squeue.act_nr_of_sg))) {
@@ -158,11 +175,10 @@ static inline int ehca_write_swqe(struct
/* clear wqe header until sglist */
memset(wqe_p, 0, offsetof(struct ehca_wqe, u.ud_av.sg_list));
 
-   wqe_p-work_request_id = send_wr-wr_id  ~QMAP_IDX_MASK;
-   wqe_p-work_request_id |= sq_map_idx  QMAP_IDX_MASK;
+   wqe_p-work_request_id = replace_wr_id(send_wr-wr_id, sq_map_idx);
 
-   qp-sq_map[sq_map_idx].app_wr_id = send_wr-wr_id  QMAP_IDX_MASK;
-   qp-sq_map[sq_map_idx].reported = 0;
+   qmap_entry-app_wr_id = get_app_wr_id(send_wr-wr_id);
+   qmap_entry-reported = 0;
 
switch (send_wr-opcode) {
case IB_WR_SEND:
@@ -496,7 +512,9 @@ static int internal_post_recv(struct ehc
struct ehca_wqe *wqe_p;
int wqe_cnt = 0;
int ret = 0;
+   u32 rq_map_idx;
unsigned long flags;
+   struct ehca_qmap_entry *qmap_entry;
 
if (unlikely(!HAS_RQ(my_qp))) {
ehca_err(dev, QP has no RQ  ehca_qp=%p qp_num=%x ext_type=%d,
@@ -524,8 +542,15 @@ static int internal_post_recv(struct ehc

cache model of ppc

2008-08-20 Thread Felix Schmidt

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
Dear,

I some questions about the PowerPC cache model.

How many caches are there?
How many caches are in multi-core systems? do you have a L1 cache per
cpu and one shared L2? or is there a victim L3 cache?

can you tell me something about the path through the cache?
for example I look up for data in L1, but this was a miss. Then i look
up in L2 this was also a miss. but the date will be in another L1
cache of another cpu. Is there a 1.5 or 2.5 hit?

can you help me please?
this is for my bachelorthesis

thanks a lot

felix
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
iEYEARECAAYFAkisOIwACgkQmH8OAwYoDBk0ugCgu6zzJEQWA9NldC4ekjOcWgIu
cYIAoMquvbvAf/qA1WmjtROpq+R1X6Yb
=JepB
-END PGP SIGNATURE-

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 0/5] ib/ehca: Fix stability issues

2008-08-12 Thread Alexander Schmidt

Hi Roland,

the following patchset contains four small fixes and one bigger patch
(5/5) for addressing some ehca issues we found during cluster test.

[1/5] update qp_state on cached modify_qp()
[2/5] rename goto label in ehca_poll_cq_one()
[3/5] repoll on invalid opcode instead of returning success
[4/5] check idr_find() return value
[5/5] discard double CQE for one WR

They all apply on top of 2.6.27-rc1. If possible, we would like to get
them into 2.6.27.

Regards,
Alexander Schmidt
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 2/5] ib/ehca: rename goto label

2008-08-12 Thread Alexander Schmidt

Rename the poll_cq_one_read_cqe goto label to what it actually does,
repoll.

Signed-off-by: Alexander Schmidt [EMAIL PROTECTED]
---
 drivers/infiniband/hw/ehca/ehca_reqs.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -589,7 +589,7 @@ static inline int ehca_poll_cq_one(struc
struct ehca_qp *my_qp;
int cqe_count = 0, is_error;
 
-poll_cq_one_read_cqe:
+repoll:
cqe = (struct ehca_cqe *)
ipz_qeit_get_inc_valid(my_cq-ipz_queue);
if (!cqe) {
@@ -617,7 +617,7 @@ poll_cq_one_read_cqe:
ehca_dmp(cqe, 64, cq_num=%x qp_num=%x,
 my_cq-cq_number,
cqe-local_qp_number); /* ignore this purged cqe */
-   goto poll_cq_one_read_cqe;
+   goto repoll;
}
spin_lock_irqsave(qp-spinlock_s, flags);
purgeflag = qp-sqerr_purgeflag;
@@ -636,7 +636,7 @@ poll_cq_one_read_cqe:
 * that caused sqe and turn off purge flag
 */
qp-sqerr_purgeflag = 0;
-   goto poll_cq_one_read_cqe;
+   goto repoll;
}
}
 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 5/5] ib/ehca: discard double CQE for one WR

2008-08-12 Thread Alexander Schmidt

Under rare circumstances, the ehca hardware might erroneously generate
two CQEs for the same WQE, which is not compliant to the IB spec and
will cause unpredictable errors like memory being freed twice. To avoid
this problem, the driver needs to detect the second CQE and discard it.

For this purpose, introduce an array holding as many elements as the SQ
of the QP, called sq_map. Each sq_map entry stores a reported flag
for one WQE in the SQ. When a work request is posted to the SQ, the
respective reported flag is set to zero. After the arrival of a CQE,
the flag is set to 1, which allows to detect the occurence of a second
CQE.

The mapping between WQE / CQE and the corresponding sq_map element is
implemented by replacing the lowest 16 Bits of the wr_id with the index
in the queue map. The original 16 Bits are stored in the sq_map entry
and are restored when the CQE is passed to the application.

Signed-off-by: Alexander Schmidt [EMAIL PROTECTED]
---
 drivers/infiniband/hw/ehca/ehca_classes.h |9 +
 drivers/infiniband/hw/ehca/ehca_qes.h |1 
 drivers/infiniband/hw/ehca/ehca_qp.c  |   34 +-
 drivers/infiniband/hw/ehca/ehca_reqs.c|   54
+++--- 4 files changed, 78 insertions(+), 20
deletions(-)

--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -139,6 +139,7 @@ static void trace_send_wr_ud(const struc
 static inline int ehca_write_swqe(struct ehca_qp *qp,
  struct ehca_wqe *wqe_p,
  const struct ib_send_wr *send_wr,
+ u32 sq_map_idx,
  int hidden)
 {
u32 idx;
@@ -157,7 +158,11 @@ static inline int ehca_write_swqe(struct
/* clear wqe header until sglist */
memset(wqe_p, 0, offsetof(struct ehca_wqe, u.ud_av.sg_list));
 
-   wqe_p-work_request_id = send_wr-wr_id;
+   wqe_p-work_request_id = send_wr-wr_id  ~QMAP_IDX_MASK;
+   wqe_p-work_request_id |= sq_map_idx  QMAP_IDX_MASK;
+
+   qp-sq_map[sq_map_idx].app_wr_id = send_wr-wr_id 
QMAP_IDX_MASK;
+   qp-sq_map[sq_map_idx].reported = 0;
 
switch (send_wr-opcode) {
case IB_WR_SEND:
@@ -381,6 +386,7 @@ static inline int post_one_send(struct e
 {
struct ehca_wqe *wqe_p;
int ret;
+   u32 sq_map_idx;
u64 start_offset = my_qp-ipz_squeue.current_q_offset;
 
/* get pointer next to free WQE */
@@ -393,8 +399,15 @@ static inline int post_one_send(struct e
 qp_num=%x, my_qp-ib_qp.qp_num);
return -ENOMEM;
}
+
+   /*
+* Get the index of the WQE in the send queue. The same index
is used
+* for writing into the sq_map.
+*/
+   sq_map_idx = start_offset / my_qp-ipz_squeue.qe_size;
+
/* write a SEND WQE into the QUEUE */
-   ret = ehca_write_swqe(my_qp, wqe_p, cur_send_wr, hidden);
+   ret = ehca_write_swqe(my_qp, wqe_p, cur_send_wr, sq_map_idx,
hidden); /*
 * if something failed,
 * reset the free entry pointer to the start value
@@ -654,8 +667,34 @@ repoll:
 my_cq, my_cq-cq_number);
}
 
-   /* we got a completion! */
-   wc-wr_id = cqe-work_request_id;
+   read_lock(ehca_qp_idr_lock);
+   my_qp = idr_find(ehca_qp_idr, cqe-qp_token);
+   read_unlock(ehca_qp_idr_lock);
+   if (!my_qp)
+   goto repoll;
+   wc-qp = my_qp-ib_qp;
+
+   if (!(cqe-w_completion_flags  WC_SEND_RECEIVE_BIT)) {
+   struct ehca_qmap_entry *qmap_entry;
+   /*
+* We got a send completion and need to restore the
original
+* wr_id.
+*/
+   qmap_entry = my_qp-sq_map[cqe-work_request_id 
+   QMAP_IDX_MASK];
+
+   if (qmap_entry-reported) {
+   ehca_warn(cq-device, Double cqe on
qp_num=%#x,
+ my_qp-real_qp_num);
+   /* found a double cqe, discard it and read
next one */
+   goto repoll;
+   }
+   wc-wr_id = cqe-work_request_id  ~QMAP_IDX_MASK;
+   wc-wr_id |= qmap_entry-app_wr_id;
+   qmap_entry-reported = 1;
+   } else
+   /* We got a receive completion. */
+   wc-wr_id = cqe-work_request_id;
 
/* eval ib_wc_opcode */
wc-opcode = ib_wc_opcode[cqe-optype]-1;
@@ -678,13 +717,6 @@ repoll:
} else
wc-status = IB_WC_SUCCESS;
 
-   read_lock(ehca_qp_idr_lock);
-   my_qp = idr_find(ehca_qp_idr, cqe-qp_token);
-   read_unlock(ehca_qp_idr_lock);
-   if (!my_qp)
-   goto repoll;
-   wc-qp = my_qp-ib_qp;
-
wc-byte_len = cqe-nr_bytes_transferred;
wc-pkey_index = cqe-pkey_index;
wc-slid = cqe-rlid

[PATCH 3/5] ib/ehca: repoll on invalid opcode

2008-08-12 Thread Alexander Schmidt

When the ehca driver detects an invalid opcode in a CQE, it currently
passes the CQE to the application and returns with success. This patch
changes the CQE handling to discard CQEs with invalid opcodes and to
continue reading the next CQE from the CQ.

Signed-off-by: Alexander Schmidt [EMAIL PROTECTED]
---
 drivers/infiniband/hw/ehca/ehca_reqs.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -667,7 +667,7 @@ repoll:
ehca_dmp(cqe, 64, ehca_cq=%p cq_num=%x,
 my_cq, my_cq-cq_number);
/* update also queue adder to throw away this entry!!!
*/
-   goto poll_cq_one_exit0;
+   goto repoll;
}
 
/* eval ib_wc_status */
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 4/5] ib/ehca: check idr_find() return value

2008-08-12 Thread Alexander Schmidt

The idr_find() function may fail when trying to get the QP that is
associated with a CQE, e.g. when a QP has been destroyed between the
generation of a CQE and the poll request for it. In consequence, the
return value of idr_find() must be checked and the CQE must be
discarded when the QP cannot be found.

Signed-off-by: Alexander Schmidt [EMAIL PROTECTED]
---
 drivers/infiniband/hw/ehca/ehca_reqs.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -680,8 +680,10 @@ repoll:
 
read_lock(ehca_qp_idr_lock);
my_qp = idr_find(ehca_qp_idr, cqe-qp_token);
-   wc-qp = my_qp-ib_qp;
read_unlock(ehca_qp_idr_lock);
+   if (!my_qp)
+   goto repoll;
+   wc-qp = my_qp-ib_qp;
 
wc-byte_len = cqe-nr_bytes_transferred;
wc-pkey_index = cqe-pkey_index;
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 1/5 try2] ib/ehca: update qp_state on cached modify_qp()

2008-08-12 Thread Alexander Schmidt

Since the introduction of the port auto-detect mode for ehca, calls to
modify_qp() may be cached in the device driver when the ports are not
activated yet. When a modify_qp() call is cached, the qp state remains
untouched until the port is activated, which will leave the qp in the reset
state. In the reset state, however, it is not allowed to post SQ WQEs,
which confuses applications like ib_mad.

The solution for this problem is to immediately set the qp state as requested
by modify_qp(), even when the call is cached.


Signed-off-by: Alexander Schmidt [EMAIL PROTECTED]
---
 drivers/infiniband/hw/ehca/ehca_qp.c |   14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_qp.c
+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_qp.c
@@ -1534,8 +1534,6 @@ static int internal_modify_qp(struct ib_
if (attr_mask  IB_QP_QKEY)
my_qp-qkey = attr-qkey;
 
-   my_qp-state = qp_new_state;
-
 modify_qp_exit2:
if (squeue_locked) { /* this means: sqe - rts */
spin_unlock_irqrestore(my_qp-spinlock_s, flags);
@@ -1551,6 +1549,8 @@ modify_qp_exit1:
 int ehca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask,
   struct ib_udata *udata)
 {
+   int ret = 0;
+
struct ehca_shca *shca = container_of(ibqp-device, struct ehca_shca,
  ib_device);
struct ehca_qp *my_qp = container_of(ibqp, struct ehca_qp, ib_qp);
@@ -1597,12 +1597,18 @@ int ehca_modify_qp(struct ib_qp *ibqp, s
 attr-qp_state, my_qp-init_attr.port_num,
 ibqp-qp_type);
spin_unlock_irqrestore(sport-mod_sqp_lock, flags);
-   return 0;
+   goto out;
}
spin_unlock_irqrestore(sport-mod_sqp_lock, flags);
}
 
-   return internal_modify_qp(ibqp, attr, attr_mask, 0);
+   ret = internal_modify_qp(ibqp, attr, attr_mask, 0);
+
+out:
+   if ((ret == 0)  (attr_mask  IB_QP_STATE))
+   my_qp-state = attr-qp_state;
+
+   return ret;
 }
 
 void ehca_recover_sqp(struct ib_qp *sqp)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 0/5 try2] ib/ehca: Fix stability issues

2008-08-12 Thread Alexander Schmidt

Hi Roland,

Sorry, the first set was mangled because of a broken mailer, so here it is
again, double checked...

the following patchset contains four small fixes and one bigger patch
(5/5) for addressing some ehca issues we found during cluster test.

[1/5] update qp_state on cached modify_qp()
[2/5] rename goto label in ehca_poll_cq_one()
[3/5] repoll on invalid opcode instead of returning success
[4/5] check idr_find() return value
[5/5] discard double CQE for one WR

They all apply on top of 2.6.27-rc1. If possible, we would like to get
them into 2.6.27.

Regards,
Alexander Schmidt
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 4/5 try2] ib/ehca: check idr_find() return value

2008-08-12 Thread Alexander Schmidt

The idr_find() function may fail when trying to get the QP that is associated
with a CQE, e.g. when a QP has been destroyed between the generation of a CQE
and the poll request for it. In consequence, the return value of idr_find()
must be checked and the CQE must be discarded when the QP cannot be found.

Signed-off-by: Alexander Schmidt [EMAIL PROTECTED]
---
 drivers/infiniband/hw/ehca/ehca_reqs.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -680,8 +680,10 @@ repoll:
 
read_lock(ehca_qp_idr_lock);
my_qp = idr_find(ehca_qp_idr, cqe-qp_token);
-   wc-qp = my_qp-ib_qp;
read_unlock(ehca_qp_idr_lock);
+   if (!my_qp)
+   goto repoll;
+   wc-qp = my_qp-ib_qp;
 
wc-byte_len = cqe-nr_bytes_transferred;
wc-pkey_index = cqe-pkey_index;
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 3/5 try2] ib/ehca: repoll on invalid opcode

2008-08-12 Thread Alexander Schmidt

When the ehca driver detects an invalid opcode in a CQE, it currently passes
the CQE to the application and returns with success. This patch changes the
CQE handling to discard CQEs with invalid opcodes and to continue reading the
next CQE from the CQ.

Signed-off-by: Alexander Schmidt [EMAIL PROTECTED]
---
 drivers/infiniband/hw/ehca/ehca_reqs.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -667,7 +667,7 @@ repoll:
ehca_dmp(cqe, 64, ehca_cq=%p cq_num=%x,
 my_cq, my_cq-cq_number);
/* update also queue adder to throw away this entry!!! */
-   goto poll_cq_one_exit0;
+   goto repoll;
}
 
/* eval ib_wc_status */
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 5/5 try2] ib/ehca: discard double CQE for one WR

2008-08-12 Thread Alexander Schmidt

Under rare circumstances, the ehca hardware might erroneously generate
two CQEs for the same WQE, which is not compliant to the IB spec and will
cause unpredictable errors like memory being freed twice. To avoid this
problem, the driver needs to detect the second CQE and discard it.

For this purpose, introduce an array holding as many elements as the SQ of
the QP, called sq_map. Each sq_map entry stores a reported flag for one
WQE in the SQ. When a work request is posted to the SQ, the respective
reported flag is set to zero. After the arrival of a CQE, the flag is set
to 1, which allows to detect the occurence of a second CQE.

The mapping between WQE / CQE and the corresponding sq_map element is
implemented by replacing the lowest 16 Bits of the wr_id with the index in
the queue map. The original 16 Bits are stored in the sq_map entry and are
restored when the CQE is passed to the application.

Signed-off-by: Alexander Schmidt [EMAIL PROTECTED]
---
 drivers/infiniband/hw/ehca/ehca_classes.h |9 +
 drivers/infiniband/hw/ehca/ehca_qes.h |1 
 drivers/infiniband/hw/ehca/ehca_qp.c  |   34 +-
 drivers/infiniband/hw/ehca/ehca_reqs.c|   54 +++---
 4 files changed, 78 insertions(+), 20 deletions(-)

--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -139,6 +139,7 @@ static void trace_send_wr_ud(const struc
 static inline int ehca_write_swqe(struct ehca_qp *qp,
  struct ehca_wqe *wqe_p,
  const struct ib_send_wr *send_wr,
+ u32 sq_map_idx,
  int hidden)
 {
u32 idx;
@@ -157,7 +158,11 @@ static inline int ehca_write_swqe(struct
/* clear wqe header until sglist */
memset(wqe_p, 0, offsetof(struct ehca_wqe, u.ud_av.sg_list));
 
-   wqe_p-work_request_id = send_wr-wr_id;
+   wqe_p-work_request_id = send_wr-wr_id  ~QMAP_IDX_MASK;
+   wqe_p-work_request_id |= sq_map_idx  QMAP_IDX_MASK;
+
+   qp-sq_map[sq_map_idx].app_wr_id = send_wr-wr_id  QMAP_IDX_MASK;
+   qp-sq_map[sq_map_idx].reported = 0;
 
switch (send_wr-opcode) {
case IB_WR_SEND:
@@ -381,6 +386,7 @@ static inline int post_one_send(struct e
 {
struct ehca_wqe *wqe_p;
int ret;
+   u32 sq_map_idx;
u64 start_offset = my_qp-ipz_squeue.current_q_offset;
 
/* get pointer next to free WQE */
@@ -393,8 +399,15 @@ static inline int post_one_send(struct e
 qp_num=%x, my_qp-ib_qp.qp_num);
return -ENOMEM;
}
+
+   /*
+* Get the index of the WQE in the send queue. The same index is used
+* for writing into the sq_map.
+*/
+   sq_map_idx = start_offset / my_qp-ipz_squeue.qe_size;
+
/* write a SEND WQE into the QUEUE */
-   ret = ehca_write_swqe(my_qp, wqe_p, cur_send_wr, hidden);
+   ret = ehca_write_swqe(my_qp, wqe_p, cur_send_wr, sq_map_idx, hidden);
/*
 * if something failed,
 * reset the free entry pointer to the start value
@@ -654,8 +667,34 @@ repoll:
 my_cq, my_cq-cq_number);
}
 
-   /* we got a completion! */
-   wc-wr_id = cqe-work_request_id;
+   read_lock(ehca_qp_idr_lock);
+   my_qp = idr_find(ehca_qp_idr, cqe-qp_token);
+   read_unlock(ehca_qp_idr_lock);
+   if (!my_qp)
+   goto repoll;
+   wc-qp = my_qp-ib_qp;
+
+   if (!(cqe-w_completion_flags  WC_SEND_RECEIVE_BIT)) {
+   struct ehca_qmap_entry *qmap_entry;
+   /*
+* We got a send completion and need to restore the original
+* wr_id.
+*/
+   qmap_entry = my_qp-sq_map[cqe-work_request_id 
+   QMAP_IDX_MASK];
+
+   if (qmap_entry-reported) {
+   ehca_warn(cq-device, Double cqe on qp_num=%#x,
+ my_qp-real_qp_num);
+   /* found a double cqe, discard it and read next one */
+   goto repoll;
+   }
+   wc-wr_id = cqe-work_request_id  ~QMAP_IDX_MASK;
+   wc-wr_id |= qmap_entry-app_wr_id;
+   qmap_entry-reported = 1;
+   } else
+   /* We got a receive completion. */
+   wc-wr_id = cqe-work_request_id;
 
/* eval ib_wc_opcode */
wc-opcode = ib_wc_opcode[cqe-optype]-1;
@@ -678,13 +717,6 @@ repoll:
} else
wc-status = IB_WC_SUCCESS;
 
-   read_lock(ehca_qp_idr_lock);
-   my_qp = idr_find(ehca_qp_idr, cqe-qp_token);
-   read_unlock(ehca_qp_idr_lock);
-   if (!my_qp)
-   goto repoll;
-   wc-qp = my_qp-ib_qp;
-
wc-byte_len = cqe-nr_bytes_transferred;
wc-pkey_index = cqe-pkey_index;
wc-slid

[PATCH 2/5 try2] ib/ehca: rename goto label

2008-08-12 Thread Alexander Schmidt

Rename the poll_cq_one_read_cqe goto label to what it actually does,
repoll.

Signed-off-by: Alexander Schmidt [EMAIL PROTECTED]
---
 drivers/infiniband/hw/ehca/ehca_reqs.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ infiniband.git/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -589,7 +589,7 @@ static inline int ehca_poll_cq_one(struc
struct ehca_qp *my_qp;
int cqe_count = 0, is_error;
 
-poll_cq_one_read_cqe:
+repoll:
cqe = (struct ehca_cqe *)
ipz_qeit_get_inc_valid(my_cq-ipz_queue);
if (!cqe) {
@@ -617,7 +617,7 @@ poll_cq_one_read_cqe:
ehca_dmp(cqe, 64, cq_num=%x qp_num=%x,
 my_cq-cq_number, cqe-local_qp_number);
/* ignore this purged cqe */
-   goto poll_cq_one_read_cqe;
+   goto repoll;
}
spin_lock_irqsave(qp-spinlock_s, flags);
purgeflag = qp-sqerr_purgeflag;
@@ -636,7 +636,7 @@ poll_cq_one_read_cqe:
 * that caused sqe and turn off purge flag
 */
qp-sqerr_purgeflag = 0;
-   goto poll_cq_one_read_cqe;
+   goto repoll;
}
}
 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH v3] update xmon slb code.

2007-12-06 Thread Will Schmidt

[powerpc] update xmon slb code

This adds a bit more detail to the xmon SLB output.  When the valid bit is
set, This displays the ESID and VSID values, as well as decoding the
segment size, (1T or 256M) and displaying the LLP bits.  This supresses the
output for any slb entries that contain only zeros.

sample output from power6 (1T segment support):
00 c800 40004f7ca3000500  1T  ESID=   c0  VSID=   4f7ca3 
LLP:100
01 d800 4000eb71b400  1T  ESID=   d0  VSID=   eb71b0 
LLP:  0
08 1800 c8499f8ccc80 256M ESID=1  VSID=c8499f8cc 
LLP:  0
09 f800 d2c1a8e46c80 256M ESID=f  VSID=d2c1a8e46 
LLP:  0
10 4800 ca87eab1dc80 256M ESID=4  VSID=ca87eab1d 
LLP:  0
43 cf000800 400011b26500  1T  ESID=   cf  VSID=   11b260 
LLP:100

sample output from power5 (notice the non-valid but non-zero entries)
10 0800 4fd0e077ac80 256M ESID=0  VSID=4fd0e077a 
LLP:  0
11 f800 5b085830fc80 256M ESID=f  VSID=5b085830f 
LLP:  0
12 4800 52ce99fe6c80 256M ESID=4  VSID=52ce99fe6 
LLP:  0
13 1800 50904ed95c80 256M ESID=1  VSID=50904ed95 
LLP:  0
14 cf000800 d59aca40f500 256M ESID=cf000  VSID=d59aca40f 
LLP:100
15 c0007800 45cb97751500 256M ESID=c0007  VSID=45cb97751 
LLP:100

Tested on power5 and power6.

Signed-Off-By: Will Schmidt [EMAIL PROTECTED]

---
This is a resend..  this latest respin is updated to apply on top of Mikeys 
slb_mmu_size change.

(earlier Updates made per comments from Olof and Ben and Paul).
This version adds padding around the ESID and VSID fields, and the LLP bits
are displayed too.
Counting bits, the VSID output looks to be as large as 51 bits, which requires
up to 13 spaces.  This doesnt count the B field bits which are now masked off
the top end of the VSID output.

I'll try to follow up sometime later with code that will handle decoding page
sizes.  I dont have a testcase handy to properly exercise that yet. :-)
---

 arch/powerpc/xmon/xmon.c |   29 +++--
 1 files changed, 23 insertions(+), 6 deletions(-)


diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 121b04d..5314db7 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2539,16 +2539,33 @@ static void xmon_print_symbol(unsigned long address, 
const char *mid,
 static void dump_slb(void)
 {
int i;
-   unsigned long tmp;
+   unsigned long esid,vsid,valid;
+   unsigned long llp;
 
printf(SLB contents of cpu %x\n, smp_processor_id());
 
for (i = 0; i  mmu_slb_size; i++) {
-   asm volatile(slbmfee  %0,%1 : =r (tmp) : r (i));
-   printf(%02d %016lx , i, tmp);
-
-   asm volatile(slbmfev  %0,%1 : =r (tmp) : r (i));
-   printf(%016lx\n, tmp);
+   asm volatile(slbmfee  %0,%1 : =r (esid) : r (i));
+   asm volatile(slbmfev  %0,%1 : =r (vsid) : r (i));
+   valid = (esid  SLB_ESID_V);
+   if (valid | esid | vsid) {
+   printf(%02d %016lx %016lx, i, esid, vsid);
+   if (valid) {
+   llp = vsid  SLB_VSID_LLP;
+   if (vsid  SLB_VSID_B_1T) {
+   printf(  1T  ESID=%9lx  VSID=%13lx 
LLP:%3lx \n,
+   GET_ESID_1T(esid),
+   (vsid  ~SLB_VSID_B)  
SLB_VSID_SHIFT_1T,
+   llp);
+   } else {
+   printf( 256M ESID=%9lx  VSID=%13lx 
LLP:%3lx \n,
+   GET_ESID(esid),
+   (vsid  ~SLB_VSID_B)  
SLB_VSID_SHIFT,
+   llp);
+   }
+   } else
+   printf(\n);
+   }
}
 }
 





___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] powerpc: fix os-term usage on kernel panic

2007-11-30 Thread Will Schmidt


On Fri, 2007-11-30 at 16:56 +1100, Stephen Rothwell wrote:
 On Tue, 27 Nov 2007 18:15:59 -0600 Will Schmidt [EMAIL PROTECTED] wrote:
 
  (resending with the proper from addr this time). 
  
  
  I'm seeing some funky behavior on power5/power6 partitions with this
  patch.A /sbin/reboot is now behaving much more like a
  /sbin/halt.
  
  Anybody else seeing this, or is it time for me to call an exorcist for
  my boxes? 
 
 On my Power5+ box, I get an error (code B200A101) logged every time I
 reboot and about half the time, the machine does not reboot but need to
 be power cycled. Removing the cited patch makes the error not by logged
 and reboots work fine.
 
 Paul and I have been having a look at this and Paul has asked the
 architects for clarification of when os-term should be used.

The architects will have the correct answer of course.   

From my reading of the papr, I've got the impression that there are two
possibilities.

First, the os-term never returns, and it's up to the service processor
to do whatever it's going to do.  (call home, dump, something else).
Nothing we can do there. 

Second, os-term returns, and it's up to the OS to call into power-off or
system-reboot.
I think this is more likely.   I havn't followed the path back from
machine_restart to see exactly how we got there, but probably means a
bit more logic to decide whether to call into rtas_restart() or
pSeries_power_off() after the call to machine_shutdown.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] powerpc: fix os-term usage on kernel panic

2007-11-29 Thread Will Schmidt


On Wed, 2007-11-28 at 14:18 -0600, Linas Vepstas wrote:
 On Tue, Nov 27, 2007 at 06:15:59PM -0600, Will Schmidt wrote:
  (resending with the proper from addr this time). 
  
  
  I'm seeing some funky behavior on power5/power6 partitions with this
  patch.A /sbin/reboot is now behaving much more like a
  /sbin/halt.
  
  Anybody else seeing this, or is it time for me to call an exorcist for
  my boxes? 
 
 I beleive the patch
 http://www.nabble.com/-PATCH--powerpc-pseries:-tell-phyp-to-auto-restart-t4847604.html
 
 will cure this problem.

It does not.   this code is getting called, but still turns the box into
a doorstop at /sbin/reboot. 

It does clear up if I apply this patch, which is a revert of part of
your earlier patch.

My js2X also turns into a doorstop after /sbin/reboot.. Though I'm
not going through a panic path, I wonder if the panic portion is OK and
this is what Olaf is hitting.  

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index fdeefe5..c9fac5a 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -508,7 +508,7 @@ define_machine(pseries) {
.power_off  = pSeries_power_off,
.halt   = rtas_halt,
.panic  = rtas_panic_msg,
-   .machine_shutdown   = rtas_os_term,
+/* .machine_shutdown   = rtas_os_term,*/
.get_boot_time  = rtas_get_boot_time,
.get_rtc_time   = rtas_get_rtc_time,
.set_rtc_time   = rtas_set_rtc_time,


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

pseries (power3) boot hang (pageblock_nr_pages==0)

2007-11-21 Thread Will Schmidt

Hi Folks, 

I've been seeing a boot hang/crash on power3 systems for a few weeks.
(hangs on a 270, drops to SP on a p610).   This afternoon I got around
to tracking it down to the changes in 

commit d9c2340052278d8eb2ffb16b0484f8f794def4de
Do not depend on MAX_ORDER when grouping pages by mobility

cpu 0x0: Vector: 100 (System Reset) at [c0006e803ae0]
pc: c009bf50: .setup_per_zone_pages_min+0x298/0x34c
lr: c009be38: .setup_per_zone_pages_min+0x180/0x34c
[c0006e803e20] c05e3898 .init_per_zone_pages_min+0x80/0xa0
[c0006e803ea0] c05c9c04 .kernel_init+0x214/0x3d8
[c0006e803f90] c0026cac .kernel_thread+0x4c/0x68

I narrowed it down to the for loop within setup_zone_migrate_reserve(),
called by setup_per_zone_pages_min().   The loop spins forever due to
pageblock_nr_pages being 0.

I imagine this would be properly fixed with something similar to the
change for iSeries.   Depending on how obvious, quick and easy it is for
the experts to come up with a proper fix,  I'll be able to do additional
debug and hacking after turkey-day.   :-)
For the moment, I've hacked it with the following patch.   (tested on
both the 270 and the p610):

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2454,6 +2454,9 @@ static void setup_zone_migrate_reserve(struct zone
*zone)
reserve = roundup(zone-pages_min, pageblock_nr_pages) 
pageblock_order;

+/* this is a cheap and dirty bailout, probally not a proper fix. */
+   if (pageblock_nr_pages==0) return;
+
for (pfn = start_pfn; pfn  end_pfn; pfn += pageblock_nr_pages)
{
if (!pfn_valid(pfn))
continue;




___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] [powerpc v3] update xmon slb code

2007-10-31 Thread Will Schmidt


[powerpc] update xmon slb code

This adds a bit more detail to the xmon SLB output.  When the valid bit is
set, This displays the ESID and VSID values, as well as decoding the
segment size, (1T or 256M) and displaying the LLP bits.  This supresses the
output for any slb entries that contain only zeros.

sample output from power6 (1T segment support):
00 c800 40004f7ca3000500  1T  ESID=   c0  VSID=   4f7ca3 
LLP:100
01 d800 4000eb71b400  1T  ESID=   d0  VSID=   eb71b0 
LLP:  0
08 1800 c8499f8ccc80 256M ESID=1  VSID=c8499f8cc 
LLP:  0
09 f800 d2c1a8e46c80 256M ESID=f  VSID=d2c1a8e46 
LLP:  0
10 4800 ca87eab1dc80 256M ESID=4  VSID=ca87eab1d 
LLP:  0
43 cf000800 400011b26500  1T  ESID=   cf  VSID=   11b260 
LLP:100

sample output from power5 (notice the non-valid but non-zero entries)
10 0800 4fd0e077ac80 256M ESID=0  VSID=4fd0e077a 
LLP:  0
11 f800 5b085830fc80 256M ESID=f  VSID=5b085830f 
LLP:  0
12 4800 52ce99fe6c80 256M ESID=4  VSID=52ce99fe6 
LLP:  0
13 1800 50904ed95c80 256M ESID=1  VSID=50904ed95 
LLP:  0
14 cf000800 d59aca40f500 256M ESID=cf000  VSID=d59aca40f 
LLP:100
15 c0007800 45cb97751500 256M ESID=c0007  VSID=45cb97751 
LLP:100

Tested on power5 and power6.

Signed-Off-By: Will Schmidt [EMAIL PROTECTED]

---
(Updates made per comments from Olof and Ben and Paul).
This version adds padding around the ESID and VSID fields, and the LLP bits
are displayed too.
Counting bits, the VSID output looks to be as large as 51 bits, which requires
up to 13 spaces.  This doesnt count the B field bits which are now masked off
the top end of the VSID output.

I'll try to follow up sometime later with code that will handle decoding page
sizes.  I dont have a testcase handy to properly exercise that yet. :-)
---

 arch/powerpc/xmon/xmon.c |   29 +++--
 1 files changed, 23 insertions(+), 6 deletions(-)


diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 121b04d..5314db7 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2527,16 +2527,33 @@ static void xmon_print_symbol(unsigned long address, 
const char *mid,
 static void dump_slb(void)
 {
int i;
-   unsigned long tmp;
+   unsigned long esid,vsid,valid;
+   unsigned long llp;
 
printf(SLB contents of cpu %x\n, smp_processor_id());
 
for (i = 0; i  SLB_NUM_ENTRIES; i++) {
-   asm volatile(slbmfee  %0,%1 : =r (tmp) : r (i));
-   printf(%02d %016lx , i, tmp);
-
-   asm volatile(slbmfev  %0,%1 : =r (tmp) : r (i));
-   printf(%016lx\n, tmp);
+   asm volatile(slbmfee  %0,%1 : =r (esid) : r (i));
+   asm volatile(slbmfev  %0,%1 : =r (vsid) : r (i));
+   valid = (esid  SLB_ESID_V);
+   if (valid | esid | vsid) {
+   printf(%02d %016lx %016lx, i, esid, vsid);
+   if (valid) {
+   llp = vsid  SLB_VSID_LLP;
+   if (vsid  SLB_VSID_B_1T) {
+   printf(  1T  ESID=%9lx  VSID=%13lx 
LLP:%3lx \n,
+   GET_ESID_1T(esid),
+   (vsid  ~SLB_VSID_B)  
SLB_VSID_SHIFT_1T,
+   llp);
+   } else {
+   printf( 256M ESID=%9lx  VSID=%13lx 
LLP:%3lx \n,
+   GET_ESID(esid),
+   (vsid  ~SLB_VSID_B)  
SLB_VSID_SHIFT,
+   llp);
+   }
+   } else
+   printf(\n);
+   }
}
 }
 


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] [powerpc v2] update xmon slb code

2007-10-30 Thread Will Schmidt


[powerpc] update xmon slb code

This adds a bit more detail to the xmon SLB output.  When the valid bit is
set, This displays the ESID and VSID values, as well as decoding the
segment size. (1T or 256M).  This supresses the output for any slb entries
that contain only zeros.

sample output from power6 (1T segment support):

00 c800 40004f7ca3000500  1T  ESID=   c0  VSID=40004f7ca3 LLP 
bits:100
01 d800 4000eb71b400  1T  ESID=   d0  VSID=4000eb71b0 LLP 
bits:  0
03 0800 628021c6ac80 256M ESID=0  VSID= 628021c6a LLP 
bits:  0
04 0f000800 400095c1e8000c80  1T  ESID=f  VSID=400095c1e8 LLP 
bits:  0
22 cf000800 400011b26500  1T  ESID=   cf  VSID=400011b260 LLP 
bits:100
62 04000800 40005d488d000c80  1T  ESID=4  VSID=40005d488d LLP 
bits:  0
63 1800 633f90285c80 256M ESID=1  VSID= 633f90285 LLP 
bits:  0

sample output from power5 (notice the non-valid but non-zero entries)

00 c800 408f92c94500 256M ESID=c  VSID= 408f92c94 LLP 
bits:100
01 d800 f09b89af5400 256M ESID=d  VSID= f09b89af5 LLP 
bits:  0
03 1000 136eafb0bc80
11 0800 5928811f2c80 256M ESID=0  VSID= 5928811f2 LLP 
bits:  0
12 f800 645ff8d87c80 256M ESID=f  VSID= 645ff8d87 LLP 
bits:  0
13 4800 5c263aa5ec80 256M ESID=4  VSID= 5c263aa5e LLP 
bits:  0
14 1800 59e7ef80dc80 256M ESID=1  VSID= 59e7ef80d LLP 
bits:  0
15 1000 59e7ef80dc80

Tested on power5 and power6.

Signed-Off-By: Will Schmidt [EMAIL PROTECTED]

---
This version adds padding around the ESID and VSID fields, and the LLP bits
are displayed too.  (Per request from Olof and Ben).
I'll try to follow up sometime later with code that will handle decoding page
sizes.  I dont have a testcase handy to properly exercise that yet. :-)
---

 arch/powerpc/xmon/xmon.c |   27 +--
 1 files changed, 21 insertions(+), 6 deletions(-)


diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 121b04d..93c26c3 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2527,16 +2527,31 @@ static void xmon_print_symbol(unsigned long address, 
const char *mid,
 static void dump_slb(void)
 {
int i;
-   unsigned long tmp;
+   unsigned long esid,vsid,valid;
+   unsigned long llp_bits;
 
printf(SLB contents of cpu %x\n, smp_processor_id());
 
for (i = 0; i  SLB_NUM_ENTRIES; i++) {
-   asm volatile(slbmfee  %0,%1 : =r (tmp) : r (i));
-   printf(%02d %016lx , i, tmp);
-
-   asm volatile(slbmfev  %0,%1 : =r (tmp) : r (i));
-   printf(%016lx\n, tmp);
+   asm volatile(slbmfee  %0,%1 : =r (esid) : r (i));
+   asm volatile(slbmfev  %0,%1 : =r (vsid) : r (i));
+   valid = (esid  SLB_ESID_V);
+   if (valid | esid | vsid) {
+   printf(%02d %016lx %016lx, i, esid, vsid);
+   if (valid) {
+   llp_bits = vsid  SLB_VSID_LLP;
+   if (vsid  SLB_VSID_B_1T) {
+   printf(  1T  ESID=%9lx  VSID=%10lx LLP 
bits:%3lx \n,
+   GET_ESID_1T(esid),vsid  
SLB_VSID_SHIFT_1T,
+   llp_bits);
+   } else {
+   printf( 256M ESID=%9lx  VSID=%10lx LLP 
bits:%3lx \n,
+   GET_ESID(esid),vsid  
SLB_VSID_SHIFT,
+   llp_bits);
+   }
+   } else
+   printf(\n);
+   }
}
 }
 


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] [Powerpc V2] fix switch_slb handling of 1T ESID values

2007-10-29 Thread Will Schmidt


[Powerpc V2] fix switch_slb handling of 1T ESID values

Now that we have 1TB segment size support, we need to be using the
GET_ESID_1T macro when comparing ESID values for pc,stack, and
unmapped_base within switch_slb().A new helper function called
esids_match() contains the logic for deciding when to call GET_ESID
and GET_ESID_1T.

This also happens to fix a duplicate-slb-entry inspired machine-check
exception I was seeing when trying to run java on a power6 partition.

Tested on power6 and power5.

Signed-Off-By:  Will Schmidt [EMAIL PROTECTED]
---

 arch/powerpc/mm/slb.c |   33 ++---
 1 files changed, 30 insertions(+), 3 deletions(-)


diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index bbd2c51..152f4cd 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -148,6 +148,34 @@ void slb_vmalloc_update(void)
slb_flush_and_rebolt();
 }
 
+/* Helper function to compare esids.  There are four cases to handle.
+ * 1. The system is not 1T segment size capable.  Use the GET_ESID compare.
+ * 2. The system is 1T capable, both addresses are  1T, use the GET_ESID 
compare.
+ * 3. The system is 1T capable, only one of the two addresses is  1T.  This 
is not a match.
+ * 4. The system is 1T capable, both addresses are  1T, use the GET_ESID_1T 
macro to compare.
+ */
+static inline int esids_match(unsigned long addr1, unsigned long addr2)
+{
+   int esid_1t_count;
+
+   /* System is not 1T segment size capable. */
+   if (!cpu_has_feature(CPU_FTR_1T_SEGMENT))
+   return (GET_ESID(addr1) == GET_ESID(addr2));
+
+   esid_1t_count = (((addr1SID_SHIFT_1T)!=0) + 
((addr2SID_SHIFT_1T)!=0));
+
+   /* both addresses are  1T */
+   if (esid_1t_count == 0)
+   return (GET_ESID(addr1) == GET_ESID(addr2));
+
+   /* One address  1T, the other  1T.  Not a match */
+   if (esid_1t_count == 1) 
+   return 0;
+
+   /* Both addresses are  1T. */
+   return (GET_ESID_1T(addr1) == GET_ESID_1T(addr2));
+}
+
 /* Flush all user entries from the segment table of the current processor. */
 void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
 {
@@ -193,15 +221,14 @@ void switch_slb(struct task_struct *tsk, struct mm_struct 
*mm)
return;
slb_allocate(pc);
 
-   if (GET_ESID(pc) == GET_ESID(stack))
+   if (esids_match(pc,stack))
return;
 
if (is_kernel_addr(stack))
return;
slb_allocate(stack);
 
-   if ((GET_ESID(pc) == GET_ESID(unmapped_base))
-   || (GET_ESID(stack) == GET_ESID(unmapped_base)))
+   if (esids_match(pc,unmapped_base) || esids_match(stack,unmapped_base))
return;
 
if (is_kernel_addr(unmapped_base))


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] [Powerpc] include udbg.h when using udbg_printf

2007-10-29 Thread Will Schmidt


[Powerpc] include udbg.h when using udbg_printf

this fixes the error
error: implicit declaration of function ‘udbg_printf’

We have a few spots where we reference udbg_printf() without #including
udbg.h.  These are within #ifdef DEBUG blocks, so unnoticed until we do
a #define DEBUG or #define DEBUG_LOW nearby.

Signed-Off-By: Will Schmidt [EMAIL PROTECTED]
---

 arch/powerpc/mm/hash_utils_64.c   |1 +
 arch/powerpc/mm/slb.c |1 +
 arch/powerpc/platforms/cell/smp.c |1 +
 arch/powerpc/platforms/pseries/firmware.c |1 +
 4 files changed, 4 insertions(+), 0 deletions(-)


diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index c78dc91..94b8ca0 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -51,6 +51,7 @@
 #include asm/cputable.h
 #include asm/sections.h
 #include asm/spu.h
+#include asm/udbg.h
 
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index bbd2c51..637afb2 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -25,6 +25,7 @@
 #include asm/smp.h
 #include asm/firmware.h
 #include linux/compiler.h
+#include asm/udbg.h
 
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)
diff --git a/arch/powerpc/platforms/cell/smp.c 
b/arch/powerpc/platforms/cell/smp.c
index 1c0acba..e443845 100644
--- a/arch/powerpc/platforms/cell/smp.c
+++ b/arch/powerpc/platforms/cell/smp.c
@@ -44,6 +44,7 @@
 #include asm/rtas.h
 
 #include interrupt.h
+#include asm/udbg.h
 
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)
diff --git a/arch/powerpc/platforms/pseries/firmware.c 
b/arch/powerpc/platforms/pseries/firmware.c
index 8b18a1c..b765b7c 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -25,6 +25,7 @@
 
 #include asm/firmware.h
 #include asm/prom.h
+#include asm/udbg.h
 
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] [powerpc] update xmon slb code

2007-10-29 Thread Will Schmidt


[powerpc] update xmon slb code

adds a bit more detail to the xmon SLB output.  When the valid bit is
set, This displays the ESID and VSID values, as well as decoding the
segment size. (1T or 256M).  This supresses the output for any slb entries
that contain only zeros.

I debated a bit on whether to check for just (valid) versus checking for
(valid|esid|vsid).  By inspection on power5, I do have SLB entries that
contain values but without the valid bit set, so opted to display any
non-zero values.

sample output from power6 (1T segment support):

00 c800 40004f7ca3000500  1T  ESID=c0  VSID=40004f7ca3
01 d800 4000eb71b400  1T  ESID=d0  VSID=4000eb71b0
24 cf000800 400011b26500  1T  ESID=cf  VSID=400011b260
25 04000800 4000a9e949000c80  1T  ESID=4  VSID=4000a9e949
26 1800 5e93bfd49c80 256M ESID=1  VSID=5e93bfd49
27 0f000800 4000e262a4000c80  1T  ESID=f  VSID=4000e262a4
28 0800 5dd45172ec80 256M ESID=0  VSID=5dd45172e

sample output from power5 (notice the non-valid but non-zero entries)

54 4800 cf33bb059c80 256M ESID=4  VSID=cf33bb059
55 1800 ccf56fe08c80 256M ESID=1  VSID=ccf56fe08
56 1000 dd82ce799c80
57 cf000800 d59aca40f500 256M ESID=cf000  VSID=d59aca40f
58 c0007800 45cb97751500 256M ESID=c0007  VSID=45cb97751
59 0400 61552db1bc80

Tested on power5 and power6.

Signed-Off-By: Will Schmidt [EMAIL PROTECTED]
---

 arch/powerpc/xmon/xmon.c |   20 ++--
 1 files changed, 14 insertions(+), 6 deletions(-)


diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 121b04d..97984f3 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2527,16 +2527,24 @@ static void xmon_print_symbol(unsigned long address, 
const char *mid,
 static void dump_slb(void)
 {
int i;
-   unsigned long tmp;
+   unsigned long esid,vsid,valid;
 
printf(SLB contents of cpu %x\n, smp_processor_id());
 
for (i = 0; i  SLB_NUM_ENTRIES; i++) {
-   asm volatile(slbmfee  %0,%1 : =r (tmp) : r (i));
-   printf(%02d %016lx , i, tmp);
-
-   asm volatile(slbmfev  %0,%1 : =r (tmp) : r (i));
-   printf(%016lx\n, tmp);
+   asm volatile(slbmfee  %0,%1 : =r (esid) : r (i));
+   asm volatile(slbmfev  %0,%1 : =r (vsid) : r (i));
+   valid = (esid  SLB_ESID_V);
+   if (valid | esid | vsid) {
+   printf(%02d %016lx %016lx, i, esid, vsid);
+   if (valid) {
+   if (vsid  SLB_VSID_B_1T)
+   printf(  1T  ESID=%lx  VSID=%lx \n, 
GET_ESID_1T(esid),vsid SLB_VSID_SHIFT_1T);
+   else 
+   printf( 256M ESID=%lx  VSID=%lx \n, 
GET_ESID(esid),vsid SLB_VSID_SHIFT);
+   } else 
+   printf(\n);
+   }
}
 }
 


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] [Powerpc] fix switch_slb handling of 1T ESID values

2007-10-27 Thread Will Schmidt

On Sat, 2007-10-27 at 14:19 +1000, Benjamin Herrenschmidt wrote:
 On Fri, 2007-10-26 at 15:46 -0500, Will Schmidt wrote:
  [Powerpc] fix switch_slb handling of 1T ESID values
  
  Now that we have 1TB segment size support, we need to be using the
  GET_ESID_1T macro when comparing ESID values for pc,stack, and
  unmapped_base within switch_slb() when we're on a CPU that supports it.
  
  This also happens to fix a duplicate-slb-entry inspired machine-check
  exception I was seeing when trying to run java on a power6 partition.
  
  Tested on power6 and power5.
  
  Signed-Off-By:  Will Schmidt [EMAIL PROTECTED]
 
 Good catch !
 
 A minor comment is maybe you could factor out the code better doing
 something like a ESID_COMPARE() macro ?

Yeah, thats a good idea.   I'll spin up a new patch in the next day or
so.   
It occurred to me that I should continue to use GET_ESID when the user
address is  1T too. 

 
  ---
  
  There is a similar bit of code in stab.c switch_stab().  Should this change 
  also be made there?
  ---
 
 There is no machine that does stab and 1T segments.

Ok, thanks for the clarification. 

-Will

 
 Ben.
 
  
   arch/powerpc/mm/slb.c |   19 ++-
   1 files changed, 14 insertions(+), 5 deletions(-)
  
  
  diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
  index bbd2c51..0c527d7 100644
  --- a/arch/powerpc/mm/slb.c
  +++ b/arch/powerpc/mm/slb.c
  @@ -193,16 +193,25 @@ void switch_slb(struct task_struct *tsk, struct 
  mm_struct *mm)
  return;
  slb_allocate(pc);
   
  -   if (GET_ESID(pc) == GET_ESID(stack))
  -   return;
  +   if (cpu_has_feature(CPU_FTR_1T_SEGMENT)) {
  +   if (GET_ESID_1T(pc) == GET_ESID_1T(stack))
  +   return;
  +   } else
  +   if (GET_ESID(pc) == GET_ESID(stack))
  +   return;
   
  if (is_kernel_addr(stack))
  return;
  slb_allocate(stack);
   
  -   if ((GET_ESID(pc) == GET_ESID(unmapped_base))
  -   || (GET_ESID(stack) == GET_ESID(unmapped_base)))
  -   return;
  +   if (cpu_has_feature(CPU_FTR_1T_SEGMENT)) {
  +   if ((GET_ESID_1T(pc) == GET_ESID_1T(unmapped_base))
  +   || (GET_ESID_1T(stack) == GET_ESID_1T(unmapped_base)))
  +   return;
  +   } else
  +   if ((GET_ESID(pc) == GET_ESID(unmapped_base))
  +   || (GET_ESID(stack) == GET_ESID(unmapped_base)))
  +   return;
   
  if (is_kernel_addr(unmapped_base))
  return;
  
  
  ___
  Linuxppc-dev mailing list
  Linuxppc-dev@ozlabs.org
  https://ozlabs.org/mailman/listinfo/linuxppc-dev
 

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] [Powerpc] fix switch_slb handling of 1T ESID values

2007-10-26 Thread Will Schmidt


[Powerpc] fix switch_slb handling of 1T ESID values

Now that we have 1TB segment size support, we need to be using the
GET_ESID_1T macro when comparing ESID values for pc,stack, and
unmapped_base within switch_slb() when we're on a CPU that supports it.

This also happens to fix a duplicate-slb-entry inspired machine-check
exception I was seeing when trying to run java on a power6 partition.

Tested on power6 and power5.

Signed-Off-By:  Will Schmidt [EMAIL PROTECTED]

---

There is a similar bit of code in stab.c switch_stab().  Should this change 
also be made there?
---

 arch/powerpc/mm/slb.c |   19 ++-
 1 files changed, 14 insertions(+), 5 deletions(-)


diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index bbd2c51..0c527d7 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -193,16 +193,25 @@ void switch_slb(struct task_struct *tsk, struct mm_struct 
*mm)
return;
slb_allocate(pc);
 
-   if (GET_ESID(pc) == GET_ESID(stack))
-   return;
+   if (cpu_has_feature(CPU_FTR_1T_SEGMENT)) {
+   if (GET_ESID_1T(pc) == GET_ESID_1T(stack))
+   return;
+   } else
+   if (GET_ESID(pc) == GET_ESID(stack))
+   return;
 
if (is_kernel_addr(stack))
return;
slb_allocate(stack);
 
-   if ((GET_ESID(pc) == GET_ESID(unmapped_base))
-   || (GET_ESID(stack) == GET_ESID(unmapped_base)))
-   return;
+   if (cpu_has_feature(CPU_FTR_1T_SEGMENT)) {
+   if ((GET_ESID_1T(pc) == GET_ESID_1T(unmapped_base))
+   || (GET_ESID_1T(stack) == GET_ESID_1T(unmapped_base)))
+   return;
+   } else
+   if ((GET_ESID(pc) == GET_ESID(unmapped_base))
+   || (GET_ESID(stack) == GET_ESID(unmapped_base)))
+   return;
 
if (is_kernel_addr(unmapped_base))
return;


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [bug] block subsystem related crash on Legacy iSeries viodasd.c

2007-10-22 Thread Will Schmidt

On Sun, 2007-10-21 at 14:44 +0200, Jens Axboe wrote:
 On Fri, Oct 19 2007, Will Schmidt wrote:
  Hi Jens, Stephen, and Everyone else.
...
 You need this, will remember to fix that up for the new branch as well.
 
 diff --git a/drivers/block/viodasd.c b/drivers/block/viodasd.c
 index e824b67..2ce3622 100644
 --- a/drivers/block/viodasd.c
 +++ b/drivers/block/viodasd.c
 @@ -270,6 +270,7 @@ static int send_request(struct request *req)
  d = req-rq_disk-private_data;
 
   /* Now build the scatter-gather list */
 + memset(sg, 0, sizeof(sg));
   nsg = blk_rq_map_sg(req-q, req, sg);
   nsg = dma_map_sg(d-dev, sg, nsg, direction);
 

That appears to do the trick. Thanks!  

Tested-By: Will Schmidt [EMAIL PROTECTED]



___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] Use 1TB segments

2007-10-03 Thread Will Schmidt

On Wed, 2007-10-03 at 13:13 +1000, Paul Mackerras wrote:
 Will Schmidt writes:
 
  I still need to test this code for performance issues, and this version
  could still use some cosmetic touchups, so I dont think we want this to
  go into a tree yet.  I am reposting this primarily to indicate the prior
  version isnt quite right, and so Jon can rebase to this version.  :-)
 
 The way we scan the ibm,processor-segment-sizes property could be
 nicer.  Where there any other cosmetic touchups you were thinking of,
 and if so what were they?  I didn't notice any leftover debugging
 printks or anything else that obviously needed cleaning up.

Correct..  nothing in the patch really *needs* to be cleaned up.  This
is mostly me being way more nit-picky than I need to be.  :-)   I don't
have any real issues with the patch (being candidate for) going into a
tree. 

The only obvious is the MMU_SEGSIZE_* #define's in mmu-hash64.h appear
to be duplicated.

The rest I can follow up on later, none of it affects the code outside
of #ifdef DEBUG's and it should be a separate patch anyway.   

Thanks,
-Will

 Paul.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] Use 1TB segments

2007-08-02 Thread Will Schmidt

Hi Paul, 
   just a few questions.  

On Wed, 2007-08-01 at 12:04 +1000, Paul Mackerras wrote:
 This makes the kernel use 1TB segments for all kernel mappings and for
 user addresses of 1TB and above, on machines which support them
 (currently POWER5+ and POWER6).  We don't currently use 1TB segments
 for user addresses  1T, since that would effectively prevent 32-bit
 processes from using huge pages unless we also had a way to revert to
 using 256MB segments.

I think I have a question about user address  1T..  once I think on
it a bit more it'll either click, or I'll have a question
articulated. :-)


 -static inline void __tlbiel(unsigned long va, unsigned int psize)
 +static inline void __tlbiel(unsigned long va, int psize, int ssize)

 -static inline void tlbie(unsigned long va, int psize, int local)
 +static inline void tlbie(unsigned long va, int psize, int ssize, int local)

  static long native_hpte_insert(unsigned long hpte_group, unsigned long va,
   unsigned long pa, unsigned long rflags,
 - unsigned long vflags, int psize)
 + unsigned long vflags, int psize, int ssize)

  static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
 -  unsigned long va, int psize, int local)
 +  unsigned long va, int psize, int ssize,
 +  int local)

Is there technical reason why the 'local' variable remains at the end of
the parm list for these?   In other cases 'ssize' simply gets added to
the end of the parm list. 

 +static int __init htab_dt_scan_seg_sizes(unsigned long node,
 +  const char *uname, int depth,
 +  void *data)
 +{
 + char *type = of_get_flat_dt_prop(node, device_type, NULL);
 + u32 *prop;
 + unsigned long size = 0;
 +
 + /* We are scanning cpu nodes only */
 + if (type == NULL || strcmp(type, cpu) != 0)
 + return 0;
 +
 + prop = (u32 *)of_get_flat_dt_prop(node, ibm,processor-segment-sizes,
 +   size);
 + if (prop != NULL  size = 8) {
 + if (prop[0] == 0x1c  prop[1] == 0x28) {

This is 0x1c indicating 2^28 for 256M; and 0x28 indicating 2^40 for 1TB
segments.

Will there ever be a segment size between the two?  Or will the
representation every vary from this?  i.e. wondering if prop[0] will
always be for 256M and prop[1] for 1TB.  

 +#define slb_vsid_shift(ssize)\
 + ((ssize) == MMU_SEGSIZE_256M? SLB_VSID_SHIFT: SLB_VSID_SHIFT_1T)

 @@ -100,12 +106,13 @@ void slb_flush_and_rebolt(void)
   vflags = SLB_VSID_KERNEL | vmalloc_llp;
 
   ksp_esid_data = mk_esid_data(get_paca()-kstack, 2);
 - if ((ksp_esid_data  ESID_MASK) == PAGE_OFFSET)
 + mask = (mmu_kernel_ssize == MMU_SEGSIZE_256M)? ESID_MASK: ESID_MASK_1T;

Is this one worthy of a #define like the slb_vsid_shift() above? 

 +#define VSID_MULTIPLIER_256M ASM_CONST(200730139)/* 28-bit prime */

 +#define VSID_MULTIPLIER_1T   ASM_CONST(12538073) /* 24-bit prime */

Anything special in how this 24-bit prime value was selected?   (same
question could be for the 28-bit prime, though I see that value was
updated at least once a few years back)

-Will


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] Use 1TB segments

2007-08-02 Thread Will Schmidt

On Fri, 2007-08-03 at 08:37 +1000, Benjamin Herrenschmidt wrote:
  Is there technical reason why the 'local' variable remains at the end of
  the parm list for these?   In other cases 'ssize' simply gets added to
  the end of the parm list. 
 
 Looks nicer to have psize and ssize together :-)

Aah!   And here I thought there was some obscure register usage
optimization going on..   :-)

-Will

 
 Ben.
 
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@ozlabs.org
 https://ozlabs.org/mailman/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] fixes for the SLB shadow buffer

2007-08-01 Thread Will Schmidt

On Wed, 2007-08-01 at 16:02 +1000, Michael Neuling wrote:
 --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/entry_64.S
 +++ linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
 @@ -389,7 +389,9 @@ BEGIN_FTR_SECTION
   ld  r9,PACA_SLBSHADOWPTR(r13)
   li  r12,0
   std r12,SLBSHADOW_STACKESID(r9) /* Clear ESID */
 + eieio
   std r7,SLBSHADOW_STACKVSID(r9)  /* Save VSID */
 + eieio
   std r0,SLBSHADOW_STACKESID(r9)  /* Save ESID */
 
Hi Michael, 

I was going to ask if we really needed both of them, but think I
convinced myself that we do.
 
Do we also want/need an eieio after the /* Save ESID */ statement, or is
that somehow handled by the slbie following?

-Will 

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] [POWERPC] iSeries: fix section mismatch warnings

2007-07-26 Thread Will Schmidt

On Thu, 2007-07-26 at 11:56 +1000, Stephen Rothwell wrote:
 Hi Will,
 
 On Wed, 25 Jul 2007 11:55:31 -0500 Will Schmidt [EMAIL PROTECTED] wrote:
 
 cmpwi   0,r24,0 /* Are we processor 0? */
   - beq .__start_initialization_iSeries /* Start up the first processor 
   */
   - mfspr   r4,SPRN_CTRLF
   + bne 1f
   + b   .__start_initialization_iSeries /* Start up the first processor 
   */
   +1:   mfspr   r4,SPRN_CTRLF
 li  r5,CTRL_RUNLATCH/* Turn off the run light */
  
  This part isnt clicking for me.. 
  How does changing a beq to a bne over a b fit into changing
  __start_initialization_iSeries static?  
 
 Because I moved __start_initialization_iSeries into another section, it
 ends up too far away for a conditional branch so something adds a jump
 table to the .text section and changes this branch to be via that table.
 Unfortunately, the jump table ends up at the start of the .text and ruins
 our carefully laid out kernel image.  By inverting the test I can turn
 the branch into an unconditional one which has a larger possible offse
 (effectively building the jump table manually).

Gotcha, thanks for the clarification.  :-)

-Will

 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@ozlabs.org
 https://ozlabs.org/mailman/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

55 matches

Mail list logo