date:20110518

Re: [PATCH 19/37] powerpc: consolidate ipi message mux and demux

2011-05-18 Thread Benjamin Herrenschmidt

On Thu, 2011-05-19 at 16:57 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2011-05-11 at 00:29 -0500, Milton Miller wrote:
> > Consolidate the mux and demux of ipi messages into smp.c and call
> > a new smp_ops callback to actually trigger the ipi.
> 
>  .../...
> 
> I'm merging the whole series.  I had to do some fixups to this one and
> the one adding the CONFIG option, missing cell & wsp bits among others,
> but mostly trivial.

I forgot to mention... I dropped the change to include/linux/smp.h to
remove the unused MSG_ flags for now. It will not have been in -next
long enough to hit Linus via my tree, just in case somebody started
using the flags while we were not looking :-)

I suggest you send it to Linus directly after he pulls my tree during
the merge window.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 19/37] powerpc: consolidate ipi message mux and demux

2011-05-18 Thread Benjamin Herrenschmidt

On Wed, 2011-05-11 at 00:29 -0500, Milton Miller wrote:
> Consolidate the mux and demux of ipi messages into smp.c and call
> a new smp_ops callback to actually trigger the ipi.

 .../...

I'm merging the whole series.  I had to do some fixups to this one and
the one adding the CONFIG option, missing cell & wsp bits among others,
but mostly trivial.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [linuxppc-dev] [PATCH][upstream] powerpc:Integrated Flash controller device tree bindings

2011-05-18 Thread Kumar Gala


On May 19, 2011, at 1:38 AM, Dipen Dudhat wrote:

> Signed-off-by: Dipen Dudhat 
> Acked-By: Scott Wood 
> ---
> Based upon 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git (branch 
> -> master)
> .../devicetree/bindings/powerpc/fsl/ifc.txt|   76 
> 1 files changed, 76 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/powerpc/fsl/ifc.txt

applied to next

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH][upstream] powerpc:Integrated Flash controller device tree bindings

2011-05-18 Thread Dipen Dudhat

Signed-off-by: Dipen Dudhat 
Acked-By: Scott Wood 
---
Based upon git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 
(branch -> master)
 .../devicetree/bindings/powerpc/fsl/ifc.txt|   76 
 1 files changed, 76 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/powerpc/fsl/ifc.txt

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/ifc.txt 
b/Documentation/devicetree/bindings/powerpc/fsl/ifc.txt
new file mode 100644
index 000..939a26d
--- /dev/null
+++ b/Documentation/devicetree/bindings/powerpc/fsl/ifc.txt
@@ -0,0 +1,76 @@
+Integrated Flash Controller
+
+Properties:
+- name : Should be ifc
+- compatible : should contain "fsl,ifc". The version of the integrated
+   flash controller can be found in the IFC_REV register at
+   offset zero.
+
+- #address-cells : Should be either two or three.  The first cell is the
+   chipselect number, and the remaining cells are the
+   offset into the chipselect.
+- #size-cells : Either one or two, depending on how large each chipselect
+can be.
+- reg : Offset and length of the register set for the device
+- interrupts : IFC has two interrupts. The first one is the "common"
+   interrupt(CM_EVTER_STAT), and second is the NAND interrupt
+   (NAND_EVTER_STAT).
+
+- ranges : Each range corresponds to a single chipselect, and covers
+   the entire access window as configured.
+
+Child device nodes describe the devices connected to IFC such as NOR (e.g.
+cfi-flash) and NAND (fsl,ifc-nand). There might be board specific devices
+like FPGAs, CPLDs, etc.
+
+Example:
+
+   ifc@ffe1e000 {
+   compatible = "fsl,ifc", "simple-bus";
+   #address-cells = <2>;
+   #size-cells = <1>;
+   reg = <0x0 0xffe1e000 0 0x2000>;
+   interrupts = <16 2 19 2>;
+
+   /* NOR, NAND Flashes and CPLD on board */
+   ranges = <0x0 0x0 0x0 0xee00 0x0200
+ 0x1 0x0 0x0 0xffa0 0x0001
+ 0x3 0x0 0x0 0xffb0 0x0002>;
+
+   flash@0,0 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "cfi-flash";
+   reg = <0x0 0x0 0x200>;
+   bank-width = <2>;
+   device-width = <1>;
+
+   partition@0 {
+   /* 32MB for user data */
+   reg = <0x0 0x0200>;
+   label = "NOR Data";
+   };
+   };
+
+   flash@1,0 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "fsl,ifc-nand";
+   reg = <0x1 0x0 0x1>;
+
+   partition@0 {
+   /* This location must not be altered  */
+   /* 1MB for u-boot Bootloader Image */
+   reg = <0x0 0x0010>;
+   label = "NAND U-Boot Image";
+   read-only;
+   };
+   };
+
+   cpld@3,0 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "fsl,p1010rdb-cpld";
+   reg = <0x3 0x0 0x01f>;
+   };
+   };
-- 
1.5.6.5


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/qoriq: Add default mode for P1020RDB USB

2011-05-18 Thread Kumar Gala


On May 4, 2011, at 8:26 AM, Ramneek Mehresh wrote:

> Add P1020 USB controller default value for "dr_mode" property
> 
> Signed-off-by: Ramneek Mehresh 
> ---
> Applies on git://git.am.freescale.net/mirrors/linux-2.6.git
> (branch master)
> arch/powerpc/boot/dts/p1020rdb.dts |   10 --
> 1 files changed, 4 insertions(+), 6 deletions(-)

Can you update the patch.  Also make sure to update the p1020rdb_camp* .dts

Against git.kernel.org/pub/scm/linux/kernel/git/galak/powerpc.git next

thanks

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/85xx:Create dts of each core in CAMP mode for P1020RDB

2011-05-18 Thread Kumar Gala


On Apr 28, 2011, at 2:00 AM, Prabhakar Kushwaha wrote:

> Create the dts files for each core and splits the devices between the two 
> cores
> for P1020RDB.
> 
> Core0 has core0 to have memory, l2, i2c, spi, gpio, tdm, dma, usb, eth1, eth2,
> sdhc, crypto, global-util, message, pci0, pci1, msi.
> Core1 has l2, eth0, crypto.
> 
> MPIC is shared between two cores but each core will protect its interrupts 
> from
> other core by using "protected-sources" of mpic.
> 
> Fix compatible property for global-util node of P1020si.dtsi.
> 
> Signed-off-by: Prabhakar Kushwaha 
> ---
> Based upon 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git(branch 
> master)
> 
> This patch depends on following patch
>   "powerpc/85xx: P1020 DTS : re-organize dts files"
> 
> arch/powerpc/boot/dts/p1020rdb_camp_core0.dts |  213 +
> arch/powerpc/boot/dts/p1020rdb_camp_core1.dts |  148 +
> arch/powerpc/boot/dts/p1020si.dtsi|2 +-
> 3 files changed, 362 insertions(+), 1 deletions(-)
> create mode 100644 arch/powerpc/boot/dts/p1020rdb_camp_core0.dts
> create mode 100644 arch/powerpc/boot/dts/p1020rdb_camp_core1.dts

applied to next

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/85xx: Save and restore pcie ATMU windows for PM

2011-05-18 Thread Kumar Gala


On Apr 28, 2011, at 1:38 AM, Prabhakar Kushwaha wrote:

> D3-cold state indicates removal of the clock and power. however auxiliary 
> (AUX)
> Power may remain available even after the main power rails are powered down.
> 
> wakeup from D3-cold state requires full context restore. Other things are 
> taken
> care in pci-driver except ATMUs.
> ATMU windows needs to be saved and restored during suspend and resume.
> 
> Signed-off-by: Jiang Yutang 
> Signed-off-by: Prabhakar Kushwaha 
> ---
> Based upon 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git(branch 
> master)
> 
> arch/powerpc/sysdev/fsl_pci.c |  116 +
> arch/powerpc/sysdev/fsl_pci.h |7 ++-
> 2 files changed, 121 insertions(+), 2 deletions(-)

Is this patch for when we are a host or agent?

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/85xx: add host-pci(e) bridge only for RC

2011-05-18 Thread Kumar Gala


On Apr 27, 2011, at 12:35 AM, Prabhakar Kushwaha wrote:

> FSL PCIe controller can act as agent(EP) or host(RC).
> Under Agent(EP) mode they are configured via Host. So it is not required to 
> add
> with the PCI(e) sub-system.
> 
> Add and configure PCIe controller only for RC mode.
> 
> Signed-off-by: Vivek Mahajan 
> Signed-off-by: Prabhakar Kushwaha 
> ---
> Based upon 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git(branch 
> master)
> 
> arch/powerpc/sysdev/fsl_pci.c |   14 ++
> 1 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
> index 68ca929..87ac11b 100644
> --- a/arch/powerpc/sysdev/fsl_pci.c
> +++ b/arch/powerpc/sysdev/fsl_pci.c
> @@ -323,6 +323,7 @@ int __init fsl_add_bridge(struct device_node *dev, int 
> is_primary)
>   struct pci_controller *hose;
>   struct resource rsrc;
>   const int *bus_range;
> + u8 is_agent;
> 
>   if (!of_device_is_available(dev)) {
>   pr_warning("%s: disabled\n", dev->full_name);
> @@ -353,6 +354,19 @@ int __init fsl_add_bridge(struct device_node *dev, int 
> is_primary)
> 
>   setup_indirect_pci(hose, rsrc.start, rsrc.start + 0x4,
>   PPC_INDIRECT_TYPE_BIG_ENDIAN);
> +
> + early_read_config_byte(hose, 0, 0, PCI_HEADER_TYPE, &is_agent);

Why are we looking at PCI_HEADER_TYPE?  We should look at PCI_CLASS_PROG.

> + if ((is_agent & 0x7f) == PCI_HEADER_TYPE_NORMAL) {
> + u32 temp;
> +
> + temp = (u32)hose->cfg_data & ~PAGE_MASK;
> + if (((u32)hose->cfg_data & PAGE_MASK) != (u32)hose->cfg_addr)
> + iounmap(hose->cfg_data - temp);
> + iounmap(hose->cfg_addr);
> + pcibios_free_controller(hose);
> + return 0;
> + }
> +
>   setup_pci_cmd(hose);
> 
>   /* check PCI express link status */
> -- 
> 1.7.3
> 
> 
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/85xx:DTS: Fix PCIe IDSEL for Px020RDB

2011-05-18 Thread Kumar Gala


On Apr 19, 2011, at 11:12 PM, Prabhakar Kushwaha wrote:

> PCIe device in legacy mode can trigger interrupts using the wires #INTA, #INTB
> ,#INTC and #INTD. PCI devices are obligated to use #INTx for interrupts under
> legacy mode.  Each PCI slot or device is typically wired to different inputs 
> on
> the interrupt controller. 
> 
> So, Define interrupt-map and interrupt-map-mask properties for device tree to
> of map each PCI interrupt signal to the inputs of the interrupt controller.
> 
> Signed-off-by: Prabhakar Kushwaha 
> ---
> Based upon 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git(branch 
> master)
> 
> This patch has depedency on following 2 patches
>   -- powerpc/85xx: P2020 DTS: re-organize dts files
>   -- powerpc/85xx: P1020 DTS : re-organize dts files
> 
> arch/powerpc/boot/dts/p1020rdb.dts|   16 
> arch/powerpc/boot/dts/p2020rdb.dts|   16 
> arch/powerpc/boot/dts/p2020rdb_camp_core0.dts |8 
> arch/powerpc/boot/dts/p2020rdb_camp_core1.dts |8 
> 4 files changed, 48 insertions(+), 0 deletions(-)

applied to next

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH][v2] powerpc/85xx: P1020 DTS : re-organize dts files

2011-05-18 Thread Kumar Gala


On Apr 7, 2011, at 4:10 AM, Prabhakar Kushwaha wrote:

> Creates P1020si.dtsi, containing information for the P1020 SoC. Modifies dts
> files for P1020 based systems to use dtsi file
> 
> Signed-off-by: Prabhakar Kushwaha 
> Acked-by: Kumar Gala 
> ---
> Based upon 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git(branch 
> master)
> 
> Please see mpc5200b.dtsi for reference.
> 
> Tested on P1020RDB
> 
> Changes for v2: Incorporated Grant Likely's comment
>   -updated model name
> 
> arch/powerpc/boot/dts/p1020rdb.dts |  316 +--
> arch/powerpc/boot/dts/p1020si.dtsi |  377 
> 2 files changed, 380 insertions(+), 313 deletions(-)
> create mode 100644 arch/powerpc/boot/dts/p1020si.dtsi

applied to next

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/85xx: P2020 DTS: re-organize dts files

2011-05-18 Thread Kumar Gala


On Apr 8, 2011, at 7:27 AM, Prabhakar Kushwaha wrote:

> Creates P2020si.dtsi, containing information for P2020 SoC. Modifies dts files
> for P2020 based systems to use dtsi file.
> 
> Signed-off-by: Prabhakar Kushwaha 
> ---
> Based upon 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git(branch 
> master)
> 
> Please see mpc5200b.dtsi for reference.
> 
> Tested on P2020RDB and P2020DS
> 
> arch/powerpc/boot/dts/p2020ds.dts |  374 ++--
> arch/powerpc/boot/dts/p2020rdb.dts|  362 ++-
> arch/powerpc/boot/dts/p2020rdb_camp_core0.dts |  237 +++-
> arch/powerpc/boot/dts/p2020rdb_camp_core1.dts |  142 ++
> arch/powerpc/boot/dts/p2020si.dtsi|  382 +
> 5 files changed, 564 insertions(+), 933 deletions(-)
> create mode 100644 arch/powerpc/boot/dts/p2020si.dtsi

applied to next

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH][upstream] powerpc: Adding bindings for flexcan controller

2011-05-18 Thread Kumar Gala


On Apr 19, 2011, at 8:58 AM, Bhaskar Upadhaya wrote:

> From: Bhaskar Upadhaya 
> 
> Signed-off-by: Bhaskar Upadhaya 
> Acked-By: Scott Wood 
> ---
> Based upon 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git (branch 
> -> master)
> 
> .../devicetree/bindings/net/can/fsl-flexcan.txt|   61 
> 1 files changed, 61 insertions(+), 0 deletions(-)
> create mode 100755 Documentation/devicetree/bindings/net/can/fsl-flexcan.txt

applied to next

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 13/13] kvm/powerpc: Allow book3s_hv guests to use SMT processor modes

2011-05-18 Thread Paul Mackerras

On Tue, May 17, 2011 at 01:36:26PM +0200, Alexander Graf wrote:
>
> Just so I understand the scheme: One vcpu needs to go to MMU mode in
> KVM, it then sends IPIs to stop the other threads and finally we
> return from this wait here?

Actually, if one thread needs to get the other threads out of the
guest, it sets the HDEC to 0.  Since it's a shared register and
interrupts all threads on a 0 to -1 transition, setting it to 0 makes
all threads come out of the guest.

The IPI is for when we're going into the guest.  When we're in the
host, all the secondary threads are in nap and only the primary thread
is running.  (Offlining a cpu in the host results in the cpu/thread
going to nap mode.)  Sending an IPI to a napping thread wakes it up
and it resumes at the system reset vector with some bits set in SRR1
to say that it was previously in nap mode.

> Oh, I'm certainly fine with the scheme :). I would just like to
> understand it and see it documented somewhere, as it's slightly
> unintuitive.

It took some thought to work it out, so you're right, I should
definitely document it.

> Also, this scheme might confuse the host scheduler for a bit, as it
> might migrate threads to other host CPUs while it would prove
> beneficial for cache usage to keep them local. But since the
> scheduler doesn't know about the correlation between the threads, it
> can't be clever about it.

Well, it's not going to migrate a sleeping thread.  The accounting
gets slightly strange in that all the CPU time for running the 4 vcpus
in the vcore gets accounted to one of the vcpu threads (which one can
change over time).  However, the total across all qemu threads should
be correct.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 02/13] powerpc/e500: SPE register saving: take arbitrary struct offset

2011-05-18 Thread Kumar Gala


On May 17, 2011, at 6:36 PM, Scott Wood wrote:

> Previously, these macros hardcoded THREAD_EVR0 as the base of the save
> area, relative to the base register passed.  This base offset is now
> passed as a separate macro parameter, allowing reuse with other SPE
> save areas, such as used by KVM.
> 
> Signed-off-by: Scott Wood 
> ---
> This is a resending of http://www.spinics.net/lists/kvm-ppc/msg02672.html
> 
> Kumar, please ack to go via kvm.
> 
> arch/powerpc/include/asm/ppc_asm.h   |   28 
> arch/powerpc/kernel/head_fsl_booke.S |6 +++---
> 2 files changed, 19 insertions(+), 15 deletions(-)

Acked-by: Kumar Gala 

[ Alex, let me know if you want this via my powerpc.git tree or your kvm tree ]

- k

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 01/13] powerpc/e500: Save SPEFCSR in flush_spe_to_thread()

2011-05-18 Thread Kumar Gala


On May 17, 2011, at 6:35 PM, Scott Wood wrote:

> From: yu liu 
> 
> giveup_spe() saves the SPE state which is protected by MSR[SPE].
> However, modifying SPEFSCR does not trap when MSR[SPE]=0.
> And since SPEFSCR is already saved/restored in _switch(),
> not all the callers want to save SPEFSCR again.
> Thus, saving SPEFSCR should not belong to giveup_spe().
> 
> This patch moves SPEFSCR saving to flush_spe_to_thread(),
> and cleans up the caller that needs to save SPEFSCR accordingly.
> 
> Signed-off-by: Liu Yu 
> Signed-off-by: Scott Wood 
> ---
> This is a resending of http://patchwork.ozlabs.org/patch/88677/
> 
> Kumar, please ack to go via kvm.  This is holding up the rest of the SPE
> patches, which in turn are holding up the MMU patches due to both
> touching the MSR update code.
> 
> arch/powerpc/kernel/head_fsl_booke.S |2 --
> arch/powerpc/kernel/process.c|1 +
> arch/powerpc/kernel/traps.c  |5 +
> 3 files changed, 2 insertions(+), 6 deletions(-)

Acked-by: Kumar Gala 

[ Alex, let me know if you want this via my powerpc.git tree or your kvm tree ]

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] powerpc/fsl: enable verbose bug output

2011-05-18 Thread Kumar Gala


On May 10, 2011, at 1:02 PM, Scott Wood wrote:

> This debug option has no overhead other than a slight increase in
> kernel size, and makes bug reports more useful.  While some end users
> may prefer to save the space, as a default on a kernel config aimed
> primarily at development on reference boards, it should be enabled.
> 
> Signed-off-by: Scott Wood 
> ---
> arch/powerpc/configs/83xx/mpc8313_rdb_defconfig  |1 -
> arch/powerpc/configs/83xx/mpc8315_rdb_defconfig  |1 -
> arch/powerpc/configs/85xx/mpc8540_ads_defconfig  |1 -
> arch/powerpc/configs/85xx/mpc8560_ads_defconfig  |1 -
> arch/powerpc/configs/85xx/mpc85xx_cds_defconfig  |1 -
> arch/powerpc/configs/86xx/mpc8641_hpcn_defconfig |1 -
> arch/powerpc/configs/e55xx_smp_defconfig |1 -
> arch/powerpc/configs/mpc85xx_defconfig   |1 -
> arch/powerpc/configs/mpc85xx_smp_defconfig   |1 -
> arch/powerpc/configs/mpc86xx_defconfig   |1 -
> 10 files changed, 0 insertions(+), 10 deletions(-)

applied to next

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/4] powerpc/mpic: add the mpic global timer support

2011-05-18 Thread Kumar Gala


On Mar 24, 2011, at 4:43 PM, Scott Wood wrote:

> Add support for MPIC timers as requestable interrupt sources.
> 
> Based on http://patchwork.ozlabs.org/patch/20941/ by Dave Liu.
> 
> Signed-off-by: Dave Liu 
> Signed-off-by: Scott Wood 
> ---
> arch/powerpc/include/asm/mpic.h |3 +-
> arch/powerpc/sysdev/mpic.c  |   92 ---
> 2 files changed, 88 insertions(+), 7 deletions(-)

applied to next, fixed for upstream changes.

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/4] powerpc/mpic: parse 4-cell intspec types other than zero

2011-05-18 Thread Kumar Gala


On Mar 24, 2011, at 4:43 PM, Scott Wood wrote:

> Signed-off-by: Scott Wood 
> ---
> arch/powerpc/include/asm/mpic.h |2 ++
> arch/powerpc/sysdev/mpic.c  |   37 -
> 2 files changed, 38 insertions(+), 1 deletions(-)

applied to next

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/4] powerpc/p1022ds: fix broken mpic timer node

2011-05-18 Thread Kumar Gala


On Mar 24, 2011, at 4:43 PM, Scott Wood wrote:

> There is no hardware interrupt 0xf7.  But now we can express the timer
> interrupt using 4-cell interrupts.  This requires converting all of the
> other interrupt specifiers in the tree as well.
> 
> Also add the second timer group, and fix the reg property to only
> describe the timer registers.
> 
> Signed-off-by: Scott Wood 
> ---
> arch/powerpc/boot/dts/p1022ds.dts |  106 
> 1 files changed, 59 insertions(+), 47 deletions(-)

applied to next

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/4] powerpc: Add fsl mpic timer binding

2011-05-18 Thread Kumar Gala


On Mar 24, 2011, at 4:43 PM, Scott Wood wrote:

> Update the existing example in the general mpic binding to have a
> separate TCRx region.  Currently the example doesn't describe TCRx at
> all.  The one upstream device tree with an mpic timer node (p1022ds)
> uses one large reg region to describe both, even though there are other
> unrelated registers in between.  That device tree also contains a bogus
> interrupt specifier, and there's no upstream software that uses this yet,
> so changing this shouldn't be a problem.
> 
> Add a full binding for the MPIC timer node, not just an example of
> 4-cell interrupts in the MPIC binding.
> 
> Add fsl,available-ranges, similar to msi-available-ranges.
> 
> Signed-off-by: Scott Wood 
> ---
> .../devicetree/bindings/powerpc/fsl/mpic-timer.txt |   38 
> .../devicetree/bindings/powerpc/fsl/mpic.txt   |2 +-
> 2 files changed, 39 insertions(+), 1 deletions(-)
> create mode 100644 
> Documentation/devicetree/bindings/powerpc/fsl/mpic-timer.txt

applied to next

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/13] Hypervisor-mode KVM on POWER7

2011-05-18 Thread Alexander Graf


On 19.05.2011, at 07:22, Paul Mackerras wrote:

> On Tue, May 17, 2011 at 02:42:08PM +0300, Avi Kivity wrote:
>> On 05/17/2011 02:38 PM, Alexander Graf wrote:
 
 What would be the path for these patches to get upstream?  Would this
 stuff normally go through Avi's tree?  There is a bit of a
 complication in that they are based on Ben's next branch.  Would Avi
 pull Ben's next branch, or would they go in via Ben's tree?
>>> 
>>> Usually the ppc tree gets merged into Avi's tree and goes on from
>>> there. When we have interdependencies, we can certainly do it
>>> differently though. We can also shove them through Ben's tree this
>>> time around, as there are more dependencies on ppc code than KVM
>>> code.
>>> 
>> 
>> Yes, both options are fine.  If it goes through kvm.git I can merge
>> Ben's tree (provided it is append-only) and apply the kvm-ppc
>> patches on top.
> 
> OK, the easiest thing is for them to go via Ben's tree, I think, since
> they depend so much on other stuff in Ben's tree.
> 
> Alex, could you give Ben an acked-by for patches 1-8 of the series?
> There haven't been any changes requested for them.

Let me give them a spin on a G5, so I can at least verify nothing breaks ;). 
I'll hopefully get to this before next week.

Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] powerpc,e5500: add networking to defconfig

2011-05-18 Thread Kumar Gala


On May 10, 2011, at 1:01 PM, Scott Wood wrote:

> Even though support for the p5020's on-chip ethernet is not yet upstream,
> it is not appropriate to disable all networking support (including
> loopback, unix domain sockets, external ethernet devices, etc) in the
> defconfig.  The networking settings are taken from mpc85xx_smp_defconfig,
> minus the drivers for ethernet devices not found on any current e5500
> chip.
> 
> The other changes are the result of running "make savedefconfig".
> 
> Signed-off-by: Scott Wood 
> ---
> arch/powerpc/configs/e55xx_smp_defconfig |   38 ++---
> 1 files changed, 29 insertions(+), 9 deletions(-)

applied to next

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/7] [RFC] add support for BlueGene/P FPU

2011-05-18 Thread Michael Neuling

Eric,

> This patch adds save/restore register support for the BlueGene/P
> double hummer FPU.

What does this mean?  Needs more details here.

> Signed-off-by: Eric Van Hensbergen 
> ---
>  arch/powerpc/include/asm/ppc_asm.h |   39 --
-
>  arch/powerpc/kernel/fpu.S  |8 +++---
>  arch/powerpc/platforms/44x/Kconfig |9 
>  3 files changed, 40 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/pp
c_asm.h
> index 9821006..daa22bb 100644
> --- a/arch/powerpc/include/asm/ppc_asm.h
> +++ b/arch/powerpc/include/asm/ppc_asm.h
> @@ -88,6 +88,13 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
>   REST_10GPRS(22, base)
>  #endif
>  
> +#ifdef CONFIG_BGP
> +#define LFPDX(frt, ra, rb)   .long (31<<26)|((frt)<<21)|((ra)<<16)| \
> + ((rb)<<11)|(462<<1)
> +#define STFPDX(frt, ra, rb)  .long (31<<26)|((frt)<<21)|((ra)<<16)| \
> + ((rb)<<11)|(974<<1)
> +#endif /* CONFIG_BGP */

Put these in arch/powerpc/include/asm/ppc-opcode.h and reformat to fit
whats there already.  

Also, don't need to put these defines inside a #ifdef.

> +
>  #define SAVE_2GPRS(n, base)  SAVE_GPR(n, base); SAVE_GPR(n+1, base)
>  #define SAVE_4GPRS(n, base)  SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
>  #define SAVE_8GPRS(n, base)  SAVE_4GPRS(n, base); SAVE_4GPRS(n+4, base)
> @@ -97,18 +104,26 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
>  #define REST_8GPRS(n, base)  REST_4GPRS(n, base); REST_4GPRS(n+4, base)
>  #define REST_10GPRS(n, base) REST_8GPRS(n, base); REST_2GPRS(n+8, base)
>  
> -#define SAVE_FPR(n, base)stfdn,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
> -#define SAVE_2FPRS(n, base)  SAVE_FPR(n, base); SAVE_FPR(n+1, base)
> -#define SAVE_4FPRS(n, base)  SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
> -#define SAVE_8FPRS(n, base)  SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
> -#define SAVE_16FPRS(n, base) SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
> -#define SAVE_32FPRS(n, base) SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
> -#define REST_FPR(n, base)lfd n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
> -#define REST_2FPRS(n, base)  REST_FPR(n, base); REST_FPR(n+1, base)
> -#define REST_4FPRS(n, base)  REST_2FPRS(n, base); REST_2FPRS(n+2, base)
> -#define REST_8FPRS(n, base)  REST_4FPRS(n, base); REST_4FPRS(n+4, base)
> -#define REST_16FPRS(n, base) REST_8FPRS(n, base); REST_8FPRS(n+8, base)
> -#define REST_32FPRS(n, base) REST_16FPRS(n, base); REST_16FPRS(n+16, base)
> +#ifdef CONFIG_BGP
> +#define SAVE_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); STFPDX(n, base, b)
> +#define REST_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); LFPDX(n, base, b)

16*?  Are these FP regs 64 or 128 bits wide?  If 128 you are doing to
have to play with TS_WIDTH to get the size of the FPs correct in the
thread_struct.

I think there's a bug here.

> +#else /* CONFIG_BGP */
> +#define SAVE_FPR(n, b, base) (stfd   n, THREAD_FPR0+8*TS_FPRWIDTH*(n)(base))
> +#define REST_FPR(n, b, base) (lfdn, THREAD_FPR0+8*TS_FPRWIDTH*(n)(base))
> +#endif /* CONFIG_BGP */
> +
> +#define SAVE_2FPRS(n, b, base)   SAVE_FPR(n, b, base); SAVE_FPR(n+1, b, 
base)
> +#define SAVE_4FPRS(n, b, base)   SAVE_2FPRS(n, b, base); SAVE_2FPRS(n+2,
 b, base)
> +#define SAVE_8FPRS(n, b, base)   SAVE_4FPRS(n, b, base); SAVE_4FPRS(n+4,
 b, base)
> +#define SAVE_16FPRS(n, b, base)  SAVE_8FPRS(n, b, base); SAVE_8FPRS(n+8,
 b, base)
> +#define SAVE_32FPRS(n, b, base)  SAVE_16FPRS(n, b, base); \
> + SAVE_16FPRS(n+16, b, base)
> +#define REST_2FPRS(n, b, base)   REST_FPR(n, b, base); REST_FPR(n+1, b, 
base)
> +#define REST_4FPRS(n, b, base)   REST_2FPRS(n, b, base); REST_2FPRS(n+2,
 b, base)
> +#define REST_8FPRS(n, b, base)   REST_4FPRS(n, b, base); REST_4FPRS(n+4,
 b, base)
> +#define REST_16FPRS(n, b, base)  REST_8FPRS(n, b, base); REST_8FPRS(n+8,
 b, base)
> +#define REST_32FPRS(n, b, base)  REST_16FPRS(n, b, base); \
> + REST_16FPRS(n+16, b, base)
>  
>  #define SAVE_VR(n,b,base)li b,THREAD_VR0+(16*(n));  stvx n,base,b
>  #define SAVE_2VRS(n,b,base)  SAVE_VR(n,b,base); SAVE_VR(n+1,b,base)
> diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
> index de36955..9f11c66 100644
> --- a/arch/powerpc/kernel/fpu.S
> +++ b/arch/powerpc/kernel/fpu.S
> @@ -30,7 +30,7 @@
>  BEGIN_FTR_SECTION\
>   b   2f; \
>  END_FTR_SECTION_IFSET(CPU_FTR_VSX);  \
> - REST_32FPRS(n,base);\
> + REST_32FPRS(n,c,base);  \
>   b   3f; \
>  2:   REST_32VSRS(n,c,base);

Re: [PATCH] powerpc/86xx: don't pretend that we support 8-bit pixels on the MPC8610 HPCD

2011-05-18 Thread Kumar Gala


On May 9, 2011, at 2:29 PM, Timur Tabi wrote:

> If the video mode is set to 16-, 24-, or 32-bit pixels, then the pixel data
> contains actual levels of red, blue, and green.  However, if the video mode is
> set to 8-bit pixels, then the 8-bit value represents an index into color 
> table.
> This is called "palette mode" on the Freescale DIU video controller.
> 
> The DIU driver does not currently support palette mode, but the MPC8610 HPCD
> board file returned a non-zero (although incorrect) pixel format value for
> 8-bit mode.
> 
> Signed-off-by: Timur Tabi 
> ---
> arch/powerpc/platforms/86xx/mpc8610_hpcd.c |   97 ++--
> 1 files changed, 64 insertions(+), 33 deletions(-)

applied to next

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/mpc8610_hpcd: Do not use "/" in interrupt names

2011-05-18 Thread Kumar Gala


On May 4, 2011, at 9:29 AM, Geert Uytterhoeven wrote:

> It may trigger a warning in fs/proc/generic.c:__xlate_proc_name() when
> trying to add an entry for the interrupt handler to sysfs.
> 
> Signed-off-by: Geert Uytterhoeven 
> ---
> arch/powerpc/platforms/86xx/mpc8610_hpcd.c |2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)

applied to next, will CC for stable (2.6.39.1)

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/e5500: set non-base IVORs

2011-05-18 Thread Kumar Gala


On May 9, 2011, at 4:26 PM, Scott Wood wrote:

> Without this, we attempt to use doorbells for IPIs, and end up
> branching to some bad address.  Plus, even for the exceptions
> we don't implement, it's good to handle it and get a message out.
> 
> Signed-off-by: Scott Wood 
> ---
> arch/powerpc/include/asm/reg_booke.h  |4 ++
> arch/powerpc/kernel/cpu_setup_fsl_booke.S |3 ++
> arch/powerpc/kernel/exceptions-64e.S  |   47 +
> 3 files changed, 54 insertions(+), 0 deletions(-)

applied to next

- k

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic

2011-05-18 Thread Benjamin Herrenschmidt

On Thu, 2011-05-19 at 08:46 +0400, James Bottomley wrote:
> This can't really be done generically.  There are several considerations
> to do with hardware requirements.  I can see some hw requiring a
> specific write order (I think this applies more to read order, though).

Right. Or there can be a need for a completely different access pattern
to do 32-bit, or maybe write only one half because both might have a
side effect etc etc etc ...

Also a global lock would be suboptimal vs. a per device lock burried in
the driver.

> The specific mpt2sas problem is that if we write a 64 bit register non
> atomically, we can't allow any interleaving writes for any other region
> on the chip, otherwise the HW will take the write as complete in the 64
> bit register and latch the wrong value.  The only way to achieve that
> given the semantics of writeq is a global static spinlock.
> 
> > How do you think about them? If you cannot agree with the above two
> > solutions, I'll agree with reverting them.
> 
> Having x86 roll its own never made any sense, so I think they need
> reverting anyway. 

Agreed.

>  This is a driver/platform bus problem not an
> architecture problem.  The assumption we can make is that the platform
> CPU can write atomically at its chip width.  We *may* be able to make
> the assumption that the bus controller can translate an atomic chip
> width transaction to a single atomic bus transaction; I think that
> assumption holds true for at least PCI and on the parisc legacy busses,
> so if we can agree on semantics, this should be a global define
> somewhere.  If there are problems with the bus assumption, we'll likely
> need some type of opt-in (or just not bother).

And we want a well defined #ifdef drivers test to know whether there's a
writeq/readq (just #define writeq/readq itself is fine even if it's an
inline function, we do that elsewhere) so they can have a fallback
scenario.

This is important as these can be used in very performance critical code
path.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic

2011-05-18 Thread Benjamin Herrenschmidt

On Wed, 2011-05-18 at 21:16 -0700, Roland Dreier wrote:
> On Wed, May 18, 2011 at 11:31 AM, Milton Miller  wrote:
> > So the real question should be why is x86-32 supplying a broken writeq
> > instead of letting drivers work out what to do it when needed?
> 
> Sounds a lot like what I was asking a couple of years ago :)
> http://lkml.org/lkml/2009/4/19/164
> 
> But Ingo insisted that non-atomic writeq would be fine:
> http://lkml.org/lkml/2009/4/19/167

Yuck... Ingo, I think that was very wrong.

Those are for MMIO, which must almost ALWAYS know precisely what the
resulting access size is going to be. It's not even about atomicity
between multiple CPUs. I have seen plenty of HW for which a 64-bit
access to a register is -not- equivalent to two 32-bit ones. In fact, in
some case, you can get the side effects twice ... or none at all.

The only case where you can be lax is when you explicitely know that
there is no side effects -and- the HW cope with different access sizes.
This is not the general case and drivers need at the very least a way to
know what the behaviour will be.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc: mpc85xx regression since 2.6.39-rc2, one cpu core lame

2011-05-18 Thread Kumar Gala


On May 17, 2011, at 4:40 PM, Benjamin Herrenschmidt wrote:

> On Tue, 2011-05-17 at 18:28 +0200, Richard Cochran wrote:
>> Ben,
>> 
>> Recent 2.6.39-rc kernels behave strangely on the Freescale dual core
>> mpc8572 and p2020. There is a long pause (like 2 seconds) in the boot
>> sequence after "mpic: requesting IPIs..."
>> 
>> When the system comes up, only one core shows in /proc/cpuinfo. Later
>> on, lots of messages appear like the following:
>> 
>>   INFO: task ksoftirqd/1:9 blocked for more than 120 seconds.
>> 
>> I bisected [1] the problem to:
>> 
>>   commit c56e58537d504706954a06570b4034c04e5b7500
>>   Author: Benjamin Herrenschmidt 
>>   Date:   Tue Mar 8 14:40:04 2011 +1100
>> 
>>   powerpc/smp: Create idle threads on demand and properly reset them
>> 
>> I don't see from that commit what had gone wrong. Perhaps you can
>> help resolve this?
> 
> Hrm, odd. Kumar, care to have a look ? That's what happens when you
> don't get me HW to test with :-)

I'm trying to work on it ;)

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc: mpc85xx regression since 2.6.39-rc2, one cpu core lame

2011-05-18 Thread Kumar Gala


On May 18, 2011, at 4:48 PM, Benjamin Herrenschmidt wrote:

> 
>> (I get the feeling that I am the only one testing recent kernels with
>> the mpc85xx.)
>> 
>> Anyhow, I see that this commit was one of a series. For my own use,
>> can I simply revert this one commit independently?
> 
> For your own use sure :-) But I'd still like to get to the bottom of
> this !
> 
> Cheers,
> Ben.

Tested the 'merge' branch and it appears to fix the issues with secondary cores 
coming up.

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [git pull] Please pull powerpc.git merge branch

2011-05-18 Thread Kumar Gala


On May 18, 2011, at 11:06 PM, Benjamin Herrenschmidt wrote:

> Hi Linus
> 
> Dunno if it's too late or not yet but here's 3 fixes for powerpc that
> would be welcome to have in before the release. If not I'll send them
> first thing next (one of them is already in -next in fact).
> 
> Those are regression fixes and a build breakage.
> 
> Cheers,
> Ben.
> 
> The following changes since commit fce519588acfac249e8fdc1f5016c73d617de315:
> 
>  Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6 
> (2011-05-18 13:25:57 -0700)
> 
> are available in the git repository at:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge
> 
> Ben Hutchings (1):
>  powerpc/kexec: Fix build failure on 32-bit SMP
> 
> Benjamin Herrenschmidt (1):
>  powerpc/smp: Make start_secondary_resume available to all CPU variants
> 
> kerstin jonsson (1):
>  powerpc/4xx: Fix regression in SMP on 476
> 
> arch/powerpc/kernel/crash.c   |   59 +
> arch/powerpc/kernel/head_32.S |9 --
> arch/powerpc/kernel/misc_32.S |   11 +++
> arch/powerpc/kernel/smp.c |4 +-
> 4 files changed, 43 insertions(+), 40 deletions(-)
> 

Can you pull this into next.

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [git pull] Please pull powerpc.git merge branch

2011-05-18 Thread Benjamin Herrenschmidt

On Wed, 2011-05-18 at 21:11 -0700, Linus Torvalds wrote:
> On Wed, May 18, 2011 at 9:06 PM, Benjamin Herrenschmidt
>  wrote:
> >
> > Dunno if it's too late or not yet but here's 3 fixes for powerpc that
> > would be welcome to have in before the release. If not I'll send them
> > first thing next (one of them is already in -next in fact).
> 
> Gah. I just cut 2.6.39.

Bah, no biggie. I'll stick some CC: stable and put them in -next :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/13] Hypervisor-mode KVM on POWER7

2011-05-18 Thread Paul Mackerras

On Tue, May 17, 2011 at 02:42:08PM +0300, Avi Kivity wrote:
> On 05/17/2011 02:38 PM, Alexander Graf wrote:
> >>
> >>  What would be the path for these patches to get upstream?  Would this
> >>  stuff normally go through Avi's tree?  There is a bit of a
> >>  complication in that they are based on Ben's next branch.  Would Avi
> >>  pull Ben's next branch, or would they go in via Ben's tree?
> >
> >Usually the ppc tree gets merged into Avi's tree and goes on from
> >there. When we have interdependencies, we can certainly do it
> >differently though. We can also shove them through Ben's tree this
> >time around, as there are more dependencies on ppc code than KVM
> >code.
> >
> 
> Yes, both options are fine.  If it goes through kvm.git I can merge
> Ben's tree (provided it is append-only) and apply the kvm-ppc
> patches on top.

OK, the easiest thing is for them to go via Ben's tree, I think, since
they depend so much on other stuff in Ben's tree.

Alex, could you give Ben an acked-by for patches 1-8 of the series?
There haven't been any changes requested for them.

Thanks,
Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic

2011-05-18 Thread James Bottomley

On Thu, 2011-05-19 at 13:08 +0900, Hitoshi Mitake wrote:
> On Thu, May 19, 2011 at 04:11, Moore, Eric  wrote:
> > On Wednesday, May 18, 2011 12:31 PM Milton Miller wrote:
> >> Ingo I would propose the following commits added in 2.6.29 be reverted.
> >> I think the current concensus is drivers must know if the writeq is
> >> not atomic so they can provide their own locking or other workaround.
> >>
> >
> >
> > Exactly.
> >
> 
> The original motivation of preparing common readq/writeq is that
> letting each driver
> have their own readq/writeq is bad for maintenance of source code.
> 
> But if you really dislike them, there might be two solutions:
> 
> 1. changing the name of readq/writeq to readq_nonatomic/writeq_nonatomic

This is fine, but not really very useful

> 2. adding new C file to somewhere and defining spinlock for them.
> With spin_lock_irqsave() and spin_unlock_irqrestore() on the spinlock,
> readq/writeq can be atomic.

This can't really be done generically.  There are several considerations
to do with hardware requirements.  I can see some hw requiring a
specific write order (I think this applies more to read order, though).

The specific mpt2sas problem is that if we write a 64 bit register non
atomically, we can't allow any interleaving writes for any other region
on the chip, otherwise the HW will take the write as complete in the 64
bit register and latch the wrong value.  The only way to achieve that
given the semantics of writeq is a global static spinlock.

> How do you think about them? If you cannot agree with the above two
> solutions, I'll agree with reverting them.

Having x86 roll its own never made any sense, so I think they need
reverting anyway.  This is a driver/platform bus problem not an
architecture problem.  The assumption we can make is that the platform
CPU can write atomically at its chip width.  We *may* be able to make
the assumption that the bus controller can translate an atomic chip
width transaction to a single atomic bus transaction; I think that
assumption holds true for at least PCI and on the parisc legacy busses,
so if we can agree on semantics, this should be a global define
somewhere.  If there are problems with the bus assumption, we'll likely
need some type of opt-in (or just not bother).

James

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic

2011-05-18 Thread Roland Dreier

On Wed, May 18, 2011 at 11:31 AM, Milton Miller  wrote:
> So the real question should be why is x86-32 supplying a broken writeq
> instead of letting drivers work out what to do it when needed?

Sounds a lot like what I was asking a couple of years ago :)
http://lkml.org/lkml/2009/4/19/164

But Ingo insisted that non-atomic writeq would be fine:
http://lkml.org/lkml/2009/4/19/167

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic

2011-05-18 Thread Hitoshi Mitake

On Thu, May 19, 2011 at 04:11, Moore, Eric  wrote:
> On Wednesday, May 18, 2011 12:31 PM Milton Miller wrote:
>> Ingo I would propose the following commits added in 2.6.29 be reverted.
>> I think the current concensus is drivers must know if the writeq is
>> not atomic so they can provide their own locking or other workaround.
>>
>
>
> Exactly.
>

The original motivation of preparing common readq/writeq is that
letting each driver
have their own readq/writeq is bad for maintenance of source code.

But if you really dislike them, there might be two solutions:

1. changing the name of readq/writeq to readq_nonatomic/writeq_nonatomic
2. adding new C file to somewhere and defining spinlock for them.
With spin_lock_irqsave() and spin_unlock_irqrestore() on the spinlock,
readq/writeq can be atomic.

How do you think about them? If you cannot agree with the above two solutions,
I'll agree with reverting them.

-- 
Hitoshi Mitake
h.mit...@gmail.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [git pull] Please pull powerpc.git merge branch

2011-05-18 Thread David Miller

From: Linus Torvalds 
Date: Wed, 18 May 2011 21:11:47 -0700

> On Wed, May 18, 2011 at 9:06 PM, Benjamin Herrenschmidt
>  wrote:
>>
>> Dunno if it's too late or not yet but here's 3 fixes for powerpc that
>> would be welcome to have in before the release. If not I'll send them
>> first thing next (one of them is already in -next in fact).
> 
> Gah. I just cut 2.6.39.

I know we can't let these things go forever, but in my opinion
we should have given this one or two more -rc's.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RFCv7 0/2] CARMA Board Support

2011-05-18 Thread Benjamin Herrenschmidt

On Fri, 2011-02-11 at 15:34 -0800, Ira W. Snyder wrote:
> Hello everyone,
> 
> This is the seventh posting of these drivers, taking into account comments
> from earlier postings. I've made sure that the drivers both pass checkpatch
> without any errors or warnings. I would appreciate as much review as you
> can offer, so that these can get into the next merge cycle. They've been
> sitting outside mainline for far too long.

This has been bitrotting for way too long indeed. I'm sticking this into
powerpc -next today.

Cheers,
Ben.

> RFCv6 -> RFCv7:
> - reference count private data structure (to support unbind)
> - use #defines instead of hex values for registers
> - keep lines <=80 characters
> 
> RFCv5 -> RFCv6:
> - change locking in several functions
> - use list_move_tail() to simplify code
> - remove unused helper functions
> 
> RFCv4 -> RFCv5:
> - remove unecessary locking per review comments
> - do not clobber return values from *_interruptible()
> - explicitly track buffer DMA mapping
> - use #defines instead of raw hex addresses
> - change enable sysfs attribute to root-writeable only
> 
> RFCv3 -> RFCv4:
> - updates for DATA-FPGA version 2
> 
> RFCv2 -> RFCv3:
> - use miscdevice framework (removing the carma class)
> - add bitfile readback capability to the programmer
> 
> RFCv1 -> RFCv2:
> - change comments to kerneldoc format
> - Kconfig improvements
> - use the videobuf_dma_sg API in the programmer
> - updates for Freescale DMAEngine DMA_SLAVE API changes
> 
> KNOWN ISSUES:
> - untested with a setup that can generate interrupts (will get access soon)
> - does not handle runtime "unbind"
> 
> Information about the CARMA board:
> 
> The CARMA board is essentially an MPC8349EA MDS reference design with a
> 1GHz ADC and 4 high powered data processing FPGAs connected to the local
> bus. It is all packed into a compact PCI form factor. It is used at the
> Owens Valley Radio Observatory as the main component in the correlator
> system.
> 
> For board information, see:
> http://www.mmarray.org/~dwh/carma_board/index.html
> 
> For DATA-FPGA register layout, see:
> http://www.mmarray.org/memos/carma_memo46.pdf
> 
> These drivers are the necessary pieces to get the data processing FPGAs
> working and producing data. Despite the fact that the hardware is custom
> and we are the only users, I'd still like to get the drivers upstream.
> Several people have suggested that this is possible.
> 
> Some further patches will be forthcoming. I have a driver for the LED
> subsystem and the PPS subsystem. The LED register layout is expected to
> change soon, so I won't post the driver until that is finished. The PPS
> driver will be posted seperately from this patch series; it is very
> generic.
> 
> Thanks to everyone who has provided comments on earlier versions!
> 
> Ira W. Snyder (2):
>   misc: add CARMA DATA-FPGA Access Driver
>   misc: add CARMA DATA-FPGA Programmer support
> 
>  drivers/misc/Kconfig|1 +
>  drivers/misc/Makefile   |1 +
>  drivers/misc/carma/Kconfig  |   18 +
>  drivers/misc/carma/Makefile |2 +
>  drivers/misc/carma/carma-fpga-program.c | 1141 
>  drivers/misc/carma/carma-fpga.c | 1433 
> +++
>  6 files changed, 2596 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/misc/carma/Kconfig
>  create mode 100644 drivers/misc/carma/Makefile
>  create mode 100644 drivers/misc/carma/carma-fpga-program.c
>  create mode 100644 drivers/misc/carma/carma-fpga.c
> 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [git pull] Please pull powerpc.git merge branch

2011-05-18 Thread Linus Torvalds

On Wed, May 18, 2011 at 9:06 PM, Benjamin Herrenschmidt
 wrote:
>
> Dunno if it's too late or not yet but here's 3 fixes for powerpc that
> would be welcome to have in before the release. If not I'll send them
> first thing next (one of them is already in -next in fact).

Gah. I just cut 2.6.39.

  Linus
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering

2011-05-18 Thread Will Drewry

On Tue, May 17, 2011 at 6:19 AM, Ingo Molnar  wrote:
>
> * Steven Rostedt  wrote:
>
>> On Tue, 2011-05-17 at 14:42 +0200, Ingo Molnar wrote:
>> > * Steven Rostedt  wrote:
>> >
>> > > On Mon, 2011-05-16 at 18:52 +0200, Ingo Molnar wrote:
>> > > > * Steven Rostedt  wrote:
>> > > >
>> > > > > I'm a bit nervous about the 'active' role of (trace_)events, because 
>> > > > > of the
>> > > > > way multiple callbacks can be registered. How would:
>> > > > >
>> > > > >       err = event_x();
>> > > > >       if (err == -EACCESS) {
>> > > > >
>> > > > > be handled? [...]
>> > > >
>> > > > The default behavior would be something obvious: to trigger all 
>> > > > callbacks and
>> > > > use the first non-zero return value.
>> > >
>> > > But how do we know which callback that was from? There's no ordering of 
>> > > what
>> > > callbacks are called first.
>> >
>> > We do not have to know that - nor do the calling sites care in general. Do 
>> > you
>> > have some specific usecase in mind where the identity of the callback that
>> > generates a match matters?
>>
>> Maybe I'm confused. I was thinking that these event_*() are what we
>> currently call trace_*(), but the event_*(), I assume, can return a
>> value if a call back returns one.
>
> Yeah - and the call site can treat it as:
>
>  - Ugh, if i get an error i need to abort whatever i was about to do
>
> or (more advanced future use):
>
>  - If i get a positive value i need to re-evaluate the parameters that were
>   passed in, they were changed

Do event_* that return non-void exist in the tree at all now?  I've
looked at the various tracepoint macros as well as some of the other
handlers (trace_function, perf_tp_event, etc) and I'm not seeing any
places where a return value is honored nor could be.  At best, the
perf_tp_event can be short-circuited it in the hlist_for_each, but
it'd still need a way to bubble up a failure and result in not calling
the trace/event that the hook precedes.

Am I missing something really obvious?  I don't feel I've gotten a
good handle on exactly how all the tracing code gets triggered, so
perhaps I'm still a level (or three) too shallow. (I can see the asm
hooks for trace functions and I can see where that translates to
registered calls - like trace_function - but I don't see how the
hooked calls can be trivially aborted).

As is, I'm not sure how the perf and ftrace infrastructure could be
reused cleanly without a fair number of hacks to the interface and a
good bit of reworking.  I can already see a number of challenges
around reusing the sys_perf_event_open interface and the fact that
reimplementing something even as simple as seccomp mode=1 seems to
require a fair amount of tweaking to avoid from being leaky.  (E.g.,
enabling all TRACE_EVENT()s for syscalls will miss unhooked syscalls
so either acceptance matching needs to be propagated up the stack
along with some seccomp-like task modality or seccomp-on-perf would
have to depend on sys_enter events with syscall number predicate
matching and fail when a filter discard applies to all active events.)

At present, I'm leaning back towards the v2 series (plus the requested
minor changes) for the benefit of code clarity and its fail-secure
behavior.  Even just considering the reduced case of seccomp mode 1
being implemented on the shared infrastructure, I feel like I missing
something that makes it viable.  Any clues?

If not, I don't think a seccomp mode 2 interface via prctl would be
intractable if the long term movement is to a ftrace/perf backend - it
just means that the in-kernel code would change to wrap whatever the
final design ended up being.

Thanks and sorry if I'm being dense!

>> Thus, we now have the ability to dynamically attach function calls to
>> arbitrary points in the kernel that can have an affect on the code that
>> called it. Right now, we only have the ability to attach function calls to
>> these locations that have passive affects (tracing/profiling).
>
> Well, they can only have the effect that the calling site accepts and handles.
> So the 'effect' is not arbitrary and not defined by the callbacks, it is
> controlled and handled by the calling code.
>
> We do not want invisible side-effects, opaque hooks, etc.
>
> Instead of that we want (this is the getname() example i cited in the thread)
> explicit effects, like:
>
>  if (event_vfs_getname(result))
>        return ERR_PTR(-EPERM);
>
>> But you say, "nor do the calling sites care in general". Then what do
>> these calling sites do with the return code? Are we limiting these
>> actions to security only? Or can we have some other feature. [...]
>
> Yeah, not just security. One other example that came up recently is whether to
> panic the box on certain (bad) events such as NMI errors. This too could be
> made flexible via the event filter code: we already capture many events, so
> places that might conceivably do some policy could do so based on a filter
> condition.

This sounds great - I just wish I could

[git pull] Please pull powerpc.git merge branch

2011-05-18 Thread Benjamin Herrenschmidt

Hi Linus

Dunno if it's too late or not yet but here's 3 fixes for powerpc that
would be welcome to have in before the release. If not I'll send them
first thing next (one of them is already in -next in fact).

Those are regression fixes and a build breakage.

Cheers,
Ben.

The following changes since commit fce519588acfac249e8fdc1f5016c73d617de315:

  Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6 
(2011-05-18 13:25:57 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

Ben Hutchings (1):
  powerpc/kexec: Fix build failure on 32-bit SMP

Benjamin Herrenschmidt (1):
  powerpc/smp: Make start_secondary_resume available to all CPU variants

kerstin jonsson (1):
  powerpc/4xx: Fix regression in SMP on 476

 arch/powerpc/kernel/crash.c   |   59 +
 arch/powerpc/kernel/head_32.S |9 --
 arch/powerpc/kernel/misc_32.S |   11 +++
 arch/powerpc/kernel/smp.c |4 +-
 4 files changed, 43 insertions(+), 40 deletions(-)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] lib: Consolidate DEBUG_STACK_USAGE option

2011-05-18 Thread Benjamin Herrenschmidt

On Fri, 2011-05-06 at 22:57 -0700, Stephen Boyd wrote:
> Most arches define CONFIG_DEBUG_STACK_USAGE exactly the same way.
> Move it to lib/Kconfig.debug so each arch doesn't have to define
> it. This obviously makes the option generic, but that's fine
> because the config is already used in generic code.
> 
> It's not obvious to me that sysrq-P actually does anything
> different with this option enabled, but I erred on the side of
> caution by keeping the most inclusive wording.

Sorry for the delay...

For powerpc:

Acked-by: Benjamin Herrenschmidt 

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

linux-next: build warning after merge of the powerpc tree

2011-05-18 Thread Stephen Rothwell

Hi all,

After merging the powerpc tree, today's linux-next build (powerpc
allyesconfig) produced this warning:

WARNING: arch/powerpc/sysdev/built-in.o(.text+0x10eb8): Section mismatch in 
reference from the function .ics_rtas_init() to the function 
.init.text:.xics_register_ics()
The function .ics_rtas_init() references
the function __init .xics_register_ics().
This is often because .ics_rtas_init lacks a __init 
annotation or the annotation of .xics_register_ics is wrong.

Introduced by commit 0b05ac6e2480 ("powerpc/xics: Rewrite XICS driver").

ics_rtas_init() is only called from xics_init() which is marked __init,
so ics_rtas_init() should be as well.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/


pgpMvnnPJ1fHO.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] PPC_47x SMP fix

2011-05-18 Thread Benjamin Herrenschmidt

On Wed, 2011-05-18 at 11:57 +0200, Kerstin Jonsson wrote:
> commit c56e58537d504706954a06570b4034c04e5b7500 breaks SMP support in PPC_47x 
> chip.
>  secondary_ti must be set to current thread info before callin kick_cpu or 
> else
>  start_secondary_47x will jump into void when trying to return to c-code.
>  In the current setup secondary_ti is initialized before the CPU idle task is 
> started
>  and only the boot core will start. I am not sure this is the correct 
> solution, but it
>  makes SMP possible in my chip.
>  Note! The HOTPLUG support probably need some fixing to, There is no 
> trampoline code
>  available in head_44x.S - start_secondary_resume?

Sending to Linus now. I've also committed a fix for the later, moving
the 32-bit definition of start_secondary_resume to misc_32.S

Thanks !

Cheers,
Ben. 
> 
> Signed-off-by: Kerstin Jonsson 
> Cc: Paul Mackerras 
> Cc: Michael Neuling 
> Cc: Will Schmidt 
> ---
>  arch/powerpc/kernel/smp.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index cbdbb14..f2dcab7 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -410,8 +410,6 @@ int __cpuinit __cpu_up(unsigned int cpu)
>  {
>   int rc, c;
>  
> - secondary_ti = current_set[cpu];
> -
>   if (smp_ops == NULL ||
>   (smp_ops->cpu_bootable && !smp_ops->cpu_bootable(cpu)))
>   return -EINVAL;
> @@ -421,6 +419,8 @@ int __cpuinit __cpu_up(unsigned int cpu)
>   if (rc)
>   return rc;
>  
> + secondary_ti = current_set[cpu];
> +
>   /* Make sure callin-map entry is 0 (can be leftover a CPU
>* hotplug
>*/


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: mmotm threatens ppc preemption again

2011-05-18 Thread Jeremy Fitzhardinge

On 03/31/2011 01:38 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2011-03-31 at 10:21 -0700, Jeremy Fitzhardinge wrote:
>> No, its the same accessors for both, since the need to distinguish them
>> hasn't really come up.  Could you put a "if (preemptable()) return;"
>> guard in your implementations?
> That would be a band-aid but would probably do the trick for now
> for !-rt, tho it wouldn't do the right thing for -rt... 

Hi Ben,

Have you had a chance to look at doing a workaround/fix for these power
problems?  I believe that's the only holdup to putting in the batching
changes.  I'd like to get them in for the next window if possible, since
they're a pretty significant performance win for us.

Thanks,
J
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 6/7] powerpc/mm: 64-bit: tlb handler micro-optimization

2011-05-18 Thread Scott Wood

On Thu, 19 May 2011 07:54:48 +1000
Benjamin Herrenschmidt  wrote:

> On Wed, 2011-05-18 at 16:51 -0500, Scott Wood wrote:
> > On Thu, 19 May 2011 07:37:47 +1000
> > Benjamin Herrenschmidt  wrote:
> > 
> > > On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> > > > A little more speed up measured on e5500.
> > > > 
> > > > Setting of U0-3 is dropped as it is not used by Linux as far as I can
> > > > see.
> > > 
> > > Please keep them for now. If your core doesn't have them, make them an
> > > MMU feature.
> > 
> > We have them, it was just an attempt to clean out unused things to speed up
> > the miss handler.  I'll drop that part if you think we'll use it in the
> > future.
> 
> I never know for sure ... damn research people ... :-)
> 
> I'd rather keep them for now, does it make a significant difference ?

It was minor but measurable (wouldn't have been worthwhile except as part
of a series of small things that add up), but upon trying again I was able
to reorder slightly and fit it in without seeing an impact.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic

2011-05-18 Thread Moore, Eric

On Wednesday, May 18, 2011 3:30 PM, Benjamin Herrenschmidt wrote:
> 
> You may also want to look at Milton's comments, it looks like the way
> you do init_completion followed immediately by wait_completion is racy.
> 
> You should init the completion before you do the IO that will eventually
> trigger complete() to be called.
> 

I agree.  The init_completion needs to be done prior to posting the smid.  I'm 
not sure why I did it that way.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 7/7] powerpc/e5500: set MMU_FTR_USE_PAIRED_MAS

2011-05-18 Thread Benjamin Herrenschmidt

On Wed, 2011-05-18 at 16:52 -0500, Scott Wood wrote:
> On Thu, 19 May 2011 07:38:19 +1000
> Benjamin Herrenschmidt  wrote:
> 
> > On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> > > Signed-off-by: Scott Wood 
> > > ---
> > > Is there any 64-bit book3e chip that doesn't support this?  It
> > > doesn't appear to be optional in the ISA.
> > 
> > Not afaik.
> 
> Any objection to just removing the feature bit?

Nope. Wasn't it added by Kumar in the first place ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 6/7] powerpc/mm: 64-bit: tlb handler micro-optimization

2011-05-18 Thread Benjamin Herrenschmidt

On Wed, 2011-05-18 at 16:51 -0500, Scott Wood wrote:
> On Thu, 19 May 2011 07:37:47 +1000
> Benjamin Herrenschmidt  wrote:
> 
> > On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> > > A little more speed up measured on e5500.
> > > 
> > > Setting of U0-3 is dropped as it is not used by Linux as far as I can
> > > see.
> > 
> > Please keep them for now. If your core doesn't have them, make them an
> > MMU feature.
> 
> We have them, it was just an attempt to clean out unused things to speed up
> the miss handler.  I'll drop that part if you think we'll use it in the
> future.

I never know for sure ... damn research people ... :-)

I'd rather keep them for now, does it make a significant difference ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 5/7] powerpc/mm: 64-bit: don't handle non-standard page sizes

2011-05-18 Thread Benjamin Herrenschmidt

On Wed, 2011-05-18 at 16:50 -0500, Scott Wood wrote:
> On Thu, 19 May 2011 07:36:04 +1000
> Benjamin Herrenschmidt  wrote:
> 
> > On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> > > I don't see where any non-standard page size will be set in the
> > > kernel page tables, so don't waste time checking for it.  It wouldn't
> > > work with TLB0 on an FSL MMU anyway, so if there's something I missed
> > > (or which is out-of-tree), it's relying on implementation-specific
> > > behavior.  If there's an out-of-tree need for occasional 4K mappings
> > > with CONFIG_PPC_64K_PAGES, perhaps this check could only be done when
> > > that is defined.
> > > 
> > > Signed-off-by: Scott Wood 
> > > ---
> > 
> > Do you use that in the hugetlbfs code ? Can you publish that code ? It's
> > long overdue...
> 
> hugetlbfs entries don't get loaded by this code.  It branches to a slow
> path based on seeing a positive value in a pgd/pud/pmd entry.

BTW. The long overdue was aimed at David to get A2 hugetlbfs out :-)

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 7/7] powerpc/e5500: set MMU_FTR_USE_PAIRED_MAS

2011-05-18 Thread Scott Wood

On Thu, 19 May 2011 07:38:19 +1000
Benjamin Herrenschmidt  wrote:

> On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> > Signed-off-by: Scott Wood 
> > ---
> > Is there any 64-bit book3e chip that doesn't support this?  It
> > doesn't appear to be optional in the ISA.
> 
> Not afaik.

Any objection to just removing the feature bit?

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/7] powerpc/mm: 64-bit 4k: use page-sized PMDs

2011-05-18 Thread Benjamin Herrenschmidt


> > Why do you want to create a virtual page table at the PMD level ? Also,
> > you are changing the geometry of the page tables which I think we don't
> > want. We chose that geometry so that the levels match the segment sizes
> > on server, I think it may have an impact with the hugetlbfs code (check
> > with David), it also was meant as a way to implement shared page tables
> > on hash64 tho we never published that.
> 
> The number of virtual page table misses were very high on certain loads.
> Cutting back to a virtual PMD eliminates most of that for the benchmark I
> tested, though it could still be painful for access patterns that are
> extremely spread out through the 64-bit address space.  I'll try a full
> 4-level walk and see what the performance is like; I was aiming for a
> compromise between random access and linear/localized access.

Let's get more numbers first then :-)

> Why does it need to match segment sizes on server?

I'm not sure whether we have a dependency with hugetlbfs there, I need
to check (remember we have one page size per segment there). For sharing
page tables that came from us using the PMD pointer as a base to
calculate the VSIDs. But I don't think we have plans to revive those
patches in the immediate future.

Cheers,
Ben.

> As for hugetlbfs, it merged easily enough with Becky's patches (you'll have
> to ask her when they'll be published).
> 
> -Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 6/7] powerpc/mm: 64-bit: tlb handler micro-optimization

2011-05-18 Thread Scott Wood

On Thu, 19 May 2011 07:37:47 +1000
Benjamin Herrenschmidt  wrote:

> On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> > A little more speed up measured on e5500.
> > 
> > Setting of U0-3 is dropped as it is not used by Linux as far as I can
> > see.
> 
> Please keep them for now. If your core doesn't have them, make them an
> MMU feature.

We have them, it was just an attempt to clean out unused things to speed up
the miss handler.  I'll drop that part if you think we'll use it in the
future.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 5/7] powerpc/mm: 64-bit: don't handle non-standard page sizes

2011-05-18 Thread Scott Wood

On Thu, 19 May 2011 07:36:04 +1000
Benjamin Herrenschmidt  wrote:

> On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> > I don't see where any non-standard page size will be set in the
> > kernel page tables, so don't waste time checking for it.  It wouldn't
> > work with TLB0 on an FSL MMU anyway, so if there's something I missed
> > (or which is out-of-tree), it's relying on implementation-specific
> > behavior.  If there's an out-of-tree need for occasional 4K mappings
> > with CONFIG_PPC_64K_PAGES, perhaps this check could only be done when
> > that is defined.
> > 
> > Signed-off-by: Scott Wood 
> > ---
> 
> Do you use that in the hugetlbfs code ? Can you publish that code ? It's
> long overdue...

hugetlbfs entries don't get loaded by this code.  It branches to a slow
path based on seeing a positive value in a pgd/pud/pmd entry.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc: mpc85xx regression since 2.6.39-rc2, one cpu core lame

2011-05-18 Thread Benjamin Herrenschmidt


> (I get the feeling that I am the only one testing recent kernels with
> the mpc85xx.)
> 
> Anyhow, I see that this commit was one of a series. For my own use,
> can I simply revert this one commit independently?

For your own use sure :-) But I'd still like to get to the bottom of
this !

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc: mpc85xx regression since 2.6.39-rc2, one cpu core lame

2011-05-18 Thread Benjamin Herrenschmidt

On Wed, 2011-05-18 at 12:19 -0500, Milton Miller wrote:
> Does this patch help?  If so please reply to that thread so patchwork
> will see it in addition to here.
> 
> http://patchwork.ozlabs.org/patch/96146/

Interesting. I'll have a closer look today. Unfortunately, I don't have
any 32-bit BookE SMP at hand at the moment so I couldn't test those
configs.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/7] powerpc/mm: 64-bit 4k: use page-sized PMDs

2011-05-18 Thread Scott Wood

On Thu, 19 May 2011 07:32:41 +1000
Benjamin Herrenschmidt  wrote:

> On Wed, 2011-05-18 at 16:04 -0500, Scott Wood wrote:
> > This allows a virtual page table to be used at the PMD rather than
> > the PTE level.
> > 
> > Rather than adjust the constant in pgd_index() (or ignore it, as
> > too-large values don't hurt as long as overly large addresses aren't
> > passed in), go back to using PTRS_PER_PGD.  The overflow comment seems to
> > apply to a very old implementation of free_pgtables that used pgd_index()
> > (unfortunately the commit message, if you seek it out in the historic
> > tree, doesn't mention any details about the overflow).  The existing
> > value was numerically indentical to the old 4K-page PTRS_PER_PGD, so
> > using it shouldn't produce an overflow where it's not otherwise possible.
> > 
> > Also get rid of the incorrect comment at the top of pgtable-ppc64-4k.h.
> 
> Why do you want to create a virtual page table at the PMD level ? Also,
> you are changing the geometry of the page tables which I think we don't
> want. We chose that geometry so that the levels match the segment sizes
> on server, I think it may have an impact with the hugetlbfs code (check
> with David), it also was meant as a way to implement shared page tables
> on hash64 tho we never published that.

The number of virtual page table misses were very high on certain loads.
Cutting back to a virtual PMD eliminates most of that for the benchmark I
tested, though it could still be painful for access patterns that are
extremely spread out through the 64-bit address space.  I'll try a full
4-level walk and see what the performance is like; I was aiming for a
compromise between random access and linear/localized access.

Why does it need to match segment sizes on server?

As for hugetlbfs, it merged easily enough with Becky's patches (you'll have
to ask her when they'll be published).

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 7/7] powerpc/e5500: set MMU_FTR_USE_PAIRED_MAS

2011-05-18 Thread Benjamin Herrenschmidt

On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> Signed-off-by: Scott Wood 
> ---
> Is there any 64-bit book3e chip that doesn't support this?  It
> doesn't appear to be optional in the ISA.

Not afaik.

Cheers,
Ben.

>  arch/powerpc/kernel/cputable.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
> index 34d2722..a3b8eeb 100644
> --- a/arch/powerpc/kernel/cputable.c
> +++ b/arch/powerpc/kernel/cputable.c
> @@ -1981,7 +1981,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
>   .cpu_features   = CPU_FTRS_E5500,
>   .cpu_user_features  = COMMON_USER_BOOKE,
>   .mmu_features   = MMU_FTR_TYPE_FSL_E | MMU_FTR_BIG_PHYS 
> |
> - MMU_FTR_USE_TLBILX,
> + MMU_FTR_USE_TLBILX | MMU_FTR_USE_PAIRED_MAS,
>   .icache_bsize   = 64,
>   .dcache_bsize   = 64,
>   .num_pmcs   = 4,


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 6/7] powerpc/mm: 64-bit: tlb handler micro-optimization

2011-05-18 Thread Benjamin Herrenschmidt

On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> A little more speed up measured on e5500.
> 
> Setting of U0-3 is dropped as it is not used by Linux as far as I can
> see.

Please keep them for now. If your core doesn't have them, make them an
MMU feature.

Cheers,
Ben.

> Signed-off-by: Scott Wood 
> ---
>  arch/powerpc/mm/tlb_low_64e.S |   21 -
>  1 files changed, 8 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
> index e782023..a94c87b 100644
> --- a/arch/powerpc/mm/tlb_low_64e.S
> +++ b/arch/powerpc/mm/tlb_low_64e.S
> @@ -47,10 +47,10 @@
>* We could probably also optimize by not saving SRR0/1 in the
>* linear mapping case but I'll leave that for later
>*/
> - mfspr   r14,SPRN_ESR
>   mfspr   r16,SPRN_DEAR   /* get faulting address */
>   srdir15,r16,60  /* get region */
>   cmpldi  cr0,r15,0xc /* linear mapping ? */
> + mfspr   r14,SPRN_ESR
>   TLB_MISS_STATS_SAVE_INFO
>   beq tlb_load_linear /* yes -> go to linear map load */
>  
> @@ -62,11 +62,11 @@
>   andi.   r10,r15,0x1
>   bne-virt_page_table_tlb_miss
>  
> - std r14,EX_TLB_ESR(r12);/* save ESR */
> - std r16,EX_TLB_DEAR(r12);   /* save DEAR */
> + /* We need _PAGE_PRESENT and  _PAGE_ACCESSED set */
>  
> -  /* We need _PAGE_PRESENT and  _PAGE_ACCESSED set */
> + std r14,EX_TLB_ESR(r12);/* save ESR */
>   li  r11,_PAGE_PRESENT
> + std r16,EX_TLB_DEAR(r12);   /* save DEAR */
>   orisr11,r11,_PAGE_ACCESSED@h
>  
>   /* We do the user/kernel test for the PID here along with the RW test
> @@ -225,21 +225,16 @@ finish_normal_tlb_miss:
>* yet implemented for now
>* MAS 2   :Defaults not useful, need to be redone
>* MAS 3+7 :Needs to be done
> -  *
> -  * TODO: mix up code below for better scheduling
>*/
>   clrrdi  r11,r16,12  /* Clear low crap in EA */
> + rldicr  r15,r14,64-(PTE_RPN_SHIFT-PAGE_SHIFT),63-PAGE_SHIFT
>   rlwimi  r11,r14,32-19,27,31 /* Insert WIMGE */
> + clrldi  r15,r15,12  /* Clear crap at the top */
>   mtspr   SPRN_MAS2,r11
> -
> - /* Move RPN in position */
> - rldicr  r11,r14,64-(PTE_RPN_SHIFT-PAGE_SHIFT),63-PAGE_SHIFT
> - clrldi  r15,r11,12  /* Clear crap at the top */
> - rlwimi  r15,r14,32-8,22,25  /* Move in U bits */
> + andi.   r11,r14,_PAGE_DIRTY
>   rlwimi  r15,r14,32-2,26,31  /* Move in BAP bits */
>  
>   /* Mask out SW and UW if !DIRTY (XXX optimize this !) */
> - andi.   r11,r14,_PAGE_DIRTY
>   bne 1f
>   li  r11,MAS3_SW|MAS3_UW
>   andcr15,r15,r11
> @@ -483,10 +478,10 @@ virt_page_table_tlb_miss_whacko_fault:
>* We could probably also optimize by not saving SRR0/1 in the
>* linear mapping case but I'll leave that for later
>*/
> - mfspr   r14,SPRN_ESR
>   mfspr   r16,SPRN_DEAR   /* get faulting address */
>   srdir11,r16,60  /* get region */
>   cmpldi  cr0,r11,0xc /* linear mapping ? */
> + mfspr   r14,SPRN_ESR
>   TLB_MISS_STATS_SAVE_INFO
>   beq tlb_load_linear /* yes -> go to linear map load */
>  


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 5/7] powerpc/mm: 64-bit: don't handle non-standard page sizes

2011-05-18 Thread Benjamin Herrenschmidt

On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> I don't see where any non-standard page size will be set in the
> kernel page tables, so don't waste time checking for it.  It wouldn't
> work with TLB0 on an FSL MMU anyway, so if there's something I missed
> (or which is out-of-tree), it's relying on implementation-specific
> behavior.  If there's an out-of-tree need for occasional 4K mappings
> with CONFIG_PPC_64K_PAGES, perhaps this check could only be done when
> that is defined.
> 
> Signed-off-by: Scott Wood 
> ---

Do you use that in the hugetlbfs code ? Can you publish that code ? It's
long overdue...

Cheers,
Ben.

>  arch/powerpc/mm/tlb_low_64e.S |   13 -
>  1 files changed, 0 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
> index 922fece..e782023 100644
> --- a/arch/powerpc/mm/tlb_low_64e.S
> +++ b/arch/powerpc/mm/tlb_low_64e.S
> @@ -232,19 +232,6 @@ finish_normal_tlb_miss:
>   rlwimi  r11,r14,32-19,27,31 /* Insert WIMGE */
>   mtspr   SPRN_MAS2,r11
>  
> - /* Check page size, if not standard, update MAS1 */
> - rldicl  r11,r14,64-8,64-8
> -#ifdef CONFIG_PPC_64K_PAGES
> - cmpldi  cr0,r11,BOOK3E_PAGESZ_64K
> -#else
> - cmpldi  cr0,r11,BOOK3E_PAGESZ_4K
> -#endif
> - beq-1f
> - mfspr   r11,SPRN_MAS1
> - rlwimi  r11,r14,31,21,24
> - rlwinm  r11,r11,0,21,19
> - mtspr   SPRN_MAS1,r11
> -1:
>   /* Move RPN in position */
>   rldicr  r11,r14,64-(PTE_RPN_SHIFT-PAGE_SHIFT),63-PAGE_SHIFT
>   clrldi  r15,r11,12  /* Clear crap at the top */


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/7] powerpc/mm: 64-bit 4k: use a PMD-based virtual page table

2011-05-18 Thread Benjamin Herrenschmidt

On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> Loads with non-linear access patterns were producing a very high
> ratio of recursive pt faults to regular tlb misses.  Rather than
> choose between a 4-level table walk or a 1-level virtual page table
> lookup, use a hybrid scheme with a virtual linear pmd, followed by a
> 2-level lookup in the normal handler.
> 
> This adds about 5 cycles (assuming no cache misses, and e5500 timing)
> to a normal TLB miss, but greatly reduces the recursive fault rate
> for loads which don't have locality within 2 MiB regions but do have
> significant locality within 1 GiB regions.  Improvements of close to 50%
> were seen on such benchmarks.

Can you publish benchmarks that compare these two with no virtual at all
(4 full loads) ?

Cheers,
Ben.

> Signed-off-by: Scott Wood 
> ---
>  arch/powerpc/mm/tlb_low_64e.S |   23 +++
>  1 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
> index af08922..17726d3 100644
> --- a/arch/powerpc/mm/tlb_low_64e.S
> +++ b/arch/powerpc/mm/tlb_low_64e.S
> @@ -24,7 +24,7 @@
>  #ifdef CONFIG_PPC_64K_PAGES
>  #define VPTE_PMD_SHIFT   (PTE_INDEX_SIZE+1)
>  #else
> -#define VPTE_PMD_SHIFT   (PTE_INDEX_SIZE)
> +#define VPTE_PMD_SHIFT   0
>  #endif
>  #define VPTE_PUD_SHIFT   (VPTE_PMD_SHIFT + PMD_INDEX_SIZE)
>  #define VPTE_PGD_SHIFT   (VPTE_PUD_SHIFT + PUD_INDEX_SIZE)
> @@ -185,7 +185,7 @@ normal_tlb_miss:
>   /* Insert the bottom bits in */
>   rlwimi  r14,r15,0,16,31
>  #else
> - rldicl  r14,r16,64-(PAGE_SHIFT-3),PAGE_SHIFT-3+4
> + rldicl  r14,r16,64-(PMD_SHIFT-3),PMD_SHIFT-3+4
>  #endif
>   sldir15,r10,60
>   clrrdi  r14,r14,3
> @@ -202,6 +202,16 @@ MMU_FTR_SECTION_ELSE
>   ld  r14,0(r10)
>  ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_USE_TLBRSRV)
>  
> +#ifndef CONFIG_PPC_64K_PAGES
> + rldicl  r15,r16,64-PAGE_SHIFT+3,64-PTE_INDEX_SIZE-3
> + clrrdi  r15,r15,3
> +
> + cmpldi  cr0,r14,0
> + beq normal_tlb_miss_access_fault
> +
> + ldx r14,r14,r15
> +#endif
> +
>  finish_normal_tlb_miss:
>   /* Check if required permissions are met */
>   andc.   r15,r11,r14
> @@ -353,14 +363,11 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_TLBRSRV)
>  #ifndef CONFIG_PPC_64K_PAGES
>   /* Get to PUD entry */
>   rldicl  r11,r16,64-VPTE_PUD_SHIFT,64-PUD_INDEX_SIZE-3
> - clrrdi  r10,r11,3
> - ldx r15,r10,r15
> - cmpldi  cr0,r15,0
> - beq virt_page_table_tlb_miss_fault
> -#endif /* CONFIG_PPC_64K_PAGES */
> -
> +#else
>   /* Get to PMD entry */
>   rldicl  r11,r16,64-VPTE_PMD_SHIFT,64-PMD_INDEX_SIZE-3
> +#endif
> +
>   clrrdi  r10,r11,3
>   ldx r15,r10,r15
>   cmpldi  cr0,r15,0


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/7] powerpc/mm: 64-bit 4k: use page-sized PMDs

2011-05-18 Thread Benjamin Herrenschmidt

On Wed, 2011-05-18 at 16:04 -0500, Scott Wood wrote:
> This allows a virtual page table to be used at the PMD rather than
> the PTE level.
> 
> Rather than adjust the constant in pgd_index() (or ignore it, as
> too-large values don't hurt as long as overly large addresses aren't
> passed in), go back to using PTRS_PER_PGD.  The overflow comment seems to
> apply to a very old implementation of free_pgtables that used pgd_index()
> (unfortunately the commit message, if you seek it out in the historic
> tree, doesn't mention any details about the overflow).  The existing
> value was numerically indentical to the old 4K-page PTRS_PER_PGD, so
> using it shouldn't produce an overflow where it's not otherwise possible.
> 
> Also get rid of the incorrect comment at the top of pgtable-ppc64-4k.h.

Why do you want to create a virtual page table at the PMD level ? Also,
you are changing the geometry of the page tables which I think we don't
want. We chose that geometry so that the levels match the segment sizes
on server, I think it may have an impact with the hugetlbfs code (check
with David), it also was meant as a way to implement shared page tables
on hash64 tho we never published that.

Cheers,
Ben.

> Signed-off-by: Scott Wood 
> ---
>  arch/powerpc/include/asm/pgtable-ppc64-4k.h |   12 
>  arch/powerpc/include/asm/pgtable-ppc64.h|3 +--
>  2 files changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pgtable-ppc64-4k.h 
> b/arch/powerpc/include/asm/pgtable-ppc64-4k.h
> index 6eefdcf..194005e 100644
> --- a/arch/powerpc/include/asm/pgtable-ppc64-4k.h
> +++ b/arch/powerpc/include/asm/pgtable-ppc64-4k.h
> @@ -1,14 +1,10 @@
>  #ifndef _ASM_POWERPC_PGTABLE_PPC64_4K_H
>  #define _ASM_POWERPC_PGTABLE_PPC64_4K_H
> -/*
> - * Entries per page directory level.  The PTE level must use a 64b record
> - * for each page table entry.  The PMD and PGD level use a 32b record for
> - * each entry by assuming that each entry is page aligned.
> - */
> +
>  #define PTE_INDEX_SIZE  9
> -#define PMD_INDEX_SIZE  7
> +#define PMD_INDEX_SIZE  9
>  #define PUD_INDEX_SIZE  7
> -#define PGD_INDEX_SIZE  9
> +#define PGD_INDEX_SIZE  7
>  
>  #ifndef __ASSEMBLY__
>  #define PTE_TABLE_SIZE   (sizeof(pte_t) << PTE_INDEX_SIZE)
> @@ -19,7 +15,7 @@
>  
>  #define PTRS_PER_PTE (1 << PTE_INDEX_SIZE)
>  #define PTRS_PER_PMD (1 << PMD_INDEX_SIZE)
> -#define PTRS_PER_PUD (1 << PMD_INDEX_SIZE)
> +#define PTRS_PER_PUD (1 << PUD_INDEX_SIZE)
>  #define PTRS_PER_PGD (1 << PGD_INDEX_SIZE)
>  
>  /* PMD_SHIFT determines what a second-level page table entry can map */
> diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
> b/arch/powerpc/include/asm/pgtable-ppc64.h
> index 2b09cd5..8bd1cd9 100644
> --- a/arch/powerpc/include/asm/pgtable-ppc64.h
> +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
> @@ -181,8 +181,7 @@
>   * Find an entry in a page-table-directory.  We combine the address region
>   * (the high order N bits) and the pgd portion of the address.
>   */
> -/* to avoid overflow in free_pgtables we don't use PTRS_PER_PGD here */
> -#define pgd_index(address) (((address) >> (PGDIR_SHIFT)) & 0x1ff)
> +#define pgd_index(address) (((address) >> (PGDIR_SHIFT)) & (PTRS_PER_PGD - 
> 1))
>  
>  #define pgd_offset(mm, address)   ((mm)->pgd + pgd_index(address))
>  


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic

2011-05-18 Thread Benjamin Herrenschmidt

On Wed, 2011-05-18 at 09:35 -0600, Moore, Eric wrote:
> I worked the original defect a couple months ago, and Kashyap is now
> getting around to posting my patch's.
> 
> This original defect has nothing to do with PPC64.  The original
> problem was only on x86.It only became a problem on PPC64 when I
> tried to fix the original x86 issue by copying the writeq code from
> the linux headers, then it broke PPC64.   I doubt that broken patch
> was ever posted. Anyways, back to the original defect.  The reason it
> because a problem for x86 is because the kernel headers had a
> implementation of writeq in the arch/x86 headers, which means our
> internal implementation of writeq is not being used.  The writeq
> implementation in the kernel is total wrong for arch/x86 because it
> doesn't not have spin locks, and if two processor simultaneously doing
> two separate 32bit pci writes, then what is received by controller
> firmware is out of order.   This change occurs between Red Hat RHEL5
> and RHEL6.  In RHEL5, this writeq was not implemented in arch/x86
> headers, and our driver internal implementation of write was used.

You may also want to look at Milton's comments, it looks like the way
you do init_completion followed immediately by wait_completion is racy.

You should init the completion before you do the IO that will eventually
trigger complete() to be called.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 7/7] [RFC] SMP support code

2011-05-18 Thread Eric Van Hensbergen

This patch adds the necessary core code to enable SMP support on BlueGene/P

Signed-off-by: Eric Van Hensbergen 
---
 arch/powerpc/kernel/head_44x.S |   72 +
 arch/powerpc/mm/fault.c|   77 
 arch/powerpc/platforms/Kconfig.cputype |2 +-
 3 files changed, 150 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 1f7ae60..57d4483 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -1133,6 +1133,70 @@ clear_utlb_entry:
 
 #endif /* CONFIG_PPC_47x */
 
+#if defined(CONFIG_BGP) && defined(CONFIG_SMP)
+_GLOBAL(start_secondary_bgp)
+   /* U2 will be enabled in TLBs. */
+lis r7,PPC44x_MMUCR_U2@h
+mtspr   SPRN_MMUCR,r7
+li  r7,0
+mtspr   SPRN_PID,r7
+sync
+lis r8,KERNELBASE@h
+
+/* The tlb_44x_hwater global var (setup by cpu#0) reveals how many
+ * 256M TLBs we need to map.
+ */
+lis r9, tlb_44x_hwater@ha
+lwz r9, tlb_44x_hwater@l(r9)
+
+li  r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
+   PPC44x_TLB_M|PPC44x_TLB_U2)
+orisr5, r5, PPC44x_TLB_WL1@h
+
+/* tlb_44x_hwater is the biggest TLB slot number for regular TLBs.
+   TLB 63 covers kernel base mapping(256MB) and TLB 62 covers CNS.
+   With 768MB lowmem, it is set to 59.
+*/
+2:
+addir9, r9, 1
+cmpwi   r9,62  /* Stop at entry 62 which is the fw */
+beq 3f
+addis   r7,r7,0x1000   /* add 256M */
+addis   r8,r8,0x1000
+ori r6,r8,PPC44x_TLB_VALID | PPC44x_TLB_256M
+
+tlbwe   r6,r9,PPC44x_TLB_PAGEID /* Load the pageid fields */
+tlbwe   r7,r9,PPC44x_TLB_XLAT   /* Load the translation fields */
+tlbwe   r5,r9,PPC44x_TLB_ATTRIB /* Load the attrib/access fields */
+b   2b
+
+3:  isync
+
+/* Setup context from global var secondary_ti */
+lis r1, secondary_ti@ha
+lwz r1, secondary_ti@l(r1)
+lwz r2, TI_TASK(r1) /*  r2 = task_info */
+
+addir3,r2,THREAD/* init task's THREAD */
+mtspr   SPRN_SPRG3,r3
+
+li  r0,0
+stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1)
+
+/* Let's move on */
+lis r4,start_secondary@h
+ori r4,r4,start_secondary@l
+lis r3,MSR_KERNEL@h
+ori r3,r3,MSR_KERNEL@l
+mtspr   SPRN_SRR0,r4
+mtspr   SPRN_SRR1,r3
+rfi /* change context and jump to start_secondary 
*/
+
+_GLOBAL(start_secondary_resume)
+   /* I don't think this currently happens on BGP */
+   b   .
+#endif /* CONFIG_BGP && CONFIG_SMP */
+
 /*
  * Here we are back to code that is common between 44x and 47x
  *
@@ -1144,6 +1208,14 @@ head_start_common:
lis r4,interrupt_base@h /* IVPR only uses the high 16-bits */
mtspr   SPRN_IVPR,r4
 
+#if defined(CONFIG_BGP) && defined(CONFIG_SMP)
+   /* are we an additional CPU */
+   li  r0, 0
+   mfspr   r4, SPRN_PIR
+   cmpwr4, r0
+   bgt start_secondary_bgp
+#endif /* CONFIG_BGP && CONFIG_SMP */
+
addis   r22,r22,KERNELBASE@h
mtlrr22
isync
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 54f4fb9..0e73244 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -103,6 +103,77 @@ static int store_updates_sp(struct pt_regs *regs)
return 0;
 }
 
+#ifdef CONFIG_BGP
+/*
+ * The icbi instruction does not broadcast to all cpus in the ppc450
+ * processor used by Blue Gene/P.  It is unlikely this problem will
+ * be exhibited in other processors so this remains ifdef'ed for BGP
+ * specifically.
+ *
+ * We deal with this by marking executable pages either writable, or
+ * executable, but never both.  The permissions will fault back and
+ * forth if the thread is actively writing to executable sections.
+ * Each time we fault to become executable we flush the dcache into
+ * icache on all cpus.
+ */
+struct bgp_fixup_parm {
+   struct page *page;
+   unsigned long   address;
+   struct vm_area_struct   *vma;
+};
+
+static void bgp_fixup_cache_tlb(void *parm)
+{
+   struct bgp_fixup_parm   *p = parm;
+
+   if (!PageHighMem(p->page))
+   flush_dcache_icache_page(p->page);
+   local_flush_tlb_page(p->vma, p->address);
+}
+
+static void bgp_fixup_access_perms(struct vm_area_struct *vma,
+ unsigned long address,
+ int is_write, int is_exec)
+{
+   struct mm_struct *mm = vma->vm_mm;
+   pte_t *ptep = NULL;
+   pmd_t *pmdp;
+
+   if (get_pteptr(mm, address, &ptep, &pmdp)) {
+   spinlock_t *ptl

[PATCH 5/7] [RFC] force 32-byte aligned kmallocs

2011-05-18 Thread Eric Van Hensbergen

For BGP, it is convenient for 'kmalloc' to come back with 32-byte
aligned units for torus DMA

Signed-off-by: Eric Van Hensbergen 
---
 arch/powerpc/include/asm/page_32.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/page_32.h 
b/arch/powerpc/include/asm/page_32.h
index 68d73b2..fb0a7ae 100644
--- a/arch/powerpc/include/asm/page_32.h
+++ b/arch/powerpc/include/asm/page_32.h
@@ -9,7 +9,7 @@
 
 #define VM_DATA_DEFAULT_FLAGS  VM_DATA_DEFAULT_FLAGS32
 
-#ifdef CONFIG_NOT_COHERENT_CACHE
+#if defined(CONFIG_NOT_COHERENT_CACHE) || defined(CONFIG_BGP)
 #define ARCH_DMA_MINALIGN  L1_CACHE_BYTES
 #endif
 
-- 
1.7.4.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 6/7] [RFC] enable early TLBs for BG/P

2011-05-18 Thread Eric Van Hensbergen

BG/P maps firmware with an early TLB

Signed-off-by: Eric Van Hensbergen 
---
 arch/powerpc/include/asm/mmu-44x.h |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-44x.h 
b/arch/powerpc/include/asm/mmu-44x.h
index ca1b90c..2807d6e 100644
--- a/arch/powerpc/include/asm/mmu-44x.h
+++ b/arch/powerpc/include/asm/mmu-44x.h
@@ -115,8 +115,12 @@ typedef struct {
 #endif /* !__ASSEMBLY__ */
 
 #ifndef CONFIG_PPC_EARLY_DEBUG_44x
+#ifndef CONFIG_BGP
 #define PPC44x_EARLY_TLBS  1
-#else
+#else /* CONFIG_BGP */
+#define PPC44x_EARLY_TLBS  2
+#endif /* CONFIG_BGP */
+#else /* CONFIG_PPC_EARLY_DEBUG_44x */
 #define PPC44x_EARLY_TLBS  2
 #define PPC44x_EARLY_DEBUG_VIRTADDR(ASM_CONST(0xf000) \
| (ASM_CONST(CONFIG_PPC_EARLY_DEBUG_44x_PHYSLOW) & 0x))
-- 
1.7.4.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/7] [RFC] add support for BlueGene/P FPU

2011-05-18 Thread Eric Van Hensbergen

This patch adds save/restore register support for the BlueGene/P
double hummer FPU.

Signed-off-by: Eric Van Hensbergen 
---
 arch/powerpc/include/asm/ppc_asm.h |   39 ---
 arch/powerpc/kernel/fpu.S  |8 +++---
 arch/powerpc/platforms/44x/Kconfig |9 
 3 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index 9821006..daa22bb 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -88,6 +88,13 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
REST_10GPRS(22, base)
 #endif
 
+#ifdef CONFIG_BGP
+#define LFPDX(frt, ra, rb) .long (31<<26)|((frt)<<21)|((ra)<<16)| \
+   ((rb)<<11)|(462<<1)
+#define STFPDX(frt, ra, rb).long (31<<26)|((frt)<<21)|((ra)<<16)| \
+   ((rb)<<11)|(974<<1)
+#endif /* CONFIG_BGP */
+
 #define SAVE_2GPRS(n, base)SAVE_GPR(n, base); SAVE_GPR(n+1, base)
 #define SAVE_4GPRS(n, base)SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
 #define SAVE_8GPRS(n, base)SAVE_4GPRS(n, base); SAVE_4GPRS(n+4, base)
@@ -97,18 +104,26 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
 #define REST_8GPRS(n, base)REST_4GPRS(n, base); REST_4GPRS(n+4, base)
 #define REST_10GPRS(n, base)   REST_8GPRS(n, base); REST_2GPRS(n+8, base)
 
-#define SAVE_FPR(n, base)  stfdn,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
-#define SAVE_2FPRS(n, base)SAVE_FPR(n, base); SAVE_FPR(n+1, base)
-#define SAVE_4FPRS(n, base)SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
-#define SAVE_8FPRS(n, base)SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
-#define SAVE_16FPRS(n, base)   SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
-#define SAVE_32FPRS(n, base)   SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base)  lfd n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
-#define REST_2FPRS(n, base)REST_FPR(n, base); REST_FPR(n+1, base)
-#define REST_4FPRS(n, base)REST_2FPRS(n, base); REST_2FPRS(n+2, base)
-#define REST_8FPRS(n, base)REST_4FPRS(n, base); REST_4FPRS(n+4, base)
-#define REST_16FPRS(n, base)   REST_8FPRS(n, base); REST_8FPRS(n+8, base)
-#define REST_32FPRS(n, base)   REST_16FPRS(n, base); REST_16FPRS(n+16, base)
+#ifdef CONFIG_BGP
+#define SAVE_FPR(n, b, base)   li b, THREAD_FPR0+(16*(n)); STFPDX(n, base, b)
+#define REST_FPR(n, b, base)   li b, THREAD_FPR0+(16*(n)); LFPDX(n, base, b)
+#else /* CONFIG_BGP */
+#define SAVE_FPR(n, b, base)   (stfd   n, THREAD_FPR0+8*TS_FPRWIDTH*(n)(base))
+#define REST_FPR(n, b, base)   (lfdn, THREAD_FPR0+8*TS_FPRWIDTH*(n)(base))
+#endif /* CONFIG_BGP */
+
+#define SAVE_2FPRS(n, b, base) SAVE_FPR(n, b, base); SAVE_FPR(n+1, b, base)
+#define SAVE_4FPRS(n, b, base) SAVE_2FPRS(n, b, base); SAVE_2FPRS(n+2, b, base)
+#define SAVE_8FPRS(n, b, base) SAVE_4FPRS(n, b, base); SAVE_4FPRS(n+4, b, base)
+#define SAVE_16FPRS(n, b, base)SAVE_8FPRS(n, b, base); SAVE_8FPRS(n+8, 
b, base)
+#define SAVE_32FPRS(n, b, base)SAVE_16FPRS(n, b, base); \
+   SAVE_16FPRS(n+16, b, base)
+#define REST_2FPRS(n, b, base) REST_FPR(n, b, base); REST_FPR(n+1, b, base)
+#define REST_4FPRS(n, b, base) REST_2FPRS(n, b, base); REST_2FPRS(n+2, b, base)
+#define REST_8FPRS(n, b, base) REST_4FPRS(n, b, base); REST_4FPRS(n+4, b, base)
+#define REST_16FPRS(n, b, base)REST_8FPRS(n, b, base); REST_8FPRS(n+8, 
b, base)
+#define REST_32FPRS(n, b, base)REST_16FPRS(n, b, base); \
+   REST_16FPRS(n+16, b, base)
 
 #define SAVE_VR(n,b,base)  li b,THREAD_VR0+(16*(n));  stvx n,base,b
 #define SAVE_2VRS(n,b,base)SAVE_VR(n,b,base); SAVE_VR(n+1,b,base)
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index de36955..9f11c66 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -30,7 +30,7 @@
 BEGIN_FTR_SECTION  \
b   2f; \
 END_FTR_SECTION_IFSET(CPU_FTR_VSX);\
-   REST_32FPRS(n,base);\
+   REST_32FPRS(n,c,base);  \
b   3f; \
 2: REST_32VSRS(n,c,base);  \
 3:
@@ -39,13 +39,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX); 
\
 BEGIN_FTR_SECTION  \
b   2f; \
 END_FTR_SECTION_IFSET(CPU_FTR_VSX);\
-   SAVE_32FPRS(n,base);\
+   SAVE_32FPRS(n,c,base);  \
b   3f;

[PATCH 4/7] [RFC] enable L1_WRITETHROUGH mode for BG/P

2011-05-18 Thread Eric Van Hensbergen

BG/P nodes need to be configured for writethrough to work in SMP
configurations.  This patch adds the right hooks in the MMU code
to make sure L1_WRITETHROUGH configurations are setup for BG/P.

Signed-off-by: Eric Van Hensbergen 
---
 arch/powerpc/include/asm/mmu-44x.h |2 ++
 arch/powerpc/kernel/head_44x.S |   24 ++--
 arch/powerpc/kernel/misc_32.S  |   15 +++
 arch/powerpc/lib/copy_32.S |   10 ++
 arch/powerpc/mm/44x_mmu.c  |7 +--
 arch/powerpc/platforms/Kconfig |5 +
 arch/powerpc/platforms/Kconfig.cputype |4 
 7 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-44x.h 
b/arch/powerpc/include/asm/mmu-44x.h
index bf52d70..ca1b90c 100644
--- a/arch/powerpc/include/asm/mmu-44x.h
+++ b/arch/powerpc/include/asm/mmu-44x.h
@@ -8,6 +8,7 @@
 
 #define PPC44x_MMUCR_TID   0x00ff
 #define PPC44x_MMUCR_STS   0x0001
+#define PPC44x_MMUCR_U20x0020
 
 #definePPC44x_TLB_PAGEID   0
 #definePPC44x_TLB_XLAT 1
@@ -32,6 +33,7 @@
 
 /* Storage attribute and access control fields */
 #define PPC44x_TLB_ATTR_MASK   0xff80
+#define PPC44x_TLB_WL1 0x0010  /* Write-through L1 */
 #define PPC44x_TLB_U0  0x8000  /* User 0 */
 #define PPC44x_TLB_U1  0x4000  /* User 1 */
 #define PPC44x_TLB_U2  0x2000  /* User 2 */
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 5e12b74..1f7ae60 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -429,7 +429,16 @@ finish_tlb_load_44x:
andi.   r10,r12,_PAGE_USER  /* User page ? */
beq 1f  /* nope, leave U bits empty */
rlwimi  r11,r11,3,26,28 /* yes, copy S bits to U */
-1: tlbwe   r11,r13,PPC44x_TLB_ATTRIB   /* Write ATTRIB */
+1:
+#ifdef CONFIG_L1_WRITETHROUGH
+   andi.   r10, r11, PPC44x_TLB_I
+   bne 2f
+   orisr11,r11,PPC44x_TLB_WL1@h/* Add coherency for */
+   /* non-inhibited */
+   ori r11,r11,PPC44x_TLB_U2|PPC44x_TLB_M
+2:
+#endif /* CONFIG_L1_WRITETHROUGH */
+   tlbwe   r11,r13,PPC44x_TLB_ATTRIB   /* Write ATTRIB */
 
/* Done...restore registers and get out of here.
*/
@@ -799,7 +808,11 @@ skpinv:addir4,r4,1 /* 
Increment */
sync
 
/* Initialize MMUCR */
+#ifdef CONFIG_L1_WRITETHROUGH
+   lis r5, PPC44x_MMUCR_U2@h
+#else
li  r5,0
+#endif /* CONFIG_L1_WRITETHROUGH */
mtspr   SPRN_MMUCR,r5
sync
 
@@ -814,7 +827,14 @@ skpinv:addir4,r4,1 /* 
Increment */
/* attrib fields */
/* Added guarded bit to protect against speculative loads/stores */
li  r5,0
-   ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | 
PPC44x_TLB_G)
+#ifdef CONFIG_L1_WRITETHROUGH
+   ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
+   PPC44x_TLB_G | PPC44x_TLB_U2)
+   orisr5,r5,PPC44x_TLB_WL1@h
+#else
+   ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
+   PPC44x_TLB_G)
+#endif /* CONFIG_L1_WRITETHROUGH
 
 li  r0,63/* TLB slot 63 */
 
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 094bd98..d88369b 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -506,7 +506,20 @@ _GLOBAL(clear_pages)
li  r0,PAGE_SIZE/L1_CACHE_BYTES
slw r0,r0,r4
mtctr   r0
+#ifdef CONFIG_L1_WRITETHROUGH
+   /* assuming 32 byte cacheline */
+   li  r4, 0
+1: stw r4, 0(r3)
+   stw r4, 4(r3)
+   stw r4, 8(r3)
+   stw r4, 12(r3)
+   stw r4, 16(r3)
+   stw r4, 20(r3)
+   stw r4, 24(r3)
+   stw r4, 28(r3)
+#else
 1: dcbz0,r3
+#endif /* CONFIG_L1_WRITETHROUGH */
addir3,r3,L1_CACHE_BYTES
bdnz1b
blr
@@ -550,7 +563,9 @@ _GLOBAL(copy_page)
mtctr   r0
 1:
dcbtr11,r4
+#ifndef CONFIG_L1_WRITETHROUGH
dcbzr5,r3
+#endif
COPY_16_BYTES
 #if L1_CACHE_BYTES >= 32
COPY_16_BYTES
diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S
index 55f19f9..98a07e3 100644
--- a/arch/powerpc/lib/copy_32.S
+++ b/arch/powerpc/lib/copy_32.S
@@ -98,7 +98,11 @@ _GLOBAL(cacheable_memzero)
bdnz4b
 3: mtctr   r9
li  r7,4
+#ifdef CONFIG_L1_WRITETHROUGH
+10:
+#else
 10:dcbzr7,r6
+#endif /* CONFIG_L1_WRITETHROUGH */
addir6,r6,CACHELINE_BYTES
bdnz10b
clrlwi  r5,r8,32-LG_CACHELINE_BYTES
@@ -187,7 +191,9 @@ _GLOBAL(cacheable_me

[PATCH 2/7] [RFC] add bluegene entry to cputable

2011-05-18 Thread Eric Van Hensbergen

Signed-off-by: Eric Van Hensbergen 
---
 arch/powerpc/kernel/cputable.c |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index b9602ee..0eb245e 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -1732,6 +1732,20 @@ static struct cpu_spec __initdata cpu_specs[] = {
.machine_check  = machine_check_440A,
.platform   = "ppc440",
},
+   { /* Blue Gene/P */
+   .pvr_mask   = 0xfff0,
+   .pvr_value  = 0x52131880,
+   .cpu_name   = "450 Blue Gene/P",
+   .cpu_features   = CPU_FTRS_440x6,
+   .cpu_user_features  = COMMON_USER_BOOKE |
+   PPC_FEATURE_HAS_FPU,
+   .mmu_features   = MMU_FTR_TYPE_44x,
+   .icache_bsize   = 32,
+   .dcache_bsize   = 32,
+   .cpu_setup  = __setup_cpu_460gt,
+   .machine_check  = machine_check_440A,
+   .platform   = "ppc440",
+   },
{ /* 460EX */
.pvr_mask   = 0x0006,
.pvr_value  = 0x13020002,
-- 
1.7.4.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/7] [RFC] Mainline BG/P platform support

2011-05-18 Thread Eric Van Hensbergen

The Linux kernel patches for the IBM BlueGene/P have been open-sourced
for quite some time, but haven't been integrated into the mainline Linux
kernel source tree.  This is the first patch series of several where I
will attempt to cleanup and mainline the already public patches.  I
welcome feedback as well as any help I can get.  I'm drawing on
the patches available for the IBM Compute Node kernel, the ZeptoOS project
and the Kittyhawk project.
(all available from http://wiki.bg.anl-external.org)

I'll be prioritizing core patches which are harder to keep current with
mainline due to merge conflicts and then slowly incorporating the drivers
and other extensions (if acceptable after community review).

I'll be maintaining the patchset in my kernel.org repository
(/pub/scm/linux/kernel/git/ericvh/bluegene.git) under the bluegene
branch with the source repos (zepto, kittyhawk, ibmcn) available in
respective branches.  Ben - if you would prefer me to send pull requests
once we get rolling, I can switch to that -- otherwise I'll stick to
just submitting patches to the list assuming you'll pull them when they
become acceptable.  Thanks for your attention reviewing these patches.

Signed-off-by: Eric Van Hensbergen 
---
 MAINTAINERS |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 69f19f1..3ffca88 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3863,6 +3863,14 @@ S:   Maintained
 F: arch/powerpc/platforms/40x/
 F: arch/powerpc/platforms/44x/
 
+LINUX FOR POWERPC BLUEGENE/P
+M: Eric Van Hensbergen 
+W: http://bg-linux.anl-external.org/wiki/index.php/Main_Page
+L: bg-li...@lists.anl-external.org
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/bluegene.git
+S: Maintained
+F: arch/powerpc/platforms/44x/bgp*
+
 LINUX FOR POWERPC EMBEDDED XILINX VIRTEX
 M: Grant Likely 
 W: http://wiki.secretlab.ca/index.php/Linux_on_Xilinx_Virtex
-- 
1.7.4.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 5/7] powerpc/mm: 64-bit: don't handle non-standard page sizes

2011-05-18 Thread Scott Wood

I don't see where any non-standard page size will be set in the
kernel page tables, so don't waste time checking for it.  It wouldn't
work with TLB0 on an FSL MMU anyway, so if there's something I missed
(or which is out-of-tree), it's relying on implementation-specific
behavior.  If there's an out-of-tree need for occasional 4K mappings
with CONFIG_PPC_64K_PAGES, perhaps this check could only be done when
that is defined.

Signed-off-by: Scott Wood 
---
 arch/powerpc/mm/tlb_low_64e.S |   13 -
 1 files changed, 0 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
index 922fece..e782023 100644
--- a/arch/powerpc/mm/tlb_low_64e.S
+++ b/arch/powerpc/mm/tlb_low_64e.S
@@ -232,19 +232,6 @@ finish_normal_tlb_miss:
rlwimi  r11,r14,32-19,27,31 /* Insert WIMGE */
mtspr   SPRN_MAS2,r11
 
-   /* Check page size, if not standard, update MAS1 */
-   rldicl  r11,r14,64-8,64-8
-#ifdef CONFIG_PPC_64K_PAGES
-   cmpldi  cr0,r11,BOOK3E_PAGESZ_64K
-#else
-   cmpldi  cr0,r11,BOOK3E_PAGESZ_4K
-#endif
-   beq-1f
-   mfspr   r11,SPRN_MAS1
-   rlwimi  r11,r14,31,21,24
-   rlwinm  r11,r11,0,21,19
-   mtspr   SPRN_MAS1,r11
-1:
/* Move RPN in position */
rldicr  r11,r14,64-(PTE_RPN_SHIFT-PAGE_SHIFT),63-PAGE_SHIFT
clrldi  r15,r11,12  /* Clear crap at the top */
-- 
1.7.4.1


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/7] powerpc/mm: 64-bit: Don't load PACA in normal TLB miss exceptions

2011-05-18 Thread Scott Wood

Load it only when needed, in recursive/linear/indirect faults,
and in the stats code.

Signed-off-by: Scott Wood 
---
 arch/powerpc/include/asm/exception-64e.h |   28 +-
 arch/powerpc/mm/tlb_low_64e.S|   43 +
 2 files changed, 39 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64e.h 
b/arch/powerpc/include/asm/exception-64e.h
index 6921261..9b57a27 100644
--- a/arch/powerpc/include/asm/exception-64e.h
+++ b/arch/powerpc/include/asm/exception-64e.h
@@ -80,9 +80,9 @@ exc_##label##_book3e:
  *
  * This prolog handles re-entrancy (up to 3 levels supported in the PACA
  * though we currently don't test for overflow). It provides you with a
- * re-entrancy safe working space of r10...r16 and CR with r12 being used
- * as the exception area pointer in the PACA for that level of re-entrancy
- * and r13 containing the PACA pointer.
+ * re-entrancy safe working space of r10...r16 (except r13) and CR with r12
+ * being used as the exception area pointer in the PACA for that level of
+ * re-entrancy.
  *
  * SRR0 and SRR1 are saved, but DEAR and ESR are not, since they don't apply
  * as-is for instruction exceptions. It's up to the actual exception code
@@ -95,8 +95,6 @@ exc_##label##_book3e:
mfcrr10;\
std r11,EX_TLB_R11(r12);\
mfspr   r11,SPRN_SPRG_TLB_SCRATCH;  \
-   std r13,EX_TLB_R13(r12);\
-   ld  r13,EX_TLB_PACA(r12);   \
std r14,EX_TLB_R14(r12);\
addir14,r12,EX_TLB_SIZE;\
std r15,EX_TLB_R15(r12);\
@@ -135,7 +133,6 @@ exc_##label##_book3e:
mtspr   SPRN_SPRG_TLB_EXFRAME,freg; \
ld  r11,EX_TLB_R11(r12);\
mtcrr14;\
-   ld  r13,EX_TLB_R13(r12);\
ld  r14,EX_TLB_R14(r12);\
mtspr   SPRN_SRR0,r15;  \
ld  r15,EX_TLB_R15(r12);\
@@ -148,11 +145,13 @@ exc_##label##_book3e:
TLB_MISS_RESTORE(r12)
 
 #define TLB_MISS_EPILOG_ERROR  \
-   addir12,r13,PACA_EXTLB; \
+   ld  r10,EX_TLB_PACA(r12);   \
+   addir12,r10,PACA_EXTLB; \
TLB_MISS_RESTORE(r12)
 
 #define TLB_MISS_EPILOG_ERROR_SPECIAL  \
-   addir11,r13,PACA_EXTLB; \
+   ld  r10,EX_TLB_PACA(r12);   \
+   addir11,r10,PACA_EXTLB; \
TLB_MISS_RESTORE(r11)
 
 #ifdef CONFIG_BOOK3E_MMU_TLB_STATS
@@ -160,25 +159,26 @@ exc_##label##_book3e:
mflrr10;\
std r8,EX_TLB_R8(r12);  \
std r9,EX_TLB_R9(r12);  \
-   std r10,EX_TLB_LR(r12);
+   std r10,EX_TLB_LR(r12); \
+   ld  r9,EX_TLB_PACA(r12);
 #define TLB_MISS_RESTORE_STATS \
ld  r16,EX_TLB_LR(r12); \
ld  r9,EX_TLB_R9(r12);  \
ld  r8,EX_TLB_R8(r12);  \
mtlrr16;
 #define TLB_MISS_STATS_D(name) \
-   addir9,r13,MMSTAT_DSTATS+name;  \
+   addir9,r9,MMSTAT_DSTATS+name;   \
bl  .tlb_stat_inc;
 #define TLB_MISS_STATS_I(name) \
-   addir9,r13,MMSTAT_ISTATS+name;  \
+   addir9,r9,MMSTAT_ISTATS+name;   \
bl  .tlb_stat_inc;
 #define TLB_MISS_STATS_X(name) \
-   ld  r8,PACA_EXTLB+EX_TLB_ESR(r13);  \
+   ld  r8,PACA_EXTLB+EX_TLB_ESR(r9);   \
cmpdi   cr2,r8,-1;  \
beq cr2,61f;\
-   addi

[PATCH 7/7] powerpc/e5500: set MMU_FTR_USE_PAIRED_MAS

2011-05-18 Thread Scott Wood

Signed-off-by: Scott Wood 
---
Is there any 64-bit book3e chip that doesn't support this?  It
doesn't appear to be optional in the ISA.

 arch/powerpc/kernel/cputable.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 34d2722..a3b8eeb 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -1981,7 +1981,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
.cpu_features   = CPU_FTRS_E5500,
.cpu_user_features  = COMMON_USER_BOOKE,
.mmu_features   = MMU_FTR_TYPE_FSL_E | MMU_FTR_BIG_PHYS 
|
-   MMU_FTR_USE_TLBILX,
+   MMU_FTR_USE_TLBILX | MMU_FTR_USE_PAIRED_MAS,
.icache_bsize   = 64,
.dcache_bsize   = 64,
.num_pmcs   = 4,
-- 
1.7.4.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 6/7] powerpc/mm: 64-bit: tlb handler micro-optimization

2011-05-18 Thread Scott Wood

A little more speed up measured on e5500.

Setting of U0-3 is dropped as it is not used by Linux as far as I can
see.

Signed-off-by: Scott Wood 
---
 arch/powerpc/mm/tlb_low_64e.S |   21 -
 1 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
index e782023..a94c87b 100644
--- a/arch/powerpc/mm/tlb_low_64e.S
+++ b/arch/powerpc/mm/tlb_low_64e.S
@@ -47,10 +47,10 @@
 * We could probably also optimize by not saving SRR0/1 in the
 * linear mapping case but I'll leave that for later
 */
-   mfspr   r14,SPRN_ESR
mfspr   r16,SPRN_DEAR   /* get faulting address */
srdir15,r16,60  /* get region */
cmpldi  cr0,r15,0xc /* linear mapping ? */
+   mfspr   r14,SPRN_ESR
TLB_MISS_STATS_SAVE_INFO
beq tlb_load_linear /* yes -> go to linear map load */
 
@@ -62,11 +62,11 @@
andi.   r10,r15,0x1
bne-virt_page_table_tlb_miss
 
-   std r14,EX_TLB_ESR(r12);/* save ESR */
-   std r16,EX_TLB_DEAR(r12);   /* save DEAR */
+   /* We need _PAGE_PRESENT and  _PAGE_ACCESSED set */
 
-/* We need _PAGE_PRESENT and  _PAGE_ACCESSED set */
+   std r14,EX_TLB_ESR(r12);/* save ESR */
li  r11,_PAGE_PRESENT
+   std r16,EX_TLB_DEAR(r12);   /* save DEAR */
orisr11,r11,_PAGE_ACCESSED@h
 
/* We do the user/kernel test for the PID here along with the RW test
@@ -225,21 +225,16 @@ finish_normal_tlb_miss:
 * yet implemented for now
 * MAS 2   :Defaults not useful, need to be redone
 * MAS 3+7 :Needs to be done
-*
-* TODO: mix up code below for better scheduling
 */
clrrdi  r11,r16,12  /* Clear low crap in EA */
+   rldicr  r15,r14,64-(PTE_RPN_SHIFT-PAGE_SHIFT),63-PAGE_SHIFT
rlwimi  r11,r14,32-19,27,31 /* Insert WIMGE */
+   clrldi  r15,r15,12  /* Clear crap at the top */
mtspr   SPRN_MAS2,r11
-
-   /* Move RPN in position */
-   rldicr  r11,r14,64-(PTE_RPN_SHIFT-PAGE_SHIFT),63-PAGE_SHIFT
-   clrldi  r15,r11,12  /* Clear crap at the top */
-   rlwimi  r15,r14,32-8,22,25  /* Move in U bits */
+   andi.   r11,r14,_PAGE_DIRTY
rlwimi  r15,r14,32-2,26,31  /* Move in BAP bits */
 
/* Mask out SW and UW if !DIRTY (XXX optimize this !) */
-   andi.   r11,r14,_PAGE_DIRTY
bne 1f
li  r11,MAS3_SW|MAS3_UW
andcr15,r15,r11
@@ -483,10 +478,10 @@ virt_page_table_tlb_miss_whacko_fault:
 * We could probably also optimize by not saving SRR0/1 in the
 * linear mapping case but I'll leave that for later
 */
-   mfspr   r14,SPRN_ESR
mfspr   r16,SPRN_DEAR   /* get faulting address */
srdir11,r16,60  /* get region */
cmpldi  cr0,r11,0xc /* linear mapping ? */
+   mfspr   r14,SPRN_ESR
TLB_MISS_STATS_SAVE_INFO
beq tlb_load_linear /* yes -> go to linear map load */
 
-- 
1.7.4.1


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/7] powerpc/mm: 64-bit 4k: use a PMD-based virtual page table

2011-05-18 Thread Scott Wood

Loads with non-linear access patterns were producing a very high
ratio of recursive pt faults to regular tlb misses.  Rather than
choose between a 4-level table walk or a 1-level virtual page table
lookup, use a hybrid scheme with a virtual linear pmd, followed by a
2-level lookup in the normal handler.

This adds about 5 cycles (assuming no cache misses, and e5500 timing)
to a normal TLB miss, but greatly reduces the recursive fault rate
for loads which don't have locality within 2 MiB regions but do have
significant locality within 1 GiB regions.  Improvements of close to 50%
were seen on such benchmarks.

Signed-off-by: Scott Wood 
---
 arch/powerpc/mm/tlb_low_64e.S |   23 +++
 1 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
index af08922..17726d3 100644
--- a/arch/powerpc/mm/tlb_low_64e.S
+++ b/arch/powerpc/mm/tlb_low_64e.S
@@ -24,7 +24,7 @@
 #ifdef CONFIG_PPC_64K_PAGES
 #define VPTE_PMD_SHIFT (PTE_INDEX_SIZE+1)
 #else
-#define VPTE_PMD_SHIFT (PTE_INDEX_SIZE)
+#define VPTE_PMD_SHIFT 0
 #endif
 #define VPTE_PUD_SHIFT (VPTE_PMD_SHIFT + PMD_INDEX_SIZE)
 #define VPTE_PGD_SHIFT (VPTE_PUD_SHIFT + PUD_INDEX_SIZE)
@@ -185,7 +185,7 @@ normal_tlb_miss:
/* Insert the bottom bits in */
rlwimi  r14,r15,0,16,31
 #else
-   rldicl  r14,r16,64-(PAGE_SHIFT-3),PAGE_SHIFT-3+4
+   rldicl  r14,r16,64-(PMD_SHIFT-3),PMD_SHIFT-3+4
 #endif
sldir15,r10,60
clrrdi  r14,r14,3
@@ -202,6 +202,16 @@ MMU_FTR_SECTION_ELSE
ld  r14,0(r10)
 ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_USE_TLBRSRV)
 
+#ifndef CONFIG_PPC_64K_PAGES
+   rldicl  r15,r16,64-PAGE_SHIFT+3,64-PTE_INDEX_SIZE-3
+   clrrdi  r15,r15,3
+
+   cmpldi  cr0,r14,0
+   beq normal_tlb_miss_access_fault
+
+   ldx r14,r14,r15
+#endif
+
 finish_normal_tlb_miss:
/* Check if required permissions are met */
andc.   r15,r11,r14
@@ -353,14 +363,11 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_TLBRSRV)
 #ifndef CONFIG_PPC_64K_PAGES
/* Get to PUD entry */
rldicl  r11,r16,64-VPTE_PUD_SHIFT,64-PUD_INDEX_SIZE-3
-   clrrdi  r10,r11,3
-   ldx r15,r10,r15
-   cmpldi  cr0,r15,0
-   beq virt_page_table_tlb_miss_fault
-#endif /* CONFIG_PPC_64K_PAGES */
-
+#else
/* Get to PMD entry */
rldicl  r11,r16,64-VPTE_PMD_SHIFT,64-PMD_INDEX_SIZE-3
+#endif
+
clrrdi  r10,r11,3
ldx r15,r10,r15
cmpldi  cr0,r15,0
-- 
1.7.4.1


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/7] powerpc/mm: 64-bit tlb miss: get PACA from memory rather than SPR

2011-05-18 Thread Scott Wood

This saves a few cycles, at least on e5500.

Signed-off-by: Scott Wood 
---
 arch/powerpc/include/asm/exception-64e.h |   16 +++-
 arch/powerpc/kernel/paca.c   |5 +
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64e.h 
b/arch/powerpc/include/asm/exception-64e.h
index 6d53f31..6921261 100644
--- a/arch/powerpc/include/asm/exception-64e.h
+++ b/arch/powerpc/include/asm/exception-64e.h
@@ -62,16 +62,14 @@
 #define EX_TLB_ESR ( 9 * 8) /* Level 0 and 2 only */
 #define EX_TLB_SRR0(10 * 8)
 #define EX_TLB_SRR1(11 * 8)
-#define EX_TLB_MMUCR0  (12 * 8) /* Level 0 */
-#define EX_TLB_MAS1(12 * 8) /* Level 0 */
-#define EX_TLB_MAS2(13 * 8) /* Level 0 */
+#define EX_TLB_PACA(12 * 8)
 #ifdef CONFIG_BOOK3E_MMU_TLB_STATS
-#define EX_TLB_R8  (14 * 8)
-#define EX_TLB_R9  (15 * 8)
-#define EX_TLB_LR  (16 * 8)
-#define EX_TLB_SIZE(17 * 8)
+#define EX_TLB_R8  (13 * 8)
+#define EX_TLB_R9  (14 * 8)
+#define EX_TLB_LR  (15 * 8)
+#define EX_TLB_SIZE(16 * 8)
 #else
-#define EX_TLB_SIZE(14 * 8)
+#define EX_TLB_SIZE(13 * 8)
 #endif
 
 #defineSTART_EXCEPTION(label)  
\
@@ -98,7 +96,7 @@ exc_##label##_book3e:
std r11,EX_TLB_R11(r12);\
mfspr   r11,SPRN_SPRG_TLB_SCRATCH;  \
std r13,EX_TLB_R13(r12);\
-   mfspr   r13,SPRN_SPRG_PACA; \
+   ld  r13,EX_TLB_PACA(r12);   \
std r14,EX_TLB_R14(r12);\
addir14,r12,EX_TLB_SIZE;\
std r15,EX_TLB_R15(r12);\
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 102244e..814dae2 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -151,6 +151,11 @@ void __init initialise_paca(struct paca_struct *new_paca, 
int cpu)
 #ifdef CONFIG_PPC_STD_MMU_64
new_paca->slb_shadow_ptr = &slb_shadow[cpu];
 #endif /* CONFIG_PPC_STD_MMU_64 */
+#ifdef CONFIG_PPC_BOOK3E
+   new_paca->extlb[0][EX_TLB_PACA / 8] = (u64)new_paca;
+   new_paca->extlb[1][EX_TLB_PACA / 8] = (u64)new_paca;
+   new_paca->extlb[2][EX_TLB_PACA / 8] = (u64)new_paca;
+#endif
 }
 
 /* Put the paca pointer into r13 and SPRG_PACA */
-- 
1.7.4.1


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/7] powerpc/mm: 64-bit 4k: use page-sized PMDs

2011-05-18 Thread Scott Wood

This allows a virtual page table to be used at the PMD rather than
the PTE level.

Rather than adjust the constant in pgd_index() (or ignore it, as
too-large values don't hurt as long as overly large addresses aren't
passed in), go back to using PTRS_PER_PGD.  The overflow comment seems to
apply to a very old implementation of free_pgtables that used pgd_index()
(unfortunately the commit message, if you seek it out in the historic
tree, doesn't mention any details about the overflow).  The existing
value was numerically indentical to the old 4K-page PTRS_PER_PGD, so
using it shouldn't produce an overflow where it's not otherwise possible.

Also get rid of the incorrect comment at the top of pgtable-ppc64-4k.h.

Signed-off-by: Scott Wood 
---
 arch/powerpc/include/asm/pgtable-ppc64-4k.h |   12 
 arch/powerpc/include/asm/pgtable-ppc64.h|3 +--
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64-4k.h 
b/arch/powerpc/include/asm/pgtable-ppc64-4k.h
index 6eefdcf..194005e 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64-4k.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64-4k.h
@@ -1,14 +1,10 @@
 #ifndef _ASM_POWERPC_PGTABLE_PPC64_4K_H
 #define _ASM_POWERPC_PGTABLE_PPC64_4K_H
-/*
- * Entries per page directory level.  The PTE level must use a 64b record
- * for each page table entry.  The PMD and PGD level use a 32b record for
- * each entry by assuming that each entry is page aligned.
- */
+
 #define PTE_INDEX_SIZE  9
-#define PMD_INDEX_SIZE  7
+#define PMD_INDEX_SIZE  9
 #define PUD_INDEX_SIZE  7
-#define PGD_INDEX_SIZE  9
+#define PGD_INDEX_SIZE  7
 
 #ifndef __ASSEMBLY__
 #define PTE_TABLE_SIZE (sizeof(pte_t) << PTE_INDEX_SIZE)
@@ -19,7 +15,7 @@
 
 #define PTRS_PER_PTE   (1 << PTE_INDEX_SIZE)
 #define PTRS_PER_PMD   (1 << PMD_INDEX_SIZE)
-#define PTRS_PER_PUD   (1 << PMD_INDEX_SIZE)
+#define PTRS_PER_PUD   (1 << PUD_INDEX_SIZE)
 #define PTRS_PER_PGD   (1 << PGD_INDEX_SIZE)
 
 /* PMD_SHIFT determines what a second-level page table entry can map */
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/pgtable-ppc64.h
index 2b09cd5..8bd1cd9 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -181,8 +181,7 @@
  * Find an entry in a page-table-directory.  We combine the address region
  * (the high order N bits) and the pgd portion of the address.
  */
-/* to avoid overflow in free_pgtables we don't use PTRS_PER_PGD here */
-#define pgd_index(address) (((address) >> (PGDIR_SHIFT)) & 0x1ff)
+#define pgd_index(address) (((address) >> (PGDIR_SHIFT)) & (PTRS_PER_PGD - 1))
 
 #define pgd_offset(mm, address) ((mm)->pgd + pgd_index(address))
 
-- 
1.7.4.1


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic

2011-05-18 Thread Moore, Eric

On Wednesday, May 18, 2011 12:31 PM Milton Miller wrote:
> Ingo I would propose the following commits added in 2.6.29 be reverted.
> I think the current concensus is drivers must know if the writeq is
> not atomic so they can provide their own locking or other workaround.
> 


Exactly.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic

2011-05-18 Thread Milton Miller

On Wed, 18 May 2011 about 09:35:56 -0600, Eric Moore wrote:
> On Wednesday, May 18, 2011 2:24 AM, Milton Miller wrote:
> > On Wed, 18 May 2011 around 17:00:10 +1000, Benjamin Herrenschmidt wrote:
> > > (Just adding Milton to the CC list, he suspects races in the
> > > driver instead).
> > >
> > > On Wed, 2011-05-18 at 08:23 +0400, James Bottomley wrote:
> > > > On Tue, 2011-05-17 at 22:15 -0600, Matthew Wilcox wrote:
> > > > > On Wed, May 18, 2011 at 09:37:08AM +0530, Desai, Kashyap wrote:
> > > > > > On Wed, 2011-05-04 at 17:23 +0530, Kashyap, Desai wrote:
> > > > > > > The following code seems to be there in
> > /usr/src/linux/arch/x86/include/asm/io.h.
> > > > > > > This is not going to work.
> > > > > > >
> > > > > > > static inline void writeq(__u64 val, volatile void __iomem *addr)
> > > > > > > {
> > > > > > > writel(val, addr);
> > > > > > > writel(val >> 32, addr+4);
> > > > > > > }
> > > > > > >
> > > > > > > So with this code turned on in the kernel, there is going to be
> > race condition
> > > > > > > where multiple cpus can be writing to the request descriptor at
> > the same time.
> > > > > > >
> > > > > > > Meaning this could happen:
> > > > > > > (A) CPU A doest 32bit write
> > > > > > > (B) CPU B does 32 bit write
> > > > > > > (C) CPU A does 32 bit write
> > > > > > > (D) CPU B does 32 bit write
> > > > > > >
> > > > > > > We need the 64 bit completed in one access pci memory write, else
> > spin lock is required.
> > > > > > > Since it's going to be difficult to know which writeq was
> > implemented in the kernel,
> > > > > > > the driver is going to have to always acquire a spin lock each
> > time we do 64bit write.
> > > > > > >
> > > > > > > Cc: sta...@kernle.org
> > > > > > > Signed-off-by: Kashyap Desai 
> > > > > > > ---
> > > > > > > diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c
> > b/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > > > > index efa0255..5778334 100644
> > > > > > > --- a/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > > > > +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > > > > @@ -1558,7 +1558,6 @@ mpt2sas_base_free_smid(struct
> > MPT2SAS_ADAPTER *ioc, u16 smid)
> > > > > > >   * care of 32 bit environment where its not quarenteed to send
> > the entire word
> > > > > > >   * in one transfer.
> > > > > > >   */
> > > > > > > -#ifndef writeq
> > > > > >
> > > > > > Why not make this #ifndef CONFIG_64BIT?  You know that all 64 bit
> > > > > > systems have writeq implemented correctly; you suspect 32 bit
> > systems
> > > > > > don't.
> > > > > >
> > > > > > James
> > > > > >
> > > > > > James, This issue was observed on PPC64 system. So what you have
> > suggested will not solve this issue.
> > > > > > If we are sure that writeq() is atomic across all architecture, we
> > can use it safely. As we have seen issue on ppc64, we are not confident to
> > use
> > > > > > "writeq" call.
> > > > >
> > > > > So have you told the powerpc people that they have a broken writeq?
> > > >
> > > > I'm just in the process of finding them now on IRC so I can demand an
> > > > explanation: this is a really serious API problem because writeq is
> > > > supposed to be atomic on 64 bit.
> > > >
> > > > > And why do you obfuscate your report by talking about i386 when it's
> > > > > really about powerpc64?
> > > >
> > > > James
> >
> > I checked the assembly for my complied output and it ends up with
> > a single std (store doubleword aka 64 bits) instruction with offset
> > 192 decimal (0xc0) from the base register obtained from the structure.
> >
> > An aligned doubleword store is atomic on 64 bit powerpc.
> >
> > So I would really like more details if you are blaming 64 bit
> > powerpc of a non-atomic store.
> >
> > That said, the patch will affect the code by adding barriers.
> > Specifically, while powerpc has a sync before doing the store as part
> > of writeq, wrapping in a spinlock adds a sync before releasing the lock
> > whenever a writeq (or writex x=b,w,d,q) was issued inside the lock.
> >
> > (sync orders all reads and all writes to both memory and devices from
> > that cpu).
> >
> > But looking further at the code, I see such things as:
> >
> > drivers/scsi/mpt2sas/mpt2sas_base.c  line 2944
> >
> > mpt2sas_base_put_smid_default(ioc, smid);
> > init_completion(&ioc->base_cmds.done);
> > timeleft = wait_for_completion_timeout(&ioc->base_cmds.done,
> >
> > where mpt2sas_base_put_smid_default is a routine that has a call to
> > _base_writeq.  This will initiate io to the adapter, then initialize
> > the completion, then hope that the timeout is long enough to let the io
> > complete and be marked done but short enough to not be a problem when
> > the timeout occurs because we initialized the compeltion after the irq
> > came in.
> >
> > The code then looks at a status flag, but there is no indication how
> > the access to that field is serialized between the interrupt handler
> > and the submission routine.  It may mostly work due to barrier

Re: powerpc: mpc85xx regression since 2.6.39-rc2, one cpu core lame

2011-05-18 Thread Milton Miller

Does this patch help?  If so please reply to that thread so patchwork
will see it in addition to here.

http://patchwork.ozlabs.org/patch/96146/

milton
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic

2011-05-18 Thread Moore, Eric

On Wednesday, May 18, 2011 2:24 AM, Milton Miller wrote:
> On Wed, 18 May 2011 around 17:00:10 +1000, Benjamin Herrenschmidt wrote:
> > (Just adding Milton to the CC list, he suspects races in the
> > driver instead).
> >
> > On Wed, 2011-05-18 at 08:23 +0400, James Bottomley wrote:
> > > On Tue, 2011-05-17 at 22:15 -0600, Matthew Wilcox wrote:
> > > > On Wed, May 18, 2011 at 09:37:08AM +0530, Desai, Kashyap wrote:
> > > > > On Wed, 2011-05-04 at 17:23 +0530, Kashyap, Desai wrote:
> > > > > > The following code seems to be there in
> /usr/src/linux/arch/x86/include/asm/io.h.
> > > > > > This is not going to work.
> > > > > >
> > > > > > static inline void writeq(__u64 val, volatile void __iomem *addr)
> > > > > > {
> > > > > > writel(val, addr);
> > > > > > writel(val >> 32, addr+4);
> > > > > > }
> > > > > >
> > > > > > So with this code turned on in the kernel, there is going to be
> race condition
> > > > > > where multiple cpus can be writing to the request descriptor at
> the same time.
> > > > > >
> > > > > > Meaning this could happen:
> > > > > > (A) CPU A doest 32bit write
> > > > > > (B) CPU B does 32 bit write
> > > > > > (C) CPU A does 32 bit write
> > > > > > (D) CPU B does 32 bit write
> > > > > >
> > > > > > We need the 64 bit completed in one access pci memory write, else
> spin lock is required.
> > > > > > Since it's going to be difficult to know which writeq was
> implemented in the kernel,
> > > > > > the driver is going to have to always acquire a spin lock each
> time we do 64bit write.
> > > > > >
> > > > > > Cc: sta...@kernle.org
> > > > > > Signed-off-by: Kashyap Desai 
> > > > > > ---
> > > > > > diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c
> b/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > > > index efa0255..5778334 100644
> > > > > > --- a/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > > > +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > > > @@ -1558,7 +1558,6 @@ mpt2sas_base_free_smid(struct
> MPT2SAS_ADAPTER *ioc, u16 smid)
> > > > > >   * care of 32 bit environment where its not quarenteed to send
> the entire word
> > > > > >   * in one transfer.
> > > > > >   */
> > > > > > -#ifndef writeq
> > > > >
> > > > > Why not make this #ifndef CONFIG_64BIT?  You know that all 64 bit
> > > > > systems have writeq implemented correctly; you suspect 32 bit
> systems
> > > > > don't.
> > > > >
> > > > > James
> > > > >
> > > > > James, This issue was observed on PPC64 system. So what you have
> suggested will not solve this issue.
> > > > > If we are sure that writeq() is atomic across all architecture, we
> can use it safely. As we have seen issue on ppc64, we are not confident to
> use
> > > > > "writeq" call.
> > > >
> > > > So have you told the powerpc people that they have a broken writeq?
> > >
> > > I'm just in the process of finding them now on IRC so I can demand an
> > > explanation: this is a really serious API problem because writeq is
> > > supposed to be atomic on 64 bit.
> > >
> > > > And why do you obfuscate your report by talking about i386 when it's
> > > > really about powerpc64?
> > >
> > > James
> 
> I checked the assembly for my complied output and it ends up with
> a single std (store doubleword aka 64 bits) instruction with offset
> 192 decimal (0xc0) from the base register obtained from the structure.
> 
> An aligned doubleword store is atomic on 64 bit powerpc.
> 
> So I would really like more details if you are blaming 64 bit
> powerpc of a non-atomic store.
> 
> That said, the patch will affect the code by adding barriers.
> Specifically, while powerpc has a sync before doing the store as part
> of writeq, wrapping in a spinlock adds a sync before releasing the lock
> whenever a writeq (or writex x=b,w,d,q) was issued inside the lock.
> 
> (sync orders all reads and all writes to both memory and devices from
> that cpu).
> 
> But looking further at the code, I see such things as:
> 
> drivers/scsi/mpt2sas/mpt2sas_base.c  line 2944
> 
> mpt2sas_base_put_smid_default(ioc, smid);
> init_completion(&ioc->base_cmds.done);
> timeleft = wait_for_completion_timeout(&ioc->base_cmds.done,
> 
> where mpt2sas_base_put_smid_default is a routine that has a call to
> _base_writeq.  This will initiate io to the adapter, then initialize
> the completion, then hope that the timeout is long enough to let the io
> complete and be marked done but short enough to not be a problem when
> the timeout occurs because we initialized the compeltion after the irq
> came in.
> 
> The code then looks at a status flag, but there is no indication how
> the access to that field is serialized between the interrupt handler
> and the submission routine.  It may mostly work due to barriers in
> the primitives but I don't see any statement of rules.
> 
> Also, while I see a few wmb before writel in _base_interrupt, I don't
> see any rmb, which I would expect between establishing a element is
> valid and reading other fields in that element.
> 
> So I'd really

Re: Kernel cannot see PCI device

2011-05-18 Thread Bjorn Helgaas

On Wed, May 18, 2011 at 4:02 AM, Prashant Bhole
 wrote:
> On Mon, May 2, 2011 at 10:21 AM, Prashant Bhole
>  wrote:
>>
>> Hi,
>> I have a custom made powerpc 460EX board. On that board u-boot
>> can see a PCI device but Linux kernel cannot see it. What could be the 
>> problem?
>>
>> On u-boot "pci  2" commands displays following device:
>> Scanning PCI devices on bus 2
>> BusDevFun  VendorId   DeviceId   Device Class       Sub-Class
>> _
>> 02.00.00   0x1000     0x0072     Mass storage controller 0x00
>>
>> And when the kernel is booted, there is only one pci device (bridge):
>> #ls /sys/bus/pci/devices
>> :80:00.0
>>
>
> I am still facing in this problem.
>
> a call to pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, &l) returns
> positive value in the function pci_scan_device(), which means VENDOR_ID 
> reading
> failed. I could not find the reason. Any hints?

Hmm...  probably powerpc-related, so I added linuxppc-dev.

My guess would be that Linux didn't find the host bridge to the
hierarchy containing bus 2.  I would guess the host bridge info is
supposed to come from OF.  More information, like the complete u-boot
PCI scan and the kernel dmesg log, would be useful.  And maybe u-boot
has a way to dump the OF device tree?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] [klibc] ppc64: Fix build failure with stricter as

2011-05-18 Thread maximilian attems

From: Matthias Klose 

Landed in Ubuntu klibc version 1.5.20-1ubuntu3.


Signed-off-by: maximilian attems 
---
 usr/klibc/arch/ppc64/crt0.S |   17 +
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/usr/klibc/arch/ppc64/crt0.S b/usr/klibc/arch/ppc64/crt0.S
index a7776a1..c976d5c 100644
--- a/usr/klibc/arch/ppc64/crt0.S
+++ b/usr/klibc/arch/ppc64/crt0.S
@@ -12,16 +12,17 @@
.section ".toc","aw"
 .LC0:  .tc environ[TC],environ
 
+   .text
+   .align 4
+
.section ".opd","aw"
-   .align 3
-   .globl _start
 _start:
-   .quad   ._start
-   .quad   .TOC.@tocbase, 0
-
-   .text
-   .globl  ._start
+   .quad   ._start, .TOC.@tocbase, 0
+   .previous
+   .size   _start, 24
.type   ._start,@function
+   .globl  _start
+   .globl  ._start
 ._start:
stdu%r1,-32(%r1)
addi%r3,%r1,32
@@ -29,4 +30,4 @@ _start:
b   .__libc_init
nop
 
-   .size _start,.-_start
+   .size ._start,.-._start
-- 
1.7.4.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc: mpc85xx regression since 2.6.39-rc2, one cpu core lame

2011-05-18 Thread Richard Cochran

On Wed, May 18, 2011 at 07:40:16AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2011-05-17 at 18:28 +0200, Richard Cochran wrote:
> > Ben,
> > 
> > Recent 2.6.39-rc kernels behave strangely on the Freescale dual core
> > mpc8572 and p2020. There is a long pause (like 2 seconds) in the boot
> > sequence after "mpic: requesting IPIs..."
> > 
> > When the system comes up, only one core shows in /proc/cpuinfo. Later
> > on, lots of messages appear like the following:
> > 
> >INFO: task ksoftirqd/1:9 blocked for more than 120 seconds.
> > 
> > I bisected [1] the problem to:
> > 
> >commit c56e58537d504706954a06570b4034c04e5b7500
> >Author: Benjamin Herrenschmidt 
> >Date:   Tue Mar 8 14:40:04 2011 +1100
> > 
> >powerpc/smp: Create idle threads on demand and properly reset them
> > 
> > I don't see from that commit what had gone wrong. Perhaps you can
> > help resolve this?
> 
> Hrm, odd. Kumar, care to have a look ? That's what happens when you
> don't get me HW to test with :-)

(I get the feeling that I am the only one testing recent kernels with
the mpc85xx.)

Anyhow, I see that this commit was one of a series. For my own use,
can I simply revert this one commit independently?

Thanks,
Richard
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] PPC_47x SMP fix

2011-05-18 Thread Kerstin Jonsson

 commit c56e58537d504706954a06570b4034c04e5b7500 breaks SMP support in PPC_47x 
chip.
 secondary_ti must be set to current thread info before callin kick_cpu or else
 start_secondary_47x will jump into void when trying to return to c-code.
 In the current setup secondary_ti is initialized before the CPU idle task is 
started
 and only the boot core will start. I am not sure this is the correct solution, 
but it
 makes SMP possible in my chip.
 Note! The HOTPLUG support probably need some fixing to, There is no trampoline 
code
 available in head_44x.S - start_secondary_resume?


Signed-off-by: Kerstin Jonsson 
Cc: Paul Mackerras 
Cc: Michael Neuling 
Cc: Will Schmidt 
---
 arch/powerpc/kernel/smp.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index cbdbb14..f2dcab7 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -410,8 +410,6 @@ int __cpuinit __cpu_up(unsigned int cpu)
 {
int rc, c;
 
-   secondary_ti = current_set[cpu];
-
if (smp_ops == NULL ||
(smp_ops->cpu_bootable && !smp_ops->cpu_bootable(cpu)))
return -EINVAL;
@@ -421,6 +419,8 @@ int __cpuinit __cpu_up(unsigned int cpu)
if (rc)
return rc;
 
+   secondary_ti = current_set[cpu];
+
/* Make sure callin-map entry is 0 (can be leftover a CPU
 * hotplug
 */
-- 
1.7.2.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] PPC_47x SMP fix

2011-05-18 Thread Kerstin Jonsson

 commit c56e58537d504706954a06570b4034c04e5b7500 breaks SMP support in PPC_47x 
chip.
 secondary_ti must be set to current thread info before callin kick_cpu or else
 start_secondary_47x will jump into void when trying to return to c-code.
 In the current setup secondary_ti is initialized before the CPU idle task is 
started
 and only the boot core will start. I am not sure this is the correct solution, 
but it
 makes SMP possible in my chip.
 Note! The HOTPLUG support probably need some fixing to, There is no trampoline 
code
 available in head_44x.S - start_secondary_resume?


Signed-off-by: Kerstin Jonsson 
Cc: Paul Mackerras 
Cc: Michael Neuling 
Cc: Darren Hart 
Cc: Will Schmidt 
---
 arch/powerpc/kernel/smp.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index cbdbb14..f2dcab7 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -410,8 +410,6 @@ int __cpuinit __cpu_up(unsigned int cpu)
 {
int rc, c;
 
-   secondary_ti = current_set[cpu];
-
if (smp_ops == NULL ||
(smp_ops->cpu_bootable && !smp_ops->cpu_bootable(cpu)))
return -EINVAL;
@@ -421,6 +419,8 @@ int __cpuinit __cpu_up(unsigned int cpu)
if (rc)
return rc;
 
+   secondary_ti = current_set[cpu];
+
/* Make sure callin-map entry is 0 (can be leftover a CPU
 * hotplug
 */
-- 
1.7.2.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic

2011-05-18 Thread Milton Miller

On Wed, 18 May 2011 around 17:00:10 +1000, Benjamin Herrenschmidt wrote:
> (Just adding Milton to the CC list, he suspects races in the
> driver instead).
>
> On Wed, 2011-05-18 at 08:23 +0400, James Bottomley wrote:
> > On Tue, 2011-05-17 at 22:15 -0600, Matthew Wilcox wrote:
> > > On Wed, May 18, 2011 at 09:37:08AM +0530, Desai, Kashyap wrote:
> > > > On Wed, 2011-05-04 at 17:23 +0530, Kashyap, Desai wrote:
> > > > > The following code seems to be there in 
> > > > > /usr/src/linux/arch/x86/include/asm/io.h.
> > > > > This is not going to work.
> > > > > 
> > > > > static inline void writeq(__u64 val, volatile void __iomem *addr)
> > > > > {
> > > > > writel(val, addr);
> > > > > writel(val >> 32, addr+4);
> > > > > }
> > > > > 
> > > > > So with this code turned on in the kernel, there is going to be race 
> > > > > condition 
> > > > > where multiple cpus can be writing to the request descriptor at the 
> > > > > same time.
> > > > > 
> > > > > Meaning this could happen:
> > > > > (A) CPU A doest 32bit write
> > > > > (B) CPU B does 32 bit write
> > > > > (C) CPU A does 32 bit write
> > > > > (D) CPU B does 32 bit write
> > > > > 
> > > > > We need the 64 bit completed in one access pci memory write, else 
> > > > > spin lock is required.
> > > > > Since it's going to be difficult to know which writeq was implemented 
> > > > > in the kernel, 
> > > > > the driver is going to have to always acquire a spin lock each time 
> > > > > we do 64bit write.
> > > > > 
> > > > > Cc: sta...@kernle.org
> > > > > Signed-off-by: Kashyap Desai 
> > > > > ---
> > > > > diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c 
> > > > > b/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > > index efa0255..5778334 100644
> > > > > --- a/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > > +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > > @@ -1558,7 +1558,6 @@ mpt2sas_base_free_smid(struct MPT2SAS_ADAPTER 
> > > > > *ioc, u16 smid)
> > > > >   * care of 32 bit environment where its not quarenteed to send the 
> > > > > entire word
> > > > >   * in one transfer.
> > > > >   */
> > > > > -#ifndef writeq
> > > > 
> > > > Why not make this #ifndef CONFIG_64BIT?  You know that all 64 bit
> > > > systems have writeq implemented correctly; you suspect 32 bit systems
> > > > don't.
> > > > 
> > > > James
> > > > 
> > > > James, This issue was observed on PPC64 system. So what you have 
> > > > suggested will not solve this issue.
> > > > If we are sure that writeq() is atomic across all architecture, we can 
> > > > use it safely. As we have seen issue on ppc64, we are not confident to 
> > > > use
> > > > "writeq" call.
> > > 
> > > So have you told the powerpc people that they have a broken writeq?
> > 
> > I'm just in the process of finding them now on IRC so I can demand an
> > explanation: this is a really serious API problem because writeq is
> > supposed to be atomic on 64 bit.
> > 
> > > And why do you obfuscate your report by talking about i386 when it's
> > > really about powerpc64?
> > 
> > James

I checked the assembly for my complied output and it ends up with
a single std (store doubleword aka 64 bits) instruction with offset
192 decimal (0xc0) from the base register obtained from the structure.

An aligned doubleword store is atomic on 64 bit powerpc.

So I would really like more details if you are blaming 64 bit
powerpc of a non-atomic store.

That said, the patch will affect the code by adding barriers.
Specifically, while powerpc has a sync before doing the store as part
of writeq, wrapping in a spinlock adds a sync before releasing the lock
whenever a writeq (or writex x=b,w,d,q) was issued inside the lock.

(sync orders all reads and all writes to both memory and devices from
that cpu).

But looking further at the code, I see such things as:

drivers/scsi/mpt2sas/mpt2sas_base.c  line 2944

mpt2sas_base_put_smid_default(ioc, smid);
init_completion(&ioc->base_cmds.done);
timeleft = wait_for_completion_timeout(&ioc->base_cmds.done,

where mpt2sas_base_put_smid_default is a routine that has a call to
_base_writeq.  This will initiate io to the adapter, then initialize
the completion, then hope that the timeout is long enough to let the io
complete and be marked done but short enough to not be a problem when
the timeout occurs because we initialized the compeltion after the irq
came in.

The code then looks at a status flag, but there is no indication how
the access to that field is serialized between the interrupt handler
and the submission routine.  It may mostly work due to barriers in
the primitives but I don't see any statement of rules.

Also, while I see a few wmb before writel in _base_interrupt, I don't
see any rmb, which I would expect between establishing a element is
valid and reading other fields in that element.

So I'd really like to hear more about what your symptoms were and how
you determined writeq on 64 bit powerpc was not atomic.

milton
___

RE: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq isnot atomic

2011-05-18 Thread David Laight

 

> > > > static inline void writeq(__u64 val, volatile void __iomem
*addr)
> > > > {
> > > > writel(val, addr);
> > > > writel(val >> 32, addr+4);
> > > > }
...
> > > > We need the 64 bit completed in one access pci memory write,
else spin lock is required.
> > > > Since it's going to be difficult to know which writeq was
implemented in the kernel, 
> > > > the driver is going to have to always acquire a spin lock each
time we do 64bit write.
...
> I'm just in the process of finding them now on IRC so I can demand an
> explanation: this is a really serious API problem because writeq is
> supposed to be atomic on 64 bit.

Most 32 bit systems don't have atomic 64bit writes.
I'd also have thought there would be code which wouldn't mind the
write being done as two cycles.

I'm not sure that some of the ppc soc systems are capable of
doing a 64bit data pci/pcie cycle except by dma.
So your driver is probably doomed to require a lock.

David


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic

2011-05-18 Thread Benjamin Herrenschmidt

(Just adding Milton to the CC list, he suspects races in the driver
instead).

On Wed, 2011-05-18 at 08:23 +0400, James Bottomley wrote:
> On Tue, 2011-05-17 at 22:15 -0600, Matthew Wilcox wrote:
> > On Wed, May 18, 2011 at 09:37:08AM +0530, Desai, Kashyap wrote:
> > > On Wed, 2011-05-04 at 17:23 +0530, Kashyap, Desai wrote:
> > > > The following code seems to be there in 
> > > > /usr/src/linux/arch/x86/include/asm/io.h.
> > > > This is not going to work.
> > > > 
> > > > static inline void writeq(__u64 val, volatile void __iomem *addr)
> > > > {
> > > > writel(val, addr);
> > > > writel(val >> 32, addr+4);
> > > > }
> > > > 
> > > > So with this code turned on in the kernel, there is going to be race 
> > > > condition 
> > > > where multiple cpus can be writing to the request descriptor at the 
> > > > same time.
> > > > 
> > > > Meaning this could happen:
> > > > (A) CPU A doest 32bit write
> > > > (B) CPU B does 32 bit write
> > > > (C) CPU A does 32 bit write
> > > > (D) CPU B does 32 bit write
> > > > 
> > > > We need the 64 bit completed in one access pci memory write, else spin 
> > > > lock is required.
> > > > Since it's going to be difficult to know which writeq was implemented 
> > > > in the kernel, 
> > > > the driver is going to have to always acquire a spin lock each time we 
> > > > do 64bit write.
> > > > 
> > > > Cc: sta...@kernle.org
> > > > Signed-off-by: Kashyap Desai 
> > > > ---
> > > > diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c 
> > > > b/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > index efa0255..5778334 100644
> > > > --- a/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > @@ -1558,7 +1558,6 @@ mpt2sas_base_free_smid(struct MPT2SAS_ADAPTER 
> > > > *ioc, u16 smid)
> > > >   * care of 32 bit environment where its not quarenteed to send the 
> > > > entire word
> > > >   * in one transfer.
> > > >   */
> > > > -#ifndef writeq
> > > 
> > > Why not make this #ifndef CONFIG_64BIT?  You know that all 64 bit
> > > systems have writeq implemented correctly; you suspect 32 bit systems
> > > don't.
> > > 
> > > James
> > > 
> > > James, This issue was observed on PPC64 system. So what you have 
> > > suggested will not solve this issue.
> > > If we are sure that writeq() is atomic across all architecture, we can 
> > > use it safely. As we have seen issue on ppc64, we are not confident to use
> > > "writeq" call.
> > 
> > So have you told the powerpc people that they have a broken writeq?
> 
> I'm just in the process of finding them now on IRC so I can demand an
> explanation: this is a really serious API problem because writeq is
> supposed to be atomic on 64 bit.
> 
> > And why do you obfuscate your report by talking about i386 when it's
> > really about powerpc64?
> 
> James
> 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: book to learn ppc assembly and architecture

2011-05-18 Thread David Laight

 
> > On Mon, 2011-05-16 at 16:37 +1000, Michael Neuling wrote:
> >> > what  is  the  best  book  to  learn  assembly  and architecture .
> 
> Assuming you have a powerpc compiler available you can use the -S
> option to produce assembly listings.

With gcc add -fverbose-asm for more info.

For a general background, look at something much simpler than ppc,
even if you don't write/run any code.

David


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

90 matches

Mail list logo