from:"Linas Vepstas"

Re: [PATCH] docs: move powerpc under arch

2023-10-03 Thread Linas Vepstas

Hi Jon,

Got the message; I'm not an active maintainer, haven't been for over a
decade, and cannot comment on style issues. But if all the other arches are
doing this, I see no reason why not. Feel free to interpret this as an
Acked-by: if that's appropriate.

-- linas

On Tue, Oct 3, 2023 at 11:05 AM Jonathan Corbet  wrote:

> Costa Shulyupin  writes:
>
> > and fix all in-tree references.
> >
> > Architecture-specific documentation is being moved into
> Documentation/arch/
> > as a way of cleaning up the top-level documentation directory and making
> > the docs hierarchy more closely match the source hierarchy.
> >
> > Signed-off-by: Costa Shulyupin 
>
> So this patch appears to have not been picked up, and to have received
> no comments.  I'll happily carry it in docs-next, but it would be nice
> to have an ack from the powerpc folks...?
>
> Thanks,
>
> jon
>

-- 
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.

Re: [PATCH 0/5] s390/pci: automatic error recovery

2021-09-06 Thread Linas Vepstas

On Mon, Sep 6, 2021 at 4:49 AM Niklas Schnelle 
wrote:

>  I believe we might be the first
> implementation of PCI device recovery in a virtualized setting requiring
> us to
> coordinate the device reset with the hypervisor platform by issuing a
> disable
> and re-enable to the platform as well as starting the recovery following
> a platform event.
>

I recall none of the details, but SRIOV is a standardized system for
sharing a PCI device across multiple virtual machines. It has detailed info
on what the hypervisor must do, and what the local OS instance must do to
accomplish this.  It's part of the PCI standard, and its more than a decade
old now, maybe two. Being a part of the PCI standard, it was interoperable
with error recovery, to the best of my recollection. At the time it was
introduced, it got pushed very aggressively.  The x86 hypervisor vendors
were aiming at the heart of zseries, and were militant about it.

-- Linas

-- 
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.

Re: [PATCH] Documentation PCI: Fix typo in pci-error-recovery.rst

2021-05-31 Thread Linas Vepstas

Signed-off-by: Linas Vepstas 

On Mon, May 31, 2021 at 3:12 AM Wesley Sheng  wrote:

> Replace "It" with "If", since it is a conditional statement.
>
> Signed-off-by: Wesley Sheng 
> ---
>  Documentation/PCI/pci-error-recovery.rst | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/PCI/pci-error-recovery.rst
> b/Documentation/PCI/pci-error-recovery.rst
> index 84ceebb08cac..187f43a03200 100644
> --- a/Documentation/PCI/pci-error-recovery.rst
> +++ b/Documentation/PCI/pci-error-recovery.rst
> @@ -295,7 +295,7 @@ and let the driver restart normal I/O processing.
>  A driver can still return a critical failure for this function if
>  it can't get the device operational after reset.  If the platform
>  previously tried a soft reset, it might now try a hard reset (power
> -cycle) and then call slot_reset() again.  It the device still can't
> +cycle) and then call slot_reset() again.  If the device still can't
>  be recovered, there is nothing more that can be done;  the platform
>  will typically report a "permanent failure" in such a case.  The
>  device will be considered "dead" in this case.
> --
> 2.25.1
>
>

-- 
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.

Re: [PATCH] powerpc/eeh: Update MAINTAINERS

2013-06-28 Thread Linas Vepstas

Hi,

On 27 June 2013 21:11, Benjamin Herrenschmidt b...@kernel.crashing.org wrote:
 On Fri, 2013-06-28 at 09:59 +0800, Gavin Shan wrote:
 Update MAINTAINERS to reflect recent changes.

 Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
 ---
  MAINTAINERS |4 
  1 files changed, 4 insertions(+), 0 deletions(-)

 diff --git a/MAINTAINERS b/MAINTAINERS
 index 5be702c..b447392 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -6146,10 +6146,14 @@ F:drivers/firmware/pcdp.*

  PCI ERROR RECOVERY
  M:
 +M:   Gavin Shan sha...@linux.vnet.ibm.com

 Remove Linas, he isn't involved anymore as far as I can tell
 (are you ?)

Not involved any more; I don't have access to equipment, don't have
time, expertise is fading.

  L:   linux-...@vger.kernel.org
 +L:   linuxppc-dev@lists.ozlabs.org
  S:   Supported
  F:   Documentation/PCI/pci-error-recovery.txt
  F:   Documentation/powerpc/eeh-pci-error-recovery.txt
 +F:   arch/powerpc/kernel/eeh*.c
 +F:   drivers/pci/pcie/aer/

 Not sure about the AER code. You are not maintaining *that* at least :-)
 Maybe we should split EEH from the rest ?

Based on recent discussions (a month ago?) regarding AER, its clear
that at least some of the AER code is mis-designed, and that some of
the patches being submitted against it were making things worse.   I
suggest keeping an eye on that ... the problem is that both AER and
EEH share a common framework in the PCI subsystem. As bugs in AER get
discovered, there's a chance that someone will submit a patch to the
common framework, or possibly start modifying assorted drivers, which
will then break EEH ... so I don't think it is wise/safe to ignore
AER.

(The point is that AER and EEH really should work exactly the same;
they differ merely by how they talk to the root port).

-- Linas
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RFC] Simplify the Linux kernel by reducing its state space

2012-03-31 Thread Linas Vepstas


Hi,

I didn't actually try to compile the patch below; it didn't look like
C code so I wasn't sure what compiler to run it through.  I guess maybe
its python?  However, I'm very sure that the patches are completely
correct, because I read them, and I also know Paul.  And I've heard of
Thomas Gleixner.

Thus, please add my ack --

Ack'ed by: Linas Vepstas linasveps...@gmail.com


On Sun, Apr 01, 2012 at 12:33:21AM +0800, Paul E. McKenney was heard to remark:
 Although there have been numerous complaints about the complexity of
 parallel programming (especially over the past 5-10 years), the plain
 truth is that the incremental complexity of parallel programming over
 that of sequential programming is not as large as is commonly believed.
 Despite that you might have heard, the mind-numbing complexity of modern
 computer systems is not due so much to there being multiple CPUs, but
 rather to there being any CPUs at all.  In short, for the ultimate in
 computer-system simplicity, the optimal choice is NR_CPUS=0.
 
 This commit therefore limits kernel builds to zero CPUs.  This change
 has the beneficial side effect of rendering all kernel bugs harmless.
 Furthermore, this commit enables additional beneficial changes, for
 example, the removal of those parts of the kernel that are not needed
 when there are zero CPUs.
 
 Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 Reviewed-by: Thomas Gleixner t...@linutronix.de
 ---
 
  alpha/Kconfig |   11 ++-
  arm/Kconfig   |6 +++---
  blackfin/Kconfig  |3 ++-
  hexagon/Kconfig   |9 +
  ia64/Kconfig  |9 +
  m32r/Kconfig  |   10 ++
  mips/Kconfig  |   21 +++--
  mn10300/Kconfig   |3 ++-
  parisc/Kconfig|6 +++---
  powerpc/platforms/Kconfig.cputype |8 
  s390/Kconfig  |   12 +++-
  sh/Kconfig|   11 ++-
  sparc/Kconfig |8 
  tile/Kconfig  |9 +
  x86/Kconfig   |   16 +---
  15 files changed, 78 insertions(+), 64 deletions(-)
 
 diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
 index 56a4df9..1766b4a 100644
 --- a/arch/alpha/Kconfig
 +++ b/arch/alpha/Kconfig
 @@ -541,14 +541,15 @@ config HAVE_DEC_LOCK
   default y
  
  config NR_CPUS
 - int Maximum number of CPUs (2-32)
 - range 2 32
 + int Maximum number of CPUs (0-0)
 + range 0 0
   depends on SMP
 - default 32 if ALPHA_GENERIC || ALPHA_MARVEL
 - default 4 if !ALPHA_GENERIC  !ALPHA_MARVEL
 + default 0 if ALPHA_GENERIC || ALPHA_MARVEL
 + default 0 if !ALPHA_GENERIC  !ALPHA_MARVEL
   help
 MARVEL support can handle a maximum of 32 CPUs, all the others
 -  with working support have a maximum of 4 CPUs.
 +  with working support have a maximum of 4 CPUs.  But why take
 +   chances?  Just stick with zero CPUs.
  
  config ARCH_DISCONTIGMEM_ENABLE
   bool Discontiguous Memory Support (EXPERIMENTAL)
 diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
 index a48aecc..1f07a3a 100644
 --- a/arch/arm/Kconfig
 +++ b/arch/arm/Kconfig
 @@ -1551,10 +1551,10 @@ config PAGE_OFFSET
   default 0xC000
  
  config NR_CPUS
 - int Maximum number of CPUs (2-32)
 - range 2 32
 + int Maximum number of CPUs (0-0)
 + range 0 0
   depends on SMP
 - default 4
 + default 0
  
  config HOTPLUG_CPU
   bool Support for hot-pluggable CPUs (EXPERIMENTAL)
 diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
 index abe5a9e..6a78549 100644
 --- a/arch/blackfin/Kconfig
 +++ b/arch/blackfin/Kconfig
 @@ -241,7 +241,8 @@ config SMP
  config NR_CPUS
   int
   depends on SMP
 - default 2 if BF561
 + range 0 0
 + default 0 if BF561
  
  config HOTPLUG_CPU
   bool Support for hot-pluggable CPUs
 diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
 index 9059e39..daab009 100644
 --- a/arch/hexagon/Kconfig
 +++ b/arch/hexagon/Kconfig
 @@ -158,13 +158,14 @@ config SMP
  
  config NR_CPUS
   int Maximum number of CPUs if SMP
 - range 2 6 if SMP
 - default 1 if !SMP
 - default 6 if SMP
 + range 0 0 if SMP
 + default 0 if !SMP
 + default 0 if SMP
   ---help---
 This allows you to specify the maximum number of CPUs which this
 kernel will support.  The maximum supported value is 6 and the
 -   minimum value which makes sense is 2.
 +   minimum value which makes sense is 2.  But a limit of zero is
 +   so much safer!
  
 This is purely to save memory - each supported CPU adds
 approximately eight kilobytes to the kernel image.
 diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
 index bd72669..fea0e6d 100644
 --- a/arch/ia64/Kconfig

Re: [PATCH RFC] Simplify the Linux kernel by reducing its state space

2012-03-31 Thread Linas Vepstas

Hi,

I didn't actually try to compile the patch below; it didn't look like C
code so I wasn't sure what compiler to run it through.  I guess maybe its
python?  However, I'm very sure that the patches are completely correct,
because I read them, and I also know that Paul is a trustworthy programmer.
 Thus, please add my ack

Ack'ed by: Linas Vepstas linasveps...@gmail.com


On 31 March 2012 11:33, Paul E. McKenney paul...@linux.vnet.ibm.com wrote:

 Although there have been numerous complaints about the complexity of
 parallel programming (especially over the past 5-10 years), the plain
 truth is that the incremental complexity of parallel programming over
 that of sequential programming is not as large as is commonly believed.
 Despite that you might have heard, the mind-numbing complexity of modern
 computer systems is not due so much to there being multiple CPUs, but
 rather to there being any CPUs at all.  In short, for the ultimate in
 computer-system simplicity, the optimal choice is NR_CPUS=0.

 This commit therefore limits kernel builds to zero CPUs.  This change
 has the beneficial side effect of rendering all kernel bugs harmless.
 Furthermore, this commit enables additional beneficial changes, for
 example, the removal of those parts of the kernel that are not needed
 when there are zero CPUs.

 Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 Reviewed-by: Thomas Gleixner t...@linutronix.de
 ---

  alpha/Kconfig |   11 ++-
  arm/Kconfig   |6 +++---
  blackfin/Kconfig  |3 ++-
  hexagon/Kconfig   |9 +
  ia64/Kconfig  |9 +
  m32r/Kconfig  |   10 ++
  mips/Kconfig  |   21 +++--
  mn10300/Kconfig   |3 ++-
  parisc/Kconfig|6 +++---
  powerpc/platforms/Kconfig.cputype |8 
  s390/Kconfig  |   12 +++-
  sh/Kconfig|   11 ++-
  sparc/Kconfig |8 
  tile/Kconfig  |9 +
  x86/Kconfig   |   16 +---
  15 files changed, 78 insertions(+), 64 deletions(-)

 diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
 index 56a4df9..1766b4a 100644
 --- a/arch/alpha/Kconfig
 +++ b/arch/alpha/Kconfig
 @@ -541,14 +541,15 @@ config HAVE_DEC_LOCK
default y

  config NR_CPUS
 -   int Maximum number of CPUs (2-32)
 -   range 2 32
 +   int Maximum number of CPUs (0-0)
 +   range 0 0
depends on SMP
 -   default 32 if ALPHA_GENERIC || ALPHA_MARVEL
 -   default 4 if !ALPHA_GENERIC  !ALPHA_MARVEL
 +   default 0 if ALPHA_GENERIC || ALPHA_MARVEL
 +   default 0 if !ALPHA_GENERIC  !ALPHA_MARVEL
help
  MARVEL support can handle a maximum of 32 CPUs, all the others
 -  with working support have a maximum of 4 CPUs.
 +  with working support have a maximum of 4 CPUs.  But why take
 + chances?  Just stick with zero CPUs.

  config ARCH_DISCONTIGMEM_ENABLE
bool Discontiguous Memory Support (EXPERIMENTAL)
 diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
 index a48aecc..1f07a3a 100644
 --- a/arch/arm/Kconfig
 +++ b/arch/arm/Kconfig
 @@ -1551,10 +1551,10 @@ config PAGE_OFFSET
default 0xC000

  config NR_CPUS
 -   int Maximum number of CPUs (2-32)
 -   range 2 32
 +   int Maximum number of CPUs (0-0)
 +   range 0 0
depends on SMP
 -   default 4
 +   default 0

  config HOTPLUG_CPU
bool Support for hot-pluggable CPUs (EXPERIMENTAL)
 diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
 index abe5a9e..6a78549 100644
 --- a/arch/blackfin/Kconfig
 +++ b/arch/blackfin/Kconfig
 @@ -241,7 +241,8 @@ config SMP
  config NR_CPUS
int
depends on SMP
 -   default 2 if BF561
 +   range 0 0
 +   default 0 if BF561

  config HOTPLUG_CPU
bool Support for hot-pluggable CPUs
 diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
 index 9059e39..daab009 100644
 --- a/arch/hexagon/Kconfig
 +++ b/arch/hexagon/Kconfig
 @@ -158,13 +158,14 @@ config SMP

  config NR_CPUS
int Maximum number of CPUs if SMP
 -   range 2 6 if SMP
 -   default 1 if !SMP
 -   default 6 if SMP
 +   range 0 0 if SMP
 +   default 0 if !SMP
 +   default 0 if SMP
---help---
  This allows you to specify the maximum number of CPUs which this
  kernel will support.  The maximum supported value is 6 and the
 - minimum value which makes sense is 2.
 + minimum value which makes sense is 2.  But a limit of zero is
 + so much safer!

  This is purely to save memory - each supported CPU adds
  approximately eight kilobytes to the kernel image.
 diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
 index bd72669

Re: [RFC PATCH] ppc: don't override CONFIG_PPC_PSERIES_DEBUG

2010-10-14 Thread Linas Vepstas

On 14 October 2010 12:48, Nishanth Aravamudan n...@us.ibm.com wrote:
 These files undef DEBUG, but I think they were added before the ability
 to control this from Kconfig.

Right.

 It's really annoying to only get some of
 the debug messages!

I don't get the big picture.  Will there be some CONFIG_DEBUG_EEH in Kconfig?
or just some option to turn on DEBUG for all powerpc-related files?

Or maybe I am demonstrating my utter ignorance of some new whiz-bang
Kconfig technology?

Anyway, I see no harm in the EEH portion of the patch.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] pseries: don't override CONFIG_PPC_PSERIES_DEBUG

2010-10-14 Thread Linas Vepstas

On 14 October 2010 19:48, Nishanth Aravamudan n...@us.ibm.com wrote:
 eeh and pci_dlpar #undef DEBUG, but I think they were added before the
 ability to control this from Kconfig. It's really annoying to only get
 some of the debug messages from these files. Leave the lpar.c #undef
 alone as it produces so much output as to make the kernel unusable.
 Update the Kconfig text to indicate this particular quirk :)

 Signed-off-by: Nishanth Aravamudan n...@us.ibm.com

OK, ignore my last email.

Acked by: Linas Vepstas linasveps...@gmail.com


 --- a/arch/powerpc/platforms/pseries/Kconfig
 +++ b/arch/powerpc/platforms/pseries/Kconfig
 @@ -47,6 +47,12 @@ config LPARCFG
  config PPC_PSERIES_DEBUG
        depends on PPC_PSERIES  PPC_EARLY_DEBUG
        bool Enable extra debug logging in platforms/pseries
 +        help
 +         Say Y here if you want the pseries core to produce a bunch of
 +         debug messages to the system log. Select this if you are having a
 +         problem with the pseries core and want to see more of what is
 +         going on. This does not enable debugging in lpar.c, which must
 +         be manually done due to its verbosity.
        default y

Umm, I see default y and you are not changing this but ... default y
?? Really?

Also, I am guessing that the lpar spam is due only to a handful of printk's,
while most of the rest will be infrequent.  Just knock out the
high-frequency ones...

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 01/15] ppc: fix return type of BUID_{HI,LO} macros

2010-09-16 Thread Linas Vepstas

Acked-by: Linas Vepstas linasveps...@gmail.com

I'm guessing this worked up til now because the rtas_call function prototype
was telling compiler to cast these to 32-bit before passing them as args.
(and since these would still get passed as one arg per 64-bit reg, it
still wouldn't go wrong.)

What I'm wondering about is why there was no compiler warning about an
implicit cast of a 64-bit int to a 32-bit int?  Surely, this is something that
should be warned about!

-- Linas

On 15 September 2010 13:13, Nishanth Aravamudan n...@us.ibm.com wrote:
 BUID_HI and BUID_LO are used to pass data to call_rtas, which expects
 ints or u32s. But the macro doesn't cast the return, so the result is
 still u64. Use the upper_32_bits and lower_32_bits macros that have been
 added to kernel.h.

 Found by getting printf format errors trying to debug print the args, no
 actual code change for 64 bit kernels where the macros are actually
 used.

 Signed-off-by: Milton Miller milt...@bga.com
 Signed-off-by: Nishanth Aravamudan n...@us.ibm.com
 ---
  arch/powerpc/include/asm/ppc-pci.h |    4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/arch/powerpc/include/asm/ppc-pci.h 
 b/arch/powerpc/include/asm/ppc-pci.h
 index 42fdff0..43268f1 100644
 --- a/arch/powerpc/include/asm/ppc-pci.h
 +++ b/arch/powerpc/include/asm/ppc-pci.h
 @@ -28,8 +28,8 @@ extern void find_and_init_phbs(void);
  extern struct pci_dev *isa_bridge_pcidev;      /* may be NULL if no ISA bus 
 */

  /** Bus Unit ID macros; get low and hi 32-bits of the 64-bit BUID */
 -#define BUID_HI(buid) ((buid)  32)
 -#define BUID_LO(buid) ((buid)  0x)
 +#define BUID_HI(buid) upper_32_bits(buid)
 +#define BUID_LO(buid) lower_32_bits(buid)

  /* PCI device_node operations */
  struct device_node;
 --
 1.7.0.4


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: eeh: Fix oops when probing in early boot

2010-05-11 Thread Linas Vepstas

On 10 May 2010 20:38, Anton Blanchard an...@samba.org wrote:

 If we take an EEH early enough, we oops:


 Call Trace:
 [c00010483770] [c0013ee4] .show_stack+0xd8/0x218 (unreliable)
 [c00010483850] [c0658940] .dump_stack+0x28/0x3c
 [c000104838d0] [c0057a68] .eeh_dn_check_failure+0x2b8/0x304
 [c00010483990] [c00259c8] .rtas_read_config+0x120/0x168
 [c00010483a40] [c0025af4] .rtas_pci_read_config+0xe4/0x124
 [c00010483af0] [c037af18] .pci_bus_read_config_word+0xac/0x104
 [c00010483bc0] [c08fec98] .pcibios_allocate_resources+0x7c/0x220
 [c00010483c90] [c08feed8] .pcibios_resource_survey+0x9c/0x418
 [c00010483d80] [c08fea10] .pcibios_init+0xbc/0xf4
 [c00010483e20] [c0009844] .do_one_initcall+0x98/0x1d8
 [c00010483ed0] [c08f0560] .kernel_init+0x228/0x2e8
 [c00010483f90] [c0031a08] .kernel_thread+0x54/0x70
 EEH: Detected PCI bus error on device null
 EEH: This PCI device has failed 1 times in the last hour:
 EEH: location=U78A5.001.WIH8464-P1 driver= pci addr=0001:00:01.0
 EEH: of node=/p...@8002209/u...@1
 EEH: PCI device/vendor: 00351033
 EEH: PCI cmd/status register: 12100146

 Unable to handle kernel paging request for data at address 0x0468
 Oops: Kernel access of bad area, sig: 11 [#1]
 
 NIP [c0057610] .rtas_set_slot_reset+0x38/0x10c
 LR [c0058724] .eeh_reset_device+0x5c/0x124
 Call Trace:
 [cbc6bd00] [c005a0e0] .pcibios_remove_pci_devices+0x7c/0xb0 
 (unreliable)
 [cbc6bd90] [c0058724] .eeh_reset_device+0x5c/0x124
 [cbc6be40] [c00589c0] .handle_eeh_events+0x1d4/0x39c
 [cbc6bf00] [c0059124] .eeh_event_handler+0xf0/0x188
 [cbc6bf90] [c0031a08] .kernel_thread+0x54/0x70


 We called rtas_set_slot_reset while scanning the bus and before the pci_dn
 to pcidev mapping has been created. Since we only need the pcidev to work
 out the type of reset and that only gets set after the module for the
 device loads, lets just do a hot reset if the pcidev is NULL.

 Signed-off-by: Anton Blanchard an...@samba.org
 ---


Acked-by: Linas Vepstas linasveps...@gmail.com

I'm cc'ing Brian King, he's the one who figured out the proper fix
for a hot-reset/fundamental-reset hardware feature that added
this line of code.

The question is -- when the system finishes booting, and the
module finally loads, will the device be found in a usable state
and/or will it automatically reset to a usable state?

--linas


 Index: linux-2.6/arch/powerpc/platforms/pseries/eeh.c
 ===
 --- linux-2.6.orig/arch/powerpc/platforms/pseries/eeh.c 2010-05-10 
 17:25:10.703453565 +1000
 +++ linux-2.6/arch/powerpc/platforms/pseries/eeh.c      2010-05-10 
 17:25:24.034323030 +1000
 @@ -749,7 +749,7 @@ static void __rtas_set_slot_reset(struct
        /* Determine type of EEH reset required by device,
         * default hot reset or fundamental reset
         */
 -       if (dev-needs_freset)
 +       if (dev  dev-needs_freset)
                rtas_pci_slot_reset(pdn, 3);
        else
                rtas_pci_slot_reset(pdn, 1);


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] eeh: Fixing a bug when pci structure is null

2010-02-19 Thread Linas Vepstas

Hi Paul, Breno,

Some confusion -- I've been out of the loop for a while -- I assume
its still Paul who is pushing
these patches upstream, and not Ben?  So Breno, maybe you should
resend the patch to Paul?

--linas

On 19 February 2010 10:43, Breno Leitao lei...@linux.vnet.ibm.com wrote:
 Hi Ben,

 I'd like to ask about this patch ? Should I re-submit ?

 Thanks,

 Breno Leitao wrote:
 During a EEH recover, the pci_dev structure can be null, mainly if an
 eeh event is detected during cpi config operation. In this case, the
 pci_dev will not be known (and will be null) the kernel will crash
 with the following message:

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] eeh: fixing pci_dev dependency

2010-01-29 Thread Linas Vepstas

On 28 January 2010 18:04, Benjamin Herrenschmidt
b...@kernel.crashing.org wrote:
 On Wed, 2010-01-27 at 12:43 -0600, lei...@linux.vnet.ibm.com wrote:
 Currently pci_dev can be null when EEH is in action. This patch
 just assure that we pci_dev is not NULL before calling pci_dev_put.

 Like all variants of *_put(), it already checks for a NULL argument
 afaik. So that patch should be unnecessary.

Ah, OK, I paniced when I saw that and assumed the worst

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] Support for PCI Express reset type

2009-08-01 Thread Linas Vepstas

Hi Andi,

2009/7/31 Andi Kleen a...@firstfloor.org:
 Mike Mason mm...@us.ibm.com writes:

 These patches supersede the previously submitted patch that
 implemented a fundamental reset bit field.

 Please review and let me know of any concerns.

 Any plans to implement that for x86 too? Right now it seems to be a PPC
 specific hack.

I've found the PCIE chipsepc somewhat daunting, but was under the
impression that much if not most of what was needed was specified
there.

See, for example:
Documentation/PCI/pcieaer-howto.txt

which states:
|||   The PCI Express Advanced Error Reporting Driver Guide HOWTO
|||T. Long Nguyen  tom.l.ngu...@intel.com
|||Yanmin Zhangyanmin.zh...@intel.com
|||07/29/2006
[..]
||| The PCI Express AER driver provides the infrastructure to support PCI
||| Express Advanced Error Reporting capability. The PCI Express AER
||| driver provides three basic functions:
|||
||| -   Gathers the comprehensive error information if errors occurred.
||| -   Reports error to the users.
||| -   Performs error recovery actions.

I presume the last bullet point  means that the AER code works and
actually does more or less the same thing as the PPC EEH code,
but in a more architecture-independent way, as it only assumes
that PCI AER is there (and is correctly implemented in the CPI chipset)
The AER code uses the same core infrastructure as the EEH code,
at the time, I did exchange emails w/ the above authors discussing
this stuff.

As to whether the x86 server vendors are actually selling something
with AER in it, and whether any of them are actually testing this stuff
is unclear.

FWIW IBM has pretty much no incentive to lobby other server vendors
to get on the ball ...as this is viewed as one of those things that lets
IBM charge premium prices for PPC hardware.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] Support for PCI Express reset type

2009-08-01 Thread Linas Vepstas

2009/7/30 Mike Mason mm...@us.ibm.com:
 This is the first of three patches that implement a bit field that PCI
 Express device drivers can use to indicate they need a fundamental reset
 during error recovery.

 By default, the EEH framework on powerpc does what's known as a hot reset
 during recovery of a PCI Express device.  We've found a case where the
 device needs a fundamental reset to recover properly.  The current PCI
 error recovery and EEH frameworks do not support this distinction.

 The attached patch (courtesy of Richard Lary) adds a bit field to pci_dev
 that indicates whether the device requires a fundamental reset during
 recovery.

 These patches supersede the previously submitted patch that implemented a
 fundamental reset bit field.
 Please review and let me know of any concerns.

 Signed-off-by: Mike Mason mm...@us.ibm.com
 Signed-off-by: Richard Lary rl...@us.ibm.com


Signed-off-by: Linas Vepstas linasveps...@gmail.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/3] Support for PCI Express reset type

2009-08-01 Thread Linas Vepstas

2009/7/30 Mike Mason mm...@us.ibm.com:
 This is the second of three patches that implement a bit field that PCI
 Express device drivers can use to indicate they need a fundamental reset
 during error recovery.

 By default, the EEH framework on powerpc does what's known as a hot reset
 during recovery of a PCI Express device.  We've found a case where the
 device needs a fundamental reset to recover properly.  The current PCI
 error recovery and EEH frameworks do not support this distinction.

 The attached patch updates the Documentation/PCI/pci-error-recovery.txt file
 with changes related to this new bit field, as well a few unrelated updates.

 These patches supersede the previously submitted patch that implemented a
 fundamental reset bit field.
 Please review and let me know of any concerns.

 Signed-off-by: Mike Mason mm...@us.ibm.com
 Signed-off-by: Richard Lary rl...@us.ibm.com

FWIW,

Signed-off-by: Linas Vepstas linasveps...@gmail.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/3] Support for PCI Express reset type

2009-08-01 Thread Linas Vepstas

2009/7/30 Mike Mason mm...@us.ibm.com:
 This is the third of three patches that implement a bit field that PCI
 Express device drivers can use to indicate they need a fundamental reset
 during error recovery.

 By default, the EEH framework on powerpc does what's known as a hot reset
 during recovery of a PCI Express device.  We've found a case where the
 device needs a fundamental reset to recover properly.  The current PCI
 error recovery and EEH frameworks do not support this distinction.

 The attached patch makes changes to EEH to utilize the new bit field.

 These patches supersede the previously submitted patch that implemented a
 fundamental reset bit field.

 Please review and let me know of any concerns.

 Signed-off-by: Mike Mason mm...@us.ibm.com
 Signed-off-by: Richard Lary rl...@us.ibm.com

Signed-off-by: Linas Vepstas linasveps...@gmail.com

 +       /* Determine type of EEH reset required by device,
 +        * default hot reset or fundamental reset
 +        */
 +       if (dev-needs_freset)
 +               rtas_pci_slot_reset(pdn, 3);
 +       else
 +               rtas_pci_slot_reset(pdn, 1);

Gack!  I remember deluges of emails and conference calls
where the hardware guys went on about this; and I admit I didn't
quite get it, which I guess is why this patch is showing up many
years late.

FWIW some of the variants of the IPR chipset almost surely
need the freset  bit set.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] Support for PCI Express reset type in EEH

2009-07-24 Thread Linas Vepstas

2009/7/24 Richard Lary rl...@us.ibm.com:
 Linas Vepstas linasveps...@gmail.com wrote on 07/23/2009 07:44:33 AM:

 2009/7/15 Mike Mason mm...@us.ibm.com:
  By default, EEH does what's known as a hot reset during error recovery
  of
  a PCI Express device.  We've found a case where the device needs a
  fundamental reset to recover properly.  The current PCI error recovery
  and
  EEH frameworks do not support this distinction.
 
  The attached patch (courtesy of Richard Lary) adds a bit field to
  pci_dev
  that indicates whether the device requires a fundamental reset during
  error
  recovery.  This bit can be checked by EEH to determine which reset type
  is
  required.
 
  This patch supersedes the previously submitted patch that implemented a
  reset type callback.
 
  Please review and let me know of any concerns.

 I like this patch a *lot* better .. it is vastly simpler, more direct.


  diff -uNrp a/include/linux/pci.h b/include/linux/pci.h
  --- a/include/linux/pci.h       2009-07-13 14:25:37.0 -0700
  +++ b/include/linux/pci.h       2009-07-15 10:25:37.0 -0700
  @@ -273,6 +273,7 @@ struct pci_dev {
         unsigned int    ari_enabled:1;  /* ARI forwarding */
         unsigned int    is_managed:1;
         unsigned int    is_pcie:1;
  +       unsigned int    fndmntl_rst_rqd:1; /* Dev requires fundamental
  reset
  */
         unsigned int    state_saved:1;
         unsigned int    is_physfn:1;
         unsigned int    is_virtfn:1;

 As Ben points out, the name is awkward.  How about needs_freset ?

 I am OK with name change.


 Since this affects the entire pci subsystem, it should be documented
 properly.  The pci error recovery subsystem was designed to be
 usable in other architectures, and so the error recovery docs should
 take at least a paragraph to describe what this flag means, and when
 its supposed to be used.

 I will update the documentation, are you referring to
 Documentation/powerpc/eeh-pci-error-recovery.txt
 or some other documentation?

No, I'm thinking
Documentation/PCI/pci-error-recovery.txt

because the flag is not powerpc-specific.

--linas


 Providing the docs patch together with the pci.h patch *only* would
 probably simplify acceptance by the PCI community.

 --linas

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] Hold reference to device_node during EEH event handling

2009-07-23 Thread Linas Vepstas

2009/7/16 Michael Ellerman mich...@ellerman.id.au:
 On Thu, 2009-07-16 at 09:33 -0700, Mike Mason wrote:
 Michael Ellerman wrote:
  On Wed, 2009-07-15 at 14:43 -0700, Mike Mason wrote:
  This patch increments the device_node reference counter when an EEH
  error occurs and decrements the counter when the event has been
  handled.  This is to prevent the device_node from being released until
  eeh_event_handler() has had a chance to deal with the event.  We've
  seen cases where the device_node is released too soon when an EEH
  event occurs during a dlpar remove, causing the event handler to
  attempt to access bad memory locations.
 
  Please review and let me know of any concerns.
 
  Taking a reference sounds sane, but ...
 
  Signed-off-by: Mike Mason mm...@us.ibm.com
 
  --- a/arch/powerpc/platforms/pseries/eeh_event.c   2008-10-09 
  15:13:53.0 -0700
  +++ b/arch/powerpc/platforms/pseries/eeh_event.c   2009-07-14 
  14:14:00.0 -0700
  @@ -75,6 +75,14 @@ static int eeh_event_handler(void * dumm
     if (event == NULL)
             return 0;
 
  +  /* EEH holds a reference to the device_node, so if it
  +   * equals 1 it's no longer valid and the event should
  +   * be ignored */
  +  if (atomic_read(event-dn-kref.refcount) == 1) {
  +          of_node_put(event-dn);
  +          return 0;
  +  }
 
  That's really gross :)

 Agreed.  I'll look for another way to determine if device is gone and
 the event should be ignored.  Suggestions are welcome :-)

 Benh and I had a quick chat about it, and were wondering whether what
 you really should be doing is taking a reference to the pci device
 (perhaps as well as the device node).

 @@ -140,7 +149,7 @@ int eeh_send_failure_event (struct devic
        if (dev)
                pci_dev_get(dev);

 -       event-dn = dn;
 +       event-dn = of_node_get(dn);
        event-dev = dev;

 pci devs are refcounted too, see pci_dev_get(), so taking a reference
 there would be the right thing to do - otherwise there's no guarantee
 it still exists later, unless there's some other trick in the EEH code.

I thought that the eeh code did pci gets and puts in the right locations,
perhaps I (incorrectly) assumed that this meant that the of_dn use count
never dropped to zero ...

I think my logic was:
-- pci device init does of_node_get
-- pci device shutdown does of_node_put
-- pci device shutdown can never run as long as pci use count is  0

Thus, explicit of_node_get was usually not needed.

So, for example, see above: I was figuring that the pci_dev_get(dev);
was enough to protect the dn too .. although maybe if dev is null,
then things go wrong ...

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] Support for PCI Express reset type in EEH

2009-07-23 Thread Linas Vepstas

2009/7/15 Mike Mason mm...@us.ibm.com:
 By default, EEH does what's known as a hot reset during error recovery of
 a PCI Express device.  We've found a case where the device needs a
 fundamental reset to recover properly.  The current PCI error recovery and
 EEH frameworks do not support this distinction.

 The attached patch (courtesy of Richard Lary) adds a bit field to pci_dev
 that indicates whether the device requires a fundamental reset during error
 recovery.  This bit can be checked by EEH to determine which reset type is
 required.

 This patch supersedes the previously submitted patch that implemented a
 reset type callback.

 Please review and let me know of any concerns.

I like this patch a *lot* better .. it is vastly simpler, more direct.


 diff -uNrp a/include/linux/pci.h b/include/linux/pci.h
 --- a/include/linux/pci.h       2009-07-13 14:25:37.0 -0700
 +++ b/include/linux/pci.h       2009-07-15 10:25:37.0 -0700
 @@ -273,6 +273,7 @@ struct pci_dev {
        unsigned int    ari_enabled:1;  /* ARI forwarding */
        unsigned int    is_managed:1;
        unsigned int    is_pcie:1;
 +       unsigned int    fndmntl_rst_rqd:1; /* Dev requires fundamental reset
 */
        unsigned int    state_saved:1;
        unsigned int    is_physfn:1;
        unsigned int    is_virtfn:1;

As Ben points out, the name is awkward.  How about needs_freset ?

Since this affects the entire pci subsystem, it should be documented
properly.  The pci error recovery subsystem was designed to be
usable in other architectures, and so the error recovery docs should
take at least a paragraph to describe what this flag means, and when
its supposed to be used.

Providing the docs patch together with the pci.h patch *only* would
probably simplify acceptance by the PCI community.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] Set error_state to pci_channel_io_normal in eeh_report_reset()

2009-04-14 Thread Linas Vepstas

Hi,

2009/4/10 Mike Mason mm...@us.ibm.com:
 While adding native EEH support to Emulex and Qlogic drivers, it was
 discovered that dev-error_state was set to pci_io_channel_normal too late
 in the recovery process. These drivers rely on error_state to determine if
 they can access the device in their slot_reset callback, thus error_state
 needs to be set to pci_io_channel_norm in eeh_report_reset(). Below is a
 detailed explanation (courtesy of Richard Lary) as to why this is necessary.

 Background:
 PCI MMIO or DMA accesses to a frozen slot generate additional EEH errors. If
 the number of additional EEH errors exceeds EEH_MAX_FAILS the adapter will
 be shutdown. To avoid triggering excessive EEH errors and an undesirable
 adapter shutdown, some drivers use the pci_channel_offline(dev) wrapper
 function to return a Boolean value based on the value of
 pci_dev-error_state to determine if PCI MMIO or DMA accesses are safe. If
 the wrapper returns TRUE, drivers must not make PCI MMIO or DMA access to
 their hardware.

 The pci_dev structure member error_state reflects one of three values, 1)
 pci_channel_io_normal, 2) pci_channel_io_frozen, 3)
 pci_channel_io_perm_failure. Function pci_channel_offline(dev) returns TRUE
 if error_state is pci_channel_io_frozen or pci_channel_io_perm_failure.

 The EEH driver sets pci_dev-error_state to pci_channel_io_frozen at the
 point where the PCI slot is frozen. Currently, the EEH driver restores
 dev-error_state to pci_channel_io_normal in eeh_report_resume() before
 calling the driver's resume callback. However, when the EEH driver calls the
 driver's slot_reset callback() from eeh_report_reset(), it incorrectly
 indicates the error state is still pci_channel_io_frozen.

 Waiting until eeh_report_resume() to restore dev-error_state to
 pci_channel_io_normal is too late for Emulex and QLogic FC drivers and any
 other drivers which are designed to use common code paths in these two
 cases: i) those called after the driver's slot_reset callback() and ii)
 those called after the PCI slot is frozen but before the driver's slot_reset
 callback is called. Case i) all driver paths executed to reinitialize the
 hardware after a reset and case ii) all code paths executed by driver kernel
 threads that run asynchronous to the main driver thread, such as interrupt
 handlers and worker threads to process driver work queues.

 Emulex and QLogic FC drivers are designed with common code paths which
 require that pci_channel_offline(dev) reflect the true state of the
 hardware. The state transitions that the hardware takes from Normal
 Operations to Slot Frozen to Reset to Normal Operations are documented in
 the Power Architecture™ Platform Requirements+ (PAPR+) in Table 75. PE State
 Control.

 PAPR defines the following 3 states:

 0 -- Not reset, Not EEH stopped, MMIO load/store allowed, DMA allowed
 (Normal Operations)
 1 -- Reset, Not EEH stopped, MMIO load/store disabled, DMA disabled
 2 -- Not reset, EEH stopped, MMIO load/store disabled, DMA disabled (Slot
 Frozen)

 An EEH error places the slot in state 2 (Frozen) and the adapter driver is
 notified that an EEH error was detected. If the adapter driver returns
 PCI_ERS_RESULT_NEED_RESET, the EEH driver calls eeh_reset_device() to place
 the slot into state 1 (Reset) and eeh_reset_device completes by placing the
 slot into State 0 (Normal Operations). Upon return from eeh_reset_device(),
 the EEH driver calls eeh_report_reset, which then calls the adapter's
 slot_reset callback. At the time the adapter's slot_reset callback is
 called, the true state of the hardware is Normal Operations and should be
 accurately reflected by setting dev-error_state to pci_channel_io_normal.

 The current implementation of EEH driver does not do so and requires the
 following patch to correct this deficiency.

 Signed-off-by: Mike Mason mm...@us.ibm.com

Yes, the analysis sounds correct; this looks like the
right thing to do.

I'm rather surprised, as this is an obvious bug,
and should have been long gone.  I thought that
Emulex, QLogic  were not the only ones that were
using pci_channel_offline(dev) in this fashion.
I thought that the symbios scsi and the e1000 did,
too ... Hmm. Perhaps these used their own, private
flag for the same purpose, and reset it at the earlier,
correct time.

Thanks for the fix!

Signed-off-by: Linas Vepstas linasveps...@gmail.com
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] Only disable/enable LSI interrupts in EEH

2009-02-10 Thread Linas Vepstas

2009/2/9 Mike Mason mm...@us.ibm.com:
 The EEH code disables and enables interrupts during the
 device recovery process.  This is unnecessary for MSI
 and MSI-X interrupts because they are effectively disabled
 by the DMA Stopped state when an EEH error occurs.  The current code is also
 incorrect for MSI-X interrupts.  It
 doesn't take into account that MSI-X interrupts are tracked
 in a different way than LSI/MSI interrupts.  This patch ensures only LSI
 interrupts are disabled/enabled.

 The patch also includes a couple minor formatting fixes.


 Signed-off-by: Mike Mason mm...@us.ibm.com

Looks good to me.
Acked-by: Linas Vepstas linasveps...@gmail.com

On a somewhat-related note: there was an issue (I forget
the details) where the kernel needed to shadow some sort
of MSI state so that it could be correctly, um, kept-track-of,
after an EEH reset (it didn't need to be restored, because
firmware did this(?)).  After some digging around and
discussion, we concluded that some generic PPC MSI
code needed to be altered to track this state, and/or
the main kernel MSI code needed to be changed to
(not?) track this state.  Mike Ellerman seemed to best
grasp this area ... was this ever fixed?

Or perhaps this is an alternate fix for that bug? It may
well have been that calling the MSI disable triggered
the problem, I don't remember now.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] Only disable/enable LSI interrupts in EEH

2009-02-10 Thread Linas Vepstas

2009/2/10 Michael Ellerman mich...@ellerman.id.au:
 On Tue, 2009-02-10 at 11:14 -0600, Linas Vepstas wrote:
 On a somewhat-related note: there was an issue (I forget
 the details) where the kernel needed to shadow some sort
 of MSI state so that it could be correctly, um, kept-track-of,
 after an EEH reset (it didn't need to be restored, because
 firmware did this(?)).  After some digging around and
 discussion, we concluded that some generic PPC MSI
 code needed to be altered to track this state, and/or
 the main kernel MSI code needed to be changed to
 (not?) track this state.  Mike Ellerman seemed to best
 grasp this area ... was this ever fixed?

 Or perhaps this is an alternate fix for that bug? It may
 well have been that calling the MSI disable triggered
 the problem, I don't remember now.

 I'm pretty sure you're referring to this patch, which you acked :)

 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1db3e890aed3ac39cded30d6e94618bda086f7ce

 I don't know of anything else that fits your description?

Yes, that's the one. I wasn't sure if it ever made it in or
not, and I just wanted to make sure it wasn't what was
biting you.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] Restore PERR/SERR bit settings during EEH device recovery

2008-07-08 Thread Linas Vepstas

2008/7/7 Mike Mason [EMAIL PROTECTED]:
 The following patch restores the PERR and SERR bits in the PCI
 command register during an EEH device recovery.
 We have found at least one case (an Agilent test card) where the
 PERR/SERR bits are set to 1 by firmware at boot time, but are
 not restored to 1 during EEH recovery.

Any chance they should be zero, and were accidentally set to 1?
In which case, you'd need an else clause, below.

 The patch fixes the
 Agilent card problem.  It has been tested on several other EEH-enabled cards
 with no regressions.

 Signed-off-by: Mike Mason [EMAIL PROTECTED]

 --- linux-2.6.26-rc9/arch/powerpc/platforms/pseries/eeh.c   2008-07-07
 16:06:57.0 -0700
 +++ linux-2.6.26-rc9-new/arch/powerpc/platforms/pseries/eeh.c   2008-07-07
 16:11:10.0 -0700
 @@ -812,6 +812,7 @@
 static inline void __restore_bars (struct pci_dn *pdn)
 {
int i;
 +   u32 cmd;

if (NULL==pdn-phb) return;
for (i=4; i10; i++) {
 @@ -832,6 +833,15 @@

/* max latency, min grant, interrupt pin and line */
rtas_write_config(pdn, 15*4, 4, pdn-config_space[15]);
 +
 +   /* Restore PERR  SERR bits, some devices require it,
 +  don't touch the other command bits */
 +   rtas_read_config(pdn, PCI_COMMAND, 4, cmd);
 +   if (pdn-config_space[1]  PCI_COMMAND_PARITY)
 +   cmd |= PCI_COMMAND_PARITY;

else cmd = ~PCI_COMMAND_PARITY;

 +   if (pdn-config_space[1]  PCI_COMMAND_SERR)
 +   cmd |= PCI_COMMAND_SERR;

else cmd = ~PCI_COMMAND_SERR;

 +   rtas_write_config(pdn, PCI_COMMAND, 4, cmd);
 }

Other than that, I'll add an

Acked-by: Linas Vepstas [EMAIL PROTECTED]

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] pseries: phyp dump: Variable size reserve space.

2008-04-16 Thread Linas Vepstas

On 07/04/2008, Manish Ahuja [EMAIL PROTECTED] wrote:
 A small proposed change in the amount of reserve space we allocate during 
 boot.
  Currently we reserve 256MB only.
  The proposed change does one of the 3 things.

  A. It checks to see if there is cmdline variable set and if found sets the
value to it. OR
  B. It computes 5% of total ram and rounds it down to multiples of 256MB. AND
  C. Compares the rounded down value and returns larger of two values, the new
computed value or 256MB.

  Again this is for large systems who have excess memory.

[...]
   early_param(phyp_dump, early_phyp_dump_enabled);

I'm pretty sure you will want to document this boot param in the documentation,
as well as add a few words about why it might be interesting to users (i.e.
that its for large systems...)

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem

2008-03-12 Thread Linas Vepstas

On 11/03/2008, Paul Mackerras [EMAIL PROTECTED] wrote:

   --

 This line needs to be exactly 3 dashes, because otherwise the tools
  include the diffstat into the commit message.  Putting 4 or more
  dashes was an annoying habit Linas had, and it means I have to fix it
  manually (usually after I have committed the patches, and then notice
  that the commit message has the extra stuff in it, so I have to go
  back and fix the separators, reset my tree and re-commit the patches.)

Sorry, I had no idea!  If I didn't have enough dashes, then quilt would
sometimes wipe out the comment at the top, so paranoia made me
add lots of dashes.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept

2008-03-12 Thread Linas Vepstas

On 10/03/2008, Michael Ellerman [EMAIL PROTECTED] wrote:
 On Thu, 2008-02-28 at 18:24 -0600, Manish Ahuja wrote:

   +
   +/* Global, used to communicate data between early boot and late boot */
   +static struct phyp_dump phyp_dump_global;
   +struct phyp_dump *phyp_dump_info = phyp_dump_global;

 I don't see the point of this. You have a static (ie. non-global) struct
  called phyp_dump_global, then you create a pointer to it and pass that
  around.

I did this. This is a style used to minimize disruption due to future
design changes. Basically, the idea is that, at some later time, for
some unknown reason, we decide that this structure shouldn't
be global, or maybe shouldn't be statically allocated, or maybe
should be per-cpu, or who knows.  By creating a pointer, and
just passing that around, you isolate other code from this change.

I learned this trick after spending too many months of my life hunting
down globals and replacing them by dynamically allocated structs.
Its a long and painful process, on many levels, often requiring major
code restructuring.  Code that touches globals directly is often
poorly thought out, designed.  But going in the opposite direction
is easy: if your code always passes everything it needs as args
to subroutines,  then you are free  clear ... if one of those args
just happens to be a pointer to a global, there's no loss (not even
a performance loss -- the arg passing overhead is about the same
as a global TOC lookup!)

So it may look weird if you're not used to seeing it; but the alternative
is almost always worse.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem

2008-02-15 Thread Linas Vepstas

On 14/02/2008, Tony Breeds [EMAIL PROTECTED] wrote:
 On Tue, Feb 12, 2008 at 01:11:58AM -0600, Manish Ahuja wrote:

   +static ssize_t
   +show_release_region(struct kset * kset, char *buf)
   +{
   + return sprintf(buf, ola\n);
   +}
   +
   +static struct subsys_attribute rr = __ATTR(release_region, 0600,
   +  show_release_region,
   +  store_release_region);


 Any reason this sysfs attribute can't be write only? The show method
  doesn't seem needed.

This was supposed to be a place-holder; a later patch would add detailed
info.  The goal was to  have user-land tools that would operate these files
to progressively dump and release memory regions; however, until these
userland tools get written, the proper interface remains murky  (e.g.
real addresses? virtual addresses? just delta's or a whole memory map?
some sort of numa flags or whatever?)

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/8] pseries: phyp dump: Docmentation

2008-01-14 Thread Linas Vepstas

On 13/01/2008, Olof Johansson [EMAIL PROTECTED] wrote:

 How do you expect to have it in full production if you don't have all
 resources available for it? It's not until the dump has finished that you
 can return all memory to the production environment and use it.

With the PHYP dump, each chunk of RAM is returned for
general use immediately after being dumped; so its not
an all-or-nothing proposition.  Production systems don't
often hit 100% RAM use right out of the gate, they often
take hours or days to get there, so again, there should
be time to dump.

 This can very easily be argued in both direction, with no clear winner:
 If the crash is stress-induced (say a slashdotted website), for those
 cases it seems more rational to take the time, collect _good data_ even
 if it takes a little longer, and then go back into production. Especially
 if the alternative is to go back into production immediately, collect
 about half of the data, and then crash again. Rinse and repeat.

Again, the mode of operation for the phyp dump  is that you'll
always have all of the data from the *first* crash, even if there
are multiple crashes. That's because the the as-yet undumped
RAM is not put back into production.

 really surprises me that there's no way to reset a device through PHYP
 though. Seems like such a fundamental feature.

I don't know who said that; that's not right. The EEH function
certainly does allow you to halt/restart PCI traffic to a particular
device and also to reset the device.  So, yes, the pSeries
kexec code should call into the eeh subsystem to rationalize
the device state.

 I think people are overly optimistic if they think it'll be possible
 to do all of this reliably (as in with consistent performance) without
 a second reboot though.

The NUMA issues do concern me. But then, the whole virtualized,
fractional-cpu, tickless operation stuff sounds like a performance
tuning nightmare to begin with.

 At least without similar amounts of work being
 done as it would have taken to fix kdump's reliability in the first place.

:-)


 Speaking of reboots. PHYP isn't known for being quick at rebooting a
 partition, it used to take in the order of minutes even on a small
 machine. Has that been fixed?

Dunno.  Probably not.

  If not, the avoiding an extra reboot
 argument hardly seems like a benefit versus kdump+kexec, which reboots
 nearly instantly and without involvement from PHYP.

OK, let me tell you what I'm up against right now.
I'm dealing with sporadic corruption on my home box.

About a month ago, I bought a whizzy ASUS M2NE
motherboard  an AMD64 2-core cpu, and two sticks
of RAM, 1GB per stick. I have one new hard drive,
SATA, and one old hard drive, from my old machine,
the PATA.  The two disks are mirrored in a RAID-1
config. Running Ubuntu.

During install/upgrade a month ago, I noticed some of
the install files seemed to have gotten corrupted, but
that downloading them again got me a working version.
This put a serious frown on my face: maybe a bad ethernet
card or connection !?

Two weeks ago, gcc stopped working one morning, although
it worked fine the night before. I'd done nothing in the interim
but sleep. Reinstalling it made it work again. Yesterday,
something else stopped working.  I found the offending
library, I compared file checksums against a known-good
version, and they were off. (!!!) Disk corruption?

Then apt-get stopped working. The /var/lib/dpkg/status file
had randomly corrupted single bytes. Its ascii, I hand
repaired it; it had maybe 10 bad bytes out of 2MB total size.

I installed tripwire. Between the first run of tripwire, and the
second, less than an hour later, it reported several dozen
files have changed checksums. Manual inspection of some
of these files against known-good versions show that, at least
this morning, that's no longer the case.

System hasn't crashed in a month, since first boot.  So
what's going on? Is it possible that one of the two disks
is serving up bad data, which explains the funny checksum
behaviour? Or maybe its bad RAM, so that a fresh disk
read shows good data?  If its bad ram, why doesn't the
system crash?  I forced fsck last night, fsck came back
spotless.

So ... moral of the story: If phyp is doing some sort of
hardware checks and validation, that's great. I wish I could
afford a pSeries system for my home computer, because
my impression is that they are very stable, and don't do
things like data corruption.  I'm such a friggin cheapskate
that I can't bear to spend many thousands instead of many
hundreds of dollars. However, I will trade a longer boot
for the dream of higher reliability.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/8] pseries: phyp dump: Docmentation

2008-01-11 Thread Linas Vepstas

On 10/01/2008, Nathan Lynch [EMAIL PROTECTED] wrote:
 Mike Strosaker wrote:
 
  At the risk of repeating what others have already said, the PHYP-assistance
  method provides some advantages that the kexec method cannot:
   - Availability of the system for production use before the dump data is
  collected.  As was mentioned before, some production systems may choose not
  to operate with the limited memory initially available after the reboot,
  but it sure is nice to provide the option.

 I'm more concerned that this design encourages the user to resume a
 workload *which is almost certainly known to result in a system crash*
 before collection of crash data is complete.  Maybe the gamble will
 pay off most of the time, but I wouldn't want to be working support
 when it doesn't.

Workloads that cause crashes within hours of startup tend to be
weeded-out/discovered during pre-production test of the system
to be deployed. Since its pre-production test, dumps can be
taken in a leisurely manner. Heck, even a session at the
xmon prompt can be contemplated.

The problem is when the crash only reproduces after days or
weeks of uptime, on a production machine.  Since the machine
is in production, its got to be brought back up ASAP.  Since
its crashing only after days/weeks, the dump should have
plenty of time to complete.  (And if it crashes quickly after
that reboot ... well, support people always welcome ways
in which a bug can be reproduced more quickly/easily).

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/8] pseries: phyp dump: Docmentation

2008-01-10 Thread Linas Vepstas

On 10/01/2008, Olof Johansson [EMAIL PROTECTED] wrote:
 On Wed, Jan 09, 2008 at 10:12:13PM -0600, Linas Vepstas wrote:
  On 09/01/2008, Olof Johansson [EMAIL PROTECTED] wrote:
   On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote:
  
Heh. That's the elbow-grease of this thing.  The easy part is to get
the core function working. The hard part is to test these various 
configs,
and when they don't work, figure out what went wrong. That will take
perseverence and brains.
  
   This just sounds like a whole lot of extra work to get a feature that
   already exists.
 
  Well, no. kexec is horribly ill-behaved with respect to PCI. The
  kexec kernel starts running with PCI devices in some random
  state; maybe they're DMA'ing or who knows what. kexec tries
  real hard to whack a few needed pci devices into submission
  but it has been hit-n-miss, and the source of 90% of the kexec
  headaches and debugging effort. Its not pretty.

 It surprises me that this hasn't been possible to resolve with less than
 architecting a completely new interface, given that the platform has
 all this fancy support for isolating and resetting adapters. After all,
 the exact same thing has to be done by the hypervisor before rebooting
 the partition.

OK, point taken.

-- The phyp interfaces are there for AIX, which I guess must
   not have kexec-like ability. So this is a case of Linux leveraging
  a feature architected for AIX.

-- There's also this idea, somewhat weak, that the crash may
   have corrupted the ram where the  kexec kernel sits.
   For someone who is used to seeing crashes due to
   null pointer deref's, this seems fairly unlikely. But perhaps
   crashes in production systems are more mind-bending.
   (we did have a case where a USB stick used for boot
   continued to scribble on memory long after it was
   supposed to be quiet and unused. This resulted in
   a very hard to debug crash.)

   A solution to a corrupted
   kexec kernel would be to disable memory access to
   where kexec sits, e.g un-mapping or making r/o the
   pages where it lies. This begs the questions of who
   unhides the kexec kernel, and what if this 'who' gets
   corrupted?

   In short, the kexec kernel does not boot
   exactly the same as a cold boot, and so this opens
   a can of worms about well, what's different, how do
   we minimize these differences, etc. and I think that
   lead AIX to punt, and say lets just use one single,
   well-known boot loader/ boot sequence instead of
   inventing a new one, thus leading to the phyp design.

   But that's just my guess.. :-)

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/8] pseries: phyp dump: Docmentation

2008-01-09 Thread Linas Vepstas

On 09/01/2008, Nathan Lynch [EMAIL PROTECTED] wrote:
 Hi Linas,

 Linas Vepstas wrote:
 
  As a side effect, the system is in
  production *while* the dump is being taken;

 A dubious feature IMO.

Hmm.  Take it up with Ken Rozendal, this is supposed to be
one of the two main selling points of this thing.

 Seems that the design potentially trades
 reliability of first failure data capture for availability.
 E.g. system crashes, reboots, resumes processing while copying dump,
 crashes again before dump procedure is complete.  How is that handled,
 if at all?

Its handled by the hypervisor.  phyp maintains the copy of the
RMO of  first crash, until such time that the OS declares the
dump of the RMO to be complete. So you'll always have
the RMO of the first crash.

For the rest of RAM, it will come in two parts: some portion
will have been dumped already. The rest has not yet been dumped,
and it will still be there, preserved across the second crash.

So you get both RMO and all of RAM from the first crash.

  with kdump,
  you can't go into production until after the dump is finished,
  and the system has been rebooted a second time.  On
  systems with terabytes of RAM, the time difference can be
  hours.

 The difference in time it takes to resume the normal workload may be
 significant, yes.  But the time it takes to get a usable dump image
 would seem to be the basically the same.

Yes.

 Since you bring up large systems... a system with terabytes of RAM is
 practically guaranteed to be a NUMA configuration with dozens of cpus.
 When processing a dump on such a system, I wonder how well we fare:
 can we successfully boot with (say) 128 cpus and 256MB of usable
 memory?  Do we have to hot-online nodes as system memory is freed up
 (and does that even work)?  We need to be able to restore the system
 to its optimal topology when the dump is finished; if the best we can
 do is a degraded configuration, the workload will suffer and the
 system admin is likely to just reboot the machine again so the kernel
 will have the right NUMA topology.

Heh. That's the elbow-grease of this thing.  The easy part is to get
the core function working. The hard part is to test these various configs,
and when they don't work, figure out what went wrong. That will take
perseverence and brains.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/8] pseries: phyp dump: Docmentation

2008-01-09 Thread Linas Vepstas

On 09/01/2008, Michael Ellerman [EMAIL PROTECTED] wrote:

   Only if you can get at rtas, but you can't get at rtas at that point.

 AFAICT you don't need to get at RTAS, you just need to look at the
 device tree to see if the property is present, and that is trivial.

 You probably just need to add a check in early_init_dt_scan_rtas() which
 sets a flag for the PHYP dump stuff, or add your own scan routine if you
 need.

I no longer remember the details. I do remember spending a lot of time
trying to figure out how to do this. I know I didn't want to write my own scan
routine; maybe that's what stopped me.  As it happens, we also did most
of the development on a broken phyp which simply did not even have
this property, no matter what, and so that may have brain-damaged me.

I went for the most elegant solution, where most elegant is defined
as fewest lines of code, least effort, etc.

Manish may need some hands-on help to extract this token during
early boot.  Hopefully, he'll let us know.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/8] pseries: phyp dump: Docmentation

2008-01-09 Thread Linas Vepstas

On 09/01/2008, Olof Johansson [EMAIL PROTECTED] wrote:
 On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote:

  Heh. That's the elbow-grease of this thing.  The easy part is to get
  the core function working. The hard part is to test these various configs,
  and when they don't work, figure out what went wrong. That will take
  perseverence and brains.

 This just sounds like a whole lot of extra work to get a feature that
 already exists.

Well, no. kexec is horribly ill-behaved with respect to PCI. The
kexec kernel starts running with PCI devices in some random
state; maybe they're DMA'ing or who knows what. kexec tries
real hard to whack a few needed pci devices into submission
but it has been hit-n-miss, and the source of 90% of the kexec
headaches and debugging effort. Its not pretty.

If all pci-host bridges could shut-down or settle the bus, and
raise the #RST line high, and then if all BIOS'es supported
this, you'd be right. But they can't 

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 4/8] pseries: phyp dump: use sysfs to release reserved mem

2008-01-08 Thread Linas Vepstas

On 07/01/2008, Stephen Rothwell [EMAIL PROTECTED] wrote:
 On Mon, 07 Jan 2008 18:21:57 -0600 Manish Ahuja [EMAIL PROTECTED] wrote:
 
  +static int __init phyp_dump_setup(void)
  +{
 
  + /* Is there dump data waiting for us? */
  + rtas = of_find_node_by_path(/rtas);
  + dump_header = of_get_property(rtas, ibm,kernel-dump, header_len);
^^^
 You could pass NULL here as header_len appears to be unused. Also you
 need of_node_put(rtas) somewhere (probably just here would do).

Perhaps the routine should have been of_get_node_by_path() ?

In ye olden days, finds didn't require put, but gets did. I'm guessing
that this has now all been fixed up for the of_xxx routines, but I think
that pci_find_xxx still does not require a pci_put.

Why did I bother to write this email? I dunno...
--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept

2008-01-07 Thread Linas Vepstas

On 07/01/2008, Arnd Bergmann [EMAIL PROTECTED] wrote:
 On Tuesday 08 January 2008, Manish Ahuja wrote:

  Initial patch for reserving memory in early boot, and freeing it later.
  If the previous boot had ended with a crash, the reserved memory would 
  contain
  a copy of the crashed kernel data.
 
  Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
  Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

 I think the signed-off-by chain needs to be modified. The way it appears,
 you handled the patch first, then sent it to Linas, who forwarded it
 to whoever will take the patches from the list.

Well,
-- there was dual authorship. I remangled the patches while Manish wrote
code  tested. And I'd mailed them out the first time around, so you could
say I forwarded after heavy editing.


 This obviously isn't true, since you are actually the one who is sending
 out the patches. Moreover, I believe that the [EMAIL PROTECTED]
 address is now dead, and shouldn't be used for this any more.

Hmm. I wanted to indicate that the work was done while I was at IBM;
clearly, no one is going through git and changing old, expired email
addrs, and so submission based on the old addr seemed appropriate.

I'm taking the Signed-off-by line as a quasi-legal thing: a fancy ID string,
identifying the author(s),  rather than a new way to manage email
address books.

 So, depending on which of you two wrote the majority of a patch, I think
 it should be either

I'm not sure there was a clear majority. I think Manish did more work
in general, but we hacked this together side by side. I got him to create
working tested code; I busted it up into individual, clean, documented,
mailing-list ready chunks.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] powerpc: fix os-term usage on kernel panic

2007-12-03 Thread Linas Vepstas

On Thu, Nov 29, 2007 at 11:41:47AM +0100, Olaf Hering wrote:
 On Wed, Nov 28, Linas Vepstas wrote:
 
  On Wed, Nov 28, 2007 at 12:00:37PM +0100, Olaf Hering wrote:
   On Tue, Nov 27, Will Schmidt wrote:
 
 - if (panic_timeout)
 - return;
   
   This change is wrong. Booting with panic=123 really means the system
   has to reboot in 123 seconds after a panic.
  
  And it does.
 
 Have you ever tried it? Current state is that the JS20 hangs after
 panic, 

It should printout the Rebooting in timeout_wait seconds ...
Then it should wait timeout_wait number of seconds, as usual,
and *then* call the hypervisor.

 simply because it calls into the hypervisor (or whatever).

The hypervisor is not supposed to return at this point. Its supposed
to reboot. Appearently, its not rebooting. Either we are using it 
wrong, or the hypervisor is buggy on some systems. It did work on 
the machines I was on; but I did not try power5's or blades.

 So, please restore the panic_timeout check.

The problem with this check was that was that the value was never 
ever set, and so the branch was never ever taken. 

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] powerpc: fix os-term usage on kernel panic

2007-11-28 Thread Linas Vepstas

On Tue, Nov 27, 2007 at 06:15:59PM -0600, Will Schmidt wrote:
 (resending with the proper from addr this time). 
 
 
 I'm seeing some funky behavior on power5/power6 partitions with this
 patch.A /sbin/reboot is now behaving much more like a
 /sbin/halt.
 
 Anybody else seeing this, or is it time for me to call an exorcist for
 my boxes? 

I beleive the patch
http://www.nabble.com/-PATCH--powerpc-pseries:-tell-phyp-to-auto-restart-t4847604.html

will cure this problem.

From that patch:

+/**
+ * pSeries_auto_restart - tell hypervisor that boot succeeded.
+ *
+ * The pseries hypervisor attempts to detect and prevent an
+ * infinite loop of kernel crashes and auto-reboots. It does
+ * so by refusing to auto-reboot unless we indicate that the
+ * current boot was sucessful.  So, indicate success late in
+ * the boot sequence.
+ */ 

FYI, I am leaving IBM in just a few days now, and won't really
have much of a chance to debug this, if there are other problems.

This pair of patches was required to make hypervisor-assisted 
dump work, viz, we need to tell the hypervisor about when we 
crashed, or didn't crash, so that if we crashed, the dump can 
be taken appropriately.

It occurs to me that, as I write this, that maybe xmon 'zr'
command should be modified to call pSeries_auto_restart just
in case, so that it actually reboots.  There might be another
funky code path that I can't think of right now.

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] powerpc: fix os-term usage on kernel panic

2007-11-28 Thread Linas Vepstas

On Wed, Nov 28, 2007 at 12:00:37PM +0100, Olaf Hering wrote:
 On Tue, Nov 27, Will Schmidt wrote:
 
   -void rtas_os_term(char *str)
   +void rtas_panic_msg(char *str)
 
   - if (panic_timeout)
   - return;
 
 This change is wrong. Booting with panic=123 really means the system
 has to reboot in 123 seconds after a panic.

And it does.

 But, maybe this panic_timeout check was moved elsewhere.

It was *always* somewhere else; the check here was always wrong.

This change makes the os-term call happen after the the panic
timeout amount of time has elapsed.

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump

2007-11-21 Thread Linas Vepstas


The following series of patches implement a basic framework
for hypervisor-assisted dump. The very first patch provides 
documentation explaining what this is :-). Yes, its supposed
to be an improvement over kdump.

The patches mostly sort-of work; a list of open issues
is inculded in the documentation.  It also appears that 
the not-yet-released firmware versions this was tested 
on are still, ahem, incomplete; this work is also pending.

-- Linas  Manish
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH/RFC 1/6]: phyp dump: Documentation

2007-11-21 Thread Linas Vepstas


Basic documentation for hypervisor-assisted dump.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 Documentation/powerpc/phyp-assisted-dump.txt |  126 +++
 1 file changed, 126 insertions(+)

Index: linux-2.6.24-rc3-git1/Documentation/powerpc/phyp-assisted-dump.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.24-rc3-git1/Documentation/powerpc/phyp-assisted-dump.txt  
2007-11-21 16:26:44.0 -0600
@@ -0,0 +1,126 @@
+
+   Hypervisor-Assisted Dump
+   
+   November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+   with a fresh copy of the kernel.  In particular,
+   PCI and I/O devices have been reinitialized and are
+   in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+   immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+   required; the system will be fully usable, and running
+   in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+   the low 256MB of RAM to a previously registered
+   save region. It will also save system state, system
+   registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+   hypervisor will reset PCI and other hardware state.
+   It will *not* clear RAM. It will then launch the
+   bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+   is a new node (ibm,dump-kernel) in the device tree,
+   indicating that there is crash data available from
+   a previous boot. It will boot into only 256MB of RAM,
+   reserving the rest of system memory.
+
+-- Userspace tools will read /proc/kcore to obtain the
+   contents of memory, which holds the previous crashed
+   kernel. The userspace tools may copy this info to
+   disk, or network, nas, san, iscsi, etc. as desired.
+
+-- As the userspace tools complete saving a portion of
+   dump, they echo an offset and size to
+   /sys/kernel/release_region to release the reserved
+   memory back to general use.
+
+   An example of this is:
+ echo 0x4000 0x1000  /sys/kernel/release_region
+   which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+--
+In order for this scheme to work, memory needs to be reserved
+quite early in the boot cycle. However, access to the device
+tree this early in the boot cycle is difficult, and device-tree
+access is needed to determine if there is a crash data waiting.
+To work around this problem, all but 256MB of RAM is reserved
+during early boot. A short while later in boot, a check is made
+to determine if there is dump data waiting. If there isn't,
+then the reserved memory is released to general kernel use.
+If there is dump data, then the /sys/kernel/release_region
+file is created, and the reserved memory is held.
+
+If there is no waiting dump data, then all but 256MB of the
+reserved ram will be released for general kernel use. The
+highest 256 MB of RAM will *not* be released: this region
+will be kept permanently reserved, so that it can act as
+a receptacle for a copy of the low 256MB in the case a crash
+does occur. See, however, open issues below, as to whether
+such a reserved region is really needed.
+
+General notes:
+--
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues:
+
+ o User-space dump tool integration is completely unresolved.
+
+ o The various code paths that tell the hypervisor that a crash
+   occurred, vs. it simply being a normal reboot, should be
+   reviewed, and possibly clarified/fixed.
+
+ o The real-virtual mapping is awkward and unaddressed. There
+   is currently no clear way of matching up the contents of
+   /proc/kcore to the values that need to be sent to
+   /sys/kernel/release_region
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+   instead? There is a dump_subsys being created by the s390 code,
+   perhaps the pseries code should use a similar layout as well.
+
+ o

[PATCH/RFC 2/6]: phyp dump: config file

2007-11-21 Thread Linas Vepstas


Add hypervisor-assisted dump to kernel config

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

-
 arch/powerpc/Kconfig |   11 +++
 1 file changed, 11 insertions(+)

Index: linux-2.6.24-rc2-git4/arch/powerpc/Kconfig
===
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/Kconfig 2007-11-14 
16:39:20.0 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/Kconfig  2007-11-15 14:27:33.0 
-0600
@@ -261,6 +261,17 @@ config CRASH_DUMP
 
  Don't change this unless you know what you are doing.
 
+config PHYP_DUMP
+   bool Hypervisor-assisted dump (EXPERIMENTAL)
+   depends on PPC_PSERIES  EXPERIMENTAL
+   default y
+   help
+ Hypervisor-assisted dump is meant to be a kdump replacement
+ offering robustness and speed not possible without system
+ hypervisor assistence.
+
+ If unsure, say Y
+
 config PPCBUG_NVRAM
bool Enable reading PPCBUG NVRAM during boot if PPLUS || LOPEC
default y if PPC_PREP
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH/RFC 4/6]: phyp dump: use sysfs to release reserved mem

2007-11-21 Thread Linas Vepstas


Check to see if there actually is data from a previously
crashed kernel waiting. If so, Allow user-sapce tools to
grab the data (by reading /proc/kcore). When user-space 
finishes dumping a section, it must release that memory
by writing to sysfs. For example,

  echo 0x4000 0x1000  /sys/kernel/release_region

will release 256MB starting at the 1GB.  The released memory
becomes free for general use.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Signed-off-by: Manish Ahuja [EMAIL PROTECTED]

--
 arch/powerpc/platforms/pseries/phyp_dump.c |  101 +++--
 1 file changed, 96 insertions(+), 5 deletions(-)

Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c   
2007-11-21 13:15:05.0 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
2007-11-21 13:24:30.0 -0600
@@ -12,17 +12,24 @@
  */
 
 #include linux/init.h
+#include linux/kobject.h
 #include linux/mm.h
+#include linux/of.h
 #include linux/pfn.h
 #include linux/swap.h
+#include linux/sysfs.h
 
 #include asm/page.h
 #include asm/phyp_dump.h
+#include asm/rtas.h
 
 /* Global, used to communicate data between early boot and late boot */
 static struct phyp_dump phyp_dump_global;
 struct phyp_dump *phyp_dump_info = phyp_dump_global;
 
+static int ibm_configure_kernel_dump;
+
+/* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
  * @start_pfn: starting physical frame number
@@ -52,18 +59,102 @@ release_memory_range(unsigned long start
}
 }
 
-static int __init phyp_dump_setup(void)
+/* - */
+/**
+ * sysfs_release_region -- sysfs interface to release memory range.
+ *
+ * Usage:
+ *   echo start addr length  /sys/kernel/release_region
+ *
+ * Example:
+ *   echo 0x4000 0x1000  /sys/kernel/release_region
+ *
+ * will release 256MB starting at 1GB.
+ */
+static ssize_t
+store_release_region(struct kset *kset, const char *buf, size_t count)
 {
+   unsigned long start_addr, length, end_addr;
unsigned long start_pfn, nr_pages;
+   ssize_t ret;
 
-   /* If no memory was reserved in early boot, there is nothing to do */
-   if (phyp_dump_info-init_reserve_size == 0)
-   return 0;
+   ret = sscanf(buf, %lx %lx, start_addr, length);
+   if (ret != 2)
+   return -EINVAL;
+
+   /* Range-check - don't free any reserved memory that
+* wasn't reserved for phyp-dump */
+   if (start_addr  phyp_dump_info-init_reserve_start)
+   start_addr = phyp_dump_info-init_reserve_start;
+
+   end_addr = phyp_dump_info-init_reserve_start +
+   phyp_dump_info-init_reserve_size;
+   if (start_addr+length  end_addr)
+   length = end_addr - start_addr;
+
+   /* Release the region of memory assed in by user */
+   start_pfn = PFN_DOWN(start_addr);
+   nr_pages = PFN_DOWN(length);
+   release_memory_range (start_pfn, nr_pages);
+
+   return count;
+}
+
+static ssize_t
+show_release_region(struct kset * kset, char *buf)
+{
+   return sprintf(buf, ola\n);
+}
+
+static struct subsys_attribute rr = __ATTR(release_region, 0600,
+show_release_region,
+store_release_region);
+
+/* - */
+
+static void release_all (void)
+{
+   unsigned long start_pfn, nr_pages;
 
-   /* Release memory that was reserved in early boot */
+   /* Release all memory that was reserved in early boot */
start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start);
nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size);
release_memory_range(start_pfn, nr_pages);
+}
+
+static int __init phyp_dump_setup(void)
+{
+   struct device_node *rtas;
+   const int *dump_header;
+   int header_len = 0;
+   int rc;
+
+   /* If no memory was reserved in early boot, there is nothing to do */
+   if (phyp_dump_info-init_reserve_size == 0)
+   return 0;
+
+   /* Return if phyp dump not supported */
+   ibm_configure_kernel_dump = rtas_token(ibm,configure-kernel-dump);
+   if (ibm_configure_kernel_dump == RTAS_UNKNOWN_SERVICE) {
+   release_all();
+   return -ENOSYS;
+   }
+
+   /* Is there dump data waiting for us? */
+   rtas = of_find_node_by_path(/rtas);
+   dump_header = of_get_property(rtas, ibm,kernel-dump, header_len);
+   if (dump_header == NULL) {
+   release_all();
+   return 0;
+   }
+
+   /* Should we create a dump_subsys, analogous to s390/ipl.c ? */
+   rc = subsys_create_file(kernel_subsys, rr);
+   if (rc

[PATCH/RFC 5/6]: phyp dump: register the dump area

2007-11-21 Thread Linas Vepstas


Set up the actual dump header, register it with the hypervisor.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

--
 arch/powerpc/platforms/pseries/phyp_dump.c |  169 +++--
 1 file changed, 163 insertions(+), 6 deletions(-)

Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c   
2007-11-21 15:55:37.0 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
2007-11-21 16:06:52.0 -0600
@@ -30,6 +30,134 @@ struct phyp_dump *phyp_dump_info = phyp
 static int ibm_configure_kernel_dump;
 
 /* - */
+/* RTAS interfaces to declare the dump regions */
+
+struct dump_section {
+   u32 dump_flags;
+   u16 source_type;
+   u16 error_flags;
+   u64 source_address;
+   u64 source_length;
+   u64 length_copied;
+   u64 destination_address;
+};
+
+struct phyp_dump_header {
+   u32 version;
+   u16 num_of_sections;
+   u16 status;
+
+   u32 first_offset_section;
+   u32 dump_disk_section;
+   u64 block_num_dd;
+   u64 num_of_blocks_dd;
+   u32 offset_dd;
+   u32 maxtime_to_auto;
+   /* No dump disk path string used */
+
+   struct dump_section cpu_data;
+   struct dump_section hpte_data;
+   struct dump_section kernel_data;
+};
+
+/* The dump header *must be* in low memory, so .bss it */
+static struct phyp_dump_header phdr;
+
+#define NUM_DUMP_SECTIONS 3
+#define DUMP_HEADER_VERSION 0x1
+#define DUMP_REQUEST_FLAG 0x1
+#define DUMP_SOURCE_CPU 0x0001
+#define DUMP_SOURCE_HPTE 0x0002
+#define DUMP_SOURCE_RMO  0x0011
+
+/**
+ * init_dump_header() - initialize the header declaring a dump
+ * Returns: length of dump save area.
+ *
+ * When the hypervisor saves crashed state, it needs to put
+ * it somewhere. The dump header tells the hypervisor where
+ * the data can be saved.
+ */
+static unsigned long init_dump_header(struct phyp_dump_header *ph)
+{
+   struct device_node *rtas;
+   const unsigned int *sizes;
+   int len;
+   unsigned long cpu_state_size = 0;
+   unsigned long hpte_region_size = 0;
+   unsigned long addr_offset = 0;
+
+   /* Get the required dump region sizes */
+   rtas = of_find_node_by_path(/rtas);
+   sizes = of_get_property(rtas, ibm,configure-kernel-dump-sizes, len);
+   if (!sizes || len  20)
+   return 0;
+
+   if (sizes[0] == 1)
+   cpu_state_size = *((unsigned long *) sizes[1]);
+
+   if (sizes[3] == 2)
+   hpte_region_size = *((unsigned long *) sizes[4]);
+
+   /* Set up the dump header */
+   ph-version = DUMP_HEADER_VERSION;
+   ph-num_of_sections = NUM_DUMP_SECTIONS;
+   ph-status = 0;
+
+   ph-first_offset_section =
+   (u32) (((struct phyp_dump_header *) 0)-cpu_data);
+   ph-dump_disk_section = 0;
+   ph-block_num_dd = 0;
+   ph-num_of_blocks_dd = 0;
+   ph-offset_dd = 0;
+
+   ph-maxtime_to_auto = 0; /* disabled */
+
+   /* The first two sections are mandatory */
+   ph-cpu_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-cpu_data.source_type = DUMP_SOURCE_CPU;
+   ph-cpu_data.source_address = 0;
+   ph-cpu_data.source_length = cpu_state_size;
+   ph-cpu_data.destination_address = addr_offset;
+   addr_offset += cpu_state_size;
+
+   ph-hpte_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-hpte_data.source_type = DUMP_SOURCE_HPTE;
+   ph-hpte_data.source_address = 0;
+   ph-hpte_data.source_length = hpte_region_size;
+   ph-hpte_data.destination_address = addr_offset;
+   addr_offset += hpte_region_size;
+
+   /* This section describes the low kernel region */
+   ph-kernel_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-kernel_data.source_type = DUMP_SOURCE_RMO;
+   ph-kernel_data.source_address = PHYP_DUMP_RMR_START;
+   ph-kernel_data.source_length = PHYP_DUMP_RMR_END;
+   ph-kernel_data.destination_address = addr_offset;
+   addr_offset += ph-kernel_data.source_length;
+
+   return addr_offset;
+}
+
+static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
+{
+   int rc;
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+
+   do {
+   rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+  1, ph, sizeof(struct phyp_dump_header));
+   } while (rtas_busy_delay(rc));
+
+   if (rc)
+   {
+   printk (KERN_ERR phyp-dump: unexpected error (%d) on 
register\n, rc);
+   }
+}
+
+/* - */
 /**
  * release_memory_range -- release memory previously

[PATCH] powerpc/pseries: tell phyp to auto-restart

2007-11-20 Thread Linas Vepstas


The pseries hypervisor attempts to detect and prevent an 
infinite loop of kernel crashes and auto-reboots. It does 
so by refusing to auto-reboot unless we indicate that the
current boot was sucessful.  So, indicate success late in
the boot sequence.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


Sigh. This is a side-effect of the patch I sent yesterday.
Its supposed to simplify the management large numbers of 
partitions. 

 arch/powerpc/platforms/pseries/setup.c |   31 +++
 1 file changed, 31 insertions(+)

Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/setup.c
===
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/setup.c   
2007-11-20 18:37:14.0 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/setup.c
2007-11-20 19:08:12.0 -0600
@@ -491,6 +491,37 @@ void pSeries_power_off(void)
for (;;);
 }
 
+/**
+ * pSeries_auto_restart - tell hypervisor that boot succeeded.
+ *
+ * The pseries hypervisor attempts to detect and prevent an
+ * infinite loop of kernel crashes and auto-reboots. It does
+ * so by refusing to auto-reboot unless we indicate that the
+ * current boot was sucessful.  So, indicate success late in
+ * the boot sequence.
+ */
+static int __init pSeries_auto_restart(void)
+{
+   static char buff[3]; /* static so that its in RMO region */
+   int rc;
+   int token = rtas_token(ibm,set-system-parameter);
+   if (!token)
+   return 0;
+
+   /* partition_auto_restart is 21; set to to 1 to auto-restart the OS. */
+   buff[0] = 0;
+   buff[1] = 1; /* length */
+   buff[2] = 1; /* value */
+   do {
+   rc = rtas_call (token, 2, 1, NULL, 21, buff);
+   } while (rtas_busy_delay(rc));
+   if (rc)
+   printk(KERN_INFO pSeries_auto_restart(): 
+  unable to setup autorestart, rc=%d\n, rc);
+   return 0;
+}
+late_initcall(pSeries_auto_restart);
+
 #ifndef CONFIG_PCI
 void pSeries_final_fixup(void) { }
 #endif
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: hangs after Freeing unused kernel memory

2007-11-19 Thread Linas Vepstas

On Thu, Nov 15, 2007 at 04:00:09PM -0800, Siva Prasad wrote:
 Hi,
 
 This sounds like a familiar problem, but could not get answers in posts
 that came up in google search.
 
 My system hangs after printing the message Freeing unused kernel
 memory. It should execute init after that, but not sure what exactly is
 happening. Appreciate if some one can throw few ideas to try out.

It might not be a hang, it might be simply that you loose the console.
If this is a redhat system, and you didn't tweak initrd and udev just 
right, this can happen.

Try doing this:

  mount --bind / /mnt
  cp -a /dev/null /mnt/dev
  cp -a /dev/console /mnt/dev
  cp -a /dev/hv* /mnt/dev
  umount /mnt

 Seems it is actually hanging when it makes the call 
 run_init_process(ramdisk_execute_command) in init/main.c

Then again, your initrd might be corrupted.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] powerpc: fix os-term usage on kernel panic

2007-11-19 Thread Linas Vepstas


The rtas_os_term() routine was being called at the wrong time.
The actual rtas call os-term will not ever return, and so
calling it from the panic notifier is too early.  Instead,
call it from the machine_reset() call.

The patch splits the rtas_os_term() routine into two: one 
part to capture the kernel panic message, invoked during
the panic notifier, and another part that is invoked during 
machine_reset().

Prior to this patch, the os-term call was never being made,
because panic_timeout was always non-zero. Calling os-term 
helps keep the hypervisor happy! We have to keep the hypervisor
happy to avoid service, dump and error reporting problems.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


One could make a strong argument to move all of this code 
from kernel/rtas.c to platforms/pseries/setup.c  I did not
do this, just so as to make the changes minimal.

 arch/powerpc/kernel/rtas.c |   12 ++--
 arch/powerpc/platforms/pseries/setup.c |3 ++-
 include/asm-powerpc/rtas.h |3 ++-
 3 files changed, 10 insertions(+), 8 deletions(-)

Index: linux-2.6.24-rc2-git4/arch/powerpc/kernel/rtas.c
===
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/kernel/rtas.c   2007-11-19 
18:58:53.0 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/kernel/rtas.c2007-11-19 
19:01:10.0 -0600
@@ -631,18 +631,18 @@ void rtas_halt(void)
 /* Must be in the RMO region, so we place it here */
 static char rtas_os_term_buf[2048];
 
-void rtas_os_term(char *str)
+void rtas_panic_msg(char *str)
 {
-   int status;
+   snprintf(rtas_os_term_buf, 2048, OS panic: %s, str);
+}
 
-   if (panic_timeout)
-   return;
+void rtas_os_term(void)
+{
+   int status;
 
if (RTAS_UNKNOWN_SERVICE == rtas_token(ibm,os-term))
return;
 
-   snprintf(rtas_os_term_buf, 2048, OS panic: %s, str);
-
do {
status = rtas_call(rtas_token(ibm,os-term), 1, 1, NULL,
   __pa(rtas_os_term_buf));
Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/setup.c
===
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/platforms/pseries/setup.c   
2007-11-19 18:58:53.0 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/setup.c
2007-11-19 19:01:10.0 -0600
@@ -507,7 +507,8 @@ define_machine(pseries) {
.restart= rtas_restart,
.power_off  = pSeries_power_off,
.halt   = rtas_halt,
-   .panic  = rtas_os_term,
+   .panic  = rtas_panic_msg,
+   .machine_shutdown   = rtas_os_term,
.get_boot_time  = rtas_get_boot_time,
.get_rtc_time   = rtas_get_rtc_time,
.set_rtc_time   = rtas_set_rtc_time,
Index: linux-2.6.24-rc2-git4/include/asm-powerpc/rtas.h
===
--- linux-2.6.24-rc2-git4.orig/include/asm-powerpc/rtas.h   2007-11-19 
18:58:53.0 -0600
+++ linux-2.6.24-rc2-git4/include/asm-powerpc/rtas.h2007-11-19 
19:01:10.0 -0600
@@ -164,7 +164,8 @@ extern int rtas_call(int token, int, int
 extern void rtas_restart(char *cmd);
 extern void rtas_power_off(void);
 extern void rtas_halt(void);
-extern void rtas_os_term(char *str);
+extern void rtas_panic_msg(char *str);
+extern void rtas_os_term(void);
 extern int rtas_get_sensor(int sensor, int index, int *state);
 extern int rtas_get_power_level(int powerdomain, int *level);
 extern int rtas_set_power_level(int powerdomain, int level, int *setlevel);
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 1/3] powerpc: EEH: work with device endpoint, always

2007-11-15 Thread Linas Vepstas


Perform all error checking at the partitonable endpoint 
of the device.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

 arch/powerpc/platforms/pseries/eeh.c |1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh.c
===
--- linux-2.6.23-rc8-mm1.orig/arch/powerpc/platforms/pseries/eeh.c  
2007-11-09 16:54:04.0 -0600
+++ linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh.c   2007-11-09 
16:56:39.0 -0600
@@ -482,6 +482,7 @@ int eeh_dn_check_failure(struct device_n
no_dn++;
return 0;
}
+   dn = find_device_pe (dn);
pdn = PCI_DN(dn);
 
/* Access to IO BARs might get this far and still not want checking. */
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 3/3]: powerpc/eeh: report errors as soon as possible.

2007-11-15 Thread Linas Vepstas


Do not wait for the pci slot status before reporting an error
to the device driver. Some systems may take many seconds to
report the slot status, and this can confuse unsuspecting 
device drivers.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 arch/powerpc/platforms/pseries/eeh_driver.c |   15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

Index: linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh_driver.c
===
--- linux-2.6.23-rc8-mm1.orig/arch/powerpc/platforms/pseries/eeh_driver.c   
2007-11-09 17:28:58.0 -0600
+++ linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh_driver.c
2007-11-09 17:36:51.0 -0600
@@ -354,13 +354,6 @@ struct pci_dn * handle_eeh_events (struc
if (frozen_pdn-eeh_freeze_count  EEH_MAX_ALLOWED_FREEZES)
goto excess_failures;
 
-   /* Get the current PCI slot state. */
-   rc = eeh_wait_for_slot_status (frozen_pdn, MAX_WAIT_FOR_RECOVERY*1000);
-   if (rc  0) {
-   printk(KERN_WARNING EEH: Permanent failure\n);
-   goto hard_fail;
-   }
-
printk(KERN_WARNING
   EEH: This PCI device has failed %d times in the last hour:\n,
frozen_pdn-eeh_freeze_count);
@@ -376,6 +369,14 @@ struct pci_dn * handle_eeh_events (struc
 */
pci_walk_bus(frozen_bus, eeh_report_error, result);
 
+   /* Get the current PCI slot state. This can take a long time,
+* sometimes over 3 seconds for certain systems. */
+   rc = eeh_wait_for_slot_status (frozen_pdn, MAX_WAIT_FOR_RECOVERY*1000);
+   if (rc  0) {
+   printk(KERN_WARNING EEH: Permanent failure\n);
+   goto hard_fail;
+   }
+
/* Since rtas may enable MMIO when posting the error log,
 * don't post the error log until after all dev drivers
 * have been informed.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] [POWERPC] pSeries: make pseries_defconfig minus PCI build again

2007-11-14 Thread Linas Vepstas

On Wed, Nov 14, 2007 at 03:07:39PM +1100, Stephen Rothwell wrote:
 
 Signed-off-by: Stephen Rothwell [EMAIL PROTECTED]
Acked-by: Linas Vepstas [EMAIL PROTECTED]

 ---
  arch/powerpc/platforms/pseries/Kconfig |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 -- 
 Cheers,
 Stephen Rothwell[EMAIL PROTECTED]
 
 diff --git a/arch/powerpc/platforms/pseries/Kconfig 
 b/arch/powerpc/platforms/pseries/Kconfig
 index 16e4e40..306a9d0 100644
 --- a/arch/powerpc/platforms/pseries/Kconfig
 +++ b/arch/powerpc/platforms/pseries/Kconfig
 @@ -21,7 +21,7 @@ config PPC_SPLPAR
  
  config EEH
   bool PCI Extended Error Handling (EEH) if EMBEDDED
 - depends on PPC_PSERIES
 + depends on PPC_PSERIES  PCI
   default y if !EMBEDDED
  
  config SCANLOG
 -- 
 1.5.3.5
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@ozlabs.org
 https://ozlabs.org/mailman/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH v2] pci hotplug: fix rpaphp directory naming

2007-11-14 Thread Linas Vepstas

Fix presentation of the slot number in the /sys/bus/pci/slots
directory to match that used in the majority of other drivers.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

--
On Tue, Nov 13, 2007 at 07:26:07PM -0800, Greg KH wrote:
 We need a signed-off-by: to be able to apply this...

Whoops. See above. Same patch as list time, no changes.

On Tue, Nov 13, 2007 at 02:58:30PM -0700, Matthew Wilcox wrote:
 On Tue, Nov 13, 2007 at 03:41:21PM -0600, Linas Vepstas wrote:
  /sys/bus/pci/slots
  /sys/bus/pci/slots/control
  /sys/bus/pci/slots/control/remove_slot
  /sys/bus/pci/slots/control/add_slot
  /sys/bus/pci/slots/0001:00:02.0
  /sys/bus/pci/slots/0001:00:02.0/phy_location

 Ugh.  Almost two years ago, paulus promised me he was going to fix the
 slot name for rpaphp.  Guess he didn't.

You have to ask the right person. :-) I've been defacto mainaining
the rpaphp code for unpteen years now. On the other hand, I am also
much, much better at promising than delivering.

 This is one of the hateful things about the current design -- hotplug
 drivers do too much.  Instead of being just the interface between the
 Linux PCI code and the hardware, they create sysfs directories, add
 files,
 and generally have far too much freedom.

I chopped out several hundred LOC from rpaphp a year ago,
and hopefuly that might make furthre simplification easier 
someday.

 We have four different schemes currently for naming in slots/,
 1. slot number.  Used by cpqphp, ibmphp, acpiphp, pciehp, shpc.
 2. domain:bus:dev:fn.  Used by fakephp.
 3a. domain:bus:dev.  Used by rpaphp and sgihp.
 3b. Except that rpaphp uses phy_location to present the information
 that
 should be in the name and sgihp uses path.

 ... I've forgotten what cpci uses.  And yenta doesn't use it.

 How is anyone supposed to write sane managability tools in the
 presence
 of such anarchy?

  ~ # cat /sys/bus/pci/slots/:00:02.2/phy_location
  U787A.001.DNZ00Z5-P1-C2

 Right.  This should look like:

 # cat /sys/bus/pci/slots/U787A.001.DNZ00Z5-P1-C2/address
 :00:02

This patch implements exactly what you describe. Boot tested.
I assume you really mean it -- if so, then please review and
ack the patch !?

I have absolutely no clue if this breaks any existing IBM tools.
I'm pretty sure it doesn't ... but attention Mike Strosaker! does it?


 drivers/pci/hotplug/rpaphp.h  |1 
 drivers/pci/hotplug/rpaphp_pci.c  |   14 ---
 drivers/pci/hotplug/rpaphp_slot.c |   47 +++---
 3 files changed, 24 insertions(+), 38 deletions(-)

Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_pci.c  2007-07-08 
18:32:17.0 -0500
+++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c   2007-11-13 
17:52:10.0 -0600
@@ -64,19 +64,6 @@ int rpaphp_get_sensor_state(struct slot 
return rc;
 }
 
-static void set_slot_name(struct slot *slot)
-{
-   struct pci_bus *bus = slot-bus;
-   struct pci_dev *bridge;
-
-   bridge = bus-self;
-   if (bridge)
-   strcpy(slot-name, pci_name(bridge));
-   else
-   sprintf(slot-name, %04x:%02x:00.0, pci_domain_nr(bus),
-   bus-number);
-}
-
 /**
  * rpaphp_enable_slot - record slot state, config pci device
  *
@@ -114,7 +101,6 @@ int rpaphp_enable_slot(struct slot *slot
info-adapter_status = EMPTY;
slot-bus = bus;
slot-pci_devs = bus-devices;
-   set_slot_name(slot);
 
/* if there's an adapter in the slot, go add the pci devices */
if (state == PRESENT) {
Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_slot.c 2007-07-08 
18:32:17.0 -0500
+++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c  2007-11-13 
18:05:13.0 -0600
@@ -33,23 +33,31 @@
 #include asm/rtas.h
 #include rpaphp.h
 
-static ssize_t location_read_file (struct hotplug_slot *php_slot, char *buf)
+static ssize_t address_read_file (struct hotplug_slot *php_slot, char *buf)
 {
-   char *value;
-   int retval = -ENOENT;
+   int retval;
struct slot *slot = (struct slot *)php_slot-private;
+   struct pci_bus *bus;
 
if (!slot)
-   return retval;
+   return -ENOENT;
 
-   value = slot-location;
-   retval = sprintf (buf, %s\n, value);
+   bus = slot-bus;
+   if (!bus)
+   return -ENOENT;
+
+   if (bus-self)
+   retval = sprintf(buf, pci_name(bus-self));
+   else
+   retval = sprintf(buf, %04x:%02x:00.0, 
+   pci_domain_nr(bus), bus-number);
+   
return retval;
 }
 
-static struct hotplug_slot_attribute php_attr_location = {
-   .attr = {.name = phy_location, .mode

[PATCH] pci hotplug: fix rpaphp directory naming

2007-11-13 Thread Linas Vepstas



Fix presentation of the slot number in the /sys/bus/pci/slots
directory to match that used in the majority of other drivers.

--

On Tue, Nov 13, 2007 at 02:58:30PM -0700, Matthew Wilcox wrote:
 On Tue, Nov 13, 2007 at 03:41:21PM -0600, Linas Vepstas wrote:
  /sys/bus/pci/slots
  /sys/bus/pci/slots/control
  /sys/bus/pci/slots/control/remove_slot
  /sys/bus/pci/slots/control/add_slot
  /sys/bus/pci/slots/0001:00:02.0
  /sys/bus/pci/slots/0001:00:02.0/phy_location

 Ugh.  Almost two years ago, paulus promised me he was going to fix the
 slot name for rpaphp.  Guess he didn't.

You have to ask the right person. :-) I've been defacto mainaining
the rpaphp code for unpteen years now. On the other hand, I am also
much, much better at promising than delivering.

 This is one of the hateful things about the current design -- hotplug
 drivers do too much.  Instead of being just the interface between the
 Linux PCI code and the hardware, they create sysfs directories, add
 files,
 and generally have far too much freedom.

I chopped out several hundred LOC from rpaphp a year ago,
and hopefuly that might make furthre simplification easier 
someday.

 We have four different schemes currently for naming in slots/,
 1. slot number.  Used by cpqphp, ibmphp, acpiphp, pciehp, shpc.
 2. domain:bus:dev:fn.  Used by fakephp.
 3a. domain:bus:dev.  Used by rpaphp and sgihp.
 3b. Except that rpaphp uses phy_location to present the information
 that
 should be in the name and sgihp uses path.

 ... I've forgotten what cpci uses.  And yenta doesn't use it.

 How is anyone supposed to write sane managability tools in the
 presence
 of such anarchy?

  ~ # cat /sys/bus/pci/slots/:00:02.2/phy_location
  U787A.001.DNZ00Z5-P1-C2

 Right.  This should look like:

 # cat /sys/bus/pci/slots/U787A.001.DNZ00Z5-P1-C2/address
 :00:02

This patch implements exactly what you describe. Boot tested.
I assume you really mean it -- if so, then please review and
ack the patch !?

I have absolutely no clue if this breaks any existing IBM tools.
I'm pretty sure it doesn't ... but attention Mike Strosaker! does it?


 drivers/pci/hotplug/rpaphp.h  |1 
 drivers/pci/hotplug/rpaphp_pci.c  |   14 ---
 drivers/pci/hotplug/rpaphp_slot.c |   47 +++---
 3 files changed, 24 insertions(+), 38 deletions(-)

Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_pci.c  2007-07-08 
18:32:17.0 -0500
+++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c   2007-11-13 
17:52:10.0 -0600
@@ -64,19 +64,6 @@ int rpaphp_get_sensor_state(struct slot 
return rc;
 }
 
-static void set_slot_name(struct slot *slot)
-{
-   struct pci_bus *bus = slot-bus;
-   struct pci_dev *bridge;
-
-   bridge = bus-self;
-   if (bridge)
-   strcpy(slot-name, pci_name(bridge));
-   else
-   sprintf(slot-name, %04x:%02x:00.0, pci_domain_nr(bus),
-   bus-number);
-}
-
 /**
  * rpaphp_enable_slot - record slot state, config pci device
  *
@@ -114,7 +101,6 @@ int rpaphp_enable_slot(struct slot *slot
info-adapter_status = EMPTY;
slot-bus = bus;
slot-pci_devs = bus-devices;
-   set_slot_name(slot);
 
/* if there's an adapter in the slot, go add the pci devices */
if (state == PRESENT) {
Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_slot.c 2007-07-08 
18:32:17.0 -0500
+++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c  2007-11-13 
18:05:13.0 -0600
@@ -33,23 +33,31 @@
 #include asm/rtas.h
 #include rpaphp.h
 
-static ssize_t location_read_file (struct hotplug_slot *php_slot, char *buf)
+static ssize_t address_read_file (struct hotplug_slot *php_slot, char *buf)
 {
-   char *value;
-   int retval = -ENOENT;
+   int retval;
struct slot *slot = (struct slot *)php_slot-private;
+   struct pci_bus *bus;
 
if (!slot)
-   return retval;
+   return -ENOENT;
 
-   value = slot-location;
-   retval = sprintf (buf, %s\n, value);
+   bus = slot-bus;
+   if (!bus)
+   return -ENOENT;
+
+   if (bus-self)
+   retval = sprintf(buf, pci_name(bus-self));
+   else
+   retval = sprintf(buf, %04x:%02x:00.0, 
+   pci_domain_nr(bus), bus-number);
+   
return retval;
 }
 
-static struct hotplug_slot_attribute php_attr_location = {
-   .attr = {.name = phy_location, .mode = S_IFREG | S_IRUGO},
-   .show = location_read_file,
+static struct hotplug_slot_attribute php_attr_address = {
+   .attr = {.name = address, .mode = S_IFREG | S_IRUGO},
+   .show = address_read_file

[PATCH] pci hotplug: rm bogus item in rpaphp struct

2007-11-13 Thread Linas Vepstas


Remove unused struct element.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 drivers/pci/hotplug/rpaphp.h |1 -
 drivers/pci/hotplug/rpaphp_pci.c |1 -
 2 files changed, 2 deletions(-)

Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp.h
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp.h  2007-11-13 
18:37:31.0 -0600
+++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp.h   2007-11-13 
19:00:42.0 -0600
@@ -76,7 +76,6 @@ struct slot {
char *name;
struct device_node *dn;
struct pci_bus *bus;
-   struct list_head *pci_devs;
struct hotplug_slot *hotplug_slot;
 };
 
Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_pci.c  2007-11-13 
18:37:31.0 -0600
+++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c   2007-11-13 
19:00:13.0 -0600
@@ -100,7 +100,6 @@ int rpaphp_enable_slot(struct slot *slot
 
info-adapter_status = EMPTY;
slot-bus = bus;
-   slot-pci_devs = bus-devices;
 
/* if there's an adapter in the slot, go add the pci devices */
if (state == PRESENT) {
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] powerpc/eeh: make sure warning message is printed.

2007-11-07 Thread Linas Vepstas


Fix old buglet; a warning message should have been printed 
when a hardware reset takes too long. 

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 arch/powerpc/platforms/pseries/eeh.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh.c
===
--- linux-2.6.23-rc8-mm1.orig/arch/powerpc/platforms/pseries/eeh.c  
2007-11-05 16:22:44.0 -0600
+++ linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh.c   2007-11-05 
16:24:17.0 -0600
@@ -325,7 +325,7 @@ eeh_wait_for_slot_status(struct pci_dn *
 
if (rets[2] == 0) return -1; /* permanently unavailable */
 
-   if (max_wait_msecs = 0) return -1;
+   if (max_wait_msecs = 0) break;
 
mwait = rets[2];
if (mwait = 0) {
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 11/16] Use of_get_next_child() in eeh_restore_bars()

2007-11-05 Thread Linas Vepstas

On Mon, Oct 29, 2007 at 02:46:13PM +1100, Michael Ellerman wrote:
 
 On Fri, 2007-10-26 at 17:29 +1000, Stephen Rothwell wrote:
  On Fri, 26 Oct 2007 16:54:43 +1000 (EST) Michael Ellerman [EMAIL 
  PROTECTED] wrote:
  
   - dn = pdn-node-child;
   - while (dn) {
   + for (dn = NULL; (dn = of_get_next_child(pdn-node, dn));)
  
  Just wondering if we need
  
  #define for_each_child_node(dn, parent) \
  for (dn = of_get_next_child(parent, NULL); dn; \
  dn = of_get_next_child(parent, dn))

Yes, I like this much better too, if for no other reason than
the for-loop tructure is more orthodox.

 Should we perhaps make it for_each_child_device_node() ?

foreach_of_device_node_child() or

of_foreach_device_node_child()

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 0/3] powerpc eeh: bug fixes for crashes, bad handling

2007-11-02 Thread Linas Vepstas

Hi Paul,

Please forward upstream the following three tiny patches 
for EEH bugs, including on crash, and one failure to 
reset correctly.

(I was planning on blasting you many many more patches,
involving MSI, but have had nothing but broken hardware
for the last few weeks, and so have nothing to show. 
Dang, cause I needed the msi fixes for 2.6.24. Oh well.)

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 1/3] powerpc eeh: cleanup comments

2007-11-02 Thread Linas Vepstas


Clean up commentary, remove dead code.

Signed-off-by Linas Vepstas [EMAIL PROTECTED]


 arch/powerpc/platforms/pseries/eeh_driver.c |8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

Index: linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh_driver.c
===
--- linux-2.6.23-rc8-mm1.orig/arch/powerpc/platforms/pseries/eeh_driver.c   
2007-10-16 11:39:18.0 -0500
+++ linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh_driver.c
2007-10-16 11:46:30.0 -0500
@@ -113,9 +113,9 @@ static void eeh_report_error(struct pci_
 /**
  * eeh_report_mmio_enabled - tell drivers that MMIO has been enabled
  *
- * Report an EEH error to each device driver, collect up and
- * merge the device driver responses. Cumulative response
- * passed back in userdata.
+ * Tells each device driver that IO ports, MMIO and config space I/O
+ * are now enabled. Collects up and merges the device driver responses.
+ * Cumulative response passed back in userdata.
  */
 
 static void eeh_report_mmio_enabled(struct pci_dev *dev, void *userdata)
@@ -123,8 +123,6 @@ static void eeh_report_mmio_enabled(stru
enum pci_ers_result rc, *res = userdata;
struct pci_driver *driver = dev-driver;
 
-   // dev-error_state = pci_channel_mmio_enabled;
-
if (!driver ||
!driver-err_handler ||
!driver-err_handler-mmio_enabled)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 2/3]: powerpc eeh: drivers that need reset trump others

2007-11-02 Thread Linas Vepstas


Bugfix: if a driver controlling one part of a multi-function
pci card has asked for a reset, honor that request above all 
othres.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 arch/powerpc/platforms/pseries/eeh_driver.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh_driver.c
===
--- linux-2.6.23-rc8-mm1.orig/arch/powerpc/platforms/pseries/eeh_driver.c   
2007-10-16 11:46:30.0 -0500
+++ linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh_driver.c
2007-10-16 11:54:27.0 -0500
@@ -105,9 +105,10 @@ static void eeh_report_error(struct pci_
return;
 
rc = driver-err_handler-error_detected (dev, pci_channel_io_frozen);
+
+   /* A driver that needs a reset trumps all others */
+   if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
if (*res == PCI_ERS_RESULT_NONE) *res = rc;
-   if (*res == PCI_ERS_RESULT_DISCONNECT 
-rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
 }
 
 /**
@@ -129,9 +130,10 @@ static void eeh_report_mmio_enabled(stru
return;
 
rc = driver-err_handler-mmio_enabled (dev);
+
+   /* A driver that needs a reset trumps all others */
+   if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
if (*res == PCI_ERS_RESULT_NONE) *res = rc;
-   if (*res == PCI_ERS_RESULT_DISCONNECT 
-rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
 }
 
 /**
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 3/3]: powerpc eeh: avoid crash on null device.

2007-11-02 Thread Linas Vepstas


Bugfix: avoid crash if there's no pci device for a given
openfirmware node.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

 arch/powerpc/platforms/pseries/eeh.c |   11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

Index: linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh.c
===
--- linux-2.6.23-rc8-mm1.orig/arch/powerpc/platforms/pseries/eeh.c  
2007-10-16 13:55:03.0 -0500
+++ linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh.c   2007-10-16 
14:04:39.0 -0500
@@ -186,6 +186,11 @@ static size_t gather_pci_data(struct pci
n += scnprintf(buf+n, len-n, cmd/stat:%x\n, cfg);
printk(KERN_WARNING EEH: PCI cmd/status register: %08x\n, cfg);
 
+   if (!dev) {
+   printk(KERN_WARNING EEH: no PCI device for this of node\n);
+   return n;
+   }
+
/* Gather bridge-specific registers */
if (dev-class  16 == PCI_BASE_CLASS_BRIDGE) {
rtas_read_config(pdn, PCI_SEC_STATUS, 2, cfg);
@@ -198,7 +203,7 @@ static size_t gather_pci_data(struct pci
}
 
/* Dump out the PCI-X command and status regs */
-   cap = pci_find_capability(pdn-pcidev, PCI_CAP_ID_PCIX);
+   cap = pci_find_capability(dev, PCI_CAP_ID_PCIX);
if (cap) {
rtas_read_config(pdn, cap, 4, cfg);
n += scnprintf(buf+n, len-n, pcix-cmd:%x\n, cfg);
@@ -210,7 +215,7 @@ static size_t gather_pci_data(struct pci
}
 
/* If PCI-E capable, dump PCI-E cap 10, and the AER */
-   cap = pci_find_capability(pdn-pcidev, PCI_CAP_ID_EXP);
+   cap = pci_find_capability(dev, PCI_CAP_ID_EXP);
if (cap) {
n += scnprintf(buf+n, len-n, pci-e cap10:\n);
printk(KERN_WARNING
@@ -222,7 +227,7 @@ static size_t gather_pci_data(struct pci
printk(KERN_WARNING EEH: PCI-E %02x: %08x\n, i, cfg);
}
 
-   cap = pci_find_ext_capability(pdn-pcidev, PCI_EXT_CAP_ID_ERR);
+   cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
if (cap) {
n += scnprintf(buf+n, len-n, pci-e AER:\n);
printk(KERN_WARNING
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [BUG] powerpc does not save msi state [was Re: [PATCH 5/7] pci: Export the pci_restore_msi_state() function

2007-10-22 Thread Linas Vepstas

On Fri, Oct 19, 2007 at 05:53:08PM -0700, David Miller wrote:
 From: [EMAIL PROTECTED] (Linas Vepstas)
 Date: Fri, 19 Oct 2007 19:46:10 -0500

  FWIW, it looks like not all that many arches do this; the output
  for grep -r address_hi * is pretty thin. Then, looking at
  i386/kernel/io_apic.c as an example, one can see that the 
  msi state save happens by accident if CONFIG_SMP is enabled;
  and so its surely broekn on uniprocesor machines.

 I don't see this, in all cases write_msi_msg() will transfer
 the given *msg to entry-msg by this assignment in
 drivers/pci/msi.c:

 void write_msi_msg(unsigned int irq, struct msi_msg *msg)
 {
  ...
   entry-msg = *msg;
 }

 So as long as write_msi_msg() is invoked, it will be saved
 properly.

As Michael Ellerman points out, the pseries msi setup is done
by firmware, and so this bit never happens. 

As discussed in the other thread, I'll try to set up a patch
for an arch callback for restoring msi state.

-linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 5/7] pci: Export the pci_restore_msi_state() function

2007-10-22 Thread Linas Vepstas

On Tue, Oct 23, 2007 at 07:24:27AM +1000, Benjamin Herrenschmidt wrote:
 
 On Mon, 2007-10-22 at 13:13 -0500, Linas Vepstas wrote:
  On Mon, Oct 22, 2007 at 11:49:24AM +1000, Michael Ellerman wrote:
   
   On pseries there's a chance it will work for PCI error recovery, but if
   so it's just lucky that firmware has left everything configured the same
   way. 
  
  ? The papr is quite clear that i is up to the OS to restore the msi
  state after an eeh error.
 
 Via direct config space access or via firmware change-msi calls ?

Direct config space access. It says that the OS is supposed to read the
MSI config (after its been set up), save it, and restore it, (via direct
config space writes) if the device is ever reset.

 I don't know why you keep talking about powerpc laptops here ... 

Well, there are Apple laptops, right?  Aren't those the powermac 
platform?  Now, I don't know if they support MSI, but if they do,
I get the impression that they might not restore msi state correctly,
after being put into hardware suspend.  But perhaps I'm mistaken;
I was simply grepping for various msi-related functions in various
arch subdirectories, comparing x86 to other arches, and noticed 
that code that would restore msi state seems to be missing for
most arches and most powerpc platforms.

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[BUG] powerpc does not save msi state [was Re: [PATCH 5/7] pci: Export the pci_restore_msi_state() function

2007-10-19 Thread Linas Vepstas

Hi,

On Fri, Oct 19, 2007 at 05:27:06PM -0700, David Miller wrote:
 From: [EMAIL PROTECTED] (Linas Vepstas)
 Date: Fri, 19 Oct 2007 19:04:21 -0500

  I'm working in linux-2.6.23-rc8-mm1 at the moment, and I don't see
  that happening. viz. read_msi_msg() is not called anywhere, and I need
  to have valid msg-address_lo and msg-address_hi and msg-data
  in order to be able to restore.

 See the pci_restore_msi_state() call done from pci_restore_state()
 in drivers/pci/pci.c, that pci_restore_msi_state() code in
 drivers/pci/msi.c very much relies upon the entry-msg values
 being uptodate and valid.

 The MSI arch layer code is supposed to fill the entry-msg values in
 via arch_setup_msi_irq().  Perhaps the pseries code is forgetting to
 do that.

Yep.  Thank you for confirming the correct location for the fix.

FWIW, it looks like not all that many arches do this; the output
for grep -r address_hi * is pretty thin. Then, looking at
i386/kernel/io_apic.c as an example, one can see that the 
msi state save happens by accident if CONFIG_SMP is enabled;
and so its surely broekn on uniprocesor machines.

I'm cc'ing the powerpc mailing list to point this out: 
it looks like only cell/axon_msi.c and mpic_u3msi.c 
bother do do anything.  I guess that there aren't any old 
macintosh laptops that have msi on them? Because without
this, suspend and resume breaks.

Paul,
On the off chance your reading this, I'll send a pseries
patch on Monday, with luck (and some other patches too).
I'm not touching any of the other plaforms, you and benh 
would know those better.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 2/2] Use of_get_pci_dev_node() in axon_msi.c

2007-10-18 Thread Linas Vepstas

On Thu, Oct 18, 2007 at 11:27:23AM +1000, Michael Ellerman wrote:
 
 It does what pci_device_to_OF_node() does, but in the right way. 
 
 The plan is to remove pci_device_to_OF_node() once all the callers have
 been converted to properly handle the refcounting. 

Oh. Yes. well, of course, then. Excellent reason. I didn't get 
that from the patch commit comments. So, FWIW:

Ack'ed-by: Linas Vepstas [EMAIL PROTECTED]

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: Merge dtc

2007-10-17 Thread Linas Vepstas

On Wed, Oct 17, 2007 at 02:59:04PM -0500, Timur Tabi wrote:
 Kumar Gala wrote:
 
  Just out of interest who's complaining?  We don't include mkimage for  
  u-boot related builds and I haven't seen any gripes related to that.
 
 I think we should include mkimage *and* dtc.  But then, I'm not sure how much 
 weight my opinion has. :-)

Isn't anyone concerned about the defacto fork-of-source-code that 
this causes? Which will be the official version? How will the code
baes be kept in sync?

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 2/2] Use of_get_pci_dev_node() in axon_msi.c

2007-10-17 Thread Linas Vepstas

On Wed, Oct 17, 2007 at 05:12:27PM +1000, Michael Ellerman wrote:

 +struct device_node *of_get_pci_dev_node(struct pci_dev *pdev)
 +{
 +   return of_node_get(pci_device_to_OF_node(pdev));
 +}

[...]

 - dn = of_node_get(pci_device_to_OF_node(dev));
 + dn = of_get_pci_dev_node(dev);

Is this really useful or wise?

As a matter of personal taste, I find stuff like this clutters
and confuses my mind. I go to read new code, and I run across some
routine I haven't heard of before ... e.g. of_get_pci_dev_node(),
so now I have to look it up to see what it does.  A few minutes later, 
I realize that its just a pair of old freinds (of_node_get and 
pci_device_to_OF_node) and so now I have to make mental room for it.  

Tommorrow, or 3 days later, I'm again looking at of_get_pci_dev_node()
and I'm thinking gee what did that thing do again??

I don't much like this style, and I've been known to submit
patches that remove stuff like this ... 

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH v4 or so] Use 1TB segments

2007-10-11 Thread Linas Vepstas

On Thu, Oct 11, 2007 at 08:37:10PM +1000, Paul Mackerras wrote:
 This makes the kernel use 1TB segments for all kernel mappings and for
 user addresses of 1TB and above, on machines which support them
 (currently POWER5+, POWER6 and PA6T).

Gack. A system dump might take a while on these machines ... 

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [patch v2] PS3: Add os-area database routines

2007-10-09 Thread Linas Vepstas

On Mon, Oct 08, 2007 at 06:07:24PM -0700, Geoff Levand wrote:
 Subject: PS3: Add os-area database routines
 
 Add support for a simple tagged database in the PS3 flash rom
 os-area.  The database allows the flash rom os-area to be shared
 between a bootloader and installed operating systems.   The
 application ps3-flash-util or the library libps3-utils from the
 ps3-utils package can be used for userspace database operations.

Perhaps I missed the discussion; but .. out of general curiosity,
what is the relation between this and the ppc_md.nvram_* system?
I note that pseries, powermac, chrp, celleb implement the nvram calls,
but cell and ps3 do not. So clearly, whatever this is, its not 
layered on top of nvram?

FWIW, I don't quite understand the nvram system; it seems to have
partitions; one part is an os area, and a chuck of it is set 
up as a file system.

So I'm wondering -- wouldn't the DB os-area be generically 
interesting to other ppowerpc platforms? Maybe even other arches?
And why isn't this built on top of the nvram structure? ... etc?

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: Hard hang in hypervisor!?

2007-10-09 Thread Linas Vepstas

On Tue, Oct 09, 2007 at 04:18:19PM -0500, Nathan Lynch wrote:
 Linas Vepstas wrote:
  
  I was futzing with linux-2.6.23-rc8-mm1 in a power6 lpar when,
  for whatever reason, a spinlock locked up. The bizarre thing 
  was that the rest of system locked up as well: an ssh terminal,
  and also an hvc console.
  
  Breaking into the debugger showed 4 cpus, 1 of which was 
  deadlocked in the spinlock, and the other 3 in 
  .pseries_dedicated_idle_sleep
  
  This was, ahhh, unexpected.  What's up with that? Can
  anyone provide any insight?
 
 Sounds consistent with a task trying to double-acquire the lock, or an
 interrupt handler attempting to acquire a lock that the current task
 holds.  Or maybe even an uninitialized spinlock.  Do you know which
 lock it was?

Not sure .. trying to find out now. But why would that kill the
ssh session, and the console? Sure, so maybe one cpu is spinning,
but the other three can still take interrupts, right?  The ssh session
should have been generating ethernet card interrupts, and the console
should have been generating hvc interrupts.  

Err ..  it was cpu 0 that was spinlocked.  Are interrupts not
distributed?

Perhaps I should IRC this ... 

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

never mind .. [was Re: Hard hang in hypervisor!?

2007-10-09 Thread Linas Vepstas

On Tue, Oct 09, 2007 at 04:28:10PM -0500, Linas Vepstas wrote:
 
 Perhaps I should IRC this ... 

yeah. I guess I'd forgotten how funky things can get. So never mind ... 

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] Eval boards should not need to mess with ROOT_DEV

2007-10-08 Thread Linas Vepstas

On Mon, Oct 08, 2007 at 02:41:54PM -0500, Kumar Gala wrote:
 
 On Oct 8, 2007, at 2:03 PM, Grant Likely wrote:
 
  I can't see a good reason for eval board platform code to mess with
  the
  ROOT_DEV value instead of using the default behaviour (so I'm  
 
  Powermac and pseries also do this weirdness.  Should it be removed
  from there too?
 
 We need benh to make a comment about powermac.
 
 I think its ok to remove everywhere but we should see if he has any  
 issue.

Ack. I see no problems in removing it.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] Eval boards should not need to mess with ROOT_DEV

2007-10-08 Thread Linas Vepstas

On Mon, Oct 08, 2007 at 03:42:21PM -0500, Linas Vepstas wrote:
 On Mon, Oct 08, 2007 at 02:41:54PM -0500, Kumar Gala wrote:
  
  On Oct 8, 2007, at 2:03 PM, Grant Likely wrote:
  
   I can't see a good reason for eval board platform code to mess with
   the
   ROOT_DEV value instead of using the default behaviour (so I'm  
  
   Powermac and pseries also do this weirdness.  Should it be removed
   from there too?
  
  We need benh to make a comment about powermac.
  
  I think its ok to remove everywhere but we should see if he has any  
  issue.
 
 Ack. I see no problems in removing it.

Err. I meant my comment to be of limited scope: for pseries. 
I know nothing of other platforms.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-05 Thread Linas Vepstas

On Thu, Oct 04, 2007 at 05:01:47PM -0700, Nish Aravamudan wrote:
 On 10/2/07, Tony Breeds [EMAIL PROTECTED] wrote:
  On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
 
   I realise it'll make the patch bigger, but this doesn't seem like a
   particularly good name for the variable anymore.
 
  Sure, what about?
 
  Clarify when RTAS logging is enabled.
 
  Signed-off-by: Tony Breeds [EMAIL PROTECTED]
 
 For what it's worth, on a different ppc64 box, this resolves a similar
 panic for me.
 
 Tested-by: Nishanth Aravamudan [EMAIL PROTECTED]

For the reasons explained, I'd really like to nack Tony's patch.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: Stdout console clogging = 300ms blocked

2007-10-04 Thread Linas Vepstas

Hi Bernard,

On Wed, Oct 03, 2007 at 08:49:12PM +, Hollis Blanchard wrote:
 On Tue, 02 Oct 2007 09:41:28 +0200, Willaert, Bernard wrote:
 
  Problem:
  When we log debug output via the serial console on a multithreaded
  application, the console throughput may get clogged and then we
  experience a 300ms deadlock.
  
  #define THREAD_DELAY 1000
  usleep(THREAD_DELAY);
  fprintf(stdout, - thread 1\n);

[...]
  
  usleep(THREAD_DELAY);
  fprintf(stdout, - thread 2\n);
  
  baudrate=115200

OK, lets do the math. 115200 baud == approx 115200 bits per second
assuming 8N1 for stop  parity bits, I get approx 9 bits per byte
so your serial port is capable of 115.2/9 = 12.8KBytes per second.

Now, every millisecond, you are attempting to print

 - thread 1\n

Lets see, thats 17 bytes. And also  - thread 2\n for
a grand total of 34 bytes per millisecond.

And you are attempting to jam this through a serial line capable
of 12.8 Bytes per millisecond?  Well, of course it won't fit!

  Real output on the console:
  
   /\ 
   - thread 1
   - thread 2
   - thread 1
   - thread 2
   - thread 1
   - thread 2
  !!! thread2 interval timeout = 335 ms

Well, thread 1 clearly also had a delay of 335 milliseconds
for a total of 670 milliseconds delay.

Now, theoretically, we should have seen a delay equal to 
   (34 - 12.8)/34 = 0.623 seconds

I'd say that theory and practice match up pretty damned well;
I see no evidence of any problem at all.

 Could you not post HTML please? Thanks.

Agreed.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-10-04 Thread Linas Vepstas

On Mon, Oct 01, 2007 at 07:27:30PM -0600, Matthew Wilcox wrote:
 
 The thing to remember is that sym2 is in transition from being a dual
 BSD/Linux driver to being a purely Linux driver. 

I was wondering about that; couldn't tell if the split in the code
was historical, or being intentionally maintained.

  My gut instinct is to say ack, although prudence dictates that 
  I should test first. Which might take a few days...
 
 Fine by me.  

I tested the patch, it worked great. It also seemed to recover 
much more quickly -- so quickly, in fact, that I thought something 
had gone wrong.

I reviewed it one more time, it really does look good. A formal
submission and acked by's at earliest convenience would be good. 

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] powerpc: fix crash in rtas during early boot.

2007-10-03 Thread Linas Vepstas


RTAS messages can occur very early during boot, before the error
message buffer has been allocated. The current code will lead to 
a null-pointer deref. Explicitly protect against this.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Cc: Andy Whitcroft [EMAIL PROTECTED]


Andy Whitcroft's crash was appearently due to firmware complaining
about lost power, (actually, lost power supply redundancy!), which
occurred very early during boot. 

Type0040 (EPOW)
Status: bypassed new
Residual error from previous boot.
EPOW Sensor Value:  0002
EPOW warning due to loss of redundancy.
EPOW general power fault.

I've no clue why firmware thought it was OK to report this 
during one of the earliest calls to RTAS; I'm still investiigating 
that.

 arch/powerpc/platforms/pseries/rtasd.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/rtasd.c
===
--- linux-2.6.23-rc8-mm1.orig/arch/powerpc/platforms/pseries/rtasd.c
2007-09-26 15:06:49.0 -0500
+++ linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/rtasd.c 2007-10-03 
11:58:09.0 -0500
@@ -235,6 +235,12 @@ void pSeries_log_error(char *buf, unsign
return;
}
 
+   /* During early boot, the log buffer hasn't been allocted yet. */
+   if (rtas_log_buf == NULL) {
+   spin_unlock_irqrestore(rtasd_log_lock, s);
+   return;
+   }
+
/* call type specific method for error */
switch (err_type  ERR_TYPE_MASK) {
case ERR_TYPE_RTAS_LOG:
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-03 Thread Linas Vepstas

On Wed, Oct 03, 2007 at 02:09:46PM +1000, Michael Ellerman wrote:
 
 Until we initialise what exactly?

Until we allocate the error log buffer. The original crash was 
for a null-pointer deref of the unallocated buffer. I just sent 
out a patch to fix this; its a bit simpler than the below.

In that email, I remarked:

Andy Whitcroft's crash was appearently due to firmware complaining
about lost power, (actually, lost power supply redundancy!), which
occurred very early during boot.

Type0040 (EPOW)
Status: bypassed new
Residual error from previous boot.
EPOW Sensor Value:  0002
EPOW warning due to loss of redundancy.
EPOW general power fault.

I've no clue why firmware thought it was OK to report this
during one of the earliest calls to RTAS; I'm still investiigating
that.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] powerpc: another use of zalloc_maybe_bootmem()

2007-10-02 Thread Linas Vepstas


Use alloc_maybe_bootmem() which wraps the if(mem_init_done)
malloc clause.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


On Tue, Oct 02, 2007 at 01:37:53PM +1000, Stephen Rothwell wrote:
 This patch introduces zalloc_maybe_bootmem and uses it so that we don;t
 have to mark a whole (largish) routine as __init_ref_ok.

sfr missed a spot -- may as well get rid of this one too.


 arch/powerpc/kernel/pci-common.c |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

Index: linux-2.6.23-rc8-mm1/arch/powerpc/kernel/pci-common.c
===
--- linux-2.6.23-rc8-mm1.orig/arch/powerpc/kernel/pci-common.c  2007-09-26 
15:02:41.0 -0500
+++ linux-2.6.23-rc8-mm1/arch/powerpc/kernel/pci-common.c   2007-10-02 
16:28:16.0 -0500
@@ -65,14 +65,11 @@ static void __devinit pci_setup_pci_cont
spin_unlock(hose_spinlock);
 }
 
-__init_refok struct pci_controller * pcibios_alloc_controller(struct 
device_node *dev)
+struct pci_controller * pcibios_alloc_controller(struct device_node *dev)
 {
struct pci_controller *phb;
 
-   if (mem_init_done)
-   phb = kmalloc(sizeof(struct pci_controller), GFP_KERNEL);
-   else
-   phb = alloc_bootmem(sizeof (struct pci_controller));
+   phb = alloc_maybe_bootmem(sizeof(struct pci_controller), GFP_KERNEL);
if (phb == NULL)
return NULL;
pci_setup_pci_controller(phb);
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-10-02 Thread Linas Vepstas

On Mon, Oct 01, 2007 at 07:27:30PM -0600, Matthew Wilcox wrote:
 
 Fine by me.  Do you have the ability to produce failures on a whim on
 your platforms?  

Yes, although it is very platform specific -- there are actually
transistors in the pci bridge chip, which actually short out lines,
and so, from the point of view of the rest of the chip, it did
actually see a real error. Its supposed to be a very realistic 
test.

 I've been vaguely musing a PCI device failure patch for
 x86, just so people can test driver failure paths.

That would be good ... I've recently agreed to accept a fedex
to test someone elses card for them, which is outside my usual
activities.

There's also supposed to be some PCI-X riser card out there, 
(never seen one) which has the ability to inject actual pci 
errors. Its the Agilent PCI BestX card; I got the impression 
they might not sell it anymore; dunno.

One guy in the lab used to brush a grounding strap across
the pins; this usually got a rise out of the audience.

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Linas Vepstas

On Mon, Sep 24, 2007 at 01:35:31PM +0100, Andy Whitcroft wrote:
 Seeing the following from an older power LPAR, pretty sure we had
 this in the previous -mm also:

I haven't forgetten about this ... and am looking at it now.
Seems that whenever I go to reserve the machine pSeries-102,
someone else is using it :-)

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-10-01 Thread Linas Vepstas

On Mon, Oct 01, 2007 at 02:12:47PM -0600, Matthew Wilcox wrote:
 
 I think the fundamental problem is that completions aren't really
 supposed to be used like this.  Here's one attempt at using completions
 perhaps a little more the way they're supposed to be used, 

Yes, that looks very good to me.  I see it solves a bug that
I hadn't been quite aware of. I don't understand why 
struct host_data is preferable to struct sym_shcb (is it because 
this is the structure that is naturally protectected by the 
spinlock?)

My gut instinct is to say ack, although prudence dictates that 
I should test first. Which might take a few days...

 although now
 I've written it, I wonder if we shouldn't just use a waitqueue instead.

I thought that earlier versions of the driver used waitqueues (I vaguely
remember eh_wait in the code), which were later converted to 
completions (I also vaguely recall thinking that the new code was
more elegant/simpler). I converted my patch to use the completions 
likewise, and, as you've clearly shown, did a rather sloppy job in 
the conversion.

I'm tempted to go with this patch; but if you prod, I could attempt
a wait-queue based patch.

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: Help! Debian ppc64

2007-09-27 Thread Linas Vepstas

On Thu, Sep 27, 2007 at 09:57:02AM -0400, Cesar Bello wrote:
 Hi, I'm writing from Venezuela. I have to prepair a presentation about
 Debian on IBM pSerires Servers with Power 5+ processors. My first question
 is what are the advantages of use Debian GNU/Linux on pSeries Servers?

Advantages as compared to what?

-- Debian on Intel? 
   +++ powerpc has better RAS features than Intel, for example, 
   my favorite, PCI error handling and recovery, or hotplug cpu,
   dynamic LPAR configuration, etc. 

-- SuSE/RedHat on PowerPC?
   +++ SuSE/RedHat offer formal support, for $$$, which debian/ubuntu do not

-- AIX on pSeries?
   +++ AIX has various enterprise features that Debian does not.

You might try talking to RedHat/SuSE product support, and also to IBM
pSeries sales.

Linas Vepstas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 5/7] Celleb: Supports VFD on Celleb 2

2007-09-27 Thread Linas Vepstas

On Thu, Sep 27, 2007 at 11:07:33AM +0200, Arnd Bergmann wrote:
 On Thursday 27 September 2007, Ishizaki Kou wrote:
  This is a patch to support VFD on Celleb 2.
  VFD is a small LCD to show miscellaneous messages.
  
  Signed-off-by: Kou Ishizaki [EMAIL PROTECTED]
 
   My feeling is that your interface should better be
   implemented as a character device, or be integrating into some other
   existing message interface, if we can find one.
 
 * The firmware seems to implement the generic rtas interface for
   display-character and set-indicator, but your driver is celleb specific.
   I'd be feel more comfortable if we could come up with a driver that also
   works on other systems that implement the same rtas calls.

Yep, I think I agree. Most pseries systems have a small two-line
LCD display.  Right now, the code that talks to it is implemented in
rtas_progress(). It has this name because its used only for printing
out boot progress messages. This is great for debugging hangs, but 
its not othrewise used.

I suppose it would be nice to have a geeric interface to the thing, 
and, after a quickie skim of the code, the celleb display looks similar
enogh that this abstraction could be made.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: tg3: PCI error recovery

2007-09-27 Thread Linas Vepstas


During a private conversation about how to save and restore 
device state after a pci error is detected, and the device is reset,
the following came up:

On Wed, Sep 26, 2007 at 04:48:38PM -0700, Michael Chan wrote:
  
   1b) If so, is it safe to call pci_save_state() in
   tg3_io_error_detected(), or are we to assume they've been corrupted?
  
  My conservative approach is to assume that anything and everything has
  been corrupted. (e.g. temporary undervoltage on the bus might scramble
  multiple registers)
 
 In that case, we should call pci_restore_msi_state() to restore the MSI
 state, but this call is only defined if CONFIG_PM is defined.

There seem to be two choices:
1) enable CONFIG_PM in those arches that care about recovering from PCI
   errors. (Yuck)

2) remove the ifdef CONFIG_PM from around pci_restore_msi_state() in 
   rivers/pci/msi.c

I'd go for choice 2, but I thought I'd ask first ...

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: 2.6.23-rc8 dies somewhere during boot!?

2007-09-27 Thread Linas Vepstas

On Thu, Sep 27, 2007 at 09:12:33PM +0200, Gerhard Pircher wrote:
 Hi,
 
 I'm working on a 2.6.23 kernel for the AmigaOne.

[...]

 6PCI: Probing PCI hardware.
 7PCI: Scanning bus 0...
 ...00:00:07.0.
 7PCI: Calling quirk...
 ...CI: Found :00:07.2 [1106/303...


Any chance that this thing has an e100 ethernet card in it? 
If so, edit drivers/pci/quirks.c and ifdef out the readb()
in the e100_quirk routine.

We're debating the proper fix on the pci mailing list now.

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: 2.6.23-rc8 dies somewhere during boot!?

2007-09-27 Thread Linas Vepstas

On Thu, Sep 27, 2007 at 09:31:31PM +0200, Gerhard Pircher wrote:
 
  Betreff: Re: 2.6.23-rc8 dies somewhere during boot!?
   
   I'm working on a 2.6.23 kernel for the AmigaOne.

Have you tried 2.6.22, or does that fail also?

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: 2.6.23-rc8 dies somewhere during boot!?

2007-09-27 Thread Linas Vepstas

On Thu, Sep 27, 2007 at 11:17:00PM +0200, Gerhard Pircher wrote:
  Betreff: Re: 2.6.23-rc8 dies somewhere during boot!?
 
 Do you have an idea how to debug it?

Not particularly. What caught my eye was the failure right near the
PCI quirk stuff, as I was having problems there as well (but apearntly,
for very different reasons).  Based on your boot messages, it looks 
like you are failing somewhere in pci probe.  My olde-fashioned, slow,
but-usually-works method is to sprinkle enough printk's into the code
to catch it in the act.

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-09-27 Thread Linas Vepstas

On Wed, Sep 26, 2007 at 09:02:16AM -0600, Matthew Wilcox wrote:
 On Fri, Apr 20, 2007 at 03:47:20PM -0500, Linas Vepstas wrote:
  Implement the so-called first failure data capture (FFDC) for the
  symbios PCI error recovery.  After a PCI error event is reported,
  the driver requests that MMIO be enabled. Once enabled, it 
  then reads and dumps assorted status registers, and concludes
  by requesting the usual reset sequence.
 
  +   /* Request that MMIO be enabled, so register dump can be taken. */
  +   return PCI_ERS_RESULT_CAN_RECOVER;
  +}
 
 I'm a little concerned by the mention of MMIO.  It's entirely possible
 for the sym2 driver to be using ioports to access the card rather than
 MMIO.  Is it simply that it can't on the platform you test on?

The comment is misleading. I've been in the bad habit of calling
it mmio whenever its not DMA.

The habit is because there are two distinct enable bits in the 
pci-host bridge during error recovery: one to enable mmio/ioports, 
and the other to enable DMA. If the adapter has gone crazy, I don't 
want to enable DMA, so that it doesn't scribble to bad places. But, 
by enabling mmio/ioports, perhaps it can be finessed back into a 
semi-sane state, e.g. sane enough to perform a dump of its internal
state.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: 2.6.23-rc8 dies somewhere during boot!?

2007-09-27 Thread Linas Vepstas

On Thu, Sep 27, 2007 at 11:57:35PM +0200, Gerhard Pircher wrote:
 
  Based on your boot messages, it looks like you are failing somewhere in
  pci probe.  My olde-fashioned, slow, but-usually-works method is to
  sprinkle enough printk's into the code to catch it in the act.
 I guess the code in arch/powerpc/pci*.c is the right place to sprinkle
 some printk's into the code?

The last identifiable message I was

7PCI: Calling quirk...

which is from drivers/pci/quirks.c

...CI: Found :00:07.2 [1106/303...

and this is from pci_setup_device() in drivers/pci/probe.c  So I'd look
to see if pci_setup_device() ever returned, and then I'd look to see
what happened next.

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-09-27 Thread Linas Vepstas

On Thu, Sep 27, 2007 at 04:10:31PM -0600, Matthew Wilcox wrote:
 In the error handler, we wait_for_completion(io_reset_wait).
 In sym2_io_error_detected, we init_completion(io_reset_wait).
 Isn't it possible that we hit the error handler before we hit the
 io_error_detected path, and thus the completion wait is lost?
 Since the completion is already initialised in sym_attach(), I don't
 think we need to initialise it in sym2_io_error_detected().
 Makes sense to just delete it?

Good catch. But no ... and I had to study this a bit. Bear with me:

It is enough to call init_completion() once, and not once per use:
it initializes spinlocks, which shouldn't be intialized twice. 

But, that completion might be used multiple times when there are
multiple errors, and so, before using it a second time, one must 
set completion-done = 0.  The INIT_COMPLETION() macro does this. 

One must have completion-done = 0 before every use, as otherwise, 
wait_for_completion() won't actually wait. And since complete_all()
sets x-done += UINT_MAX/2, I'm pretty sure x-done won't be zero
the next time we use it, unless we make it so.

So I need to find a place to safely call INIT_COMPLETION() again, 
after the completion has been used. At the moment, I'm stumped
as to where to do this. 

 [think ... think ... think] 

I think the race you describe above is harmless. The first time
that sym_eh_handler() will run, it will be with SYM_EH_ABORT, 
in it doesn't matter if we lose that, since the device is hosed
anyway. At some later time, it will run with SYM_EH_DEVICE_RESET
and then SYM_EH_BUS_RESET and then SYM_EH_HOST_RESET, and we won't 
miss those, since, by now, sym2_io_error_detected() will have run.

So, by my reading, I'd say that init_completion() in
sym2_io_error_detected() has to stay (although perhaps
it should be replaced by the INIT_COMPLETION() macro.)
Removing it will prevent correct operation on the second 
and subsequent errors.

--Linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [EMAIL PROTECTED]: 2.6.23-rc6-mm1 -- powerpc pSeries_log_error panic in rtas_call/early_enable_eeh]

2007-09-24 Thread Linas Vepstas

I just got back from vacation.

I'll give this a whirl shortly.

--linas

On Sun, Sep 23, 2007 at 11:17:40AM -0500, Anton Blanchard wrote:
 
 Hi Linas,
 
 Looks like EEH could be involved :)
 
 Anton
 
 - Forwarded message from Andy Whitcroft [EMAIL PROTECTED] -
 
 From: Andy Whitcroft [EMAIL PROTECTED]
 To: Andrew Morton [EMAIL PROTECTED]
 Subject: 2.6.23-rc6-mm1 -- powerpc pSeries_log_error panic in
   rtas_call/early_enable_eeh
 X-SPF-Guess: neutral
 Cc: linuxppc-dev@ozlabs.org, [EMAIL PROTECTED]
 X-BeenThere: linuxppc-dev@ozlabs.org
 X-Mailman-Version: 2.1.9
 List-Id: Linux on PowerPC Developers Mail List linuxppc-dev.ozlabs.org
 List-Unsubscribe: https://ozlabs.org/mailman/listinfo/linuxppc-dev,
   mailto:[EMAIL PROTECTED]
 List-Archive: http://ozlabs.org/pipermail/linuxppc-dev
 List-Post: mailto:linuxppc-dev@ozlabs.org
 List-Help: mailto:[EMAIL PROTECTED]
 List-Subscribe: https://ozlabs.org/mailman/listinfo/linuxppc-dev,
   mailto:[EMAIL PROTECTED]
 
 Seeing the following panic booting an old powerpc LPAR:
 
 Unable to handle kernel paging request for data at address 0x
 Faulting instruction address: 0xc0047b48
 cpu 0x0: Vector: 300 (Data Access) at [c06a3750]
 pc: c0047b48: .pSeries_log_error+0x364/0x420
 lr: c0047acc: .pSeries_log_error+0x2e8/0x420
 sp: c06a39d0
msr: 80001032
dar: 0
  dsisr: 4200
   current = 0xc05acab0
   paca= 0xc05ad700
 pid   = 0, comm = swapper
 enter ? for help
 [c06a3af0] c0021164 .rtas_call+0x200/0x250
 [c06a3ba0] c0049d50 .early_enable_eeh+0x168/0x360
 [c06a3c70] c002f674 .traverse_pci_devices+0x8c/0x138
 [c06a3d10] c0560ce8 .eeh_init+0x1a8/0x200
 [c06a3db0] c055fb70 .pSeries_setup_arch+0x128/0x234
 [c06a3e40] c054f830 .setup_arch+0x214/0x24c
 [c06a3ee0] c0546a38 .start_kernel+0xd4/0x3e4
 [c06a3f90] c045adc4 .start_here_common+0x54/0x58
 0:mon
 
 This machine is:
 
 # cat /proc/cpuinfo
 processor   : 0
 cpu : POWER4+ (gq)
 clock   : 1703.965296MHz
 revision: 19.0
 
 [...]
 machine : CHRP IBM,7040-681
 
 -apw
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@ozlabs.org
 https://ozlabs.org/mailman/listinfo/linuxppc-dev
 
 - End forwarded message -
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: Keep On Debugging You

2007-09-06 Thread Linas Vepstas

On Thu, Sep 06, 2007 at 10:21:43AM -0500, Timur Tabi wrote:
 Zhang Wei-r63237 wrote:
  Oops!
  
  Could you give us a live show version? :D  
 
 Sorry, I'm booked up for the rest of the year.

Hmm. Maybe someone could sneak a videocam into one of the
venues, and, you know, post a pirated, illegal copy on
youtube or something.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 0/3] powerpc: whitespace cleanup, grammar corrections

2007-09-06 Thread Linas Vepstas


These popped out at me while I was reading code.
Its all janitorial.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 2/3] powerpc: prom whitespace cleanup

2007-09-06 Thread Linas Vepstas


Whitespace cleanup: badly indented lines.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]



 arch/powerpc/kernel/prom.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

Index: linux-2.6.22-git2/arch/powerpc/kernel/prom.c
===
--- linux-2.6.22-git2.orig/arch/powerpc/kernel/prom.c   2007-08-29 
14:14:12.0 -0500
+++ linux-2.6.22-git2/arch/powerpc/kernel/prom.c2007-08-29 
14:15:10.0 -0500
@@ -782,13 +782,13 @@ static int __init early_init_dt_scan_cho
 #endif
 
 #ifdef CONFIG_KEXEC
-   lprop = (u64*)of_get_flat_dt_prop(node, linux,crashkernel-base, NULL);
-   if (lprop)
-   crashk_res.start = *lprop;
-
-   lprop = (u64*)of_get_flat_dt_prop(node, linux,crashkernel-size, NULL);
-   if (lprop)
-   crashk_res.end = crashk_res.start + *lprop - 1;
+   lprop = (u64*)of_get_flat_dt_prop(node, linux,crashkernel-base, NULL);
+   if (lprop)
+   crashk_res.start = *lprop;
+
+   lprop = (u64*)of_get_flat_dt_prop(node, linux,crashkernel-size, NULL);
+   if (lprop)
+   crashk_res.end = crashk_res.start + *lprop - 1;
 #endif
 
early_init_dt_check_for_initrd(node);
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 1/3] powerpc: prom_init whitespace cleanup, typo fix.

2007-09-06 Thread Linas Vepstas


Whitespace cleanup: badly indented lines.
Typo in comment.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]



 arch/powerpc/kernel/prom_init.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux-2.6.22-git2/arch/powerpc/kernel/prom_init.c
===
--- linux-2.6.22-git2.orig/arch/powerpc/kernel/prom_init.c  2007-07-08 
18:32:17.0 -0500
+++ linux-2.6.22-git2/arch/powerpc/kernel/prom_init.c   2007-08-28 
16:40:26.0 -0500
@@ -1197,7 +1197,7 @@ static void __init prom_initialize_tce_t
if ((type[0] == 0) || (strstr(type, RELOC(pci)) == NULL))
continue;
 
-   /* Keep the old logic in tack to avoid regression. */
+   /* Keep the old logic intact to avoid regression. */
if (compatible[0] != 0) {
if ((strstr(compatible, RELOC(python)) == NULL) 
(strstr(compatible, RELOC(Speedwagon)) == NULL) 
@@ -2224,7 +2224,7 @@ static void __init fixup_device_tree(voi
 
 static void __init prom_find_boot_cpu(void)
 {
-   struct prom_t *_prom = RELOC(prom);
+   struct prom_t *_prom = RELOC(prom);
u32 getprop_rval;
ihandle prom_cpu;
phandle cpu_pkg;
@@ -2244,7 +2244,7 @@ static void __init prom_find_boot_cpu(vo
 static void __init prom_check_initrd(unsigned long r3, unsigned long r4)
 {
 #ifdef CONFIG_BLK_DEV_INITRD
-   struct prom_t *_prom = RELOC(prom);
+   struct prom_t *_prom = RELOC(prom);
 
if (r3  r4  r4 != 0xdeadbeef) {
unsigned long val;
@@ -2277,7 +2277,7 @@ unsigned long __init prom_init(unsigned 
   unsigned long pp,
   unsigned long r6, unsigned long r7)
 {  
-   struct prom_t *_prom;
+   struct prom_t *_prom;
unsigned long hdr;
unsigned long offset = reloc_offset();
 
@@ -2336,8 +2336,8 @@ unsigned long __init prom_init(unsigned 
/*
 * Copy the CPU hold code
 */
-   if (RELOC(of_platform) != PLATFORM_POWERMAC)
-   copy_and_flush(0, KERNELBASE + offset, 0x100, 0);
+   if (RELOC(of_platform) != PLATFORM_POWERMAC)
+   copy_and_flush(0, KERNELBASE + offset, 0x100, 0);
 
/*
 * Do early parsing of command line
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 3/3] powerpc: setup_64 comment cleanup.

2007-09-06 Thread Linas Vepstas


Gramatical corrections to comments.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]



 arch/powerpc/kernel/prom.c |8 +---
 arch/powerpc/kernel/setup_64.c |6 +++---
 2 files changed, 8 insertions(+), 6 deletions(-)

Index: linux-2.6.22-git2/arch/powerpc/kernel/setup_64.c
===
--- linux-2.6.22-git2.orig/arch/powerpc/kernel/setup_64.c   2007-09-04 
17:29:36.0 -0500
+++ linux-2.6.22-git2/arch/powerpc/kernel/setup_64.c2007-09-05 
14:12:23.0 -0500
@@ -181,9 +181,9 @@ void __init early_setup(unsigned long dt
DBG( - early_setup(), dt_ptr: 0x%lx\n, dt_ptr);
 
/*
-* Do early initializations using the flattened device
-* tree, like retreiving the physical memory map or
-* calculating/retreiving the hash table size
+* Do early initialization using the flattened device
+* tree, such as retrieving the physical memory map or
+* calculating/retrieving the hash table size.
 */
early_init_devtree(__va(dt_ptr));
 
Index: linux-2.6.22-git2/arch/powerpc/kernel/prom.c
===
--- linux-2.6.22-git2.orig/arch/powerpc/kernel/prom.c   2007-09-05 
14:23:06.0 -0500
+++ linux-2.6.22-git2/arch/powerpc/kernel/prom.c2007-09-05 
14:24:49.0 -0500
@@ -433,9 +433,11 @@ static int __init early_parse_mem(char *
 }
 early_param(mem, early_parse_mem);
 
-/*
- * The device tree may be allocated below our memory limit, or inside the
- * crash kernel region for kdump. If so, move it out now.
+/**
+ * move_device_tree - move tree to an unused area, if needed.
+ *
+ * The device tree may be allocated beyond our memory limit, or inside the
+ * crash kernel region for kdump. If so, move it out of the way.
  */
 static void move_device_tree(void)
 {
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 2.6.23] ibmebus: Prevent bus_id collisions

2007-08-30 Thread Linas Vepstas

On Thu, Aug 30, 2007 at 04:00:56PM +0200, Joachim Fenkes wrote:
 
 Plus, I rather like using 
 the full_name since it also contains a descriptive name as opposed to 
 being just nondescript numbers, helping the layman (ie user) to make sense 
 out of a dev_id.

Yes, well, but no. The location code is useful as a geographical
location: slots and devices are physically labelled with stickers 
so you can tell which is which.  Handy when you have to unplug stuff. 
By contrast, the device-tree full_name is mostly just gobldy-gook, 
with some crazy phb numbering in there that, after four years of 
staring at them, I still can't reliably do anything useful with.  
Location codes are nice. 

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas

On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
 3) On modern systems the incoming packets are processed very fast. Especially
    on SMP systems when we use multiple queues we process only a few packets
    per napi poll cycle. So NAPI does not work very well here and the 
 interrupt 
    rate is still high. 

I saw this too, on a system that is modern but not terribly fast, and
only slightly (2-way) smp. (the spidernet)

I experimented wih various solutions, none were terribly exciting.  The
thing that killed all of them was a crazy test case that someone sprung on
me:  They had written a worst-case network ping-pong app: send one
packet, wait for reply, send one packet, etc.  

If I waited (indefinitely) for a second packet to show up, the test case 
completely stalled (since no second packet would ever arrive).  And if I 
introduced a timer to wait for a second packet, then I just increased 
the latency in the response to the first packet, and this was noticed, 
and folks complained.  

In the end, I just let it be, and let the system work as a busy-beaver, 
with the high interrupt rate. Is this a wise thing to do?  I was
thinking that, if the system is under heavy load, then the interrupt
rate would fall, since (for less pathological network loads) more 
packets would queue up before the poll was serviced.  But I did not
actually measure the interrupt rate under heavy load ... 

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas

On Fri, Aug 24, 2007 at 08:52:03AM -0700, Stephen Hemminger wrote:
 
 You need hardware support for deferred interrupts. Most devices have it 
 (e1000, sky2, tg3)
 and it interacts well with NAPI. It is not a generic thing you want done by 
 the stack,
 you want the hardware to hold off interrupts until X packets or Y usecs have 
 expired.

Just to be clear, in the previous email I posted on this thread, I
described a worst-case network ping-pong test case (send a packet, wait
for reply), and found out that a deffered interrupt scheme just damaged
the performance of the test case.  Since the folks who came up with the
test case were adamant, I turned off the defferred interrupts.  
While defferred interrupts are an obvious solution, I decided that 
they weren't a good solution. (And I have no other solution to offer).

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas

On Fri, Aug 24, 2007 at 09:04:56PM +0200, Bodo Eggert wrote:
 Linas Vepstas [EMAIL PROTECTED] wrote:
  On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
  3) On modern systems the incoming packets are processed very fast. 
  Especially
  on SMP systems when we use multiple queues we process only a few packets
  per napi poll cycle. So NAPI does not work very well here and the interrupt
  rate is still high.
  
  worst-case network ping-pong app: send one
  packet, wait for reply, send one packet, etc.
 
 Possible solution / possible brainfart:
 
 Introduce a timer, but don't start to use it to combine packets unless you
 receive n packets within the timeframe. If you receive less than m packets
 within one timeframe, stop using the timer. The system should now have a
 decent response time when the network is idle, and when the network is
 busy, nobody will complain about the latency.-)

Ohh, that was inspirational. Let me free-associate some wild ideas.

Suppose we keep a running average of the recent packet arrival rate,
Lets say its 10 per millisecond (typical for a gigabit eth runnning
flat-out).  If we could poll the driver at a rate of 10-20 per
millisecond (i.e. letting the OS do other useful work for 0.05 millisec),
then we could potentially service the card without ever having to enable 
interrupts on the card, and without hurting latency.

If the packet arrival rate becomes slow enough, we go back to an
interrupt-driven scheme (to keep latency down).

The main problem here is that, even for HZ=1000 machines, this amounts 
to 10-20 polls per jiffy.  Which, if implemented in kernel, requires 
using the high-resolution timers. And, umm, don't the HR timers require
a cpu timer interrupt to make them go? So its not clear that this is much
of a win.

The eHEA is a 10 gigabit device, so it can expect 80-100 packets per
millisecond for large packets, and even more, say 1K packets per
millisec, for small packets. (Even the spec for my 1Gb spidernet card
claims its internal rate is 1M packets/sec.) 

Another possiblity is to set HZ to 5000 or 2 or something humongous
... after all cpu's are now faster! But, since this might be wasteful,
maybe we could make HZ be dynamically variable: have high HZ rates when
there's lots of network/disk activity, and low HZ rates when not. That
means a non-constant jiffy.

If all drivers used interrupt mitigation, then the variable-high
frequency jiffy could take thier place, and be more fair to everyone.
Most drivers would be polled most of the time when they're busy, and 
only use interrupts when they're not.
 
--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas

On Fri, Aug 24, 2007 at 02:44:36PM -0700, David Miller wrote:
 From: David Stevens [EMAIL PROTECTED]
 Date: Fri, 24 Aug 2007 09:50:58 -0700

  Problem is if it increases rapidly, you may drop packets
  before you notice that the ring is full in the current estimated
  interval.

 This is one of many reasons why hardware interrupt mitigation
 is really needed for this.

When turning off interrupts, don't turn them *all* off.
Leave the queue-full interrupt always on.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: How to port PReP to arch/powerpc?

2007-08-22 Thread Linas Vepstas

On Wed, Aug 22, 2007 at 07:29:56AM -0500, Josh Boyer wrote:
 
 David Gibson and Rob Landley had a quite interesting discussion about
 PReP last night on IRC. 

?? Where? I scrolled back on #ppc64 on freenode, and see no such
conversation.

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

1 2 >

1 - 100 of 123 matches

Mail list logo