Re: [PATCH] docs: move powerpc under arch
Hi Jon, Got the message; I'm not an active maintainer, haven't been for over a decade, and cannot comment on style issues. But if all the other arches are doing this, I see no reason why not. Feel free to interpret this as an Acked-by: if that's appropriate. -- linas On Tue, Oct 3, 2023 at 11:05 AM Jonathan Corbet wrote: > Costa Shulyupin writes: > > > and fix all in-tree references. > > > > Architecture-specific documentation is being moved into > Documentation/arch/ > > as a way of cleaning up the top-level documentation directory and making > > the docs hierarchy more closely match the source hierarchy. > > > > Signed-off-by: Costa Shulyupin > > So this patch appears to have not been picked up, and to have received > no comments. I'll happily carry it in docs-next, but it would be nice > to have an ack from the powerpc folks...? > > Thanks, > > jon > -- Patrick: Are they laughing at us? Sponge Bob: No, Patrick, they are laughing next to us.
Re: [PATCH 0/5] s390/pci: automatic error recovery
On Mon, Sep 6, 2021 at 4:49 AM Niklas Schnelle wrote: > I believe we might be the first > implementation of PCI device recovery in a virtualized setting requiring > us to > coordinate the device reset with the hypervisor platform by issuing a > disable > and re-enable to the platform as well as starting the recovery following > a platform event. > I recall none of the details, but SRIOV is a standardized system for sharing a PCI device across multiple virtual machines. It has detailed info on what the hypervisor must do, and what the local OS instance must do to accomplish this. It's part of the PCI standard, and its more than a decade old now, maybe two. Being a part of the PCI standard, it was interoperable with error recovery, to the best of my recollection. At the time it was introduced, it got pushed very aggressively. The x86 hypervisor vendors were aiming at the heart of zseries, and were militant about it. -- Linas -- Patrick: Are they laughing at us? Sponge Bob: No, Patrick, they are laughing next to us.
Re: [PATCH] Documentation PCI: Fix typo in pci-error-recovery.rst
Signed-off-by: Linas Vepstas On Mon, May 31, 2021 at 3:12 AM Wesley Sheng wrote: > Replace "It" with "If", since it is a conditional statement. > > Signed-off-by: Wesley Sheng > --- > Documentation/PCI/pci-error-recovery.rst | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/Documentation/PCI/pci-error-recovery.rst > b/Documentation/PCI/pci-error-recovery.rst > index 84ceebb08cac..187f43a03200 100644 > --- a/Documentation/PCI/pci-error-recovery.rst > +++ b/Documentation/PCI/pci-error-recovery.rst > @@ -295,7 +295,7 @@ and let the driver restart normal I/O processing. > A driver can still return a critical failure for this function if > it can't get the device operational after reset. If the platform > previously tried a soft reset, it might now try a hard reset (power > -cycle) and then call slot_reset() again. It the device still can't > +cycle) and then call slot_reset() again. If the device still can't > be recovered, there is nothing more that can be done; the platform > will typically report a "permanent failure" in such a case. The > device will be considered "dead" in this case. > -- > 2.25.1 > > -- Patrick: Are they laughing at us? Sponge Bob: No, Patrick, they are laughing next to us.
Re: [PATCH] powerpc/eeh: Update MAINTAINERS
Hi, On 27 June 2013 21:11, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Fri, 2013-06-28 at 09:59 +0800, Gavin Shan wrote: Update MAINTAINERS to reflect recent changes. Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com --- MAINTAINERS |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 5be702c..b447392 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6146,10 +6146,14 @@ F:drivers/firmware/pcdp.* PCI ERROR RECOVERY M: +M: Gavin Shan sha...@linux.vnet.ibm.com Remove Linas, he isn't involved anymore as far as I can tell (are you ?) Not involved any more; I don't have access to equipment, don't have time, expertise is fading. L: linux-...@vger.kernel.org +L: linuxppc-dev@lists.ozlabs.org S: Supported F: Documentation/PCI/pci-error-recovery.txt F: Documentation/powerpc/eeh-pci-error-recovery.txt +F: arch/powerpc/kernel/eeh*.c +F: drivers/pci/pcie/aer/ Not sure about the AER code. You are not maintaining *that* at least :-) Maybe we should split EEH from the rest ? Based on recent discussions (a month ago?) regarding AER, its clear that at least some of the AER code is mis-designed, and that some of the patches being submitted against it were making things worse. I suggest keeping an eye on that ... the problem is that both AER and EEH share a common framework in the PCI subsystem. As bugs in AER get discovered, there's a chance that someone will submit a patch to the common framework, or possibly start modifying assorted drivers, which will then break EEH ... so I don't think it is wise/safe to ignore AER. (The point is that AER and EEH really should work exactly the same; they differ merely by how they talk to the root port). -- Linas ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH RFC] Simplify the Linux kernel by reducing its state space
Hi, I didn't actually try to compile the patch below; it didn't look like C code so I wasn't sure what compiler to run it through. I guess maybe its python? However, I'm very sure that the patches are completely correct, because I read them, and I also know Paul. And I've heard of Thomas Gleixner. Thus, please add my ack -- Ack'ed by: Linas Vepstas linasveps...@gmail.com On Sun, Apr 01, 2012 at 12:33:21AM +0800, Paul E. McKenney was heard to remark: Although there have been numerous complaints about the complexity of parallel programming (especially over the past 5-10 years), the plain truth is that the incremental complexity of parallel programming over that of sequential programming is not as large as is commonly believed. Despite that you might have heard, the mind-numbing complexity of modern computer systems is not due so much to there being multiple CPUs, but rather to there being any CPUs at all. In short, for the ultimate in computer-system simplicity, the optimal choice is NR_CPUS=0. This commit therefore limits kernel builds to zero CPUs. This change has the beneficial side effect of rendering all kernel bugs harmless. Furthermore, this commit enables additional beneficial changes, for example, the removal of those parts of the kernel that are not needed when there are zero CPUs. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Reviewed-by: Thomas Gleixner t...@linutronix.de --- alpha/Kconfig | 11 ++- arm/Kconfig |6 +++--- blackfin/Kconfig |3 ++- hexagon/Kconfig |9 + ia64/Kconfig |9 + m32r/Kconfig | 10 ++ mips/Kconfig | 21 +++-- mn10300/Kconfig |3 ++- parisc/Kconfig|6 +++--- powerpc/platforms/Kconfig.cputype |8 s390/Kconfig | 12 +++- sh/Kconfig| 11 ++- sparc/Kconfig |8 tile/Kconfig |9 + x86/Kconfig | 16 +--- 15 files changed, 78 insertions(+), 64 deletions(-) diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index 56a4df9..1766b4a 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -541,14 +541,15 @@ config HAVE_DEC_LOCK default y config NR_CPUS - int Maximum number of CPUs (2-32) - range 2 32 + int Maximum number of CPUs (0-0) + range 0 0 depends on SMP - default 32 if ALPHA_GENERIC || ALPHA_MARVEL - default 4 if !ALPHA_GENERIC !ALPHA_MARVEL + default 0 if ALPHA_GENERIC || ALPHA_MARVEL + default 0 if !ALPHA_GENERIC !ALPHA_MARVEL help MARVEL support can handle a maximum of 32 CPUs, all the others - with working support have a maximum of 4 CPUs. + with working support have a maximum of 4 CPUs. But why take + chances? Just stick with zero CPUs. config ARCH_DISCONTIGMEM_ENABLE bool Discontiguous Memory Support (EXPERIMENTAL) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index a48aecc..1f07a3a 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1551,10 +1551,10 @@ config PAGE_OFFSET default 0xC000 config NR_CPUS - int Maximum number of CPUs (2-32) - range 2 32 + int Maximum number of CPUs (0-0) + range 0 0 depends on SMP - default 4 + default 0 config HOTPLUG_CPU bool Support for hot-pluggable CPUs (EXPERIMENTAL) diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig index abe5a9e..6a78549 100644 --- a/arch/blackfin/Kconfig +++ b/arch/blackfin/Kconfig @@ -241,7 +241,8 @@ config SMP config NR_CPUS int depends on SMP - default 2 if BF561 + range 0 0 + default 0 if BF561 config HOTPLUG_CPU bool Support for hot-pluggable CPUs diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig index 9059e39..daab009 100644 --- a/arch/hexagon/Kconfig +++ b/arch/hexagon/Kconfig @@ -158,13 +158,14 @@ config SMP config NR_CPUS int Maximum number of CPUs if SMP - range 2 6 if SMP - default 1 if !SMP - default 6 if SMP + range 0 0 if SMP + default 0 if !SMP + default 0 if SMP ---help--- This allows you to specify the maximum number of CPUs which this kernel will support. The maximum supported value is 6 and the - minimum value which makes sense is 2. + minimum value which makes sense is 2. But a limit of zero is + so much safer! This is purely to save memory - each supported CPU adds approximately eight kilobytes to the kernel image. diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index bd72669..fea0e6d 100644 --- a/arch/ia64/Kconfig
Re: [PATCH RFC] Simplify the Linux kernel by reducing its state space
Hi, I didn't actually try to compile the patch below; it didn't look like C code so I wasn't sure what compiler to run it through. I guess maybe its python? However, I'm very sure that the patches are completely correct, because I read them, and I also know that Paul is a trustworthy programmer. Thus, please add my ack Ack'ed by: Linas Vepstas linasveps...@gmail.com On 31 March 2012 11:33, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: Although there have been numerous complaints about the complexity of parallel programming (especially over the past 5-10 years), the plain truth is that the incremental complexity of parallel programming over that of sequential programming is not as large as is commonly believed. Despite that you might have heard, the mind-numbing complexity of modern computer systems is not due so much to there being multiple CPUs, but rather to there being any CPUs at all. In short, for the ultimate in computer-system simplicity, the optimal choice is NR_CPUS=0. This commit therefore limits kernel builds to zero CPUs. This change has the beneficial side effect of rendering all kernel bugs harmless. Furthermore, this commit enables additional beneficial changes, for example, the removal of those parts of the kernel that are not needed when there are zero CPUs. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Reviewed-by: Thomas Gleixner t...@linutronix.de --- alpha/Kconfig | 11 ++- arm/Kconfig |6 +++--- blackfin/Kconfig |3 ++- hexagon/Kconfig |9 + ia64/Kconfig |9 + m32r/Kconfig | 10 ++ mips/Kconfig | 21 +++-- mn10300/Kconfig |3 ++- parisc/Kconfig|6 +++--- powerpc/platforms/Kconfig.cputype |8 s390/Kconfig | 12 +++- sh/Kconfig| 11 ++- sparc/Kconfig |8 tile/Kconfig |9 + x86/Kconfig | 16 +--- 15 files changed, 78 insertions(+), 64 deletions(-) diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index 56a4df9..1766b4a 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -541,14 +541,15 @@ config HAVE_DEC_LOCK default y config NR_CPUS - int Maximum number of CPUs (2-32) - range 2 32 + int Maximum number of CPUs (0-0) + range 0 0 depends on SMP - default 32 if ALPHA_GENERIC || ALPHA_MARVEL - default 4 if !ALPHA_GENERIC !ALPHA_MARVEL + default 0 if ALPHA_GENERIC || ALPHA_MARVEL + default 0 if !ALPHA_GENERIC !ALPHA_MARVEL help MARVEL support can handle a maximum of 32 CPUs, all the others - with working support have a maximum of 4 CPUs. + with working support have a maximum of 4 CPUs. But why take + chances? Just stick with zero CPUs. config ARCH_DISCONTIGMEM_ENABLE bool Discontiguous Memory Support (EXPERIMENTAL) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index a48aecc..1f07a3a 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1551,10 +1551,10 @@ config PAGE_OFFSET default 0xC000 config NR_CPUS - int Maximum number of CPUs (2-32) - range 2 32 + int Maximum number of CPUs (0-0) + range 0 0 depends on SMP - default 4 + default 0 config HOTPLUG_CPU bool Support for hot-pluggable CPUs (EXPERIMENTAL) diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig index abe5a9e..6a78549 100644 --- a/arch/blackfin/Kconfig +++ b/arch/blackfin/Kconfig @@ -241,7 +241,8 @@ config SMP config NR_CPUS int depends on SMP - default 2 if BF561 + range 0 0 + default 0 if BF561 config HOTPLUG_CPU bool Support for hot-pluggable CPUs diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig index 9059e39..daab009 100644 --- a/arch/hexagon/Kconfig +++ b/arch/hexagon/Kconfig @@ -158,13 +158,14 @@ config SMP config NR_CPUS int Maximum number of CPUs if SMP - range 2 6 if SMP - default 1 if !SMP - default 6 if SMP + range 0 0 if SMP + default 0 if !SMP + default 0 if SMP ---help--- This allows you to specify the maximum number of CPUs which this kernel will support. The maximum supported value is 6 and the - minimum value which makes sense is 2. + minimum value which makes sense is 2. But a limit of zero is + so much safer! This is purely to save memory - each supported CPU adds approximately eight kilobytes to the kernel image. diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index bd72669
Re: [RFC PATCH] ppc: don't override CONFIG_PPC_PSERIES_DEBUG
On 14 October 2010 12:48, Nishanth Aravamudan n...@us.ibm.com wrote: These files undef DEBUG, but I think they were added before the ability to control this from Kconfig. Right. It's really annoying to only get some of the debug messages! I don't get the big picture. Will there be some CONFIG_DEBUG_EEH in Kconfig? or just some option to turn on DEBUG for all powerpc-related files? Or maybe I am demonstrating my utter ignorance of some new whiz-bang Kconfig technology? Anyway, I see no harm in the EEH portion of the patch. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] pseries: don't override CONFIG_PPC_PSERIES_DEBUG
On 14 October 2010 19:48, Nishanth Aravamudan n...@us.ibm.com wrote: eeh and pci_dlpar #undef DEBUG, but I think they were added before the ability to control this from Kconfig. It's really annoying to only get some of the debug messages from these files. Leave the lpar.c #undef alone as it produces so much output as to make the kernel unusable. Update the Kconfig text to indicate this particular quirk :) Signed-off-by: Nishanth Aravamudan n...@us.ibm.com OK, ignore my last email. Acked by: Linas Vepstas linasveps...@gmail.com --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -47,6 +47,12 @@ config LPARCFG config PPC_PSERIES_DEBUG depends on PPC_PSERIES PPC_EARLY_DEBUG bool Enable extra debug logging in platforms/pseries + help + Say Y here if you want the pseries core to produce a bunch of + debug messages to the system log. Select this if you are having a + problem with the pseries core and want to see more of what is + going on. This does not enable debugging in lpar.c, which must + be manually done due to its verbosity. default y Umm, I see default y and you are not changing this but ... default y ?? Really? Also, I am guessing that the lpar spam is due only to a handful of printk's, while most of the rest will be infrequent. Just knock out the high-frequency ones... --linas ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 01/15] ppc: fix return type of BUID_{HI,LO} macros
Acked-by: Linas Vepstas linasveps...@gmail.com I'm guessing this worked up til now because the rtas_call function prototype was telling compiler to cast these to 32-bit before passing them as args. (and since these would still get passed as one arg per 64-bit reg, it still wouldn't go wrong.) What I'm wondering about is why there was no compiler warning about an implicit cast of a 64-bit int to a 32-bit int? Surely, this is something that should be warned about! -- Linas On 15 September 2010 13:13, Nishanth Aravamudan n...@us.ibm.com wrote: BUID_HI and BUID_LO are used to pass data to call_rtas, which expects ints or u32s. But the macro doesn't cast the return, so the result is still u64. Use the upper_32_bits and lower_32_bits macros that have been added to kernel.h. Found by getting printf format errors trying to debug print the args, no actual code change for 64 bit kernels where the macros are actually used. Signed-off-by: Milton Miller milt...@bga.com Signed-off-by: Nishanth Aravamudan n...@us.ibm.com --- arch/powerpc/include/asm/ppc-pci.h | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h index 42fdff0..43268f1 100644 --- a/arch/powerpc/include/asm/ppc-pci.h +++ b/arch/powerpc/include/asm/ppc-pci.h @@ -28,8 +28,8 @@ extern void find_and_init_phbs(void); extern struct pci_dev *isa_bridge_pcidev; /* may be NULL if no ISA bus */ /** Bus Unit ID macros; get low and hi 32-bits of the 64-bit BUID */ -#define BUID_HI(buid) ((buid) 32) -#define BUID_LO(buid) ((buid) 0x) +#define BUID_HI(buid) upper_32_bits(buid) +#define BUID_LO(buid) lower_32_bits(buid) /* PCI device_node operations */ struct device_node; -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: eeh: Fix oops when probing in early boot
On 10 May 2010 20:38, Anton Blanchard an...@samba.org wrote: If we take an EEH early enough, we oops: Call Trace: [c00010483770] [c0013ee4] .show_stack+0xd8/0x218 (unreliable) [c00010483850] [c0658940] .dump_stack+0x28/0x3c [c000104838d0] [c0057a68] .eeh_dn_check_failure+0x2b8/0x304 [c00010483990] [c00259c8] .rtas_read_config+0x120/0x168 [c00010483a40] [c0025af4] .rtas_pci_read_config+0xe4/0x124 [c00010483af0] [c037af18] .pci_bus_read_config_word+0xac/0x104 [c00010483bc0] [c08fec98] .pcibios_allocate_resources+0x7c/0x220 [c00010483c90] [c08feed8] .pcibios_resource_survey+0x9c/0x418 [c00010483d80] [c08fea10] .pcibios_init+0xbc/0xf4 [c00010483e20] [c0009844] .do_one_initcall+0x98/0x1d8 [c00010483ed0] [c08f0560] .kernel_init+0x228/0x2e8 [c00010483f90] [c0031a08] .kernel_thread+0x54/0x70 EEH: Detected PCI bus error on device null EEH: This PCI device has failed 1 times in the last hour: EEH: location=U78A5.001.WIH8464-P1 driver= pci addr=0001:00:01.0 EEH: of node=/p...@8002209/u...@1 EEH: PCI device/vendor: 00351033 EEH: PCI cmd/status register: 12100146 Unable to handle kernel paging request for data at address 0x0468 Oops: Kernel access of bad area, sig: 11 [#1] NIP [c0057610] .rtas_set_slot_reset+0x38/0x10c LR [c0058724] .eeh_reset_device+0x5c/0x124 Call Trace: [cbc6bd00] [c005a0e0] .pcibios_remove_pci_devices+0x7c/0xb0 (unreliable) [cbc6bd90] [c0058724] .eeh_reset_device+0x5c/0x124 [cbc6be40] [c00589c0] .handle_eeh_events+0x1d4/0x39c [cbc6bf00] [c0059124] .eeh_event_handler+0xf0/0x188 [cbc6bf90] [c0031a08] .kernel_thread+0x54/0x70 We called rtas_set_slot_reset while scanning the bus and before the pci_dn to pcidev mapping has been created. Since we only need the pcidev to work out the type of reset and that only gets set after the module for the device loads, lets just do a hot reset if the pcidev is NULL. Signed-off-by: Anton Blanchard an...@samba.org --- Acked-by: Linas Vepstas linasveps...@gmail.com I'm cc'ing Brian King, he's the one who figured out the proper fix for a hot-reset/fundamental-reset hardware feature that added this line of code. The question is -- when the system finishes booting, and the module finally loads, will the device be found in a usable state and/or will it automatically reset to a usable state? --linas Index: linux-2.6/arch/powerpc/platforms/pseries/eeh.c === --- linux-2.6.orig/arch/powerpc/platforms/pseries/eeh.c 2010-05-10 17:25:10.703453565 +1000 +++ linux-2.6/arch/powerpc/platforms/pseries/eeh.c 2010-05-10 17:25:24.034323030 +1000 @@ -749,7 +749,7 @@ static void __rtas_set_slot_reset(struct /* Determine type of EEH reset required by device, * default hot reset or fundamental reset */ - if (dev-needs_freset) + if (dev dev-needs_freset) rtas_pci_slot_reset(pdn, 3); else rtas_pci_slot_reset(pdn, 1); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] eeh: Fixing a bug when pci structure is null
Hi Paul, Breno, Some confusion -- I've been out of the loop for a while -- I assume its still Paul who is pushing these patches upstream, and not Ben? So Breno, maybe you should resend the patch to Paul? --linas On 19 February 2010 10:43, Breno Leitao lei...@linux.vnet.ibm.com wrote: Hi Ben, I'd like to ask about this patch ? Should I re-submit ? Thanks, Breno Leitao wrote: During a EEH recover, the pci_dev structure can be null, mainly if an eeh event is detected during cpi config operation. In this case, the pci_dev will not be known (and will be null) the kernel will crash with the following message: ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] eeh: fixing pci_dev dependency
On 28 January 2010 18:04, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Wed, 2010-01-27 at 12:43 -0600, lei...@linux.vnet.ibm.com wrote: Currently pci_dev can be null when EEH is in action. This patch just assure that we pci_dev is not NULL before calling pci_dev_put. Like all variants of *_put(), it already checks for a NULL argument afaik. So that patch should be unnecessary. Ah, OK, I paniced when I saw that and assumed the worst --linas ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] Support for PCI Express reset type
Hi Andi, 2009/7/31 Andi Kleen a...@firstfloor.org: Mike Mason mm...@us.ibm.com writes: These patches supersede the previously submitted patch that implemented a fundamental reset bit field. Please review and let me know of any concerns. Any plans to implement that for x86 too? Right now it seems to be a PPC specific hack. I've found the PCIE chipsepc somewhat daunting, but was under the impression that much if not most of what was needed was specified there. See, for example: Documentation/PCI/pcieaer-howto.txt which states: ||| The PCI Express Advanced Error Reporting Driver Guide HOWTO |||T. Long Nguyen tom.l.ngu...@intel.com |||Yanmin Zhangyanmin.zh...@intel.com |||07/29/2006 [..] ||| The PCI Express AER driver provides the infrastructure to support PCI ||| Express Advanced Error Reporting capability. The PCI Express AER ||| driver provides three basic functions: ||| ||| - Gathers the comprehensive error information if errors occurred. ||| - Reports error to the users. ||| - Performs error recovery actions. I presume the last bullet point means that the AER code works and actually does more or less the same thing as the PPC EEH code, but in a more architecture-independent way, as it only assumes that PCI AER is there (and is correctly implemented in the CPI chipset) The AER code uses the same core infrastructure as the EEH code, at the time, I did exchange emails w/ the above authors discussing this stuff. As to whether the x86 server vendors are actually selling something with AER in it, and whether any of them are actually testing this stuff is unclear. FWIW IBM has pretty much no incentive to lobby other server vendors to get on the ball ...as this is viewed as one of those things that lets IBM charge premium prices for PPC hardware. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] Support for PCI Express reset type
2009/7/30 Mike Mason mm...@us.ibm.com: This is the first of three patches that implement a bit field that PCI Express device drivers can use to indicate they need a fundamental reset during error recovery. By default, the EEH framework on powerpc does what's known as a hot reset during recovery of a PCI Express device. We've found a case where the device needs a fundamental reset to recover properly. The current PCI error recovery and EEH frameworks do not support this distinction. The attached patch (courtesy of Richard Lary) adds a bit field to pci_dev that indicates whether the device requires a fundamental reset during recovery. These patches supersede the previously submitted patch that implemented a fundamental reset bit field. Please review and let me know of any concerns. Signed-off-by: Mike Mason mm...@us.ibm.com Signed-off-by: Richard Lary rl...@us.ibm.com Signed-off-by: Linas Vepstas linasveps...@gmail.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/3] Support for PCI Express reset type
2009/7/30 Mike Mason mm...@us.ibm.com: This is the second of three patches that implement a bit field that PCI Express device drivers can use to indicate they need a fundamental reset during error recovery. By default, the EEH framework on powerpc does what's known as a hot reset during recovery of a PCI Express device. We've found a case where the device needs a fundamental reset to recover properly. The current PCI error recovery and EEH frameworks do not support this distinction. The attached patch updates the Documentation/PCI/pci-error-recovery.txt file with changes related to this new bit field, as well a few unrelated updates. These patches supersede the previously submitted patch that implemented a fundamental reset bit field. Please review and let me know of any concerns. Signed-off-by: Mike Mason mm...@us.ibm.com Signed-off-by: Richard Lary rl...@us.ibm.com FWIW, Signed-off-by: Linas Vepstas linasveps...@gmail.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/3] Support for PCI Express reset type
2009/7/30 Mike Mason mm...@us.ibm.com: This is the third of three patches that implement a bit field that PCI Express device drivers can use to indicate they need a fundamental reset during error recovery. By default, the EEH framework on powerpc does what's known as a hot reset during recovery of a PCI Express device. We've found a case where the device needs a fundamental reset to recover properly. The current PCI error recovery and EEH frameworks do not support this distinction. The attached patch makes changes to EEH to utilize the new bit field. These patches supersede the previously submitted patch that implemented a fundamental reset bit field. Please review and let me know of any concerns. Signed-off-by: Mike Mason mm...@us.ibm.com Signed-off-by: Richard Lary rl...@us.ibm.com Signed-off-by: Linas Vepstas linasveps...@gmail.com + /* Determine type of EEH reset required by device, + * default hot reset or fundamental reset + */ + if (dev-needs_freset) + rtas_pci_slot_reset(pdn, 3); + else + rtas_pci_slot_reset(pdn, 1); Gack! I remember deluges of emails and conference calls where the hardware guys went on about this; and I admit I didn't quite get it, which I guess is why this patch is showing up many years late. FWIW some of the variants of the IPR chipset almost surely need the freset bit set. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] Support for PCI Express reset type in EEH
2009/7/24 Richard Lary rl...@us.ibm.com: Linas Vepstas linasveps...@gmail.com wrote on 07/23/2009 07:44:33 AM: 2009/7/15 Mike Mason mm...@us.ibm.com: By default, EEH does what's known as a hot reset during error recovery of a PCI Express device. We've found a case where the device needs a fundamental reset to recover properly. The current PCI error recovery and EEH frameworks do not support this distinction. The attached patch (courtesy of Richard Lary) adds a bit field to pci_dev that indicates whether the device requires a fundamental reset during error recovery. This bit can be checked by EEH to determine which reset type is required. This patch supersedes the previously submitted patch that implemented a reset type callback. Please review and let me know of any concerns. I like this patch a *lot* better .. it is vastly simpler, more direct. diff -uNrp a/include/linux/pci.h b/include/linux/pci.h --- a/include/linux/pci.h 2009-07-13 14:25:37.0 -0700 +++ b/include/linux/pci.h 2009-07-15 10:25:37.0 -0700 @@ -273,6 +273,7 @@ struct pci_dev { unsigned int ari_enabled:1; /* ARI forwarding */ unsigned int is_managed:1; unsigned int is_pcie:1; + unsigned int fndmntl_rst_rqd:1; /* Dev requires fundamental reset */ unsigned int state_saved:1; unsigned int is_physfn:1; unsigned int is_virtfn:1; As Ben points out, the name is awkward. How about needs_freset ? I am OK with name change. Since this affects the entire pci subsystem, it should be documented properly. The pci error recovery subsystem was designed to be usable in other architectures, and so the error recovery docs should take at least a paragraph to describe what this flag means, and when its supposed to be used. I will update the documentation, are you referring to Documentation/powerpc/eeh-pci-error-recovery.txt or some other documentation? No, I'm thinking Documentation/PCI/pci-error-recovery.txt because the flag is not powerpc-specific. --linas Providing the docs patch together with the pci.h patch *only* would probably simplify acceptance by the PCI community. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] Hold reference to device_node during EEH event handling
2009/7/16 Michael Ellerman mich...@ellerman.id.au: On Thu, 2009-07-16 at 09:33 -0700, Mike Mason wrote: Michael Ellerman wrote: On Wed, 2009-07-15 at 14:43 -0700, Mike Mason wrote: This patch increments the device_node reference counter when an EEH error occurs and decrements the counter when the event has been handled. This is to prevent the device_node from being released until eeh_event_handler() has had a chance to deal with the event. We've seen cases where the device_node is released too soon when an EEH event occurs during a dlpar remove, causing the event handler to attempt to access bad memory locations. Please review and let me know of any concerns. Taking a reference sounds sane, but ... Signed-off-by: Mike Mason mm...@us.ibm.com --- a/arch/powerpc/platforms/pseries/eeh_event.c 2008-10-09 15:13:53.0 -0700 +++ b/arch/powerpc/platforms/pseries/eeh_event.c 2009-07-14 14:14:00.0 -0700 @@ -75,6 +75,14 @@ static int eeh_event_handler(void * dumm if (event == NULL) return 0; + /* EEH holds a reference to the device_node, so if it + * equals 1 it's no longer valid and the event should + * be ignored */ + if (atomic_read(event-dn-kref.refcount) == 1) { + of_node_put(event-dn); + return 0; + } That's really gross :) Agreed. I'll look for another way to determine if device is gone and the event should be ignored. Suggestions are welcome :-) Benh and I had a quick chat about it, and were wondering whether what you really should be doing is taking a reference to the pci device (perhaps as well as the device node). @@ -140,7 +149,7 @@ int eeh_send_failure_event (struct devic if (dev) pci_dev_get(dev); - event-dn = dn; + event-dn = of_node_get(dn); event-dev = dev; pci devs are refcounted too, see pci_dev_get(), so taking a reference there would be the right thing to do - otherwise there's no guarantee it still exists later, unless there's some other trick in the EEH code. I thought that the eeh code did pci gets and puts in the right locations, perhaps I (incorrectly) assumed that this meant that the of_dn use count never dropped to zero ... I think my logic was: -- pci device init does of_node_get -- pci device shutdown does of_node_put -- pci device shutdown can never run as long as pci use count is 0 Thus, explicit of_node_get was usually not needed. So, for example, see above: I was figuring that the pci_dev_get(dev); was enough to protect the dn too .. although maybe if dev is null, then things go wrong ... --linas ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] Support for PCI Express reset type in EEH
2009/7/15 Mike Mason mm...@us.ibm.com: By default, EEH does what's known as a hot reset during error recovery of a PCI Express device. We've found a case where the device needs a fundamental reset to recover properly. The current PCI error recovery and EEH frameworks do not support this distinction. The attached patch (courtesy of Richard Lary) adds a bit field to pci_dev that indicates whether the device requires a fundamental reset during error recovery. This bit can be checked by EEH to determine which reset type is required. This patch supersedes the previously submitted patch that implemented a reset type callback. Please review and let me know of any concerns. I like this patch a *lot* better .. it is vastly simpler, more direct. diff -uNrp a/include/linux/pci.h b/include/linux/pci.h --- a/include/linux/pci.h 2009-07-13 14:25:37.0 -0700 +++ b/include/linux/pci.h 2009-07-15 10:25:37.0 -0700 @@ -273,6 +273,7 @@ struct pci_dev { unsigned int ari_enabled:1; /* ARI forwarding */ unsigned int is_managed:1; unsigned int is_pcie:1; + unsigned int fndmntl_rst_rqd:1; /* Dev requires fundamental reset */ unsigned int state_saved:1; unsigned int is_physfn:1; unsigned int is_virtfn:1; As Ben points out, the name is awkward. How about needs_freset ? Since this affects the entire pci subsystem, it should be documented properly. The pci error recovery subsystem was designed to be usable in other architectures, and so the error recovery docs should take at least a paragraph to describe what this flag means, and when its supposed to be used. Providing the docs patch together with the pci.h patch *only* would probably simplify acceptance by the PCI community. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] Set error_state to pci_channel_io_normal in eeh_report_reset()
Hi, 2009/4/10 Mike Mason mm...@us.ibm.com: While adding native EEH support to Emulex and Qlogic drivers, it was discovered that dev-error_state was set to pci_io_channel_normal too late in the recovery process. These drivers rely on error_state to determine if they can access the device in their slot_reset callback, thus error_state needs to be set to pci_io_channel_norm in eeh_report_reset(). Below is a detailed explanation (courtesy of Richard Lary) as to why this is necessary. Background: PCI MMIO or DMA accesses to a frozen slot generate additional EEH errors. If the number of additional EEH errors exceeds EEH_MAX_FAILS the adapter will be shutdown. To avoid triggering excessive EEH errors and an undesirable adapter shutdown, some drivers use the pci_channel_offline(dev) wrapper function to return a Boolean value based on the value of pci_dev-error_state to determine if PCI MMIO or DMA accesses are safe. If the wrapper returns TRUE, drivers must not make PCI MMIO or DMA access to their hardware. The pci_dev structure member error_state reflects one of three values, 1) pci_channel_io_normal, 2) pci_channel_io_frozen, 3) pci_channel_io_perm_failure. Function pci_channel_offline(dev) returns TRUE if error_state is pci_channel_io_frozen or pci_channel_io_perm_failure. The EEH driver sets pci_dev-error_state to pci_channel_io_frozen at the point where the PCI slot is frozen. Currently, the EEH driver restores dev-error_state to pci_channel_io_normal in eeh_report_resume() before calling the driver's resume callback. However, when the EEH driver calls the driver's slot_reset callback() from eeh_report_reset(), it incorrectly indicates the error state is still pci_channel_io_frozen. Waiting until eeh_report_resume() to restore dev-error_state to pci_channel_io_normal is too late for Emulex and QLogic FC drivers and any other drivers which are designed to use common code paths in these two cases: i) those called after the driver's slot_reset callback() and ii) those called after the PCI slot is frozen but before the driver's slot_reset callback is called. Case i) all driver paths executed to reinitialize the hardware after a reset and case ii) all code paths executed by driver kernel threads that run asynchronous to the main driver thread, such as interrupt handlers and worker threads to process driver work queues. Emulex and QLogic FC drivers are designed with common code paths which require that pci_channel_offline(dev) reflect the true state of the hardware. The state transitions that the hardware takes from Normal Operations to Slot Frozen to Reset to Normal Operations are documented in the Power Architecture™ Platform Requirements+ (PAPR+) in Table 75. PE State Control. PAPR defines the following 3 states: 0 -- Not reset, Not EEH stopped, MMIO load/store allowed, DMA allowed (Normal Operations) 1 -- Reset, Not EEH stopped, MMIO load/store disabled, DMA disabled 2 -- Not reset, EEH stopped, MMIO load/store disabled, DMA disabled (Slot Frozen) An EEH error places the slot in state 2 (Frozen) and the adapter driver is notified that an EEH error was detected. If the adapter driver returns PCI_ERS_RESULT_NEED_RESET, the EEH driver calls eeh_reset_device() to place the slot into state 1 (Reset) and eeh_reset_device completes by placing the slot into State 0 (Normal Operations). Upon return from eeh_reset_device(), the EEH driver calls eeh_report_reset, which then calls the adapter's slot_reset callback. At the time the adapter's slot_reset callback is called, the true state of the hardware is Normal Operations and should be accurately reflected by setting dev-error_state to pci_channel_io_normal. The current implementation of EEH driver does not do so and requires the following patch to correct this deficiency. Signed-off-by: Mike Mason mm...@us.ibm.com Yes, the analysis sounds correct; this looks like the right thing to do. I'm rather surprised, as this is an obvious bug, and should have been long gone. I thought that Emulex, QLogic were not the only ones that were using pci_channel_offline(dev) in this fashion. I thought that the symbios scsi and the e1000 did, too ... Hmm. Perhaps these used their own, private flag for the same purpose, and reset it at the earlier, correct time. Thanks for the fix! Signed-off-by: Linas Vepstas linasveps...@gmail.com ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Only disable/enable LSI interrupts in EEH
2009/2/9 Mike Mason mm...@us.ibm.com: The EEH code disables and enables interrupts during the device recovery process. This is unnecessary for MSI and MSI-X interrupts because they are effectively disabled by the DMA Stopped state when an EEH error occurs. The current code is also incorrect for MSI-X interrupts. It doesn't take into account that MSI-X interrupts are tracked in a different way than LSI/MSI interrupts. This patch ensures only LSI interrupts are disabled/enabled. The patch also includes a couple minor formatting fixes. Signed-off-by: Mike Mason mm...@us.ibm.com Looks good to me. Acked-by: Linas Vepstas linasveps...@gmail.com On a somewhat-related note: there was an issue (I forget the details) where the kernel needed to shadow some sort of MSI state so that it could be correctly, um, kept-track-of, after an EEH reset (it didn't need to be restored, because firmware did this(?)). After some digging around and discussion, we concluded that some generic PPC MSI code needed to be altered to track this state, and/or the main kernel MSI code needed to be changed to (not?) track this state. Mike Ellerman seemed to best grasp this area ... was this ever fixed? Or perhaps this is an alternate fix for that bug? It may well have been that calling the MSI disable triggered the problem, I don't remember now. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Only disable/enable LSI interrupts in EEH
2009/2/10 Michael Ellerman mich...@ellerman.id.au: On Tue, 2009-02-10 at 11:14 -0600, Linas Vepstas wrote: On a somewhat-related note: there was an issue (I forget the details) where the kernel needed to shadow some sort of MSI state so that it could be correctly, um, kept-track-of, after an EEH reset (it didn't need to be restored, because firmware did this(?)). After some digging around and discussion, we concluded that some generic PPC MSI code needed to be altered to track this state, and/or the main kernel MSI code needed to be changed to (not?) track this state. Mike Ellerman seemed to best grasp this area ... was this ever fixed? Or perhaps this is an alternate fix for that bug? It may well have been that calling the MSI disable triggered the problem, I don't remember now. I'm pretty sure you're referring to this patch, which you acked :) http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1db3e890aed3ac39cded30d6e94618bda086f7ce I don't know of anything else that fits your description? Yes, that's the one. I wasn't sure if it ever made it in or not, and I just wanted to make sure it wasn't what was biting you. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Restore PERR/SERR bit settings during EEH device recovery
2008/7/7 Mike Mason [EMAIL PROTECTED]: The following patch restores the PERR and SERR bits in the PCI command register during an EEH device recovery. We have found at least one case (an Agilent test card) where the PERR/SERR bits are set to 1 by firmware at boot time, but are not restored to 1 during EEH recovery. Any chance they should be zero, and were accidentally set to 1? In which case, you'd need an else clause, below. The patch fixes the Agilent card problem. It has been tested on several other EEH-enabled cards with no regressions. Signed-off-by: Mike Mason [EMAIL PROTECTED] --- linux-2.6.26-rc9/arch/powerpc/platforms/pseries/eeh.c 2008-07-07 16:06:57.0 -0700 +++ linux-2.6.26-rc9-new/arch/powerpc/platforms/pseries/eeh.c 2008-07-07 16:11:10.0 -0700 @@ -812,6 +812,7 @@ static inline void __restore_bars (struct pci_dn *pdn) { int i; + u32 cmd; if (NULL==pdn-phb) return; for (i=4; i10; i++) { @@ -832,6 +833,15 @@ /* max latency, min grant, interrupt pin and line */ rtas_write_config(pdn, 15*4, 4, pdn-config_space[15]); + + /* Restore PERR SERR bits, some devices require it, + don't touch the other command bits */ + rtas_read_config(pdn, PCI_COMMAND, 4, cmd); + if (pdn-config_space[1] PCI_COMMAND_PARITY) + cmd |= PCI_COMMAND_PARITY; else cmd = ~PCI_COMMAND_PARITY; + if (pdn-config_space[1] PCI_COMMAND_SERR) + cmd |= PCI_COMMAND_SERR; else cmd = ~PCI_COMMAND_SERR; + rtas_write_config(pdn, PCI_COMMAND, 4, cmd); } Other than that, I'll add an Acked-by: Linas Vepstas [EMAIL PROTECTED] --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] pseries: phyp dump: Variable size reserve space.
On 07/04/2008, Manish Ahuja [EMAIL PROTECTED] wrote: A small proposed change in the amount of reserve space we allocate during boot. Currently we reserve 256MB only. The proposed change does one of the 3 things. A. It checks to see if there is cmdline variable set and if found sets the value to it. OR B. It computes 5% of total ram and rounds it down to multiples of 256MB. AND C. Compares the rounded down value and returns larger of two values, the new computed value or 256MB. Again this is for large systems who have excess memory. [...] early_param(phyp_dump, early_phyp_dump_enabled); I'm pretty sure you will want to document this boot param in the documentation, as well as add a few words about why it might be interesting to users (i.e. that its for large systems...) --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
On 11/03/2008, Paul Mackerras [EMAIL PROTECTED] wrote: -- This line needs to be exactly 3 dashes, because otherwise the tools include the diffstat into the commit message. Putting 4 or more dashes was an annoying habit Linas had, and it means I have to fix it manually (usually after I have committed the patches, and then notice that the commit message has the extra stuff in it, so I have to go back and fix the separators, reset my tree and re-commit the patches.) Sorry, I had no idea! If I didn't have enough dashes, then quilt would sometimes wipe out the comment at the top, so paranoia made me add lots of dashes. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
On 10/03/2008, Michael Ellerman [EMAIL PROTECTED] wrote: On Thu, 2008-02-28 at 18:24 -0600, Manish Ahuja wrote: + +/* Global, used to communicate data between early boot and late boot */ +static struct phyp_dump phyp_dump_global; +struct phyp_dump *phyp_dump_info = phyp_dump_global; I don't see the point of this. You have a static (ie. non-global) struct called phyp_dump_global, then you create a pointer to it and pass that around. I did this. This is a style used to minimize disruption due to future design changes. Basically, the idea is that, at some later time, for some unknown reason, we decide that this structure shouldn't be global, or maybe shouldn't be statically allocated, or maybe should be per-cpu, or who knows. By creating a pointer, and just passing that around, you isolate other code from this change. I learned this trick after spending too many months of my life hunting down globals and replacing them by dynamically allocated structs. Its a long and painful process, on many levels, often requiring major code restructuring. Code that touches globals directly is often poorly thought out, designed. But going in the opposite direction is easy: if your code always passes everything it needs as args to subroutines, then you are free clear ... if one of those args just happens to be a pointer to a global, there's no loss (not even a performance loss -- the arg passing overhead is about the same as a global TOC lookup!) So it may look weird if you're not used to seeing it; but the alternative is almost always worse. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
On 14/02/2008, Tony Breeds [EMAIL PROTECTED] wrote: On Tue, Feb 12, 2008 at 01:11:58AM -0600, Manish Ahuja wrote: +static ssize_t +show_release_region(struct kset * kset, char *buf) +{ + return sprintf(buf, ola\n); +} + +static struct subsys_attribute rr = __ATTR(release_region, 0600, + show_release_region, + store_release_region); Any reason this sysfs attribute can't be write only? The show method doesn't seem needed. This was supposed to be a place-holder; a later patch would add detailed info. The goal was to have user-land tools that would operate these files to progressively dump and release memory regions; however, until these userland tools get written, the proper interface remains murky (e.g. real addresses? virtual addresses? just delta's or a whole memory map? some sort of numa flags or whatever?) --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/8] pseries: phyp dump: Docmentation
On 13/01/2008, Olof Johansson [EMAIL PROTECTED] wrote: How do you expect to have it in full production if you don't have all resources available for it? It's not until the dump has finished that you can return all memory to the production environment and use it. With the PHYP dump, each chunk of RAM is returned for general use immediately after being dumped; so its not an all-or-nothing proposition. Production systems don't often hit 100% RAM use right out of the gate, they often take hours or days to get there, so again, there should be time to dump. This can very easily be argued in both direction, with no clear winner: If the crash is stress-induced (say a slashdotted website), for those cases it seems more rational to take the time, collect _good data_ even if it takes a little longer, and then go back into production. Especially if the alternative is to go back into production immediately, collect about half of the data, and then crash again. Rinse and repeat. Again, the mode of operation for the phyp dump is that you'll always have all of the data from the *first* crash, even if there are multiple crashes. That's because the the as-yet undumped RAM is not put back into production. really surprises me that there's no way to reset a device through PHYP though. Seems like such a fundamental feature. I don't know who said that; that's not right. The EEH function certainly does allow you to halt/restart PCI traffic to a particular device and also to reset the device. So, yes, the pSeries kexec code should call into the eeh subsystem to rationalize the device state. I think people are overly optimistic if they think it'll be possible to do all of this reliably (as in with consistent performance) without a second reboot though. The NUMA issues do concern me. But then, the whole virtualized, fractional-cpu, tickless operation stuff sounds like a performance tuning nightmare to begin with. At least without similar amounts of work being done as it would have taken to fix kdump's reliability in the first place. :-) Speaking of reboots. PHYP isn't known for being quick at rebooting a partition, it used to take in the order of minutes even on a small machine. Has that been fixed? Dunno. Probably not. If not, the avoiding an extra reboot argument hardly seems like a benefit versus kdump+kexec, which reboots nearly instantly and without involvement from PHYP. OK, let me tell you what I'm up against right now. I'm dealing with sporadic corruption on my home box. About a month ago, I bought a whizzy ASUS M2NE motherboard an AMD64 2-core cpu, and two sticks of RAM, 1GB per stick. I have one new hard drive, SATA, and one old hard drive, from my old machine, the PATA. The two disks are mirrored in a RAID-1 config. Running Ubuntu. During install/upgrade a month ago, I noticed some of the install files seemed to have gotten corrupted, but that downloading them again got me a working version. This put a serious frown on my face: maybe a bad ethernet card or connection !? Two weeks ago, gcc stopped working one morning, although it worked fine the night before. I'd done nothing in the interim but sleep. Reinstalling it made it work again. Yesterday, something else stopped working. I found the offending library, I compared file checksums against a known-good version, and they were off. (!!!) Disk corruption? Then apt-get stopped working. The /var/lib/dpkg/status file had randomly corrupted single bytes. Its ascii, I hand repaired it; it had maybe 10 bad bytes out of 2MB total size. I installed tripwire. Between the first run of tripwire, and the second, less than an hour later, it reported several dozen files have changed checksums. Manual inspection of some of these files against known-good versions show that, at least this morning, that's no longer the case. System hasn't crashed in a month, since first boot. So what's going on? Is it possible that one of the two disks is serving up bad data, which explains the funny checksum behaviour? Or maybe its bad RAM, so that a fresh disk read shows good data? If its bad ram, why doesn't the system crash? I forced fsck last night, fsck came back spotless. So ... moral of the story: If phyp is doing some sort of hardware checks and validation, that's great. I wish I could afford a pSeries system for my home computer, because my impression is that they are very stable, and don't do things like data corruption. I'm such a friggin cheapskate that I can't bear to spend many thousands instead of many hundreds of dollars. However, I will trade a longer boot for the dream of higher reliability. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/8] pseries: phyp dump: Docmentation
On 10/01/2008, Nathan Lynch [EMAIL PROTECTED] wrote: Mike Strosaker wrote: At the risk of repeating what others have already said, the PHYP-assistance method provides some advantages that the kexec method cannot: - Availability of the system for production use before the dump data is collected. As was mentioned before, some production systems may choose not to operate with the limited memory initially available after the reboot, but it sure is nice to provide the option. I'm more concerned that this design encourages the user to resume a workload *which is almost certainly known to result in a system crash* before collection of crash data is complete. Maybe the gamble will pay off most of the time, but I wouldn't want to be working support when it doesn't. Workloads that cause crashes within hours of startup tend to be weeded-out/discovered during pre-production test of the system to be deployed. Since its pre-production test, dumps can be taken in a leisurely manner. Heck, even a session at the xmon prompt can be contemplated. The problem is when the crash only reproduces after days or weeks of uptime, on a production machine. Since the machine is in production, its got to be brought back up ASAP. Since its crashing only after days/weeks, the dump should have plenty of time to complete. (And if it crashes quickly after that reboot ... well, support people always welcome ways in which a bug can be reproduced more quickly/easily). --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/8] pseries: phyp dump: Docmentation
On 10/01/2008, Olof Johansson [EMAIL PROTECTED] wrote: On Wed, Jan 09, 2008 at 10:12:13PM -0600, Linas Vepstas wrote: On 09/01/2008, Olof Johansson [EMAIL PROTECTED] wrote: On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote: Heh. That's the elbow-grease of this thing. The easy part is to get the core function working. The hard part is to test these various configs, and when they don't work, figure out what went wrong. That will take perseverence and brains. This just sounds like a whole lot of extra work to get a feature that already exists. Well, no. kexec is horribly ill-behaved with respect to PCI. The kexec kernel starts running with PCI devices in some random state; maybe they're DMA'ing or who knows what. kexec tries real hard to whack a few needed pci devices into submission but it has been hit-n-miss, and the source of 90% of the kexec headaches and debugging effort. Its not pretty. It surprises me that this hasn't been possible to resolve with less than architecting a completely new interface, given that the platform has all this fancy support for isolating and resetting adapters. After all, the exact same thing has to be done by the hypervisor before rebooting the partition. OK, point taken. -- The phyp interfaces are there for AIX, which I guess must not have kexec-like ability. So this is a case of Linux leveraging a feature architected for AIX. -- There's also this idea, somewhat weak, that the crash may have corrupted the ram where the kexec kernel sits. For someone who is used to seeing crashes due to null pointer deref's, this seems fairly unlikely. But perhaps crashes in production systems are more mind-bending. (we did have a case where a USB stick used for boot continued to scribble on memory long after it was supposed to be quiet and unused. This resulted in a very hard to debug crash.) A solution to a corrupted kexec kernel would be to disable memory access to where kexec sits, e.g un-mapping or making r/o the pages where it lies. This begs the questions of who unhides the kexec kernel, and what if this 'who' gets corrupted? In short, the kexec kernel does not boot exactly the same as a cold boot, and so this opens a can of worms about well, what's different, how do we minimize these differences, etc. and I think that lead AIX to punt, and say lets just use one single, well-known boot loader/ boot sequence instead of inventing a new one, thus leading to the phyp design. But that's just my guess.. :-) --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/8] pseries: phyp dump: Docmentation
On 09/01/2008, Nathan Lynch [EMAIL PROTECTED] wrote: Hi Linas, Linas Vepstas wrote: As a side effect, the system is in production *while* the dump is being taken; A dubious feature IMO. Hmm. Take it up with Ken Rozendal, this is supposed to be one of the two main selling points of this thing. Seems that the design potentially trades reliability of first failure data capture for availability. E.g. system crashes, reboots, resumes processing while copying dump, crashes again before dump procedure is complete. How is that handled, if at all? Its handled by the hypervisor. phyp maintains the copy of the RMO of first crash, until such time that the OS declares the dump of the RMO to be complete. So you'll always have the RMO of the first crash. For the rest of RAM, it will come in two parts: some portion will have been dumped already. The rest has not yet been dumped, and it will still be there, preserved across the second crash. So you get both RMO and all of RAM from the first crash. with kdump, you can't go into production until after the dump is finished, and the system has been rebooted a second time. On systems with terabytes of RAM, the time difference can be hours. The difference in time it takes to resume the normal workload may be significant, yes. But the time it takes to get a usable dump image would seem to be the basically the same. Yes. Since you bring up large systems... a system with terabytes of RAM is practically guaranteed to be a NUMA configuration with dozens of cpus. When processing a dump on such a system, I wonder how well we fare: can we successfully boot with (say) 128 cpus and 256MB of usable memory? Do we have to hot-online nodes as system memory is freed up (and does that even work)? We need to be able to restore the system to its optimal topology when the dump is finished; if the best we can do is a degraded configuration, the workload will suffer and the system admin is likely to just reboot the machine again so the kernel will have the right NUMA topology. Heh. That's the elbow-grease of this thing. The easy part is to get the core function working. The hard part is to test these various configs, and when they don't work, figure out what went wrong. That will take perseverence and brains. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/8] pseries: phyp dump: Docmentation
On 09/01/2008, Michael Ellerman [EMAIL PROTECTED] wrote: Only if you can get at rtas, but you can't get at rtas at that point. AFAICT you don't need to get at RTAS, you just need to look at the device tree to see if the property is present, and that is trivial. You probably just need to add a check in early_init_dt_scan_rtas() which sets a flag for the PHYP dump stuff, or add your own scan routine if you need. I no longer remember the details. I do remember spending a lot of time trying to figure out how to do this. I know I didn't want to write my own scan routine; maybe that's what stopped me. As it happens, we also did most of the development on a broken phyp which simply did not even have this property, no matter what, and so that may have brain-damaged me. I went for the most elegant solution, where most elegant is defined as fewest lines of code, least effort, etc. Manish may need some hands-on help to extract this token during early boot. Hopefully, he'll let us know. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/8] pseries: phyp dump: Docmentation
On 09/01/2008, Olof Johansson [EMAIL PROTECTED] wrote: On Wed, Jan 09, 2008 at 08:33:53PM -0600, Linas Vepstas wrote: Heh. That's the elbow-grease of this thing. The easy part is to get the core function working. The hard part is to test these various configs, and when they don't work, figure out what went wrong. That will take perseverence and brains. This just sounds like a whole lot of extra work to get a feature that already exists. Well, no. kexec is horribly ill-behaved with respect to PCI. The kexec kernel starts running with PCI devices in some random state; maybe they're DMA'ing or who knows what. kexec tries real hard to whack a few needed pci devices into submission but it has been hit-n-miss, and the source of 90% of the kexec headaches and debugging effort. Its not pretty. If all pci-host bridges could shut-down or settle the bus, and raise the #RST line high, and then if all BIOS'es supported this, you'd be right. But they can't --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 4/8] pseries: phyp dump: use sysfs to release reserved mem
On 07/01/2008, Stephen Rothwell [EMAIL PROTECTED] wrote: On Mon, 07 Jan 2008 18:21:57 -0600 Manish Ahuja [EMAIL PROTECTED] wrote: +static int __init phyp_dump_setup(void) +{ + /* Is there dump data waiting for us? */ + rtas = of_find_node_by_path(/rtas); + dump_header = of_get_property(rtas, ibm,kernel-dump, header_len); ^^^ You could pass NULL here as header_len appears to be unused. Also you need of_node_put(rtas) somewhere (probably just here would do). Perhaps the routine should have been of_get_node_by_path() ? In ye olden days, finds didn't require put, but gets did. I'm guessing that this has now all been fixed up for the of_xxx routines, but I think that pci_find_xxx still does not require a pci_put. Why did I bother to write this email? I dunno... --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept
On 07/01/2008, Arnd Bergmann [EMAIL PROTECTED] wrote: On Tuesday 08 January 2008, Manish Ahuja wrote: Initial patch for reserving memory in early boot, and freeing it later. If the previous boot had ended with a crash, the reserved memory would contain a copy of the crashed kernel data. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] I think the signed-off-by chain needs to be modified. The way it appears, you handled the patch first, then sent it to Linas, who forwarded it to whoever will take the patches from the list. Well, -- there was dual authorship. I remangled the patches while Manish wrote code tested. And I'd mailed them out the first time around, so you could say I forwarded after heavy editing. This obviously isn't true, since you are actually the one who is sending out the patches. Moreover, I believe that the [EMAIL PROTECTED] address is now dead, and shouldn't be used for this any more. Hmm. I wanted to indicate that the work was done while I was at IBM; clearly, no one is going through git and changing old, expired email addrs, and so submission based on the old addr seemed appropriate. I'm taking the Signed-off-by line as a quasi-legal thing: a fancy ID string, identifying the author(s), rather than a new way to manage email address books. So, depending on which of you two wrote the majority of a patch, I think it should be either I'm not sure there was a clear majority. I think Manish did more work in general, but we hacked this together side by side. I got him to create working tested code; I busted it up into individual, clean, documented, mailing-list ready chunks. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: fix os-term usage on kernel panic
On Thu, Nov 29, 2007 at 11:41:47AM +0100, Olaf Hering wrote: On Wed, Nov 28, Linas Vepstas wrote: On Wed, Nov 28, 2007 at 12:00:37PM +0100, Olaf Hering wrote: On Tue, Nov 27, Will Schmidt wrote: - if (panic_timeout) - return; This change is wrong. Booting with panic=123 really means the system has to reboot in 123 seconds after a panic. And it does. Have you ever tried it? Current state is that the JS20 hangs after panic, It should printout the Rebooting in timeout_wait seconds ... Then it should wait timeout_wait number of seconds, as usual, and *then* call the hypervisor. simply because it calls into the hypervisor (or whatever). The hypervisor is not supposed to return at this point. Its supposed to reboot. Appearently, its not rebooting. Either we are using it wrong, or the hypervisor is buggy on some systems. It did work on the machines I was on; but I did not try power5's or blades. So, please restore the panic_timeout check. The problem with this check was that was that the value was never ever set, and so the branch was never ever taken. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: fix os-term usage on kernel panic
On Tue, Nov 27, 2007 at 06:15:59PM -0600, Will Schmidt wrote: (resending with the proper from addr this time). I'm seeing some funky behavior on power5/power6 partitions with this patch.A /sbin/reboot is now behaving much more like a /sbin/halt. Anybody else seeing this, or is it time for me to call an exorcist for my boxes? I beleive the patch http://www.nabble.com/-PATCH--powerpc-pseries:-tell-phyp-to-auto-restart-t4847604.html will cure this problem. From that patch: +/** + * pSeries_auto_restart - tell hypervisor that boot succeeded. + * + * The pseries hypervisor attempts to detect and prevent an + * infinite loop of kernel crashes and auto-reboots. It does + * so by refusing to auto-reboot unless we indicate that the + * current boot was sucessful. So, indicate success late in + * the boot sequence. + */ FYI, I am leaving IBM in just a few days now, and won't really have much of a chance to debug this, if there are other problems. This pair of patches was required to make hypervisor-assisted dump work, viz, we need to tell the hypervisor about when we crashed, or didn't crash, so that if we crashed, the dump can be taken appropriately. It occurs to me that, as I write this, that maybe xmon 'zr' command should be modified to call pSeries_auto_restart just in case, so that it actually reboots. There might be another funky code path that I can't think of right now. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: fix os-term usage on kernel panic
On Wed, Nov 28, 2007 at 12:00:37PM +0100, Olaf Hering wrote: On Tue, Nov 27, Will Schmidt wrote: -void rtas_os_term(char *str) +void rtas_panic_msg(char *str) - if (panic_timeout) - return; This change is wrong. Booting with panic=123 really means the system has to reboot in 123 seconds after a panic. And it does. But, maybe this panic_timeout check was moved elsewhere. It was *always* somewhere else; the check here was always wrong. This change makes the os-term call happen after the the panic timeout amount of time has elapsed. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump
The following series of patches implement a basic framework for hypervisor-assisted dump. The very first patch provides documentation explaining what this is :-). Yes, its supposed to be an improvement over kdump. The patches mostly sort-of work; a list of open issues is inculded in the documentation. It also appears that the not-yet-released firmware versions this was tested on are still, ahem, incomplete; this work is also pending. -- Linas Manish ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH/RFC 1/6]: phyp dump: Documentation
Basic documentation for hypervisor-assisted dump. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] Documentation/powerpc/phyp-assisted-dump.txt | 126 +++ 1 file changed, 126 insertions(+) Index: linux-2.6.24-rc3-git1/Documentation/powerpc/phyp-assisted-dump.txt === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.24-rc3-git1/Documentation/powerpc/phyp-assisted-dump.txt 2007-11-21 16:26:44.0 -0600 @@ -0,0 +1,126 @@ + + Hypervisor-Assisted Dump + + November 2007 + +The goal of hypervisor-assisted dump is to enable the dump of +a crashed system, and to do so from a fully-reset system, and +to minimize the total elapsed time until the system is back +in production use. + +As compared to kdump or other strategies, hypervisor-assisted +dump offers several strong, practical advantages: + +-- Unlike kdump, the system has been reset, and loaded + with a fresh copy of the kernel. In particular, + PCI and I/O devices have been reinitialized and are + in a clean, consistent state. +-- As the dump is performed, the dumped memory becomes + immediately available to the system for normal use. +-- After the dump is completed, no further reboots are + required; the system will be fully usable, and running + in it's normal, production mode on it normal kernel. + +The above can only be accomplished by coordination with, +and assistance from the hypervisor. The procedure is +as follows: + +-- When a system crashes, the hypervisor will save + the low 256MB of RAM to a previously registered + save region. It will also save system state, system + registers, and hardware PTE's. + +-- After the low 256MB area has been saved, the + hypervisor will reset PCI and other hardware state. + It will *not* clear RAM. It will then launch the + bootloader, as normal. + +-- The freshly booted kernel will notice that there + is a new node (ibm,dump-kernel) in the device tree, + indicating that there is crash data available from + a previous boot. It will boot into only 256MB of RAM, + reserving the rest of system memory. + +-- Userspace tools will read /proc/kcore to obtain the + contents of memory, which holds the previous crashed + kernel. The userspace tools may copy this info to + disk, or network, nas, san, iscsi, etc. as desired. + +-- As the userspace tools complete saving a portion of + dump, they echo an offset and size to + /sys/kernel/release_region to release the reserved + memory back to general use. + + An example of this is: + echo 0x4000 0x1000 /sys/kernel/release_region + which will release 256MB at the 1GB boundary. + +Please note that the hypervisor-assisted dump feature +is only available on Power6-based systems with recent +firmware versions. + +Implementation details: +-- +In order for this scheme to work, memory needs to be reserved +quite early in the boot cycle. However, access to the device +tree this early in the boot cycle is difficult, and device-tree +access is needed to determine if there is a crash data waiting. +To work around this problem, all but 256MB of RAM is reserved +during early boot. A short while later in boot, a check is made +to determine if there is dump data waiting. If there isn't, +then the reserved memory is released to general kernel use. +If there is dump data, then the /sys/kernel/release_region +file is created, and the reserved memory is held. + +If there is no waiting dump data, then all but 256MB of the +reserved ram will be released for general kernel use. The +highest 256 MB of RAM will *not* be released: this region +will be kept permanently reserved, so that it can act as +a receptacle for a copy of the low 256MB in the case a crash +does occur. See, however, open issues below, as to whether +such a reserved region is really needed. + +General notes: +-- +Security: please note that there are potential security issues +with any sort of dump mechanism. In particular, plaintext +(unencrypted) data, and possibly passwords, may be present in +the dump data. Userspace tools must take adequate precautions to +preserve security. + +Open issues: + + o User-space dump tool integration is completely unresolved. + + o The various code paths that tell the hypervisor that a crash + occurred, vs. it simply being a normal reboot, should be + reviewed, and possibly clarified/fixed. + + o The real-virtual mapping is awkward and unaddressed. There + is currently no clear way of matching up the contents of + /proc/kcore to the values that need to be sent to + /sys/kernel/release_region + + o Instead of using /sys/kernel, should there be a /sys/dump + instead? There is a dump_subsys being created by the s390 code, + perhaps the pseries code should use a similar layout as well. + + o
[PATCH/RFC 2/6]: phyp dump: config file
Add hypervisor-assisted dump to kernel config Signed-off-by: Linas Vepstas [EMAIL PROTECTED] - arch/powerpc/Kconfig | 11 +++ 1 file changed, 11 insertions(+) Index: linux-2.6.24-rc2-git4/arch/powerpc/Kconfig === --- linux-2.6.24-rc2-git4.orig/arch/powerpc/Kconfig 2007-11-14 16:39:20.0 -0600 +++ linux-2.6.24-rc2-git4/arch/powerpc/Kconfig 2007-11-15 14:27:33.0 -0600 @@ -261,6 +261,17 @@ config CRASH_DUMP Don't change this unless you know what you are doing. +config PHYP_DUMP + bool Hypervisor-assisted dump (EXPERIMENTAL) + depends on PPC_PSERIES EXPERIMENTAL + default y + help + Hypervisor-assisted dump is meant to be a kdump replacement + offering robustness and speed not possible without system + hypervisor assistence. + + If unsure, say Y + config PPCBUG_NVRAM bool Enable reading PPCBUG NVRAM during boot if PPLUS || LOPEC default y if PPC_PREP ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH/RFC 4/6]: phyp dump: use sysfs to release reserved mem
Check to see if there actually is data from a previously crashed kernel waiting. If so, Allow user-sapce tools to grab the data (by reading /proc/kcore). When user-space finishes dumping a section, it must release that memory by writing to sysfs. For example, echo 0x4000 0x1000 /sys/kernel/release_region will release 256MB starting at the 1GB. The released memory becomes free for general use. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] Signed-off-by: Manish Ahuja [EMAIL PROTECTED] -- arch/powerpc/platforms/pseries/phyp_dump.c | 101 +++-- 1 file changed, 96 insertions(+), 5 deletions(-) Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c === --- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 13:15:05.0 -0600 +++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 13:24:30.0 -0600 @@ -12,17 +12,24 @@ */ #include linux/init.h +#include linux/kobject.h #include linux/mm.h +#include linux/of.h #include linux/pfn.h #include linux/swap.h +#include linux/sysfs.h #include asm/page.h #include asm/phyp_dump.h +#include asm/rtas.h /* Global, used to communicate data between early boot and late boot */ static struct phyp_dump phyp_dump_global; struct phyp_dump *phyp_dump_info = phyp_dump_global; +static int ibm_configure_kernel_dump; + +/* - */ /** * release_memory_range -- release memory previously lmb_reserved * @start_pfn: starting physical frame number @@ -52,18 +59,102 @@ release_memory_range(unsigned long start } } -static int __init phyp_dump_setup(void) +/* - */ +/** + * sysfs_release_region -- sysfs interface to release memory range. + * + * Usage: + * echo start addr length /sys/kernel/release_region + * + * Example: + * echo 0x4000 0x1000 /sys/kernel/release_region + * + * will release 256MB starting at 1GB. + */ +static ssize_t +store_release_region(struct kset *kset, const char *buf, size_t count) { + unsigned long start_addr, length, end_addr; unsigned long start_pfn, nr_pages; + ssize_t ret; - /* If no memory was reserved in early boot, there is nothing to do */ - if (phyp_dump_info-init_reserve_size == 0) - return 0; + ret = sscanf(buf, %lx %lx, start_addr, length); + if (ret != 2) + return -EINVAL; + + /* Range-check - don't free any reserved memory that +* wasn't reserved for phyp-dump */ + if (start_addr phyp_dump_info-init_reserve_start) + start_addr = phyp_dump_info-init_reserve_start; + + end_addr = phyp_dump_info-init_reserve_start + + phyp_dump_info-init_reserve_size; + if (start_addr+length end_addr) + length = end_addr - start_addr; + + /* Release the region of memory assed in by user */ + start_pfn = PFN_DOWN(start_addr); + nr_pages = PFN_DOWN(length); + release_memory_range (start_pfn, nr_pages); + + return count; +} + +static ssize_t +show_release_region(struct kset * kset, char *buf) +{ + return sprintf(buf, ola\n); +} + +static struct subsys_attribute rr = __ATTR(release_region, 0600, +show_release_region, +store_release_region); + +/* - */ + +static void release_all (void) +{ + unsigned long start_pfn, nr_pages; - /* Release memory that was reserved in early boot */ + /* Release all memory that was reserved in early boot */ start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start); nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size); release_memory_range(start_pfn, nr_pages); +} + +static int __init phyp_dump_setup(void) +{ + struct device_node *rtas; + const int *dump_header; + int header_len = 0; + int rc; + + /* If no memory was reserved in early boot, there is nothing to do */ + if (phyp_dump_info-init_reserve_size == 0) + return 0; + + /* Return if phyp dump not supported */ + ibm_configure_kernel_dump = rtas_token(ibm,configure-kernel-dump); + if (ibm_configure_kernel_dump == RTAS_UNKNOWN_SERVICE) { + release_all(); + return -ENOSYS; + } + + /* Is there dump data waiting for us? */ + rtas = of_find_node_by_path(/rtas); + dump_header = of_get_property(rtas, ibm,kernel-dump, header_len); + if (dump_header == NULL) { + release_all(); + return 0; + } + + /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ + rc = subsys_create_file(kernel_subsys, rr); + if (rc
[PATCH/RFC 5/6]: phyp dump: register the dump area
Set up the actual dump header, register it with the hypervisor. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] -- arch/powerpc/platforms/pseries/phyp_dump.c | 169 +++-- 1 file changed, 163 insertions(+), 6 deletions(-) Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c === --- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 15:55:37.0 -0600 +++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 16:06:52.0 -0600 @@ -30,6 +30,134 @@ struct phyp_dump *phyp_dump_info = phyp static int ibm_configure_kernel_dump; /* - */ +/* RTAS interfaces to declare the dump regions */ + +struct dump_section { + u32 dump_flags; + u16 source_type; + u16 error_flags; + u64 source_address; + u64 source_length; + u64 length_copied; + u64 destination_address; +}; + +struct phyp_dump_header { + u32 version; + u16 num_of_sections; + u16 status; + + u32 first_offset_section; + u32 dump_disk_section; + u64 block_num_dd; + u64 num_of_blocks_dd; + u32 offset_dd; + u32 maxtime_to_auto; + /* No dump disk path string used */ + + struct dump_section cpu_data; + struct dump_section hpte_data; + struct dump_section kernel_data; +}; + +/* The dump header *must be* in low memory, so .bss it */ +static struct phyp_dump_header phdr; + +#define NUM_DUMP_SECTIONS 3 +#define DUMP_HEADER_VERSION 0x1 +#define DUMP_REQUEST_FLAG 0x1 +#define DUMP_SOURCE_CPU 0x0001 +#define DUMP_SOURCE_HPTE 0x0002 +#define DUMP_SOURCE_RMO 0x0011 + +/** + * init_dump_header() - initialize the header declaring a dump + * Returns: length of dump save area. + * + * When the hypervisor saves crashed state, it needs to put + * it somewhere. The dump header tells the hypervisor where + * the data can be saved. + */ +static unsigned long init_dump_header(struct phyp_dump_header *ph) +{ + struct device_node *rtas; + const unsigned int *sizes; + int len; + unsigned long cpu_state_size = 0; + unsigned long hpte_region_size = 0; + unsigned long addr_offset = 0; + + /* Get the required dump region sizes */ + rtas = of_find_node_by_path(/rtas); + sizes = of_get_property(rtas, ibm,configure-kernel-dump-sizes, len); + if (!sizes || len 20) + return 0; + + if (sizes[0] == 1) + cpu_state_size = *((unsigned long *) sizes[1]); + + if (sizes[3] == 2) + hpte_region_size = *((unsigned long *) sizes[4]); + + /* Set up the dump header */ + ph-version = DUMP_HEADER_VERSION; + ph-num_of_sections = NUM_DUMP_SECTIONS; + ph-status = 0; + + ph-first_offset_section = + (u32) (((struct phyp_dump_header *) 0)-cpu_data); + ph-dump_disk_section = 0; + ph-block_num_dd = 0; + ph-num_of_blocks_dd = 0; + ph-offset_dd = 0; + + ph-maxtime_to_auto = 0; /* disabled */ + + /* The first two sections are mandatory */ + ph-cpu_data.dump_flags = DUMP_REQUEST_FLAG; + ph-cpu_data.source_type = DUMP_SOURCE_CPU; + ph-cpu_data.source_address = 0; + ph-cpu_data.source_length = cpu_state_size; + ph-cpu_data.destination_address = addr_offset; + addr_offset += cpu_state_size; + + ph-hpte_data.dump_flags = DUMP_REQUEST_FLAG; + ph-hpte_data.source_type = DUMP_SOURCE_HPTE; + ph-hpte_data.source_address = 0; + ph-hpte_data.source_length = hpte_region_size; + ph-hpte_data.destination_address = addr_offset; + addr_offset += hpte_region_size; + + /* This section describes the low kernel region */ + ph-kernel_data.dump_flags = DUMP_REQUEST_FLAG; + ph-kernel_data.source_type = DUMP_SOURCE_RMO; + ph-kernel_data.source_address = PHYP_DUMP_RMR_START; + ph-kernel_data.source_length = PHYP_DUMP_RMR_END; + ph-kernel_data.destination_address = addr_offset; + addr_offset += ph-kernel_data.source_length; + + return addr_offset; +} + +static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 1, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) + { + printk (KERN_ERR phyp-dump: unexpected error (%d) on register\n, rc); + } +} + +/* - */ /** * release_memory_range -- release memory previously
[PATCH] powerpc/pseries: tell phyp to auto-restart
The pseries hypervisor attempts to detect and prevent an infinite loop of kernel crashes and auto-reboots. It does so by refusing to auto-reboot unless we indicate that the current boot was sucessful. So, indicate success late in the boot sequence. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] Sigh. This is a side-effect of the patch I sent yesterday. Its supposed to simplify the management large numbers of partitions. arch/powerpc/platforms/pseries/setup.c | 31 +++ 1 file changed, 31 insertions(+) Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/setup.c === --- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/setup.c 2007-11-20 18:37:14.0 -0600 +++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/setup.c 2007-11-20 19:08:12.0 -0600 @@ -491,6 +491,37 @@ void pSeries_power_off(void) for (;;); } +/** + * pSeries_auto_restart - tell hypervisor that boot succeeded. + * + * The pseries hypervisor attempts to detect and prevent an + * infinite loop of kernel crashes and auto-reboots. It does + * so by refusing to auto-reboot unless we indicate that the + * current boot was sucessful. So, indicate success late in + * the boot sequence. + */ +static int __init pSeries_auto_restart(void) +{ + static char buff[3]; /* static so that its in RMO region */ + int rc; + int token = rtas_token(ibm,set-system-parameter); + if (!token) + return 0; + + /* partition_auto_restart is 21; set to to 1 to auto-restart the OS. */ + buff[0] = 0; + buff[1] = 1; /* length */ + buff[2] = 1; /* value */ + do { + rc = rtas_call (token, 2, 1, NULL, 21, buff); + } while (rtas_busy_delay(rc)); + if (rc) + printk(KERN_INFO pSeries_auto_restart(): + unable to setup autorestart, rc=%d\n, rc); + return 0; +} +late_initcall(pSeries_auto_restart); + #ifndef CONFIG_PCI void pSeries_final_fixup(void) { } #endif ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: hangs after Freeing unused kernel memory
On Thu, Nov 15, 2007 at 04:00:09PM -0800, Siva Prasad wrote: Hi, This sounds like a familiar problem, but could not get answers in posts that came up in google search. My system hangs after printing the message Freeing unused kernel memory. It should execute init after that, but not sure what exactly is happening. Appreciate if some one can throw few ideas to try out. It might not be a hang, it might be simply that you loose the console. If this is a redhat system, and you didn't tweak initrd and udev just right, this can happen. Try doing this: mount --bind / /mnt cp -a /dev/null /mnt/dev cp -a /dev/console /mnt/dev cp -a /dev/hv* /mnt/dev umount /mnt Seems it is actually hanging when it makes the call run_init_process(ramdisk_execute_command) in init/main.c Then again, your initrd might be corrupted. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH] powerpc: fix os-term usage on kernel panic
The rtas_os_term() routine was being called at the wrong time. The actual rtas call os-term will not ever return, and so calling it from the panic notifier is too early. Instead, call it from the machine_reset() call. The patch splits the rtas_os_term() routine into two: one part to capture the kernel panic message, invoked during the panic notifier, and another part that is invoked during machine_reset(). Prior to this patch, the os-term call was never being made, because panic_timeout was always non-zero. Calling os-term helps keep the hypervisor happy! We have to keep the hypervisor happy to avoid service, dump and error reporting problems. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] One could make a strong argument to move all of this code from kernel/rtas.c to platforms/pseries/setup.c I did not do this, just so as to make the changes minimal. arch/powerpc/kernel/rtas.c | 12 ++-- arch/powerpc/platforms/pseries/setup.c |3 ++- include/asm-powerpc/rtas.h |3 ++- 3 files changed, 10 insertions(+), 8 deletions(-) Index: linux-2.6.24-rc2-git4/arch/powerpc/kernel/rtas.c === --- linux-2.6.24-rc2-git4.orig/arch/powerpc/kernel/rtas.c 2007-11-19 18:58:53.0 -0600 +++ linux-2.6.24-rc2-git4/arch/powerpc/kernel/rtas.c2007-11-19 19:01:10.0 -0600 @@ -631,18 +631,18 @@ void rtas_halt(void) /* Must be in the RMO region, so we place it here */ static char rtas_os_term_buf[2048]; -void rtas_os_term(char *str) +void rtas_panic_msg(char *str) { - int status; + snprintf(rtas_os_term_buf, 2048, OS panic: %s, str); +} - if (panic_timeout) - return; +void rtas_os_term(void) +{ + int status; if (RTAS_UNKNOWN_SERVICE == rtas_token(ibm,os-term)) return; - snprintf(rtas_os_term_buf, 2048, OS panic: %s, str); - do { status = rtas_call(rtas_token(ibm,os-term), 1, 1, NULL, __pa(rtas_os_term_buf)); Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/setup.c === --- linux-2.6.24-rc2-git4.orig/arch/powerpc/platforms/pseries/setup.c 2007-11-19 18:58:53.0 -0600 +++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/setup.c 2007-11-19 19:01:10.0 -0600 @@ -507,7 +507,8 @@ define_machine(pseries) { .restart= rtas_restart, .power_off = pSeries_power_off, .halt = rtas_halt, - .panic = rtas_os_term, + .panic = rtas_panic_msg, + .machine_shutdown = rtas_os_term, .get_boot_time = rtas_get_boot_time, .get_rtc_time = rtas_get_rtc_time, .set_rtc_time = rtas_set_rtc_time, Index: linux-2.6.24-rc2-git4/include/asm-powerpc/rtas.h === --- linux-2.6.24-rc2-git4.orig/include/asm-powerpc/rtas.h 2007-11-19 18:58:53.0 -0600 +++ linux-2.6.24-rc2-git4/include/asm-powerpc/rtas.h2007-11-19 19:01:10.0 -0600 @@ -164,7 +164,8 @@ extern int rtas_call(int token, int, int extern void rtas_restart(char *cmd); extern void rtas_power_off(void); extern void rtas_halt(void); -extern void rtas_os_term(char *str); +extern void rtas_panic_msg(char *str); +extern void rtas_os_term(void); extern int rtas_get_sensor(int sensor, int index, int *state); extern int rtas_get_power_level(int powerdomain, int *level); extern int rtas_set_power_level(int powerdomain, int level, int *setlevel); ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 1/3] powerpc: EEH: work with device endpoint, always
Perform all error checking at the partitonable endpoint of the device. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] arch/powerpc/platforms/pseries/eeh.c |1 + 1 file changed, 1 insertion(+) Index: linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh.c === --- linux-2.6.23-rc8-mm1.orig/arch/powerpc/platforms/pseries/eeh.c 2007-11-09 16:54:04.0 -0600 +++ linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh.c 2007-11-09 16:56:39.0 -0600 @@ -482,6 +482,7 @@ int eeh_dn_check_failure(struct device_n no_dn++; return 0; } + dn = find_device_pe (dn); pdn = PCI_DN(dn); /* Access to IO BARs might get this far and still not want checking. */ ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 3/3]: powerpc/eeh: report errors as soon as possible.
Do not wait for the pci slot status before reporting an error to the device driver. Some systems may take many seconds to report the slot status, and this can confuse unsuspecting device drivers. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] arch/powerpc/platforms/pseries/eeh_driver.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) Index: linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh_driver.c === --- linux-2.6.23-rc8-mm1.orig/arch/powerpc/platforms/pseries/eeh_driver.c 2007-11-09 17:28:58.0 -0600 +++ linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh_driver.c 2007-11-09 17:36:51.0 -0600 @@ -354,13 +354,6 @@ struct pci_dn * handle_eeh_events (struc if (frozen_pdn-eeh_freeze_count EEH_MAX_ALLOWED_FREEZES) goto excess_failures; - /* Get the current PCI slot state. */ - rc = eeh_wait_for_slot_status (frozen_pdn, MAX_WAIT_FOR_RECOVERY*1000); - if (rc 0) { - printk(KERN_WARNING EEH: Permanent failure\n); - goto hard_fail; - } - printk(KERN_WARNING EEH: This PCI device has failed %d times in the last hour:\n, frozen_pdn-eeh_freeze_count); @@ -376,6 +369,14 @@ struct pci_dn * handle_eeh_events (struc */ pci_walk_bus(frozen_bus, eeh_report_error, result); + /* Get the current PCI slot state. This can take a long time, +* sometimes over 3 seconds for certain systems. */ + rc = eeh_wait_for_slot_status (frozen_pdn, MAX_WAIT_FOR_RECOVERY*1000); + if (rc 0) { + printk(KERN_WARNING EEH: Permanent failure\n); + goto hard_fail; + } + /* Since rtas may enable MMIO when posting the error log, * don't post the error log until after all dev drivers * have been informed. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] pSeries: make pseries_defconfig minus PCI build again
On Wed, Nov 14, 2007 at 03:07:39PM +1100, Stephen Rothwell wrote: Signed-off-by: Stephen Rothwell [EMAIL PROTECTED] Acked-by: Linas Vepstas [EMAIL PROTECTED] --- arch/powerpc/platforms/pseries/Kconfig |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) -- Cheers, Stephen Rothwell[EMAIL PROTECTED] diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index 16e4e40..306a9d0 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -21,7 +21,7 @@ config PPC_SPLPAR config EEH bool PCI Extended Error Handling (EEH) if EMBEDDED - depends on PPC_PSERIES + depends on PPC_PSERIES PCI default y if !EMBEDDED config SCANLOG -- 1.5.3.5 ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH v2] pci hotplug: fix rpaphp directory naming
Fix presentation of the slot number in the /sys/bus/pci/slots directory to match that used in the majority of other drivers. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] -- On Tue, Nov 13, 2007 at 07:26:07PM -0800, Greg KH wrote: We need a signed-off-by: to be able to apply this... Whoops. See above. Same patch as list time, no changes. On Tue, Nov 13, 2007 at 02:58:30PM -0700, Matthew Wilcox wrote: On Tue, Nov 13, 2007 at 03:41:21PM -0600, Linas Vepstas wrote: /sys/bus/pci/slots /sys/bus/pci/slots/control /sys/bus/pci/slots/control/remove_slot /sys/bus/pci/slots/control/add_slot /sys/bus/pci/slots/0001:00:02.0 /sys/bus/pci/slots/0001:00:02.0/phy_location Ugh. Almost two years ago, paulus promised me he was going to fix the slot name for rpaphp. Guess he didn't. You have to ask the right person. :-) I've been defacto mainaining the rpaphp code for unpteen years now. On the other hand, I am also much, much better at promising than delivering. This is one of the hateful things about the current design -- hotplug drivers do too much. Instead of being just the interface between the Linux PCI code and the hardware, they create sysfs directories, add files, and generally have far too much freedom. I chopped out several hundred LOC from rpaphp a year ago, and hopefuly that might make furthre simplification easier someday. We have four different schemes currently for naming in slots/, 1. slot number. Used by cpqphp, ibmphp, acpiphp, pciehp, shpc. 2. domain:bus:dev:fn. Used by fakephp. 3a. domain:bus:dev. Used by rpaphp and sgihp. 3b. Except that rpaphp uses phy_location to present the information that should be in the name and sgihp uses path. ... I've forgotten what cpci uses. And yenta doesn't use it. How is anyone supposed to write sane managability tools in the presence of such anarchy? ~ # cat /sys/bus/pci/slots/:00:02.2/phy_location U787A.001.DNZ00Z5-P1-C2 Right. This should look like: # cat /sys/bus/pci/slots/U787A.001.DNZ00Z5-P1-C2/address :00:02 This patch implements exactly what you describe. Boot tested. I assume you really mean it -- if so, then please review and ack the patch !? I have absolutely no clue if this breaks any existing IBM tools. I'm pretty sure it doesn't ... but attention Mike Strosaker! does it? drivers/pci/hotplug/rpaphp.h |1 drivers/pci/hotplug/rpaphp_pci.c | 14 --- drivers/pci/hotplug/rpaphp_slot.c | 47 +++--- 3 files changed, 24 insertions(+), 38 deletions(-) Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c === --- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_pci.c 2007-07-08 18:32:17.0 -0500 +++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c 2007-11-13 17:52:10.0 -0600 @@ -64,19 +64,6 @@ int rpaphp_get_sensor_state(struct slot return rc; } -static void set_slot_name(struct slot *slot) -{ - struct pci_bus *bus = slot-bus; - struct pci_dev *bridge; - - bridge = bus-self; - if (bridge) - strcpy(slot-name, pci_name(bridge)); - else - sprintf(slot-name, %04x:%02x:00.0, pci_domain_nr(bus), - bus-number); -} - /** * rpaphp_enable_slot - record slot state, config pci device * @@ -114,7 +101,6 @@ int rpaphp_enable_slot(struct slot *slot info-adapter_status = EMPTY; slot-bus = bus; slot-pci_devs = bus-devices; - set_slot_name(slot); /* if there's an adapter in the slot, go add the pci devices */ if (state == PRESENT) { Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c === --- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_slot.c 2007-07-08 18:32:17.0 -0500 +++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c 2007-11-13 18:05:13.0 -0600 @@ -33,23 +33,31 @@ #include asm/rtas.h #include rpaphp.h -static ssize_t location_read_file (struct hotplug_slot *php_slot, char *buf) +static ssize_t address_read_file (struct hotplug_slot *php_slot, char *buf) { - char *value; - int retval = -ENOENT; + int retval; struct slot *slot = (struct slot *)php_slot-private; + struct pci_bus *bus; if (!slot) - return retval; + return -ENOENT; - value = slot-location; - retval = sprintf (buf, %s\n, value); + bus = slot-bus; + if (!bus) + return -ENOENT; + + if (bus-self) + retval = sprintf(buf, pci_name(bus-self)); + else + retval = sprintf(buf, %04x:%02x:00.0, + pci_domain_nr(bus), bus-number); + return retval; } -static struct hotplug_slot_attribute php_attr_location = { - .attr = {.name = phy_location, .mode
[PATCH] pci hotplug: fix rpaphp directory naming
Fix presentation of the slot number in the /sys/bus/pci/slots directory to match that used in the majority of other drivers. -- On Tue, Nov 13, 2007 at 02:58:30PM -0700, Matthew Wilcox wrote: On Tue, Nov 13, 2007 at 03:41:21PM -0600, Linas Vepstas wrote: /sys/bus/pci/slots /sys/bus/pci/slots/control /sys/bus/pci/slots/control/remove_slot /sys/bus/pci/slots/control/add_slot /sys/bus/pci/slots/0001:00:02.0 /sys/bus/pci/slots/0001:00:02.0/phy_location Ugh. Almost two years ago, paulus promised me he was going to fix the slot name for rpaphp. Guess he didn't. You have to ask the right person. :-) I've been defacto mainaining the rpaphp code for unpteen years now. On the other hand, I am also much, much better at promising than delivering. This is one of the hateful things about the current design -- hotplug drivers do too much. Instead of being just the interface between the Linux PCI code and the hardware, they create sysfs directories, add files, and generally have far too much freedom. I chopped out several hundred LOC from rpaphp a year ago, and hopefuly that might make furthre simplification easier someday. We have four different schemes currently for naming in slots/, 1. slot number. Used by cpqphp, ibmphp, acpiphp, pciehp, shpc. 2. domain:bus:dev:fn. Used by fakephp. 3a. domain:bus:dev. Used by rpaphp and sgihp. 3b. Except that rpaphp uses phy_location to present the information that should be in the name and sgihp uses path. ... I've forgotten what cpci uses. And yenta doesn't use it. How is anyone supposed to write sane managability tools in the presence of such anarchy? ~ # cat /sys/bus/pci/slots/:00:02.2/phy_location U787A.001.DNZ00Z5-P1-C2 Right. This should look like: # cat /sys/bus/pci/slots/U787A.001.DNZ00Z5-P1-C2/address :00:02 This patch implements exactly what you describe. Boot tested. I assume you really mean it -- if so, then please review and ack the patch !? I have absolutely no clue if this breaks any existing IBM tools. I'm pretty sure it doesn't ... but attention Mike Strosaker! does it? drivers/pci/hotplug/rpaphp.h |1 drivers/pci/hotplug/rpaphp_pci.c | 14 --- drivers/pci/hotplug/rpaphp_slot.c | 47 +++--- 3 files changed, 24 insertions(+), 38 deletions(-) Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c === --- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_pci.c 2007-07-08 18:32:17.0 -0500 +++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c 2007-11-13 17:52:10.0 -0600 @@ -64,19 +64,6 @@ int rpaphp_get_sensor_state(struct slot return rc; } -static void set_slot_name(struct slot *slot) -{ - struct pci_bus *bus = slot-bus; - struct pci_dev *bridge; - - bridge = bus-self; - if (bridge) - strcpy(slot-name, pci_name(bridge)); - else - sprintf(slot-name, %04x:%02x:00.0, pci_domain_nr(bus), - bus-number); -} - /** * rpaphp_enable_slot - record slot state, config pci device * @@ -114,7 +101,6 @@ int rpaphp_enable_slot(struct slot *slot info-adapter_status = EMPTY; slot-bus = bus; slot-pci_devs = bus-devices; - set_slot_name(slot); /* if there's an adapter in the slot, go add the pci devices */ if (state == PRESENT) { Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c === --- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_slot.c 2007-07-08 18:32:17.0 -0500 +++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_slot.c 2007-11-13 18:05:13.0 -0600 @@ -33,23 +33,31 @@ #include asm/rtas.h #include rpaphp.h -static ssize_t location_read_file (struct hotplug_slot *php_slot, char *buf) +static ssize_t address_read_file (struct hotplug_slot *php_slot, char *buf) { - char *value; - int retval = -ENOENT; + int retval; struct slot *slot = (struct slot *)php_slot-private; + struct pci_bus *bus; if (!slot) - return retval; + return -ENOENT; - value = slot-location; - retval = sprintf (buf, %s\n, value); + bus = slot-bus; + if (!bus) + return -ENOENT; + + if (bus-self) + retval = sprintf(buf, pci_name(bus-self)); + else + retval = sprintf(buf, %04x:%02x:00.0, + pci_domain_nr(bus), bus-number); + return retval; } -static struct hotplug_slot_attribute php_attr_location = { - .attr = {.name = phy_location, .mode = S_IFREG | S_IRUGO}, - .show = location_read_file, +static struct hotplug_slot_attribute php_attr_address = { + .attr = {.name = address, .mode = S_IFREG | S_IRUGO}, + .show = address_read_file
[PATCH] pci hotplug: rm bogus item in rpaphp struct
Remove unused struct element. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] drivers/pci/hotplug/rpaphp.h |1 - drivers/pci/hotplug/rpaphp_pci.c |1 - 2 files changed, 2 deletions(-) Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp.h === --- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp.h 2007-11-13 18:37:31.0 -0600 +++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp.h 2007-11-13 19:00:42.0 -0600 @@ -76,7 +76,6 @@ struct slot { char *name; struct device_node *dn; struct pci_bus *bus; - struct list_head *pci_devs; struct hotplug_slot *hotplug_slot; }; Index: linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c === --- linux-2.6.23-rc8-mm1.orig/drivers/pci/hotplug/rpaphp_pci.c 2007-11-13 18:37:31.0 -0600 +++ linux-2.6.23-rc8-mm1/drivers/pci/hotplug/rpaphp_pci.c 2007-11-13 19:00:13.0 -0600 @@ -100,7 +100,6 @@ int rpaphp_enable_slot(struct slot *slot info-adapter_status = EMPTY; slot-bus = bus; - slot-pci_devs = bus-devices; /* if there's an adapter in the slot, go add the pci devices */ if (state == PRESENT) { ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH] powerpc/eeh: make sure warning message is printed.
Fix old buglet; a warning message should have been printed when a hardware reset takes too long. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] arch/powerpc/platforms/pseries/eeh.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh.c === --- linux-2.6.23-rc8-mm1.orig/arch/powerpc/platforms/pseries/eeh.c 2007-11-05 16:22:44.0 -0600 +++ linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh.c 2007-11-05 16:24:17.0 -0600 @@ -325,7 +325,7 @@ eeh_wait_for_slot_status(struct pci_dn * if (rets[2] == 0) return -1; /* permanently unavailable */ - if (max_wait_msecs = 0) return -1; + if (max_wait_msecs = 0) break; mwait = rets[2]; if (mwait = 0) { ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 11/16] Use of_get_next_child() in eeh_restore_bars()
On Mon, Oct 29, 2007 at 02:46:13PM +1100, Michael Ellerman wrote: On Fri, 2007-10-26 at 17:29 +1000, Stephen Rothwell wrote: On Fri, 26 Oct 2007 16:54:43 +1000 (EST) Michael Ellerman [EMAIL PROTECTED] wrote: - dn = pdn-node-child; - while (dn) { + for (dn = NULL; (dn = of_get_next_child(pdn-node, dn));) Just wondering if we need #define for_each_child_node(dn, parent) \ for (dn = of_get_next_child(parent, NULL); dn; \ dn = of_get_next_child(parent, dn)) Yes, I like this much better too, if for no other reason than the for-loop tructure is more orthodox. Should we perhaps make it for_each_child_device_node() ? foreach_of_device_node_child() or of_foreach_device_node_child() ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 0/3] powerpc eeh: bug fixes for crashes, bad handling
Hi Paul, Please forward upstream the following three tiny patches for EEH bugs, including on crash, and one failure to reset correctly. (I was planning on blasting you many many more patches, involving MSI, but have had nothing but broken hardware for the last few weeks, and so have nothing to show. Dang, cause I needed the msi fixes for 2.6.24. Oh well.) --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 1/3] powerpc eeh: cleanup comments
Clean up commentary, remove dead code. Signed-off-by Linas Vepstas [EMAIL PROTECTED] arch/powerpc/platforms/pseries/eeh_driver.c |8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) Index: linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh_driver.c === --- linux-2.6.23-rc8-mm1.orig/arch/powerpc/platforms/pseries/eeh_driver.c 2007-10-16 11:39:18.0 -0500 +++ linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh_driver.c 2007-10-16 11:46:30.0 -0500 @@ -113,9 +113,9 @@ static void eeh_report_error(struct pci_ /** * eeh_report_mmio_enabled - tell drivers that MMIO has been enabled * - * Report an EEH error to each device driver, collect up and - * merge the device driver responses. Cumulative response - * passed back in userdata. + * Tells each device driver that IO ports, MMIO and config space I/O + * are now enabled. Collects up and merges the device driver responses. + * Cumulative response passed back in userdata. */ static void eeh_report_mmio_enabled(struct pci_dev *dev, void *userdata) @@ -123,8 +123,6 @@ static void eeh_report_mmio_enabled(stru enum pci_ers_result rc, *res = userdata; struct pci_driver *driver = dev-driver; - // dev-error_state = pci_channel_mmio_enabled; - if (!driver || !driver-err_handler || !driver-err_handler-mmio_enabled) ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 2/3]: powerpc eeh: drivers that need reset trump others
Bugfix: if a driver controlling one part of a multi-function pci card has asked for a reset, honor that request above all othres. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] arch/powerpc/platforms/pseries/eeh_driver.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) Index: linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh_driver.c === --- linux-2.6.23-rc8-mm1.orig/arch/powerpc/platforms/pseries/eeh_driver.c 2007-10-16 11:46:30.0 -0500 +++ linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh_driver.c 2007-10-16 11:54:27.0 -0500 @@ -105,9 +105,10 @@ static void eeh_report_error(struct pci_ return; rc = driver-err_handler-error_detected (dev, pci_channel_io_frozen); + + /* A driver that needs a reset trumps all others */ + if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; if (*res == PCI_ERS_RESULT_NONE) *res = rc; - if (*res == PCI_ERS_RESULT_DISCONNECT -rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; } /** @@ -129,9 +130,10 @@ static void eeh_report_mmio_enabled(stru return; rc = driver-err_handler-mmio_enabled (dev); + + /* A driver that needs a reset trumps all others */ + if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; if (*res == PCI_ERS_RESULT_NONE) *res = rc; - if (*res == PCI_ERS_RESULT_DISCONNECT -rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; } /** ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 3/3]: powerpc eeh: avoid crash on null device.
Bugfix: avoid crash if there's no pci device for a given openfirmware node. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] arch/powerpc/platforms/pseries/eeh.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) Index: linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh.c === --- linux-2.6.23-rc8-mm1.orig/arch/powerpc/platforms/pseries/eeh.c 2007-10-16 13:55:03.0 -0500 +++ linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/eeh.c 2007-10-16 14:04:39.0 -0500 @@ -186,6 +186,11 @@ static size_t gather_pci_data(struct pci n += scnprintf(buf+n, len-n, cmd/stat:%x\n, cfg); printk(KERN_WARNING EEH: PCI cmd/status register: %08x\n, cfg); + if (!dev) { + printk(KERN_WARNING EEH: no PCI device for this of node\n); + return n; + } + /* Gather bridge-specific registers */ if (dev-class 16 == PCI_BASE_CLASS_BRIDGE) { rtas_read_config(pdn, PCI_SEC_STATUS, 2, cfg); @@ -198,7 +203,7 @@ static size_t gather_pci_data(struct pci } /* Dump out the PCI-X command and status regs */ - cap = pci_find_capability(pdn-pcidev, PCI_CAP_ID_PCIX); + cap = pci_find_capability(dev, PCI_CAP_ID_PCIX); if (cap) { rtas_read_config(pdn, cap, 4, cfg); n += scnprintf(buf+n, len-n, pcix-cmd:%x\n, cfg); @@ -210,7 +215,7 @@ static size_t gather_pci_data(struct pci } /* If PCI-E capable, dump PCI-E cap 10, and the AER */ - cap = pci_find_capability(pdn-pcidev, PCI_CAP_ID_EXP); + cap = pci_find_capability(dev, PCI_CAP_ID_EXP); if (cap) { n += scnprintf(buf+n, len-n, pci-e cap10:\n); printk(KERN_WARNING @@ -222,7 +227,7 @@ static size_t gather_pci_data(struct pci printk(KERN_WARNING EEH: PCI-E %02x: %08x\n, i, cfg); } - cap = pci_find_ext_capability(pdn-pcidev, PCI_EXT_CAP_ID_ERR); + cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); if (cap) { n += scnprintf(buf+n, len-n, pci-e AER:\n); printk(KERN_WARNING ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [BUG] powerpc does not save msi state [was Re: [PATCH 5/7] pci: Export the pci_restore_msi_state() function
On Fri, Oct 19, 2007 at 05:53:08PM -0700, David Miller wrote: From: [EMAIL PROTECTED] (Linas Vepstas) Date: Fri, 19 Oct 2007 19:46:10 -0500 FWIW, it looks like not all that many arches do this; the output for grep -r address_hi * is pretty thin. Then, looking at i386/kernel/io_apic.c as an example, one can see that the msi state save happens by accident if CONFIG_SMP is enabled; and so its surely broekn on uniprocesor machines. I don't see this, in all cases write_msi_msg() will transfer the given *msg to entry-msg by this assignment in drivers/pci/msi.c: void write_msi_msg(unsigned int irq, struct msi_msg *msg) { ... entry-msg = *msg; } So as long as write_msi_msg() is invoked, it will be saved properly. As Michael Ellerman points out, the pseries msi setup is done by firmware, and so this bit never happens. As discussed in the other thread, I'll try to set up a patch for an arch callback for restoring msi state. -linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 5/7] pci: Export the pci_restore_msi_state() function
On Tue, Oct 23, 2007 at 07:24:27AM +1000, Benjamin Herrenschmidt wrote: On Mon, 2007-10-22 at 13:13 -0500, Linas Vepstas wrote: On Mon, Oct 22, 2007 at 11:49:24AM +1000, Michael Ellerman wrote: On pseries there's a chance it will work for PCI error recovery, but if so it's just lucky that firmware has left everything configured the same way. ? The papr is quite clear that i is up to the OS to restore the msi state after an eeh error. Via direct config space access or via firmware change-msi calls ? Direct config space access. It says that the OS is supposed to read the MSI config (after its been set up), save it, and restore it, (via direct config space writes) if the device is ever reset. I don't know why you keep talking about powerpc laptops here ... Well, there are Apple laptops, right? Aren't those the powermac platform? Now, I don't know if they support MSI, but if they do, I get the impression that they might not restore msi state correctly, after being put into hardware suspend. But perhaps I'm mistaken; I was simply grepping for various msi-related functions in various arch subdirectories, comparing x86 to other arches, and noticed that code that would restore msi state seems to be missing for most arches and most powerpc platforms. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[BUG] powerpc does not save msi state [was Re: [PATCH 5/7] pci: Export the pci_restore_msi_state() function
Hi, On Fri, Oct 19, 2007 at 05:27:06PM -0700, David Miller wrote: From: [EMAIL PROTECTED] (Linas Vepstas) Date: Fri, 19 Oct 2007 19:04:21 -0500 I'm working in linux-2.6.23-rc8-mm1 at the moment, and I don't see that happening. viz. read_msi_msg() is not called anywhere, and I need to have valid msg-address_lo and msg-address_hi and msg-data in order to be able to restore. See the pci_restore_msi_state() call done from pci_restore_state() in drivers/pci/pci.c, that pci_restore_msi_state() code in drivers/pci/msi.c very much relies upon the entry-msg values being uptodate and valid. The MSI arch layer code is supposed to fill the entry-msg values in via arch_setup_msi_irq(). Perhaps the pseries code is forgetting to do that. Yep. Thank you for confirming the correct location for the fix. FWIW, it looks like not all that many arches do this; the output for grep -r address_hi * is pretty thin. Then, looking at i386/kernel/io_apic.c as an example, one can see that the msi state save happens by accident if CONFIG_SMP is enabled; and so its surely broekn on uniprocesor machines. I'm cc'ing the powerpc mailing list to point this out: it looks like only cell/axon_msi.c and mpic_u3msi.c bother do do anything. I guess that there aren't any old macintosh laptops that have msi on them? Because without this, suspend and resume breaks. Paul, On the off chance your reading this, I'll send a pseries patch on Monday, with luck (and some other patches too). I'm not touching any of the other plaforms, you and benh would know those better. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/2] Use of_get_pci_dev_node() in axon_msi.c
On Thu, Oct 18, 2007 at 11:27:23AM +1000, Michael Ellerman wrote: It does what pci_device_to_OF_node() does, but in the right way. The plan is to remove pci_device_to_OF_node() once all the callers have been converted to properly handle the refcounting. Oh. Yes. well, of course, then. Excellent reason. I didn't get that from the patch commit comments. So, FWIW: Ack'ed-by: Linas Vepstas [EMAIL PROTECTED] --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Merge dtc
On Wed, Oct 17, 2007 at 02:59:04PM -0500, Timur Tabi wrote: Kumar Gala wrote: Just out of interest who's complaining? We don't include mkimage for u-boot related builds and I haven't seen any gripes related to that. I think we should include mkimage *and* dtc. But then, I'm not sure how much weight my opinion has. :-) Isn't anyone concerned about the defacto fork-of-source-code that this causes? Which will be the official version? How will the code baes be kept in sync? --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/2] Use of_get_pci_dev_node() in axon_msi.c
On Wed, Oct 17, 2007 at 05:12:27PM +1000, Michael Ellerman wrote: +struct device_node *of_get_pci_dev_node(struct pci_dev *pdev) +{ + return of_node_get(pci_device_to_OF_node(pdev)); +} [...] - dn = of_node_get(pci_device_to_OF_node(dev)); + dn = of_get_pci_dev_node(dev); Is this really useful or wise? As a matter of personal taste, I find stuff like this clutters and confuses my mind. I go to read new code, and I run across some routine I haven't heard of before ... e.g. of_get_pci_dev_node(), so now I have to look it up to see what it does. A few minutes later, I realize that its just a pair of old freinds (of_node_get and pci_device_to_OF_node) and so now I have to make mental room for it. Tommorrow, or 3 days later, I'm again looking at of_get_pci_dev_node() and I'm thinking gee what did that thing do again?? I don't much like this style, and I've been known to submit patches that remove stuff like this ... --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH v4 or so] Use 1TB segments
On Thu, Oct 11, 2007 at 08:37:10PM +1000, Paul Mackerras wrote: This makes the kernel use 1TB segments for all kernel mappings and for user addresses of 1TB and above, on machines which support them (currently POWER5+, POWER6 and PA6T). Gack. A system dump might take a while on these machines ... --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [patch v2] PS3: Add os-area database routines
On Mon, Oct 08, 2007 at 06:07:24PM -0700, Geoff Levand wrote: Subject: PS3: Add os-area database routines Add support for a simple tagged database in the PS3 flash rom os-area. The database allows the flash rom os-area to be shared between a bootloader and installed operating systems. The application ps3-flash-util or the library libps3-utils from the ps3-utils package can be used for userspace database operations. Perhaps I missed the discussion; but .. out of general curiosity, what is the relation between this and the ppc_md.nvram_* system? I note that pseries, powermac, chrp, celleb implement the nvram calls, but cell and ps3 do not. So clearly, whatever this is, its not layered on top of nvram? FWIW, I don't quite understand the nvram system; it seems to have partitions; one part is an os area, and a chuck of it is set up as a file system. So I'm wondering -- wouldn't the DB os-area be generically interesting to other ppowerpc platforms? Maybe even other arches? And why isn't this built on top of the nvram structure? ... etc? --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Hard hang in hypervisor!?
On Tue, Oct 09, 2007 at 04:18:19PM -0500, Nathan Lynch wrote: Linas Vepstas wrote: I was futzing with linux-2.6.23-rc8-mm1 in a power6 lpar when, for whatever reason, a spinlock locked up. The bizarre thing was that the rest of system locked up as well: an ssh terminal, and also an hvc console. Breaking into the debugger showed 4 cpus, 1 of which was deadlocked in the spinlock, and the other 3 in .pseries_dedicated_idle_sleep This was, ahhh, unexpected. What's up with that? Can anyone provide any insight? Sounds consistent with a task trying to double-acquire the lock, or an interrupt handler attempting to acquire a lock that the current task holds. Or maybe even an uninitialized spinlock. Do you know which lock it was? Not sure .. trying to find out now. But why would that kill the ssh session, and the console? Sure, so maybe one cpu is spinning, but the other three can still take interrupts, right? The ssh session should have been generating ethernet card interrupts, and the console should have been generating hvc interrupts. Err .. it was cpu 0 that was spinlocked. Are interrupts not distributed? Perhaps I should IRC this ... --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
never mind .. [was Re: Hard hang in hypervisor!?
On Tue, Oct 09, 2007 at 04:28:10PM -0500, Linas Vepstas wrote: Perhaps I should IRC this ... yeah. I guess I'd forgotten how funky things can get. So never mind ... --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Eval boards should not need to mess with ROOT_DEV
On Mon, Oct 08, 2007 at 02:41:54PM -0500, Kumar Gala wrote: On Oct 8, 2007, at 2:03 PM, Grant Likely wrote: I can't see a good reason for eval board platform code to mess with the ROOT_DEV value instead of using the default behaviour (so I'm Powermac and pseries also do this weirdness. Should it be removed from there too? We need benh to make a comment about powermac. I think its ok to remove everywhere but we should see if he has any issue. Ack. I see no problems in removing it. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Eval boards should not need to mess with ROOT_DEV
On Mon, Oct 08, 2007 at 03:42:21PM -0500, Linas Vepstas wrote: On Mon, Oct 08, 2007 at 02:41:54PM -0500, Kumar Gala wrote: On Oct 8, 2007, at 2:03 PM, Grant Likely wrote: I can't see a good reason for eval board platform code to mess with the ROOT_DEV value instead of using the default behaviour (so I'm Powermac and pseries also do this weirdness. Should it be removed from there too? We need benh to make a comment about powermac. I think its ok to remove everywhere but we should see if he has any issue. Ack. I see no problems in removing it. Err. I meant my comment to be of limited scope: for pseries. I know nothing of other platforms. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: 2.6.23-rc7-mm1 -- powerpc rtas panic
On Thu, Oct 04, 2007 at 05:01:47PM -0700, Nish Aravamudan wrote: On 10/2/07, Tony Breeds [EMAIL PROTECTED] wrote: On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote: I realise it'll make the patch bigger, but this doesn't seem like a particularly good name for the variable anymore. Sure, what about? Clarify when RTAS logging is enabled. Signed-off-by: Tony Breeds [EMAIL PROTECTED] For what it's worth, on a different ppc64 box, this resolves a similar panic for me. Tested-by: Nishanth Aravamudan [EMAIL PROTECTED] For the reasons explained, I'd really like to nack Tony's patch. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Stdout console clogging = 300ms blocked
Hi Bernard, On Wed, Oct 03, 2007 at 08:49:12PM +, Hollis Blanchard wrote: On Tue, 02 Oct 2007 09:41:28 +0200, Willaert, Bernard wrote: Problem: When we log debug output via the serial console on a multithreaded application, the console throughput may get clogged and then we experience a 300ms deadlock. #define THREAD_DELAY 1000 usleep(THREAD_DELAY); fprintf(stdout, - thread 1\n); [...] usleep(THREAD_DELAY); fprintf(stdout, - thread 2\n); baudrate=115200 OK, lets do the math. 115200 baud == approx 115200 bits per second assuming 8N1 for stop parity bits, I get approx 9 bits per byte so your serial port is capable of 115.2/9 = 12.8KBytes per second. Now, every millisecond, you are attempting to print - thread 1\n Lets see, thats 17 bytes. And also - thread 2\n for a grand total of 34 bytes per millisecond. And you are attempting to jam this through a serial line capable of 12.8 Bytes per millisecond? Well, of course it won't fit! Real output on the console: /\ - thread 1 - thread 2 - thread 1 - thread 2 - thread 1 - thread 2 !!! thread2 interval timeout = 335 ms Well, thread 1 clearly also had a delay of 335 milliseconds for a total of 670 milliseconds delay. Now, theoretically, we should have seen a delay equal to (34 - 12.8)/34 = 0.623 seconds I'd say that theory and practice match up pretty damned well; I see no evidence of any problem at all. Could you not post HTML please? Thanks. Agreed. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure
On Mon, Oct 01, 2007 at 07:27:30PM -0600, Matthew Wilcox wrote: The thing to remember is that sym2 is in transition from being a dual BSD/Linux driver to being a purely Linux driver. I was wondering about that; couldn't tell if the split in the code was historical, or being intentionally maintained. My gut instinct is to say ack, although prudence dictates that I should test first. Which might take a few days... Fine by me. I tested the patch, it worked great. It also seemed to recover much more quickly -- so quickly, in fact, that I thought something had gone wrong. I reviewed it one more time, it really does look good. A formal submission and acked by's at earliest convenience would be good. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH] powerpc: fix crash in rtas during early boot.
RTAS messages can occur very early during boot, before the error message buffer has been allocated. The current code will lead to a null-pointer deref. Explicitly protect against this. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] Cc: Andy Whitcroft [EMAIL PROTECTED] Andy Whitcroft's crash was appearently due to firmware complaining about lost power, (actually, lost power supply redundancy!), which occurred very early during boot. Type0040 (EPOW) Status: bypassed new Residual error from previous boot. EPOW Sensor Value: 0002 EPOW warning due to loss of redundancy. EPOW general power fault. I've no clue why firmware thought it was OK to report this during one of the earliest calls to RTAS; I'm still investiigating that. arch/powerpc/platforms/pseries/rtasd.c |6 ++ 1 file changed, 6 insertions(+) Index: linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/rtasd.c === --- linux-2.6.23-rc8-mm1.orig/arch/powerpc/platforms/pseries/rtasd.c 2007-09-26 15:06:49.0 -0500 +++ linux-2.6.23-rc8-mm1/arch/powerpc/platforms/pseries/rtasd.c 2007-10-03 11:58:09.0 -0500 @@ -235,6 +235,12 @@ void pSeries_log_error(char *buf, unsign return; } + /* During early boot, the log buffer hasn't been allocted yet. */ + if (rtas_log_buf == NULL) { + spin_unlock_irqrestore(rtasd_log_lock, s); + return; + } + /* call type specific method for error */ switch (err_type ERR_TYPE_MASK) { case ERR_TYPE_RTAS_LOG: ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: 2.6.23-rc7-mm1 -- powerpc rtas panic
On Wed, Oct 03, 2007 at 02:09:46PM +1000, Michael Ellerman wrote: Until we initialise what exactly? Until we allocate the error log buffer. The original crash was for a null-pointer deref of the unallocated buffer. I just sent out a patch to fix this; its a bit simpler than the below. In that email, I remarked: Andy Whitcroft's crash was appearently due to firmware complaining about lost power, (actually, lost power supply redundancy!), which occurred very early during boot. Type0040 (EPOW) Status: bypassed new Residual error from previous boot. EPOW Sensor Value: 0002 EPOW warning due to loss of redundancy. EPOW general power fault. I've no clue why firmware thought it was OK to report this during one of the earliest calls to RTAS; I'm still investiigating that. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH] powerpc: another use of zalloc_maybe_bootmem()
Use alloc_maybe_bootmem() which wraps the if(mem_init_done) malloc clause. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] On Tue, Oct 02, 2007 at 01:37:53PM +1000, Stephen Rothwell wrote: This patch introduces zalloc_maybe_bootmem and uses it so that we don;t have to mark a whole (largish) routine as __init_ref_ok. sfr missed a spot -- may as well get rid of this one too. arch/powerpc/kernel/pci-common.c |7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) Index: linux-2.6.23-rc8-mm1/arch/powerpc/kernel/pci-common.c === --- linux-2.6.23-rc8-mm1.orig/arch/powerpc/kernel/pci-common.c 2007-09-26 15:02:41.0 -0500 +++ linux-2.6.23-rc8-mm1/arch/powerpc/kernel/pci-common.c 2007-10-02 16:28:16.0 -0500 @@ -65,14 +65,11 @@ static void __devinit pci_setup_pci_cont spin_unlock(hose_spinlock); } -__init_refok struct pci_controller * pcibios_alloc_controller(struct device_node *dev) +struct pci_controller * pcibios_alloc_controller(struct device_node *dev) { struct pci_controller *phb; - if (mem_init_done) - phb = kmalloc(sizeof(struct pci_controller), GFP_KERNEL); - else - phb = alloc_bootmem(sizeof (struct pci_controller)); + phb = alloc_maybe_bootmem(sizeof(struct pci_controller), GFP_KERNEL); if (phb == NULL) return NULL; pci_setup_pci_controller(phb); ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure
On Mon, Oct 01, 2007 at 07:27:30PM -0600, Matthew Wilcox wrote: Fine by me. Do you have the ability to produce failures on a whim on your platforms? Yes, although it is very platform specific -- there are actually transistors in the pci bridge chip, which actually short out lines, and so, from the point of view of the rest of the chip, it did actually see a real error. Its supposed to be a very realistic test. I've been vaguely musing a PCI device failure patch for x86, just so people can test driver failure paths. That would be good ... I've recently agreed to accept a fedex to test someone elses card for them, which is outside my usual activities. There's also supposed to be some PCI-X riser card out there, (never seen one) which has the ability to inject actual pci errors. Its the Agilent PCI BestX card; I got the impression they might not sell it anymore; dunno. One guy in the lab used to brush a grounding strap across the pins; this usually got a rise out of the audience. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: 2.6.23-rc7-mm1 -- powerpc rtas panic
On Mon, Sep 24, 2007 at 01:35:31PM +0100, Andy Whitcroft wrote: Seeing the following from an older power LPAR, pretty sure we had this in the previous -mm also: I haven't forgetten about this ... and am looking at it now. Seems that whenever I go to reserve the machine pSeries-102, someone else is using it :-) --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure
On Mon, Oct 01, 2007 at 02:12:47PM -0600, Matthew Wilcox wrote: I think the fundamental problem is that completions aren't really supposed to be used like this. Here's one attempt at using completions perhaps a little more the way they're supposed to be used, Yes, that looks very good to me. I see it solves a bug that I hadn't been quite aware of. I don't understand why struct host_data is preferable to struct sym_shcb (is it because this is the structure that is naturally protectected by the spinlock?) My gut instinct is to say ack, although prudence dictates that I should test first. Which might take a few days... although now I've written it, I wonder if we shouldn't just use a waitqueue instead. I thought that earlier versions of the driver used waitqueues (I vaguely remember eh_wait in the code), which were later converted to completions (I also vaguely recall thinking that the new code was more elegant/simpler). I converted my patch to use the completions likewise, and, as you've clearly shown, did a rather sloppy job in the conversion. I'm tempted to go with this patch; but if you prod, I could attempt a wait-queue based patch. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Help! Debian ppc64
On Thu, Sep 27, 2007 at 09:57:02AM -0400, Cesar Bello wrote: Hi, I'm writing from Venezuela. I have to prepair a presentation about Debian on IBM pSerires Servers with Power 5+ processors. My first question is what are the advantages of use Debian GNU/Linux on pSeries Servers? Advantages as compared to what? -- Debian on Intel? +++ powerpc has better RAS features than Intel, for example, my favorite, PCI error handling and recovery, or hotplug cpu, dynamic LPAR configuration, etc. -- SuSE/RedHat on PowerPC? +++ SuSE/RedHat offer formal support, for $$$, which debian/ubuntu do not -- AIX on pSeries? +++ AIX has various enterprise features that Debian does not. You might try talking to RedHat/SuSE product support, and also to IBM pSeries sales. Linas Vepstas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 5/7] Celleb: Supports VFD on Celleb 2
On Thu, Sep 27, 2007 at 11:07:33AM +0200, Arnd Bergmann wrote: On Thursday 27 September 2007, Ishizaki Kou wrote: This is a patch to support VFD on Celleb 2. VFD is a small LCD to show miscellaneous messages. Signed-off-by: Kou Ishizaki [EMAIL PROTECTED] My feeling is that your interface should better be implemented as a character device, or be integrating into some other existing message interface, if we can find one. * The firmware seems to implement the generic rtas interface for display-character and set-indicator, but your driver is celleb specific. I'd be feel more comfortable if we could come up with a driver that also works on other systems that implement the same rtas calls. Yep, I think I agree. Most pseries systems have a small two-line LCD display. Right now, the code that talks to it is implemented in rtas_progress(). It has this name because its used only for printing out boot progress messages. This is great for debugging hangs, but its not othrewise used. I suppose it would be nice to have a geeric interface to the thing, and, after a quickie skim of the code, the celleb display looks similar enogh that this abstraction could be made. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: tg3: PCI error recovery
During a private conversation about how to save and restore device state after a pci error is detected, and the device is reset, the following came up: On Wed, Sep 26, 2007 at 04:48:38PM -0700, Michael Chan wrote: 1b) If so, is it safe to call pci_save_state() in tg3_io_error_detected(), or are we to assume they've been corrupted? My conservative approach is to assume that anything and everything has been corrupted. (e.g. temporary undervoltage on the bus might scramble multiple registers) In that case, we should call pci_restore_msi_state() to restore the MSI state, but this call is only defined if CONFIG_PM is defined. There seem to be two choices: 1) enable CONFIG_PM in those arches that care about recovering from PCI errors. (Yuck) 2) remove the ifdef CONFIG_PM from around pci_restore_msi_state() in rivers/pci/msi.c I'd go for choice 2, but I thought I'd ask first ... --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: 2.6.23-rc8 dies somewhere during boot!?
On Thu, Sep 27, 2007 at 09:12:33PM +0200, Gerhard Pircher wrote: Hi, I'm working on a 2.6.23 kernel for the AmigaOne. [...] 6PCI: Probing PCI hardware. 7PCI: Scanning bus 0... ...00:00:07.0. 7PCI: Calling quirk... ...CI: Found :00:07.2 [1106/303... Any chance that this thing has an e100 ethernet card in it? If so, edit drivers/pci/quirks.c and ifdef out the readb() in the e100_quirk routine. We're debating the proper fix on the pci mailing list now. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: 2.6.23-rc8 dies somewhere during boot!?
On Thu, Sep 27, 2007 at 09:31:31PM +0200, Gerhard Pircher wrote: Betreff: Re: 2.6.23-rc8 dies somewhere during boot!? I'm working on a 2.6.23 kernel for the AmigaOne. Have you tried 2.6.22, or does that fail also? --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: 2.6.23-rc8 dies somewhere during boot!?
On Thu, Sep 27, 2007 at 11:17:00PM +0200, Gerhard Pircher wrote: Betreff: Re: 2.6.23-rc8 dies somewhere during boot!? Do you have an idea how to debug it? Not particularly. What caught my eye was the failure right near the PCI quirk stuff, as I was having problems there as well (but apearntly, for very different reasons). Based on your boot messages, it looks like you are failing somewhere in pci probe. My olde-fashioned, slow, but-usually-works method is to sprinkle enough printk's into the code to catch it in the act. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure
On Wed, Sep 26, 2007 at 09:02:16AM -0600, Matthew Wilcox wrote: On Fri, Apr 20, 2007 at 03:47:20PM -0500, Linas Vepstas wrote: Implement the so-called first failure data capture (FFDC) for the symbios PCI error recovery. After a PCI error event is reported, the driver requests that MMIO be enabled. Once enabled, it then reads and dumps assorted status registers, and concludes by requesting the usual reset sequence. + /* Request that MMIO be enabled, so register dump can be taken. */ + return PCI_ERS_RESULT_CAN_RECOVER; +} I'm a little concerned by the mention of MMIO. It's entirely possible for the sym2 driver to be using ioports to access the card rather than MMIO. Is it simply that it can't on the platform you test on? The comment is misleading. I've been in the bad habit of calling it mmio whenever its not DMA. The habit is because there are two distinct enable bits in the pci-host bridge during error recovery: one to enable mmio/ioports, and the other to enable DMA. If the adapter has gone crazy, I don't want to enable DMA, so that it doesn't scribble to bad places. But, by enabling mmio/ioports, perhaps it can be finessed back into a semi-sane state, e.g. sane enough to perform a dump of its internal state. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: 2.6.23-rc8 dies somewhere during boot!?
On Thu, Sep 27, 2007 at 11:57:35PM +0200, Gerhard Pircher wrote: Based on your boot messages, it looks like you are failing somewhere in pci probe. My olde-fashioned, slow, but-usually-works method is to sprinkle enough printk's into the code to catch it in the act. I guess the code in arch/powerpc/pci*.c is the right place to sprinkle some printk's into the code? The last identifiable message I was 7PCI: Calling quirk... which is from drivers/pci/quirks.c ...CI: Found :00:07.2 [1106/303... and this is from pci_setup_device() in drivers/pci/probe.c So I'd look to see if pci_setup_device() ever returned, and then I'd look to see what happened next. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure
On Thu, Sep 27, 2007 at 04:10:31PM -0600, Matthew Wilcox wrote: In the error handler, we wait_for_completion(io_reset_wait). In sym2_io_error_detected, we init_completion(io_reset_wait). Isn't it possible that we hit the error handler before we hit the io_error_detected path, and thus the completion wait is lost? Since the completion is already initialised in sym_attach(), I don't think we need to initialise it in sym2_io_error_detected(). Makes sense to just delete it? Good catch. But no ... and I had to study this a bit. Bear with me: It is enough to call init_completion() once, and not once per use: it initializes spinlocks, which shouldn't be intialized twice. But, that completion might be used multiple times when there are multiple errors, and so, before using it a second time, one must set completion-done = 0. The INIT_COMPLETION() macro does this. One must have completion-done = 0 before every use, as otherwise, wait_for_completion() won't actually wait. And since complete_all() sets x-done += UINT_MAX/2, I'm pretty sure x-done won't be zero the next time we use it, unless we make it so. So I need to find a place to safely call INIT_COMPLETION() again, after the completion has been used. At the moment, I'm stumped as to where to do this. [think ... think ... think] I think the race you describe above is harmless. The first time that sym_eh_handler() will run, it will be with SYM_EH_ABORT, in it doesn't matter if we lose that, since the device is hosed anyway. At some later time, it will run with SYM_EH_DEVICE_RESET and then SYM_EH_BUS_RESET and then SYM_EH_HOST_RESET, and we won't miss those, since, by now, sym2_io_error_detected() will have run. So, by my reading, I'd say that init_completion() in sym2_io_error_detected() has to stay (although perhaps it should be replaced by the INIT_COMPLETION() macro.) Removing it will prevent correct operation on the second and subsequent errors. --Linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [EMAIL PROTECTED]: 2.6.23-rc6-mm1 -- powerpc pSeries_log_error panic in rtas_call/early_enable_eeh]
I just got back from vacation. I'll give this a whirl shortly. --linas On Sun, Sep 23, 2007 at 11:17:40AM -0500, Anton Blanchard wrote: Hi Linas, Looks like EEH could be involved :) Anton - Forwarded message from Andy Whitcroft [EMAIL PROTECTED] - From: Andy Whitcroft [EMAIL PROTECTED] To: Andrew Morton [EMAIL PROTECTED] Subject: 2.6.23-rc6-mm1 -- powerpc pSeries_log_error panic in rtas_call/early_enable_eeh X-SPF-Guess: neutral Cc: linuxppc-dev@ozlabs.org, [EMAIL PROTECTED] X-BeenThere: linuxppc-dev@ozlabs.org X-Mailman-Version: 2.1.9 List-Id: Linux on PowerPC Developers Mail List linuxppc-dev.ozlabs.org List-Unsubscribe: https://ozlabs.org/mailman/listinfo/linuxppc-dev, mailto:[EMAIL PROTECTED] List-Archive: http://ozlabs.org/pipermail/linuxppc-dev List-Post: mailto:linuxppc-dev@ozlabs.org List-Help: mailto:[EMAIL PROTECTED] List-Subscribe: https://ozlabs.org/mailman/listinfo/linuxppc-dev, mailto:[EMAIL PROTECTED] Seeing the following panic booting an old powerpc LPAR: Unable to handle kernel paging request for data at address 0x Faulting instruction address: 0xc0047b48 cpu 0x0: Vector: 300 (Data Access) at [c06a3750] pc: c0047b48: .pSeries_log_error+0x364/0x420 lr: c0047acc: .pSeries_log_error+0x2e8/0x420 sp: c06a39d0 msr: 80001032 dar: 0 dsisr: 4200 current = 0xc05acab0 paca= 0xc05ad700 pid = 0, comm = swapper enter ? for help [c06a3af0] c0021164 .rtas_call+0x200/0x250 [c06a3ba0] c0049d50 .early_enable_eeh+0x168/0x360 [c06a3c70] c002f674 .traverse_pci_devices+0x8c/0x138 [c06a3d10] c0560ce8 .eeh_init+0x1a8/0x200 [c06a3db0] c055fb70 .pSeries_setup_arch+0x128/0x234 [c06a3e40] c054f830 .setup_arch+0x214/0x24c [c06a3ee0] c0546a38 .start_kernel+0xd4/0x3e4 [c06a3f90] c045adc4 .start_here_common+0x54/0x58 0:mon This machine is: # cat /proc/cpuinfo processor : 0 cpu : POWER4+ (gq) clock : 1703.965296MHz revision: 19.0 [...] machine : CHRP IBM,7040-681 -apw ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev - End forwarded message - ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Keep On Debugging You
On Thu, Sep 06, 2007 at 10:21:43AM -0500, Timur Tabi wrote: Zhang Wei-r63237 wrote: Oops! Could you give us a live show version? :D Sorry, I'm booked up for the rest of the year. Hmm. Maybe someone could sneak a videocam into one of the venues, and, you know, post a pirated, illegal copy on youtube or something. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 0/3] powerpc: whitespace cleanup, grammar corrections
These popped out at me while I was reading code. Its all janitorial. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 2/3] powerpc: prom whitespace cleanup
Whitespace cleanup: badly indented lines. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] arch/powerpc/kernel/prom.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) Index: linux-2.6.22-git2/arch/powerpc/kernel/prom.c === --- linux-2.6.22-git2.orig/arch/powerpc/kernel/prom.c 2007-08-29 14:14:12.0 -0500 +++ linux-2.6.22-git2/arch/powerpc/kernel/prom.c2007-08-29 14:15:10.0 -0500 @@ -782,13 +782,13 @@ static int __init early_init_dt_scan_cho #endif #ifdef CONFIG_KEXEC - lprop = (u64*)of_get_flat_dt_prop(node, linux,crashkernel-base, NULL); - if (lprop) - crashk_res.start = *lprop; - - lprop = (u64*)of_get_flat_dt_prop(node, linux,crashkernel-size, NULL); - if (lprop) - crashk_res.end = crashk_res.start + *lprop - 1; + lprop = (u64*)of_get_flat_dt_prop(node, linux,crashkernel-base, NULL); + if (lprop) + crashk_res.start = *lprop; + + lprop = (u64*)of_get_flat_dt_prop(node, linux,crashkernel-size, NULL); + if (lprop) + crashk_res.end = crashk_res.start + *lprop - 1; #endif early_init_dt_check_for_initrd(node); ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 1/3] powerpc: prom_init whitespace cleanup, typo fix.
Whitespace cleanup: badly indented lines. Typo in comment. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] arch/powerpc/kernel/prom_init.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) Index: linux-2.6.22-git2/arch/powerpc/kernel/prom_init.c === --- linux-2.6.22-git2.orig/arch/powerpc/kernel/prom_init.c 2007-07-08 18:32:17.0 -0500 +++ linux-2.6.22-git2/arch/powerpc/kernel/prom_init.c 2007-08-28 16:40:26.0 -0500 @@ -1197,7 +1197,7 @@ static void __init prom_initialize_tce_t if ((type[0] == 0) || (strstr(type, RELOC(pci)) == NULL)) continue; - /* Keep the old logic in tack to avoid regression. */ + /* Keep the old logic intact to avoid regression. */ if (compatible[0] != 0) { if ((strstr(compatible, RELOC(python)) == NULL) (strstr(compatible, RELOC(Speedwagon)) == NULL) @@ -2224,7 +2224,7 @@ static void __init fixup_device_tree(voi static void __init prom_find_boot_cpu(void) { - struct prom_t *_prom = RELOC(prom); + struct prom_t *_prom = RELOC(prom); u32 getprop_rval; ihandle prom_cpu; phandle cpu_pkg; @@ -2244,7 +2244,7 @@ static void __init prom_find_boot_cpu(vo static void __init prom_check_initrd(unsigned long r3, unsigned long r4) { #ifdef CONFIG_BLK_DEV_INITRD - struct prom_t *_prom = RELOC(prom); + struct prom_t *_prom = RELOC(prom); if (r3 r4 r4 != 0xdeadbeef) { unsigned long val; @@ -2277,7 +2277,7 @@ unsigned long __init prom_init(unsigned unsigned long pp, unsigned long r6, unsigned long r7) { - struct prom_t *_prom; + struct prom_t *_prom; unsigned long hdr; unsigned long offset = reloc_offset(); @@ -2336,8 +2336,8 @@ unsigned long __init prom_init(unsigned /* * Copy the CPU hold code */ - if (RELOC(of_platform) != PLATFORM_POWERMAC) - copy_and_flush(0, KERNELBASE + offset, 0x100, 0); + if (RELOC(of_platform) != PLATFORM_POWERMAC) + copy_and_flush(0, KERNELBASE + offset, 0x100, 0); /* * Do early parsing of command line ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 3/3] powerpc: setup_64 comment cleanup.
Gramatical corrections to comments. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] arch/powerpc/kernel/prom.c |8 +--- arch/powerpc/kernel/setup_64.c |6 +++--- 2 files changed, 8 insertions(+), 6 deletions(-) Index: linux-2.6.22-git2/arch/powerpc/kernel/setup_64.c === --- linux-2.6.22-git2.orig/arch/powerpc/kernel/setup_64.c 2007-09-04 17:29:36.0 -0500 +++ linux-2.6.22-git2/arch/powerpc/kernel/setup_64.c2007-09-05 14:12:23.0 -0500 @@ -181,9 +181,9 @@ void __init early_setup(unsigned long dt DBG( - early_setup(), dt_ptr: 0x%lx\n, dt_ptr); /* -* Do early initializations using the flattened device -* tree, like retreiving the physical memory map or -* calculating/retreiving the hash table size +* Do early initialization using the flattened device +* tree, such as retrieving the physical memory map or +* calculating/retrieving the hash table size. */ early_init_devtree(__va(dt_ptr)); Index: linux-2.6.22-git2/arch/powerpc/kernel/prom.c === --- linux-2.6.22-git2.orig/arch/powerpc/kernel/prom.c 2007-09-05 14:23:06.0 -0500 +++ linux-2.6.22-git2/arch/powerpc/kernel/prom.c2007-09-05 14:24:49.0 -0500 @@ -433,9 +433,11 @@ static int __init early_parse_mem(char * } early_param(mem, early_parse_mem); -/* - * The device tree may be allocated below our memory limit, or inside the - * crash kernel region for kdump. If so, move it out now. +/** + * move_device_tree - move tree to an unused area, if needed. + * + * The device tree may be allocated beyond our memory limit, or inside the + * crash kernel region for kdump. If so, move it out of the way. */ static void move_device_tree(void) { ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2.6.23] ibmebus: Prevent bus_id collisions
On Thu, Aug 30, 2007 at 04:00:56PM +0200, Joachim Fenkes wrote: Plus, I rather like using the full_name since it also contains a descriptive name as opposed to being just nondescript numbers, helping the layman (ie user) to make sense out of a dev_id. Yes, well, but no. The location code is useful as a geographical location: slots and devices are physically labelled with stickers so you can tell which is which. Handy when you have to unplug stuff. By contrast, the device-tree full_name is mostly just gobldy-gook, with some crazy phb numbering in there that, after four years of staring at them, I still can't reliably do anything useful with. Location codes are nice. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: RFC: issues concerning the next NAPI interface
On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote: 3) On modern systems the incoming packets are processed very fast. Especially on SMP systems when we use multiple queues we process only a few packets per napi poll cycle. So NAPI does not work very well here and the interrupt rate is still high. I saw this too, on a system that is modern but not terribly fast, and only slightly (2-way) smp. (the spidernet) I experimented wih various solutions, none were terribly exciting. The thing that killed all of them was a crazy test case that someone sprung on me: They had written a worst-case network ping-pong app: send one packet, wait for reply, send one packet, etc. If I waited (indefinitely) for a second packet to show up, the test case completely stalled (since no second packet would ever arrive). And if I introduced a timer to wait for a second packet, then I just increased the latency in the response to the first packet, and this was noticed, and folks complained. In the end, I just let it be, and let the system work as a busy-beaver, with the high interrupt rate. Is this a wise thing to do? I was thinking that, if the system is under heavy load, then the interrupt rate would fall, since (for less pathological network loads) more packets would queue up before the poll was serviced. But I did not actually measure the interrupt rate under heavy load ... --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: RFC: issues concerning the next NAPI interface
On Fri, Aug 24, 2007 at 08:52:03AM -0700, Stephen Hemminger wrote: You need hardware support for deferred interrupts. Most devices have it (e1000, sky2, tg3) and it interacts well with NAPI. It is not a generic thing you want done by the stack, you want the hardware to hold off interrupts until X packets or Y usecs have expired. Just to be clear, in the previous email I posted on this thread, I described a worst-case network ping-pong test case (send a packet, wait for reply), and found out that a deffered interrupt scheme just damaged the performance of the test case. Since the folks who came up with the test case were adamant, I turned off the defferred interrupts. While defferred interrupts are an obvious solution, I decided that they weren't a good solution. (And I have no other solution to offer). --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: RFC: issues concerning the next NAPI interface
On Fri, Aug 24, 2007 at 09:04:56PM +0200, Bodo Eggert wrote: Linas Vepstas [EMAIL PROTECTED] wrote: On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote: 3) On modern systems the incoming packets are processed very fast. Especially on SMP systems when we use multiple queues we process only a few packets per napi poll cycle. So NAPI does not work very well here and the interrupt rate is still high. worst-case network ping-pong app: send one packet, wait for reply, send one packet, etc. Possible solution / possible brainfart: Introduce a timer, but don't start to use it to combine packets unless you receive n packets within the timeframe. If you receive less than m packets within one timeframe, stop using the timer. The system should now have a decent response time when the network is idle, and when the network is busy, nobody will complain about the latency.-) Ohh, that was inspirational. Let me free-associate some wild ideas. Suppose we keep a running average of the recent packet arrival rate, Lets say its 10 per millisecond (typical for a gigabit eth runnning flat-out). If we could poll the driver at a rate of 10-20 per millisecond (i.e. letting the OS do other useful work for 0.05 millisec), then we could potentially service the card without ever having to enable interrupts on the card, and without hurting latency. If the packet arrival rate becomes slow enough, we go back to an interrupt-driven scheme (to keep latency down). The main problem here is that, even for HZ=1000 machines, this amounts to 10-20 polls per jiffy. Which, if implemented in kernel, requires using the high-resolution timers. And, umm, don't the HR timers require a cpu timer interrupt to make them go? So its not clear that this is much of a win. The eHEA is a 10 gigabit device, so it can expect 80-100 packets per millisecond for large packets, and even more, say 1K packets per millisec, for small packets. (Even the spec for my 1Gb spidernet card claims its internal rate is 1M packets/sec.) Another possiblity is to set HZ to 5000 or 2 or something humongous ... after all cpu's are now faster! But, since this might be wasteful, maybe we could make HZ be dynamically variable: have high HZ rates when there's lots of network/disk activity, and low HZ rates when not. That means a non-constant jiffy. If all drivers used interrupt mitigation, then the variable-high frequency jiffy could take thier place, and be more fair to everyone. Most drivers would be polled most of the time when they're busy, and only use interrupts when they're not. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: RFC: issues concerning the next NAPI interface
On Fri, Aug 24, 2007 at 02:44:36PM -0700, David Miller wrote: From: David Stevens [EMAIL PROTECTED] Date: Fri, 24 Aug 2007 09:50:58 -0700 Problem is if it increases rapidly, you may drop packets before you notice that the ring is full in the current estimated interval. This is one of many reasons why hardware interrupt mitigation is really needed for this. When turning off interrupts, don't turn them *all* off. Leave the queue-full interrupt always on. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: How to port PReP to arch/powerpc?
On Wed, Aug 22, 2007 at 07:29:56AM -0500, Josh Boyer wrote: David Gibson and Rob Landley had a quite interesting discussion about PReP last night on IRC. ?? Where? I scrolled back on #ppc64 on freenode, and see no such conversation. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev