Re: [PATCH 2/4] edac: mpc85xx add mpc83xx support
Ira or Kumar, can you address Andrew's concerns below and what was posted in prior posts on this? thanks doug t --- On Wed, 7/15/09, Andrew Morton a...@linux-foundation.org wrote: From: Andrew Morton a...@linux-foundation.org Subject: Re: [PATCH 2/4] edac: mpc85xx add mpc83xx support To: dougthomp...@xmission.com Cc: bluesmoke-de...@lists.sourceforge.net, linux-ker...@vger.kernel.org Date: Wednesday, July 15, 2009, 1:52 PM On Wed, 15 Jul 2009 11:38:49 -0600 dougthomp...@xmission.com wrote: Add support for the Freescale MPC83xx memory controller to the existing driver for the Freescale MPC85xx memory controller. The only difference between the two processors are in the CS_BNDS register parsing code, which has been changed so it will work on both processors. The L2 cache controller does not exist on the MPC83xx, but the OF subsystem will not use the driver if the device is not present in the OF device tree. Kumar, I had to change the nr_pages calculation to make the math work out. I checked it on my board and did the math by hand for a 64GB 85xx using 64K pages. In both cases, nr_pages * PAGE_SIZE comes out to the correct value. Thanks for the help. v1 - v2: * Use PAGE_SHIFT to parse cs_bnds regardless of board type * Remove special-casing for the 83xx processor ... @@ -789,19 +791,20 @@ static void __devinit mpc85xx_init_csrow csrow = mci-csrows[index]; cs_bnds = in_be32(pdata-mc_vbase + MPC85XX_MC_CS_BNDS_0 + (index * MPC85XX_MC_CS_BNDS_OFS)); - start = (cs_bnds 0xfff) 4; - end = ((cs_bnds 0xfff) 20); - if (start) - start |= 0xf; - if (end) - end |= 0xf; + + start = (cs_bnds 0x) 16; + end = (cs_bnds 0x); if (start == end) continue; /* not populated */ + start = (24 - PAGE_SHIFT); + end = (24 - PAGE_SHIFT); + end |= (1 (24 - PAGE_SHIFT)) - 1; stares for a while That looks like the original code was really really wrong. The setting of all the lower bits in `end' is funny-looking. What's happening here? Should it be commented? csrow-first_page = start PAGE_SHIFT; csrow-last_page = end PAGE_SHIFT; - csrow-nr_pages = csrow-last_page + 1 - csrow-first_page; + csrow-nr_pages = end + 1 - start; csrow-grain = 8; csrow-mtype = mtype; csrow-dtype = DEV_UNKNOWN; @@ -985,6 +988,7 @@ static struct of_device_id mpc85xx_mc_er { .compatible = fsl,mpc8560-memory-controller, }, { .compatible = fsl,mpc8568-memory-controller, }, { .compatible = fsl,mpc8572-memory-controller, }, + { .compatible = fsl,mpc8349-memory-controller, }, { .compatible = fsl,p2020-memory-controller, }, {}, }; @@ -1001,13 +1005,13 @@ static struct of_platform_driver mpc85xx }, }; - +#ifdef CONFIG_MPC85xx static void __init mpc85xx_mc_clear_rfxe(void *data) { orig_hid1[smp_processor_id()] = mfspr(SPRN_HID1); mtspr(SPRN_HID1, (orig_hid1[smp_processor_id()] ~0x2)); } - +#endif static int __init mpc85xx_mc_init(void) { @@ -1040,26 +1044,32 @@ static int __init mpc85xx_mc_init(void) printk(KERN_WARNING EDAC_MOD_STR PCI fails to register\n); #endif +#ifdef CONFIG_MPC85xx /* * need to clear HID1[RFXE] to disable machine check int * so we can catch it */ if (edac_op_state == EDAC_OPSTATE_INT) on_each_cpu(mpc85xx_mc_clear_rfxe, NULL, 0); +#endif return 0; } The patch adds lots of ifdefs :( module_init(mpc85xx_mc_init); +#ifdef CONFIG_MPC85xx static void __exit mpc85xx_mc_restore_hid1(void *data) { mtspr(SPRN_HID1, orig_hid1[smp_processor_id()]); } +#endif afacit this will run smp_processor_id() from within preemptible code, which is often buggy on preemptible kernels and will cause runtime warnings on at least some architectures. static void __exit mpc85xx_mc_exit(void) { +#ifdef CONFIG_MPC85xx on_each_cpu(mpc85xx_mc_restore_hid1, NULL, 0); +#endif #ifdef CONFIG_PCI of_unregister_platform_driver(mpc85xx_pci_err_driver); #endif ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2 PATCH 2/3] EDAC: Add edac_device_alloc_index()
--- On Wed, 4/15/09, Andrew Morton a...@linux-foundation.org wrote: From: Andrew Morton a...@linux-foundation.org Subject: Re: [v2 PATCH 2/3] EDAC: Add edac_device_alloc_index() To: Harry Ciao qingtao@windriver.com Cc: nor...@yahoo.com, mich...@ellerman.id.au, bluesmoke-de...@lists.sourceforge.net, linuxppc-dev@ozlabs.org, linux-ker...@vger.kernel.org Date: Wednesday, April 15, 2009, 4:27 PM On Mon, 13 Apr 2009 14:05:15 +0800 Harry Ciao qingtao@windriver.com wrote: Add edac_device_alloc_index(), because for MAPLE platform there may exist several EDAC driver modules that could make use of edac_device_ctl_info structure at the same time. The index allocation for these structures should be taken care of by EDAC core. From: Andrew Morton a...@linux-foundation.org keep things neat. Also avoids having global identifier device_index shadowed by local identifier device_index. Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Acked-by: Doug Thompson dougthomp...@xmission.com Cc: Harry Ciao qingtao@windriver.com Cc: Kumar Gala ga...@gate.crashing.org Cc: Michael Ellerman mich...@ellerman.id.au Cc: Paul Mackerras pau...@samba.org Signed-off-by: Andrew Morton a...@linux-foundation.org --- drivers/edac/edac_device.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff -puN drivers/edac/amd8111_edac.c~edac-add-edac_device_alloc_index-cleanup drivers/edac/amd8111_edac.c diff -puN drivers/edac/edac_core.h~edac-add-edac_device_alloc_index-cleanup drivers/edac/edac_core.h diff -puN drivers/edac/edac_device.c~edac-add-edac_device_alloc_index-cleanup drivers/edac/edac_device.c --- a/drivers/edac/edac_device.c~edac-add-edac_device_alloc_index-cleanup +++ a/drivers/edac/edac_device.c @@ -37,7 +37,6 @@ */ static DEFINE_MUTEX(device_ctls_mutex); static LIST_HEAD(edac_device_list); -static atomic_t device_indexes = ATOMIC_INIT(0); #ifdef CONFIG_EDAC_DEBUG static void edac_device_dump_device(struct edac_device_ctl_info *edac_dev) @@ -499,6 +498,8 @@ void edac_device_reset_delay_period(stru */ int edac_device_alloc_index(void) { + static atomic_t device_indexes = ATOMIC_INIT(0); + return atomic_inc_return(device_indexes) - 1; } EXPORT_SYMBOL_GPL(edac_device_alloc_index); _ ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH v2] edac: mpc85xx: Add support for MPC8572
Dave Jiang [EMAIL PROTECTED] wrote: There's an SVN+quilt tree via sourceforge for EDAC. I have asked Doug to push this patch upstream to the mm tree. Kumar Gala wrote: On Sep 19, 2008, at 6:20 PM, Nate Case wrote: From: Andrew Kilkenny This adds support for the dual-core MPC8572 processor. We have to support making SPR changes on each core. Also, since we can have multiple memory controllers sharing an interrupt, flag the interrupts with IRQF_SHARED. Signed-off-by: Andrew Kilkenny Signed-off-by: Nate Case --- drivers/edac/mpc85xx_edac.c | 28 +++- 1 files changed, 23 insertions(+), 5 deletions(-) Acked-by: Kumar Gala Guys, is there an edac git tree or something to create patches against? I've got one I've been sitting on but it should be updated based on Nate's patch. - k the SVN repos is svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk the info page is http://bluesmoke.sourceforge.net/ doug t W1DUG___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [patch 1/9] powerpc/cell/edac: log a syndrome code in case of correctable error
--- Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: Arnd, Maxim, please, next time, send that patch or at least CC the bluesmoke-devel list for EDAC related bits. Doug, if you are ok with this patch, I'll merge it via the powerpc fine with me, acked below doug t tree. Cheers, Ben. On Tue, 2008-07-15 at 21:51 +0200, [EMAIL PROTECTED] From: Maxim Shchetynin [EMAIL PROTECTED] If correctable error occurs the syndrome code was logged as 0. This patch lets EDAC to log a correct syndrome code to make problem investigation easier. Signed-off-by: Maxim Shchetynin [EMAIL PROTECTED] Signed-off-by: Arnd Bergmann [EMAIL PROTECTED] Acked-by: Doug Thompson [EMAIL PROTECTED] --- drivers/edac/cell_edac.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/edac/cell_edac.c b/drivers/edac/cell_edac.c index b54112f..0e024fe 100644 --- a/drivers/edac/cell_edac.c +++ b/drivers/edac/cell_edac.c @@ -33,7 +33,7 @@ static void cell_edac_count_ce(struct mem_ctl_info *mci, int chan, u64 ar) { struct cell_edac_priv *priv = mci-pvt_info; struct csrow_info *csrow = mci-csrows[0]; - unsigned long address, pfn, offset; + unsigned long address, pfn, offset, syndrome; dev_dbg(mci-dev, ECC CE err on node %d, channel %d, ar = 0x%016lx\n, priv-node, chan, ar); @@ -44,10 +44,11 @@ static void cell_edac_count_ce(struct mem_ctl_info *mci, int chan, u64 ar) address = (address 1) | chan; pfn = address PAGE_SHIFT; offset = address ~PAGE_MASK; + syndrome = (ar 0x1fe0ul) 21; /* TODO: Decoding of the error addresss */ edac_mc_handle_ce(mci, csrow-first_page + pfn, offset, - 0, 0, chan, ); + syndrome, 0, chan, ); } static void cell_edac_count_ue(struct mem_ctl_info *mci, int chan, u64 ar) -- 1.5.4.3 W1DUG ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] pasemi: Broaden specific references to 1682M
I assume then that this patch will move upstream via the POWERPC path, is that right? Signed-off-by: Doug Thompson [EMAIL PROTECTED] --- Olof Johansson [EMAIL PROTECTED] wrote: [POWERPC] pasemi: Broaden specific references to 1682M There will be more product numbers in the future than just PA6T-1682M, but they will share much of the features. Remove some of the explicit references and compatibility checks with 1682M, and replace most of them with the more generic term PWRficient. Signed-off-by: Olof Johansson [EMAIL PROTECTED] --- This one touches drivers/char/hw_random and drivers/edac, but I'd prefer to just merge it up through the powerpc merge path since the changes are trivial. (Michael, Doug, if you disagree let me know and I can submit separate patches. This is 2.6.25 material anyway). -Olof diff --git a/arch/powerpc/platforms/pasemi/Kconfig b/arch/powerpc/platforms/pasemi/Kconfig index 735e153..2f4dd6e 100644 --- a/arch/powerpc/platforms/pasemi/Kconfig +++ b/arch/powerpc/platforms/pasemi/Kconfig @@ -17,7 +17,7 @@ config PPC_PASEMI_IOMMU bool PA Semi IOMMU support depends on PPC_PASEMI help - IOMMU support for PA6T-1682M + IOMMU support for PA Semi PWRficient config PPC_PASEMI_IOMMU_DMA_FORCE bool Force DMA engine to use IOMMU diff --git a/arch/powerpc/platforms/pasemi/cpufreq.c b/arch/powerpc/platforms/pasemi/cpufreq.c index 1cfb8b0..8caa166 100644 --- a/arch/powerpc/platforms/pasemi/cpufreq.c +++ b/arch/powerpc/platforms/pasemi/cpufreq.c @@ -147,7 +147,10 @@ static int pas_cpufreq_cpu_init(struct cpufreq_policy *policy) if (!cpu) goto out; - dn = of_find_compatible_node(NULL, sdc, 1682m-sdc); + dn = of_find_compatible_node(NULL, NULL, 1682m-sdc); + if (!dn) + dn = of_find_compatible_node(NULL, NULL, + pasemi,pwrficient-sdc); if (!dn) goto out; err = of_address_to_resource(dn, 0, res); @@ -160,7 +163,10 @@ static int pas_cpufreq_cpu_init(struct cpufreq_policy *policy) goto out; } - dn = of_find_compatible_node(NULL, gizmo, 1682m-gizmo); + dn = of_find_compatible_node(NULL, NULL, 1682m-gizmo); + if (!dn) + dn = of_find_compatible_node(NULL, NULL, + pasemi,pwrficient-gizmo); if (!dn) { err = -ENODEV; goto out_unmap_sdcasr; @@ -292,7 +298,8 @@ static struct cpufreq_driver pas_cpufreq_driver = { static int __init pas_cpufreq_init(void) { - if (!machine_is_compatible(PA6T-1682M)) + if (!machine_is_compatible(PA6T-1682M) + !machine_is_compatible(pasemi,pwrficient)) return -ENODEV; return cpufreq_register_driver(pas_cpufreq_driver); diff --git a/arch/powerpc/platforms/pasemi/gpio_mdio.c b/arch/powerpc/platforms/pasemi/gpio_mdio.c index 95d0c78..b029804 100644 --- a/arch/powerpc/platforms/pasemi/gpio_mdio.c +++ b/arch/powerpc/platforms/pasemi/gpio_mdio.c @@ -333,7 +333,10 @@ int gpio_mdio_init(void) { struct device_node *np; - np = of_find_compatible_node(NULL, gpio, 1682m-gpio); + np = of_find_compatible_node(NULL, NULL, 1682m-gpio); + if (!np) + np = of_find_compatible_node(NULL, NULL, + pasemi,pwrficient-gpio); if (!np) return -ENODEV; gpio_regs = of_iomap(np, 0); diff --git a/arch/powerpc/platforms/pasemi/setup.c b/arch/powerpc/platforms/pasemi/setup.c index 3a5d112..aeafe98 100644 --- a/arch/powerpc/platforms/pasemi/setup.c +++ b/arch/powerpc/platforms/pasemi/setup.c @@ -362,8 +362,11 @@ static inline void pasemi_pcmcia_init(void) static struct of_device_id pasemi_bus_ids[] = { + /* Unfortunately needed for legacy firmwares */ { .type = localbus, }, { .type = sdc, }, + { .compatible = pasemi,localbus, }, + { .compatible = pasemi,sdc, }, {}, }; @@ -389,7 +392,8 @@ static int __init pas_probe(void) { unsigned long root = of_get_flat_dt_root(); - if (!of_flat_dt_is_compatible(root, PA6T-1682M)) + if (!of_flat_dt_is_compatible(root, PA6T-1682M) + !of_flat_dt_is_compatible(root, pasemi,pwrficient)) return 0; hpte_init_native(); @@ -400,7 +404,7 @@ static int __init pas_probe(void) } define_machine(pasemi) { - .name = PA Semi PA6T-1682M, + .name = PA Semi PWRficient, .probe = pas_probe, .setup_arch = pas_setup_arch, .init_early = pas_init_early, diff --git a/drivers/char/hw_random/Kconfig b/drivers/char/hw_random/Kconfig index 2d7cd48..6bbd4fa 100644 --- a/drivers/char/hw_random/Kconfig +++ b/drivers/char/hw_random/Kconfig @@ -98,7 +98,7
Re: EDAC stats PCI error recovery (was Re: [PATCH 2/2] powerpc: MPC85xx EDAC device driver)
--- Linas Vepstas [EMAIL PROTECTED] wrote: On Mon, Jul 30, 2007 at 03:47:05PM -0700, Doug Thompson wrote: --- Linas Vepstas [EMAIL PROTECTED] wrote: Also: please note that the linux kernel has a pci error recovery mechanism built in; its used by pseries and PCI-E. I'm not clear on what any of this has to do with EDAC, which I thought was supposed to be for RAM only. (The EDAC project once talked about doing pci error recovery, but that was years ago, and there is a separate system for that, now.) no, edac can/does harvest PCI bus errors, via polling and other hardware error detectors. Ehh! I had no idea. A few years ago, when I was working on the PCI error recovery, I sent a number of emails to the various EDAC people and mailing lists that I could find, and never got a response. I assumed the project was dead. I guess its not ... No its not, just some company lay offs stirred the pot, at least for me, for awhile. I did see the ibm patches go by, but didn't have the time to check up at that time. I actually, didn't know the recovery interface had gotten into the kernel (My failure to watch for them), so I was pleasantly surprised at this last OLS to attend the presentation. But at the current time, few PCI device drivers initialize those callback functions and thus errors are lost and some IO transactions fail. There are patches for 6 drivers in mainline (e100, e1000, ixgb, s2io, ipr, lpfc), and two more pending (sym53cxxx, tg3). So far, I've written all of them. Great. EDAC does nothing for recovery, just logging and stats gathering and presentation. Over time, as drivers get updated (might take some time) then drivers can take some sort of action FOR THEMSELVES I think I need to do more to raise awareness and interest. good point Yet, there is no tracking of errors - except for a log message in the log file. There is NO meter on frequency of errors, etc. One must grep the log file and that is not a very cycle friendly mechanism. Yeah, there was low interest in stats. There's a core set of stats in /proc/pp64/eeh, but these are clearly arch-specific. I'd ike to move away from those. Some recent patches added stats to the /sys tree, under the individual pci bridge and device nodes. Again, these are arch-specific; I'd like to move to some geeral/standardized presentation. the memory error consumers really like the stats of EDAC. Allows them to track trends. Cluster types, with thousands of nodes, like the monitoring for both memory and PCI, as well as some newer hardware detector harvesting. The reason I added PCI parity/error device scanning, was that when I was at Linux Networx, we had parity errors on the PCI-X bus, but didn't know the cause. After we discovered that a simple PCI-X riser card had manufacturing problems (quality) and didn't drive lines properly, it caused parity errors. Heh. Not unusual. I've seen/heard of cases with voltages being low, and/or ground-bounce in slots near the end. There's a whole zoo of hardware/firmware bugs that we've had to painfully crawl through and fix. That's why the IBM boxes cost big $$$; here's to hoping that customers understand why. I understand This feature allowed us to track nodes that were having parity problems, but we had no METER to know it. Recovery is a good thing, BUT how do you know you having LOTS of errors/recovery events? You need a meter. EDAC provides that METER I'm lazy. What source code should I be looking at? I'm concerned about duplication of function and proliferation of interfaces. I've got my metering data under (for example) /sys/bus/pci/devices/0001:c0:01.0/eeh_*, mostly very arch specific. The code for this is in arch/powerpc/platforms/pseries/eeh_sysfs.c http://bluesmoke.sourceforge.net/ is the SF project zone (bluesmoke was the out-of-tree name, changed to EDAC when it came into tree, and source forge doesn't allow renaming) EDAC info is under: /sys/devices/system/edac/ mc for memory controllers pci for pci info. very basic, just counters and some controls I met with Yanmin Zhang of Intel at OLS after his paper presentation on PCI Express Advanced Error Reporting in the Kernel, and we talked about this same thing. I am talking with him on having the recovery code present information into EDAC sysfs area. (hopefully, anyway) Hmm. OK, where's that? Back when, I'd talked to Yamin about coming up with a generic, arch-indep way of driving the recovery routines. But this wasn't exactly easy, and we were still grappling with just getting things working. Now that things are working, its time to broaden horizons. Not very far, but I see the potential. When EDAC was received, it was placed where it was in the sysfs from various kernel developers as a good spot on its own. Can you point me to the current edac code? find
Re: [PATCH 2/2] powerpc: MPC85xx EDAC device driver
--- Linas Vepstas [EMAIL PROTECTED] wrote: On Mon, Jul 30, 2007 at 01:17:40PM -0700, Dave Jiang wrote: Arnd Bergmann wrote: The best solution may be to look at how it's structured at the register level. If the PCI EDAC registers are implemented separately from the regular PCI registers, a device tree entry would be appropriate. If not, your idea of registering a platform_device from fsl_add_bridge is probably more sensible. We can probably do either. From looking at the 8560 and 8548 manuals, the PCI error registers are 0xe00 offset of the start of PCI registers. For example, the PCI registers would start at 0x8000 offset. And the PCI error registers would be at 0xe00 offset from there and would be the very last block of registers. Anywhere I can easily get an overview of these PCI error registers? Also: please note that the linux kernel has a pci error recovery mechanism built in; its used by pseries and PCI-E. I'm not clear on what any of this has to do with EDAC, which I thought was supposed to be for RAM only. (The EDAC project once talked about doing pci error recovery, but that was years ago, and there is a separate system for that, now.) no, edac can/does harvest PCI bus errors, via polling and other hardware error detectors. The pci error recovery added a couple of NEW device callback functions in the driver interface, which the bus layer can call to notify drivers that a PCI bus error occurred. Then the driver can do some action on the event. But at the current time, few PCI device drivers initialize those callback functions and thus errors are lost and some IO transactions fail. Over time, as drivers get updated (might take some time) then drivers can take some sort of action FOR THEMSELVES Yet, there is no tracking of errors - except for a log message in the log file. There is NO meter on frequency of errors, etc. One must grep the log file and that is not a very cycle friendly mechanism. The reason I added PCI parity/error device scanning, was that when I was at Linux Networx, we had parity errors on the PCI-X bus, but didn't know the cause. After we discovered that a simple PCI-X riser card had manufacturing problems (quality) and didn't drive lines properly, it caused parity errors. This feature allowed us to track nodes that were having parity problems, but we had no METER to know it. Recovery is a good thing, BUT how do you know you having LOTS of errors/recovery events? You need a meter. EDAC provides that METER I met with Yanmin Zhang of Intel at OLS after his paper presentation on PCI Express Advanced Error Reporting in the Kernel, and we talked about this same thing. I am talking with him on having the recovery code present information into EDAC sysfs area. (hopefully, anyway) The recovery generates log messages BUT having to periodically 'grep' the log file looking for errors is not a good use of CPU cycles. grep once for a count and then grep later for a count and then compare the counts for a delta count per unit time. ugly. The EDAC solution is to be able to have a Listener thread in user space that can be notified (via poll()) that an event has occurred. There are more than one consumer (error recover) of error events: 1) driver recovery after a transaction (which is the recovery consumer above) 2) Management agents for health of a node 3) Maintainance agents for predictive component replacement Rates of change of errors can be gathered as well. EDAC allows for presentation of error counts via sysfs entries, from which user space programs can harvest for over-time profiling We have MEMORY (edac_mc) devices for chipsets now, but via the new edac_device class, such things as ECC error tracking on DMA error checkers, FABRIC switchs, L1 and L2 cache ECC events, core CPU data ECC checkers, etc can be done. I have an out of kernel tree MIPS driver do just this. Other types of harvesters can be generated as well for other and/or new hardware error detectors. doug thompson --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev