[PATCH] powerpc/mm: Fix hang accessing top of vmalloc space
On pSeries, we always force the IO space to be mapped using 4K pages even with a 64K base page size to cope with some limitations in the HV interface to some devices. However, the SLB miss handler code to discriminate between vmalloc and ioremap space uses a CPU feature section such that the code is nop'ed out when the processor support large pages non-cachable mappings. Thus, we end up always using the ioremap page size for vmalloc segments on such processors, causing a discrepency between the segment and the hash table, and thus a hang continously hashing the page. It works for the first segment of the vmalloc space since that segment is bolted in by C code correctly, and thankfully we almost never use the vmalloc space beyond the first segment, but the new percpu code made the bug happen. This fixes it by removing the feature section from the assembly, we now always do the comparison between vmalloc and ioremap. Signed-off-by; Benjamin Herrenschmidt b...@kernel.crashing.org --- Sachin, can you verify that works for you ? diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S index bc44dc4..95ce355 100644 --- a/arch/powerpc/mm/slb_low.S +++ b/arch/powerpc/mm/slb_low.S @@ -72,19 +72,17 @@ _GLOBAL(slb_miss_kernel_load_vmemmap) 1: #endif /* CONFIG_SPARSEMEM_VMEMMAP */ - /* vmalloc/ioremap mapping encoding bits, the li instructions below -* will be patched by the kernel at boot + /* vmalloc mapping gets the encoding from the PACA as the mapping +* can be demoted from 64K - 4K dynamically on some machines */ -BEGIN_FTR_SECTION - /* check whether this is in vmalloc or ioremap space */ clrldi r11,r10,48 cmpldi r11,(VMALLOC_SIZE 28) - 1 bgt 5f lhz r11,PACAVMALLOCSLLP(r13) b 6f 5: -END_FTR_SECTION_IFCLR(CPU_FTR_CI_LARGE_PAGE) -_GLOBAL(slb_miss_kernel_load_io) + /* IO mapping */ + _GLOBAL(slb_miss_kernel_load_io) li r11,0 6: BEGIN_FTR_SECTION ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] gianfar: Add support for hibernation
From: Anton Vorontsov avoront...@ru.mvista.com Date: Mon, 12 Oct 2009 20:00:00 +0400 Here are few patches that add support for hibernation for gianfar driver. Technically, we could just do gfar_close() and then gfar_enet_open() sequence to restore gianfar functionality after hibernation, but close/open does so many unneeded things (e.g. BDs buffers freeing and allocation, IRQ freeing and requesting), that I felt it would be much better to cleanup and refactor some code to make the hibernation [and not only hibernation] code a little bit prettier. I applied all of this, it's a really nice patch set. If there are any problems we can deal with it using follow-on fixups. I noticed something, in patch #3 where you remove the spurious wrap bit setting in startup_gfar(). It looks like that was not only spurious but it was doing it wrong too. It's writing garbage into the status word, because it's not using the BD_LFLAG() macro to shift the value up 16 bits. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/mm: Fix hang accessing top of vmalloc space
Benjamin Herrenschmidt wrote: On pSeries, we always force the IO space to be mapped using 4K pages even with a 64K base page size to cope with some limitations in the HV interface to some devices. However, the SLB miss handler code to discriminate between vmalloc and ioremap space uses a CPU feature section such that the code is nop'ed out when the processor support large pages non-cachable mappings. Thus, we end up always using the ioremap page size for vmalloc segments on such processors, causing a discrepency between the segment and the hash table, and thus a hang continously hashing the page. It works for the first segment of the vmalloc space since that segment is bolted in by C code correctly, and thankfully we almost never use the vmalloc space beyond the first segment, but the new percpu code made the bug happen. This fixes it by removing the feature section from the assembly, we now always do the comparison between vmalloc and ioremap. Signed-off-by; Benjamin Herrenschmidt b...@kernel.crashing.org --- Sachin, can you verify that works for you ? Works great. Thanks Ben. Tested by: Sachin Sant sach...@in.ibm.com Regards -Sachin -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/8] bitmap: Introduce bitmap_set, bitmap_clear, bitmap_find_next_zero_area
My user space testing exposed off-by-one error find_next_zero_area in iommu-helper. Some zero area cannot be found by this bug. Subject: [PATCH] Fix off-by-one error in find_next_zero_area Signed-off-by: Akinobu Mita akinobu.m...@gmail.com --- lib/iommu-helper.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/lib/iommu-helper.c b/lib/iommu-helper.c index 75dbda0..afc58bc 100644 --- a/lib/iommu-helper.c +++ b/lib/iommu-helper.c @@ -19,7 +19,7 @@ again: index = (index + align_mask) ~align_mask; end = index + nr; - if (end = size) + if (end size) return -1; for (i = index; i end; i++) { if (test_bit(i, map)) { -- 1.5.4.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
i2c-powermac fails
Hi Ben, Paul, I had a report by Tim Shepard (Cc'd) that the therm_adt746x driver sometimes fails to initialize on his PowerBook G4 running kernel 2.6.31. The following error message can be seen in the logs when the failure happens: therm_adt746x 7-002e: Thermostat failed to read config! After enabling low-level i2c debugging, it turns out that the problem is caused by low-level errors at the I2C bus level: PowerMac i2c bus pmu 2 registered PowerMac i2c bus pmu 1 registered PowerMac i2c bus mac-io 0 registered low_i2c:xfer() chan=0, addrdir=0x5d, mode=4, subsize=1, subaddr=0x0, 1 bytes, bus /un...@f800/i...@f8001000/i2c-...@0 low_i2c:kw_handle_interrupt(state_addr, isr: 6) low_i2c:KW: NAK on address low_i2c:xfer error -6 i2c-adapter i2c-7: I2C transfer at 0x2e failed, size 2, err -6 therm_adt746x 7-002e: Thermostat failed to read config! PowerMac i2c bus uni-n 0 registered So apparently the I2C controller doesn't see the ack from the ADT7467. However the ADT7467 is a SMBus-compliant device, so it must always ack his address. It is worth noting that many other I2C errors happen and go unnoticed. Below is the log of a successful therm_adt746x registration: PowerMac i2c bus pmu 2 registered PowerMac i2c bus pmu 1 registered PowerMac i2c bus mac-io 0 registered low_i2c:xfer() chan=0, addrdir=0x5d, mode=4, subsize=1, subaddr=0x0, 1 bytes, bus /un...@f800/i...@f8001000/i2c-...@0 low_i2c:kw_handle_interrupt(state_addr, isr: 2) low_i2c:kw_handle_interrupt(state_read, isr: 5) adt746x: ADT7467 initializing low_i2c:xfer() chan=0, addrdir=0x5d, mode=4, subsize=1, subaddr=0x6b, 1 bytes, bus /un...@f800/i...@f8001000/i2c-...@0 low_i2c:kw_handle_interrupt(state_addr, isr: 2) low_i2c:kw_handle_interrupt(state_read, isr: 5) low_i2c:xfer() chan=0, addrdir=0x5c, mode=3, subsize=1, subaddr=0x6b, 1 bytes, bus /un...@f800/i...@f8001000/i2c-...@0 low_i2c:kw_handle_interrupt(state_addr, isr: 6) low_i2c:KW: NAK on address low_i2c:xfer error -6 i2c-adapter i2c-7: I2C transfer at 0x2e failed, size 2, err -6 low_i2c:xfer() chan=0, addrdir=0x5d, mode=4, subsize=1, subaddr=0x6a, 1 bytes, bus /un...@f800/i...@f8001000/i2c-...@0 low_i2c:kw_handle_interrupt(state_addr, isr: 2) low_i2c:kw_handle_interrupt(state_read, isr: 1) low_i2c:kw_handle_interrupt(state_stop, isr: 4) low_i2c:xfer() chan=0, addrdir=0x5c, mode=3, subsize=1, subaddr=0x6a, 1 bytes, bus /un...@f800/i...@f8001000/i2c-...@0 ieee1394: Host added: ID:BUS[0-00:1023] GUID[001124fffed61a88] low_i2c:kw_handle_interrupt(state_addr, isr: 6) low_i2c:KW: NAK on address low_i2c:xfer error -6 i2c-adapter i2c-7: I2C transfer at 0x2e failed, size 2, err -6 low_i2c:xfer() chan=0, addrdir=0x5d, mode=4, subsize=1, subaddr=0x6c, 1 bytes, bus /un...@f800/i...@f8001000/i2c-...@0 low_i2c:kw_handle_interrupt(state_addr, isr: 2) low_i2c:kw_handle_interrupt(state_read, isr: 5) low_i2c:xfer() chan=0, addrdir=0x5c, mode=3, subsize=1, subaddr=0x6c, 1 bytes, bus /un...@f800/i...@f8001000/i2c-...@0 low_i2c:kw_handle_interrupt(state_addr, isr: 2) low_i2c:kw_handle_interrupt(state_write, isr: 1) low_i2c:kw_handle_interrupt(state_stop, isr: 4) adt746x: Lowering max temperatures from 81, 80, 87 to 70, 50, 70 low_i2c:xfer() chan=0, addrdir=0x5d, mode=4, subsize=1, subaddr=0x5c, 1 bytes, bus /un...@f800/i...@f8001000/i2c-...@0 eth0: Link is up at 1000 Mbps, full-duplex. low_i2c:kw_handle_interrupt(state_addr, isr: 6) low_i2c:KW: NAK on address low_i2c:xfer error -6 i2c-adapter i2c-7: I2C transfer at 0x2e failed, size 2, err -6 low_i2c:xfer() chan=0, addrdir=0x5c, mode=3, subsize=1, subaddr=0x5c, 1 bytes, bus /un...@f800/i...@f8001000/i2c-...@0 low_i2c:kw_handle_interrupt(state_addr, isr: 6) low_i2c:KW: NAK on address low_i2c:xfer error -6 i2c-adapter i2c-7: I2C transfer at 0x2e failed, size 2, err -6 low_i2c:xfer() chan=0, addrdir=0x5c, mode=3, subsize=1, subaddr=0x30, 1 bytes, bus /un...@f800/i...@f8001000/i2c-...@0 low_i2c:kw_handle_interrupt(state_addr, isr: 2) low_i2c:kw_handle_interrupt(state_write, isr: 1) low_i2c:kw_handle_interrupt(state_stop, isr: 4) PowerMac i2c bus uni-n 0 registered As you can see there are 4 errors, but the config register read doesn't fail so this is considered a success. Ever heard of this problem? One very interesting thing I've noticed is that therm_adt746x register access _after_ the initialization works reliably. Errors only happen in probe_thermostat(). This makes me suspect that the problem is either a low level initialization happening too late, or another initialization step happening in parallel and interfering with probe_thermostat(). Tim found evidences in older boot logs that the problem isn't new and was already present back in kernel 2.6.24 at least. Any idea what the problem can be and/or how to debug it further? -- Jean Delvare ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: i2c-powermac fails
On Tue, 2009-10-13 at 11:23 +0200, Jean Delvare wrote: Hi Ben, Paul, I had a report by Tim Shepard (Cc'd) that the therm_adt746x driver sometimes fails to initialize on his PowerBook G4 running kernel 2.6.31. The following error message can be seen in the logs when the failure happens: therm_adt746x 7-002e: Thermostat failed to read config! After enabling low-level i2c debugging, it turns out that the problem is caused by low-level errors at the I2C bus level: Nothing comes to mind immediately, but I'll have another look tomorrow. Maybe we are configuring the i2c bus too fast ? Another possibility would be that the device needs some retries ... Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: i2c-powermac fails
On Tue, 13 Oct 2009 20:32:28 +1100, Benjamin Herrenschmidt wrote: On Tue, 2009-10-13 at 11:23 +0200, Jean Delvare wrote: Hi Ben, Paul, I had a report by Tim Shepard (Cc'd) that the therm_adt746x driver sometimes fails to initialize on his PowerBook G4 running kernel 2.6.31. The following error message can be seen in the logs when the failure happens: therm_adt746x 7-002e: Thermostat failed to read config! After enabling low-level i2c debugging, it turns out that the problem is caused by low-level errors at the I2C bus level: Nothing comes to mind immediately, but I'll have another look tomorrow. Maybe we are configuring the i2c bus too fast ? Another possibility would be that the device needs some retries ... I guess that retrying would work around the problem, yes. But I do not think this is the proper solution. If retries were needed, they would be needed all the time, not just at initialization time. And as I said, the SMBus specification says that devices have to always ack their slave address (they can always delay the transaction later if they need more time) so I am reasonably certain that the ADT7467 does ack his address always. If it seems otherwise, this suggests that either the message was not properly sent on the bus (so the ADT7467 did not have anything to ack), or the ADT7467's ack went on the bus but the I2C master didn't see it. I2C bus being setup too fast sounds more likely. It might be worth adding an arbitrary delay after initialization, just to see if it helps. Not sure where though, as I'm not familiar with the Powermac initialization steps. Maybe right before i2c_add_adapter() in i2c_powermac_probe? -- Jean Delvare ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [U-Boot] Linux seamless booting
On Mon, 2009-10-12 at 15:54 +0200, Fortini Matteo wrote: Yes, that's what we're currently using, but the problem is a little broader: I should answer to CAN messages in at most 100-200ms from powerup, and that can be done in u-boot. if you are in that interval you definitely need to go to a more exotic start sequence than usual. one solution would be to do as you suggest and do a special driver that is living outside of the kernel during startup. you still need to hack into the interrupt code to let your external driver handle the CAN. then you need to hack up the ordinary driver to take over from yours. I have not seen this solution on any project I worked on but should be doable. optimizing the boot time of linux so it starts up in 200ms is probably going to be quite hard. I did 2 seconds to /sbin/init started from ide driver without to much trouble. removing the IDE and going to a root on NOR would probably get closer to 1.5 but to get down to 200ms would probably mean removing most of u-boot and only keep the dram setup then you probably need to remove most of the drivers from the kernel and load them later as modules. I have never really tried to do a insane fast boot like this so I'm not sure what problems you will run up against. but maybe it's possible. but 200ms feels a bit to optimistic. However, handing CAN transmission control over to Linux is quite complicated nowadays, since it would involve passing structures in memory and hacking through device init. It'd be nice to have a framework with which u-boot could hand-over devices to Linux in a clean and defined way. not likely to happen as a generic solution. Much better to just remove the boat loader then and work on optimizing the linux startup code. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch] powerpc: Fix memory leak in axon_msi.c
On Tuesday 13 October 2009, Michael Ellerman wrote: cppcheck found a memory leak in axon_msi, if dcr_base or dcr_len are zero, we have already allocated msic, so we should free it in the error path. Signed-off-by: Eric Sesterhenn eric.sesterh...@lsexperts.de Acked-by: Michael Ellerman mich...@ellerman.id.au Acked-by: Arnd Bergmann a...@arndb.de ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2][v2] mm: add notifier in pageblock isolation for balloon drivers
On Fri, Oct 09, 2009 at 21:43:26 +0100, Mal Gorman wrote: As you have tested this recently, would you be willing to post the results? While it's not a requirement of the patch, it would be nice to have an idea of how the effectiveness of memory hot-remove is improved when used with the powerpc balloon. This might convince others developers for balloons to register with the notifier. I did ten test runs without my patches and ten test runs with my patches on a 2.6.32-rc3 kernel. Without the patch: 6 out of 10 memory-remove operations without the patch removed 1 LMB (64Mb), the rest of the memory-remove attempts failed to remove any LMBs. With the patch: All of the memory-remove operations removed some LMBs. The average removed was just over 11 LMBs (704Mb) per attempt. Linux was given 2Gb of memory. During the test runs the average memory in use was 140Mb, not including cache and buffers, and the average amount consumed by the balloon was 1217Mb. The system was idle while the memory remove operation was performed. After each attempt the system was rebooted and allowed ~10 minutes to settle after boot. With a 2Gb configuration on POWER the LMB size is 64Mb. The drmgr command (part of powerpc-utils) was used to remove memory by LBM, just as an end-user would. Below is a list of the runs and the number of LMBs removed. Stock kernel (v2.6.32-rc3) -- LMBsUsed kb Loaned kb removed 0 135232 1257280 0 151168 1231744 1 152128 1234176 1 150976 1239232 1 151808 1232064 0 136064 1249152 0 137088 1246976 1 135296 1289984 1 136384 1263104 1 152960 1243904 === 0.60143910 1248762 Average 0.49 792916960 StdDev Patched kernel -- LMBsUsed kb Loaned kb removed 12 134336 1294336 10 152192 1250432 9 152832 1235520 15 153152 1237952 12 152320 1232704 13 135360 1252224 11 154176 1237056 10 153920 1243264 10 150720 1236416 13 151040 1230848 === 11.50 149005 1245075 Average 1.75 715817738 StdDev Regards, Robert Jennings ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
From: Tim Abbott tabb...@ksplice.com
There is already an architecture-independent __page_aligned_data macro for this purpose, so removing the powerpc-specific macro should be harmless. Signed-off-by: Tim Abbott tabb...@ksplice.com Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org Cc: linuxppc-...@ozlabs.org Cc: Sam Ravnborg s...@ravnborg.org --- arch/powerpc/include/asm/page_64.h |8 1 files changed, 0 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/page_64.h b/arch/powerpc/include/asm/page_64.h index 3f17b83..3c7118f 100644 --- a/arch/powerpc/include/asm/page_64.h +++ b/arch/powerpc/include/asm/page_64.h @@ -162,14 +162,6 @@ do { \ #endif /* !CONFIG_HUGETLB_PAGE */ -#ifdef MODULE -#define __page_aligned __attribute__((__aligned__(PAGE_SIZE))) -#else -#define __page_aligned \ - __attribute__((__aligned__(PAGE_SIZE), \ - __section__(.data.page_aligned))) -#endif - #define VM_DATA_DEFAULT_FLAGS \ (test_thread_flag(TIF_32BIT) ? \ VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) -- 1.6.4.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: From: Tim Abbott tabb...@ksplice.com
Well, I think I just found a bug in git-send-email. I'll resend with the actual subject line. -Tim Abbott On Tue, 13 Oct 2009, Tim Abbott wrote: There is already an architecture-independent __page_aligned_data macro for this purpose, so removing the powerpc-specific macro should be harmless. Signed-off-by: Tim Abbott tabb...@ksplice.com Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org Cc: linuxppc-...@ozlabs.org Cc: Sam Ravnborg s...@ravnborg.org --- arch/powerpc/include/asm/page_64.h |8 1 files changed, 0 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/page_64.h b/arch/powerpc/include/asm/page_64.h index 3f17b83..3c7118f 100644 --- a/arch/powerpc/include/asm/page_64.h +++ b/arch/powerpc/include/asm/page_64.h @@ -162,14 +162,6 @@ do { \ #endif /* !CONFIG_HUGETLB_PAGE */ -#ifdef MODULE -#define __page_aligned __attribute__((__aligned__(PAGE_SIZE))) -#else -#define __page_aligned \ - __attribute__((__aligned__(PAGE_SIZE), \ - __section__(.data.page_aligned))) -#endif - #define VM_DATA_DEFAULT_FLAGS \ (test_thread_flag(TIF_32BIT) ? \ VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) -- 1.6.4.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Cbe-oss-dev] [PATCH] spufs: Fix test in spufs_switch_log_read()
On Tuesday 13 October 2009, Jeremy Kerr wrote: Or can this test be removed? I'd prefer just to remove the test. Yes, sounds good. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] gianfar: Add support for hibernation
On Oct 13, 2009, at 1:57 AM, David Miller wrote: From: Anton Vorontsov avoront...@ru.mvista.com Date: Mon, 12 Oct 2009 20:00:00 +0400 Here are few patches that add support for hibernation for gianfar driver. Technically, we could just do gfar_close() and then gfar_enet_open() sequence to restore gianfar functionality after hibernation, but close/open does so many unneeded things (e.g. BDs buffers freeing and allocation, IRQ freeing and requesting), that I felt it would be much better to cleanup and refactor some code to make the hibernation [and not only hibernation] code a little bit prettier. I applied all of this, it's a really nice patch set. If there are any problems we can deal with it using follow-on fixups. I noticed something, in patch #3 where you remove the spurious wrap bit setting in startup_gfar(). It looks like that was not only spurious but it was doing it wrong too. It's writing garbage into the status word, because it's not using the BD_LFLAG() macro to shift the value up 16 bits. No, it was fine (though made unnecessary by other patches). The BD has a union: struct { u16 status; /* Status Fields */ u16 length; /* Buffer length */ }; u32 lstatus; so when you write lstatus, you need to use the BD_LFLAG() macro, but when you write status, you are just setting the status bits. Andy ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/5 v3] dynamic logical partitioning infrastructure
This patch provides the kernel DLPAR infrastructure in a new filed named dlpar.c. The functionality provided is for acquiring and releasing a resource from firmware and the parsing of information returned from the ibm,configure-connector rtas call. Additionally, this exports the pSeries reconfiguration notifier chain so that it can be invoked when device tree updates are made. Updated to remove an extraneous of_node_put() in the removal of a device tree node path. Signed-off-by: Nathan Fontenot nf...@austin.ibm.com --- Index: powerpc/arch/powerpc/platforms/pseries/dlpar.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ powerpc/arch/powerpc/platforms/pseries/dlpar.c 2009-10-08 11:08:42.0 -0500 @@ -0,0 +1,414 @@ +/* + * dlpar.c - support for dynamic reconfiguration (including PCI + * Hotplug and Dynamic Logical Partitioning on RPA platforms). + * + * Copyright (C) 2009 Nathan Fontenot + * Copyright (C) 2009 IBM Corporation + * + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version + * 2 as published by the Free Software Foundation. + */ + +#include linux/kernel.h +#include linux/kref.h +#include linux/notifier.h +#include linux/proc_fs.h +#include linux/spinlock.h + +#include asm/prom.h +#include asm/machdep.h +#include asm/uaccess.h +#include asm/rtas.h +#include asm/pSeries_reconfig.h + +#define CFG_CONN_WORK_SIZE 4096 +static char workarea[CFG_CONN_WORK_SIZE]; +static DEFINE_SPINLOCK(workarea_lock); + +struct cc_workarea { + u32 drc_index; + u32 zero; + u32 name_offset; + u32 prop_length; + u32 prop_offset; +}; + +static struct property *parse_cc_property(char *workarea) +{ + struct property *prop; + struct cc_workarea *ccwa; + char *name; + char *value; + + prop = kzalloc(sizeof(*prop), GFP_KERNEL); + if (!prop) + return NULL; + + ccwa = (struct cc_workarea *)workarea; + name = workarea + ccwa-name_offset; + prop-name = kzalloc(strlen(name) + 1, GFP_KERNEL); + if (!prop-name) { + kfree(prop); + return NULL; + } + + strcpy(prop-name, name); + + prop-length = ccwa-prop_length; + value = workarea + ccwa-prop_offset; + prop-value = kzalloc(prop-length, GFP_KERNEL); + if (!prop-value) { + kfree(prop-name); + kfree(prop); + return NULL; + } + + memcpy(prop-value, value, prop-length); + return prop; +} + +static void free_property(struct property *prop) +{ + kfree(prop-name); + kfree(prop-value); + kfree(prop); +} + +static struct device_node *parse_cc_node(char *work_area) +{ + struct device_node *dn; + struct cc_workarea *ccwa; + char *name; + + dn = kzalloc(sizeof(*dn), GFP_KERNEL); + if (!dn) + return NULL; + + ccwa = (struct cc_workarea *)work_area; + name = work_area + ccwa-name_offset; + dn-full_name = kzalloc(strlen(name) + 1, GFP_KERNEL); + if (!dn-full_name) { + kfree(dn); + return NULL; + } + + strcpy(dn-full_name, name); + return dn; +} + +static void free_one_cc_node(struct device_node *dn) +{ + struct property *prop; + + while (dn-properties) { + prop = dn-properties; + dn-properties = prop-next; + free_property(prop); + } + + kfree(dn-full_name); + kfree(dn); +} + +static void free_cc_nodes(struct device_node *dn) +{ + if (dn-child) + free_cc_nodes(dn-child); + + if (dn-sibling) + free_cc_nodes(dn-sibling); + + free_one_cc_node(dn); +} + +#define NEXT_SIBLING1 +#define NEXT_CHILD 2 +#define NEXT_PROPERTY 3 +#define PREV_PARENT 4 +#define MORE_MEMORY 5 +#define CALL_AGAIN -2 +#define ERR_CFG_USE -9003 + +struct device_node *configure_connector(u32 drc_index) +{ + struct device_node *dn; + struct device_node *first_dn = NULL; + struct device_node *last_dn = NULL; + struct property *property; + struct property *last_property = NULL; + struct cc_workarea *ccwa; + int cc_token; + int rc; + + cc_token = rtas_token(ibm,configure-connector); + if (cc_token == RTAS_UNKNOWN_SERVICE) + return NULL; + + spin_lock(workarea_lock); + + ccwa = (struct cc_workarea *)workarea[0]; + ccwa-drc_index = drc_index; + ccwa-zero = 0; + + rc = rtas_call(cc_token, 2, 1, NULL, workarea, NULL); + while (rc) { + switch (rc) { + case NEXT_SIBLING: + dn = parse_cc_node(workarea); + if (!dn) + goto cc_error; + +
Re: [PATCH 4/5 v3] kernel handling of memory DLPAR
This adds the capability to DLPAR add and remove memory from the kernel. The patch extends the powerpc handling of memory_add_physaddr_to_nid(), which is called from the sysfs memory 'probe' file to first ensure that the memory has been added to the system. This is done by creating a platform specific callout from the routine. The pseries implementation of this handles the DLPAR work to add the memory to the system and update the device tree. The patch also creates a pseries only 'release' sys file, /sys/devices/system/memory/release. This file handles the DLPAR release of memory back to firmware and updating of the device-tree. Updated to add #ifdef CONFIG_MEMORY_HOTPLUG around the memory hotplug specific updates. This allows the file to be built without CONFIG_MEMORY_HOTPLUG defined. Signed-off-by: Nathan Fontenot nf...@austin.ibm.com --- Index: powerpc/arch/powerpc/platforms/pseries/dlpar.c === --- powerpc.orig/arch/powerpc/platforms/pseries/dlpar.c 2009-10-08 11:08:42.0 -0500 +++ powerpc/arch/powerpc/platforms/pseries/dlpar.c 2009-10-13 13:08:22.0 -0500 @@ -16,6 +16,10 @@ #include linux/notifier.h #include linux/proc_fs.h #include linux/spinlock.h +#include linux/memory_hotplug.h +#include linux/sysdev.h +#include linux/sysfs.h + #include asm/prom.h #include asm/machdep.h @@ -404,11 +408,165 @@ return 0; } +#ifdef CONFIG_MEMORY_HOTPLUG + +static struct property *clone_property(struct property *old_prop) +{ + struct property *new_prop; + + new_prop = kzalloc((sizeof *new_prop), GFP_KERNEL); + if (!new_prop) + return NULL; + + new_prop-name = kzalloc(strlen(old_prop-name) + 1, GFP_KERNEL); + new_prop-value = kzalloc(old_prop-length + 1, GFP_KERNEL); + if (!new_prop-name || !new_prop-value) { + free_property(new_prop); + return NULL; + } + + strcpy(new_prop-name, old_prop-name); + memcpy(new_prop-value, old_prop-value, old_prop-length); + new_prop-length = old_prop-length; + + return new_prop; +} + +int platform_probe_memory(u64 phys_addr) +{ + struct device_node *dn; + struct property *new_prop, *old_prop; + struct property *lmb_sz_prop; + struct of_drconf_cell *drmem; + u64 lmb_size; + int num_entries, i, rc; + + if (!phys_addr) + return -EINVAL; + + dn = of_find_node_by_path(/ibm,dynamic-reconfiguration-memory); + if (!dn) + return -EINVAL; + + lmb_sz_prop = of_find_property(dn, ibm,lmb-size, NULL); + lmb_size = *(u64 *)lmb_sz_prop-value; + + old_prop = of_find_property(dn, ibm,dynamic-memory, NULL); + + num_entries = *(u32 *)old_prop-value; + drmem = (struct of_drconf_cell *) + ((char *)old_prop-value + sizeof(u32)); + + for (i = 0; i num_entries; i++) { + u64 lmb_end_addr = drmem[i].base_addr + lmb_size; + if (phys_addr = drmem[i].base_addr +phys_addr lmb_end_addr) + break; + } + + if (i = num_entries) { + of_node_put(dn); + return -EINVAL; + } + + if (drmem[i].flags DRCONF_MEM_ASSIGNED) { + of_node_put(dn); + return 0; + } + + rc = acquire_drc(drmem[i].drc_index); + if (rc) { + of_node_put(dn); + return -1; + } + + new_prop = clone_property(old_prop); + drmem = (struct of_drconf_cell *) + ((char *)new_prop-value + sizeof(u32)); + + drmem[i].flags |= DRCONF_MEM_ASSIGNED; + prom_update_property(dn, new_prop, old_prop); + + rc = blocking_notifier_call_chain(pSeries_reconfig_chain, + PSERIES_DRCONF_MEM_ADD, + drmem[i].base_addr); + if (rc == NOTIFY_BAD) { + prom_update_property(dn, old_prop, new_prop); + release_drc(drmem[i].drc_index); + } + + of_node_put(dn); + return rc == NOTIFY_BAD ? -1 : 0; +} + +static ssize_t memory_release_store(struct class *class, const char *buf, + size_t count) +{ + unsigned long drc_index; + struct device_node *dn; + struct property *new_prop, *old_prop; + struct of_drconf_cell *drmem; + int num_entries; + int i, rc; + + rc = strict_strtoul(buf, 0, drc_index); + if (rc) + return -EINVAL; + + dn = of_find_node_by_path(/ibm,dynamic-reconfiguration-memory); + if (!dn) + return 0; + + old_prop = of_find_property(dn, ibm,dynamic-memory, NULL); + new_prop = clone_property(old_prop); + + num_entries = *(u32 *)new_prop-value; + drmem = (struct of_drconf_cell *) +
Re: [PATCH 5/5 v2] kernel handling of CPU DLPAR
This adds the capability to DLPAR add and remove CPUs from the kernel. The creates two new files /sys/devices/system/cpu/probe and /sys/devices/system/cpu/release to handle the DLPAR addition and removal of CPUs respectively. CPU DLPAR add is accomplished by writing the drc-index of the CPU to the probe file, and removal is done by writing the device-tree path of the cpu to the release file. Updated to include #ifdef CONFIG_HOTPLUG_CPU around the cpu hotplug specific bits so that it will build without CONFIG_HOTPLUG_CPU defined. Signed-off-by: Nathan Fontenot nf...@austin.ibm.com --- Index: powerpc/arch/powerpc/platforms/pseries/dlpar.c === --- powerpc.orig/arch/powerpc/platforms/pseries/dlpar.c 2009-10-13 13:08:22.0 -0500 +++ powerpc/arch/powerpc/platforms/pseries/dlpar.c 2009-10-13 13:09:00.0 -0500 @@ -1,11 +1,11 @@ /* - * dlpar.c - support for dynamic reconfiguration (including PCI - * Hotplug and Dynamic Logical Partitioning on RPA platforms). + * dlpar.c - support for dynamic reconfiguration (including PCI, + * Memory, and CPU Hotplug and Dynamic Logical Partitioning on + * PAPR platforms). * * Copyright (C) 2009 Nathan Fontenot * Copyright (C) 2009 IBM Corporation * - * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version * 2 as published by the Free Software Foundation. @@ -19,6 +19,7 @@ #include linux/memory_hotplug.h #include linux/sysdev.h #include linux/sysfs.h +#include linux/cpu.h #include asm/prom.h @@ -408,6 +409,82 @@ return 0; } +#ifdef CONFIG_HOTPLUG_CPU +static ssize_t cpu_probe_store(struct class *class, const char *buf, + size_t count) +{ + struct device_node *dn; + unsigned long drc_index; + char *cpu_name; + int rc; + + rc = strict_strtoul(buf, 0, drc_index); + if (rc) + return -EINVAL; + + rc = acquire_drc(drc_index); + if (rc) + return rc; + + dn = configure_connector(drc_index); + if (!dn) { + release_drc(drc_index); + return rc; + } + + /* fixup dn name */ + cpu_name = kzalloc(strlen(dn-full_name) + strlen(/cpus/) + 1, + GFP_KERNEL); + if (!cpu_name) { + free_cc_nodes(dn); + release_drc(drc_index); + return -ENOMEM; + } + + sprintf(cpu_name, /cpus/%s, dn-full_name); + kfree(dn-full_name); + dn-full_name = cpu_name; + + rc = add_device_tree_nodes(dn); + if (rc) + release_drc(drc_index); + + return rc ? rc : count; +} + +static ssize_t cpu_release_store(struct class *class, const char *buf, +size_t count) +{ + struct device_node *dn; + u32 *drc_index; + int rc; + + dn = of_find_node_by_path(buf); + if (!dn) + return -EINVAL; + + drc_index = (u32 *)of_get_property(dn, ibm,my-drc-index, NULL); + if (!drc_index) { + of_node_put(dn); + return -EINVAL; + } + + rc = release_drc(*drc_index); + if (rc) { + of_node_put(dn); + return rc; + } + + rc = remove_device_tree_nodes(dn); + if (rc) + acquire_drc(*drc_index); + + of_node_put(dn); + return rc ? rc : count; +} + +#endif /* CONFIG_HOTPLUG_CPU */ + #ifdef CONFIG_MEMORY_HOTPLUG static struct property *clone_property(struct property *old_prop) @@ -553,6 +630,13 @@ static struct class_attribute class_attr_mem_release = __ATTR(release, S_IWUSR, NULL, memory_release_store); +#endif /* CONFIG_MEMORY_HOTPLUG */ + +#ifdef CONFIG_HOTPLUG_CPU +static struct class_attribute class_attr_cpu_probe = + __ATTR(probe, S_IWUSR, NULL, cpu_probe_store); +static struct class_attribute class_attr_cpu_release = + __ATTR(release, S_IWUSR, NULL, cpu_release_store); #endif static int pseries_dlpar_init(void) @@ -567,6 +651,18 @@ release file\n); #endif +#ifdef CONFIG_HOTPLUG_CPU + if (sysfs_create_file(cpu_sysdev_class.kset.kobj, + class_attr_cpu_probe.attr)) + printk(KERN_INFO DLPAR: Could not create sysfs cpu + probe file\n); + + if (sysfs_create_file(cpu_sysdev_class.kset.kobj, + class_attr_cpu_release.attr)) + printk(KERN_INFO DLPAR: Could not create sysfs cpu + release file\n); +#endif + return 0; } device_initcall(pseries_dlpar_init); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] gianfar: Add support for hibernation
From: Andy Fleming aflem...@freescale.com Date: Tue, 13 Oct 2009 12:22:38 -0500 No, it was fine (though made unnecessary by other patches). The BD has a union: struct { u16 status; /* Status Fields */ u16 length; /* Buffer length */ }; u32 lstatus; so when you write lstatus, you need to use the BD_LFLAG() macro, but when you write status, you are just setting the status bits. Indeed I missed that, thanks. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/8] bitmap: Introduce bitmap_set, bitmap_clear, bitmap_find_next_zero_area
On Tue, 2009-10-13 at 18:10 +0900, Akinobu Mita wrote: My user space testing exposed off-by-one error find_next_zero_area in iommu-helper. Why not merge those tests into the kernel as a configurable boot-time self-test? cheers signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 5/5 v2] kernel handling of CPU DLPAR
On Tue, 2009-10-13 at 13:14 -0500, Nathan Fontenot wrote: This adds the capability to DLPAR add and remove CPUs from the kernel. The creates two new files /sys/devices/system/cpu/probe and /sys/devices/system/cpu/release to handle the DLPAR addition and removal of CPUs respectively. How does this relate to the existing cpu hotplug mechanism? Or is this making the cpu exist (possible), vs marking it as online? Is some other platform going to want to do the same? ie. should the probe/release part be in generic code? Index: powerpc/arch/powerpc/platforms/pseries/dlpar.c === --- powerpc.orig/arch/powerpc/platforms/pseries/dlpar.c 2009-10-13 13:08:22.0 -0500 +++ powerpc/arch/powerpc/platforms/pseries/dlpar.c2009-10-13 13:09:00.0 -0500 @@ -1,11 +1,11 @@ /* - * dlpar.c - support for dynamic reconfiguration (including PCI - * Hotplug and Dynamic Logical Partitioning on RPA platforms). + * dlpar.c - support for dynamic reconfiguration (including PCI, We know it's dlpar.c :) + * Memory, and CPU Hotplug and Dynamic Logical Partitioning on + * PAPR platforms). * * Copyright (C) 2009 Nathan Fontenot * Copyright (C) 2009 IBM Corporation * - * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version * 2 as published by the Free Software Foundation. @@ -19,6 +19,7 @@ #include linux/memory_hotplug.h #include linux/sysdev.h #include linux/sysfs.h +#include linux/cpu.h #include asm/prom.h @@ -408,6 +409,82 @@ return 0; } +#ifdef CONFIG_HOTPLUG_CPU +static ssize_t cpu_probe_store(struct class *class, const char *buf, +size_t count) +{ + struct device_node *dn; + unsigned long drc_index; + char *cpu_name; + int rc; + + rc = strict_strtoul(buf, 0, drc_index); + if (rc) + return -EINVAL; + + rc = acquire_drc(drc_index); + if (rc) + return rc; + + dn = configure_connector(drc_index); + if (!dn) { + release_drc(drc_index); + return rc; + } + + /* fixup dn name */ + cpu_name = kzalloc(strlen(dn-full_name) + strlen(/cpus/) + 1, +GFP_KERNEL); + if (!cpu_name) { + free_cc_nodes(dn); + release_drc(drc_index); + return -ENOMEM; + } + + sprintf(cpu_name, /cpus/%s, dn-full_name); + kfree(dn-full_name); + dn-full_name = cpu_name; What was all that? Firmware gives us a bogus full name? But the parent is right? + rc = add_device_tree_nodes(dn); + if (rc) + release_drc(drc_index); + + return rc ? rc : count; You're sure rc is 0. +} + +static ssize_t cpu_release_store(struct class *class, const char *buf, + size_t count) +{ + struct device_node *dn; + u32 *drc_index; + int rc; + + dn = of_find_node_by_path(buf); + if (!dn) + return -EINVAL; + + drc_index = (u32 *)of_get_property(dn, ibm,my-drc-index, NULL); No cast required. + if (!drc_index) { + of_node_put(dn); + return -EINVAL; + } + + rc = release_drc(*drc_index); + if (rc) { + of_node_put(dn); + return rc; + } + + rc = remove_device_tree_nodes(dn); + if (rc) + acquire_drc(*drc_index); + + of_node_put(dn); + return rc ? rc : count; +} cheers signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 4/5 v3] kernel handling of memory DLPAR
On Tue, 2009-10-13 at 13:13 -0500, Nathan Fontenot wrote: This adds the capability to DLPAR add and remove memory from the kernel. The Hi Nathan, Sorry to only get around to reviewing version 3, time is a commodity in short supply :) Index: powerpc/arch/powerpc/platforms/pseries/dlpar.c === --- powerpc.orig/arch/powerpc/platforms/pseries/dlpar.c 2009-10-08 11:08:42.0 -0500 +++ powerpc/arch/powerpc/platforms/pseries/dlpar.c2009-10-13 13:08:22.0 -0500 @@ -16,6 +16,10 @@ #include linux/notifier.h #include linux/proc_fs.h #include linux/spinlock.h +#include linux/memory_hotplug.h +#include linux/sysdev.h +#include linux/sysfs.h + #include asm/prom.h #include asm/machdep.h @@ -404,11 +408,165 @@ return 0; } +#ifdef CONFIG_MEMORY_HOTPLUG + +static struct property *clone_property(struct property *old_prop) +{ + struct property *new_prop; + + new_prop = kzalloc((sizeof *new_prop), GFP_KERNEL); + if (!new_prop) + return NULL; + + new_prop-name = kzalloc(strlen(old_prop-name) + 1, GFP_KERNEL); kstrdup()? + new_prop-value = kzalloc(old_prop-length + 1, GFP_KERNEL); + if (!new_prop-name || !new_prop-value) { + free_property(new_prop); + return NULL; + } + + strcpy(new_prop-name, old_prop-name); + memcpy(new_prop-value, old_prop-value, old_prop-length); + new_prop-length = old_prop-length; + + return new_prop; +} + +int platform_probe_memory(u64 phys_addr) +{ + struct device_node *dn; + struct property *new_prop, *old_prop; + struct property *lmb_sz_prop; + struct of_drconf_cell *drmem; + u64 lmb_size; + int num_entries, i, rc; + + if (!phys_addr) + return -EINVAL; + + dn = of_find_node_by_path(/ibm,dynamic-reconfiguration-memory); + if (!dn) + return -EINVAL; + + lmb_sz_prop = of_find_property(dn, ibm,lmb-size, NULL); + lmb_size = *(u64 *)lmb_sz_prop-value; of_get_property() ? + + old_prop = of_find_property(dn, ibm,dynamic-memory, NULL); I know we should never fail to find these properties, but it would be nice to check just in case. + + num_entries = *(u32 *)old_prop-value; + drmem = (struct of_drconf_cell *) + ((char *)old_prop-value + sizeof(u32)); You do this dance twice (see below), a struct might make it cleaner. + for (i = 0; i num_entries; i++) { + u64 lmb_end_addr = drmem[i].base_addr + lmb_size; + if (phys_addr = drmem[i].base_addr + phys_addr lmb_end_addr) + break; + } + + if (i = num_entries) { + of_node_put(dn); + return -EINVAL; + } + + if (drmem[i].flags DRCONF_MEM_ASSIGNED) { + of_node_put(dn); + return 0; This is the already added case? + } + + rc = acquire_drc(drmem[i].drc_index); + if (rc) { + of_node_put(dn); + return -1; -1 ? + } + + new_prop = clone_property(old_prop); + drmem = (struct of_drconf_cell *) + ((char *)new_prop-value + sizeof(u32)); + + drmem[i].flags |= DRCONF_MEM_ASSIGNED; + prom_update_property(dn, new_prop, old_prop); + + rc = blocking_notifier_call_chain(pSeries_reconfig_chain, + PSERIES_DRCONF_MEM_ADD, + drmem[i].base_addr); + if (rc == NOTIFY_BAD) { + prom_update_property(dn, old_prop, new_prop); + release_drc(drmem[i].drc_index); + } + + of_node_put(dn); + return rc == NOTIFY_BAD ? -1 : 0; -1 ? +} + +static ssize_t memory_release_store(struct class *class, const char *buf, + size_t count) +{ + unsigned long drc_index; + struct device_node *dn; + struct property *new_prop, *old_prop; + struct of_drconf_cell *drmem; + int num_entries; + int i, rc; + + rc = strict_strtoul(buf, 0, drc_index); + if (rc) + return -EINVAL; + + dn = of_find_node_by_path(/ibm,dynamic-reconfiguration-memory); + if (!dn) + return 0; 0 really? + + old_prop = of_find_property(dn, ibm,dynamic-memory, NULL); + new_prop = clone_property(old_prop); + + num_entries = *(u32 *)new_prop-value; + drmem = (struct of_drconf_cell *) + ((char *)new_prop-value + sizeof(u32)); + + for (i = 0; i num_entries; i++) { + if (drmem[i].drc_index == drc_index) + break; + } + + if (i = num_entries) { + free_property(new_prop); + of_node_put(dn); + return -EINVAL; + } Couldn't use old_prop up until here? They're identical
New percpu ppc64 perfs
Hi Tejun ! So I found (and fixed, though the patch isn't upstream yet) the problem that was causing the new percpu to hang when accessing the top of our vmalloc space. However, I have some concerns about that choice of location for the percpu datas. Basically, our MMU divides the address space into segments (of 256M or 1T depending on your processor capabilities) and those segments are SW loaded into a relatively small (64 entries) SLB buffer. Thus, by moving the per-cpu to the end of the vmalloc space, you essentially make it use a different segment from the rest of the vmalloc space, which will overall degrade performances by increasing pressure on the SLB. It would be nicer if we could provide an arch function to provide a preferred location for the per-cpu data. I can easily cook up a patch but wanted to discuss that with you first. Any reason why we would keep it within vmalloc space for example ? IE. I could move VMALLOC_END to below the per-cpu reserved areas, or are they subject to expansion past boot time ? Also, how big can they be ? Ie, will the top of the first 256M segment good enough or that will risk blowing out of space ? In general, machines with 256M segments won't have more than 64 or maybe 128 CPUs I believe. Bigger machines will have CPUs that support 1T segments. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: New percpu ppc64 perfs
Hello, Benjamin. Benjamin Herrenschmidt wrote: So I found (and fixed, though the patch isn't upstream yet) the problem that was causing the new percpu to hang when accessing the top of our vmalloc space. However, I have some concerns about that choice of location for the percpu datas. Basically, our MMU divides the address space into segments (of 256M or 1T depending on your processor capabilities) and those segments are SW loaded into a relatively small (64 entries) SLB buffer. Thus, by moving the per-cpu to the end of the vmalloc space, you essentially make it use a different segment from the rest of the vmalloc space, which will overall degrade performances by increasing pressure on the SLB. It would be nicer if we could provide an arch function to provide a preferred location for the per-cpu data. I can easily cook up a patch but wanted to discuss that with you first. Any reason why we would keep it within vmalloc space for example ? IE. I could move VMALLOC_END to below the per-cpu reserved areas, or are they subject to expansion past boot time ? Also, how big can they be ? Ie, will the top of the first 256M segment good enough or that will risk blowing out of space ? In general, machines with 256M segments won't have more than 64 or maybe 128 CPUs I believe. Bigger machines will have CPUs that support 1T segments. Hmm... I don't think 256M segment will be enough. Percpu area layout will follow how numa memory is laidd out. For example, if a machine has 4 nodes (each one with one cpu) and memory for each node is 1G in size and 1G apart, the first chunk will be embedded in the linear mapping area (normal kernel addressable area) and each unit in the chunk will be apart by between 1G and 2G. As the first chunk is embedded in the linear mapped area, this shouldn't cause any extra overhead. The vmalloc area is used when the first chunk is filled and another chunk need to be allocated. From the second chunk on, vmalloc area is used to preserve the layout of the first chunk. ie. Each of them will span across 8G bytes (they will overlap tho, so even with many dynamic chunks vm usage will only be slightly over 8G). The reason why vmalloc area from the top is used is that I didn't want this congruent allocation to compete with normal vmalloc allocations. Depending on the numa layout, competition between linear allocation and congruent allocation may create many unnecessary holes. For 256M segment, I don't think much can be done but for 1T segment, just limiting vmalloc area size to 1T should do the trick, no? Thanks. -- tejun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] Ftrace : fix function_graph tracer OOPS
On Thu, 2009-10-08 at 20:21 +0530, Sachin Sant wrote: Switch to LOAD_REG_ADDR(). Signed-off-by : Sachin Sant sach...@in.ibm.com --- diff -Naurp old/arch/powerpc/kernel/entry_64.S new/arch/powerpc/kernel/entry_64.S --- old/arch/powerpc/kernel/entry_64.S 2009-10-08 18:37:44.0 +0530 +++ new/arch/powerpc/kernel/entry_64.S 2009-10-08 18:34:33.0 +0530 @@ -1038,8 +1038,8 @@ _GLOBAL(mod_return_to_handler) * We are in a module using the module's TOC. * Switch to our TOC to run inside the core kernel. */ - LOAD_REG_IMMEDIATE(r4,ftrace_return_to_handler) - ld r2, 8(r4) + ld r2, PACATOC(r13) + LOAD_REG_ADDR(r4,ftrace_return_to_handler) Actually, the loading of this register is not needed. The original used the loading to get the r2. I actually wrote a fix for this a month ago. I never sent it out because I was distracted by other issues. I'll send out the two patches I had now. Could yo test them? Thanks! -- Steve bl .ftrace_return_to_handler nop ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH -mmotm] Fix bitmap-introduce-bitmap_set-bitmap_clear-bitmap_find_next_zero_area. patch
Update PATCH 2/8 based on review comments by Andrew and bugfix exposed by user space testing. I didn't change argument of align_mask at this time because it turned out that it needs more changes in iommu-helper users. From: Akinobu Mita akinobu.m...@gmail.com Subject: Fix bitmap-introduce-bitmap_set-bitmap_clear-bitmap_find_next_zero_area.patch - Rewrite bitmap_set and bitmap_clear Instead of setting or clearing for each bit. - Fix off-by-one error in bitmap_find_next_zero_area This bug was derived from find_next_zero_area in iommu-helper. - Add kerneldoc for bitmap_find_next_zero_area This patch is supposed to be folded into bitmap-introduce-bitmap_set-bitmap_clear-bitmap_find_next_zero_area.patch Signed-off-by: Akinobu Mita akinobu.m...@gmail.com --- lib/bitmap.c | 60 + 1 files changed, 47 insertions(+), 13 deletions(-) diff --git a/lib/bitmap.c b/lib/bitmap.c index 2415da4..84292c9 100644 --- a/lib/bitmap.c +++ b/lib/bitmap.c @@ -271,28 +271,62 @@ int __bitmap_weight(const unsigned long *bitmap, int bits) } EXPORT_SYMBOL(__bitmap_weight); -void bitmap_set(unsigned long *map, int i, int len) -{ - int end = i + len; +#define BITMAP_FIRST_WORD_MASK(start) (~0UL ((start) % BITS_PER_LONG)) - while (i end) { - __set_bit(i, map); - i++; +void bitmap_set(unsigned long *map, int start, int nr) +{ + unsigned long *p = map + BIT_WORD(start); + const int size = start + nr; + int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG); + unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start); + + while (nr - bits_to_set = 0) { + *p |= mask_to_set; + nr -= bits_to_set; + bits_to_set = BITS_PER_LONG; + mask_to_set = ~0UL; + p++; + } + if (nr) { + mask_to_set = BITMAP_LAST_WORD_MASK(size); + *p |= mask_to_set; } } EXPORT_SYMBOL(bitmap_set); void bitmap_clear(unsigned long *map, int start, int nr) { - int end = start + nr; - - while (start end) { - __clear_bit(start, map); - start++; + unsigned long *p = map + BIT_WORD(start); + const int size = start + nr; + int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG); + unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start); + + while (nr - bits_to_clear = 0) { + *p = ~mask_to_clear; + nr -= bits_to_clear; + bits_to_clear = BITS_PER_LONG; + mask_to_clear = ~0UL; + p++; + } + if (nr) { + mask_to_clear = BITMAP_LAST_WORD_MASK(size); + *p = ~mask_to_clear; } } EXPORT_SYMBOL(bitmap_clear); +/* + * bitmap_find_next_zero_area - find a contiguous aligned zero area + * @map: The address to base the search on + * @size: The bitmap size in bits + * @start: The bitnumber to start searching at + * @nr: The number of zeroed bits we're looking for + * @align_mask: Alignment mask for zero area + * + * The @align_mask should be one less than a power of 2; the effect is that + * the bit offset of all zero areas this function finds is multiples of that + * power of 2. A @align_mask of 0 means no alignment is required. + */ unsigned long bitmap_find_next_zero_area(unsigned long *map, unsigned long size, unsigned long start, @@ -304,10 +338,10 @@ again: index = find_next_zero_bit(map, size, start); /* Align allocation */ - index = (index + align_mask) ~align_mask; + index = __ALIGN_MASK(index, align_mask); end = index + nr; - if (end = size) + if (end size) return end; i = find_next_bit(map, end, index); if (i end) { -- 1.5.4.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: New percpu ppc64 perfs
On Wed, 2009-10-14 at 10:49 +0900, Tejun Heo wrote: For 256M segment, I don't think much can be done but for 1T segment, just limiting vmalloc area size to 1T should do the trick, no? Right. I'll have a look at it. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/8] bitmap: Introduce bitmap_set, bitmap_clear, bitmap_find_next_zero_area
On Wed, Oct 14, 2009 at 08:54:47AM +1100, Michael Ellerman wrote: On Tue, 2009-10-13 at 18:10 +0900, Akinobu Mita wrote: My user space testing exposed off-by-one error find_next_zero_area in iommu-helper. Why not merge those tests into the kernel as a configurable boot-time self-test? I send the test program that I used. Obviously it needs better diagnostic messages and cleanup to be added into kernel tests. #include stdio.h #include time.h #include stdlib.h #include string.h #if 1 /* Copy and paste from kernel source */ #define BITS_PER_BYTE 8 #define BITS_PER_LONG (sizeof(long) * BITS_PER_BYTE) #define BIT_WORD(nr)((nr) / BITS_PER_LONG) #define BITOP_WORD(nr) ((nr) / BITS_PER_LONG) #define BITMAP_LAST_WORD_MASK(nbits)\ ( \ ((nbits) % BITS_PER_LONG) ? \ (1UL((nbits) % BITS_PER_LONG))-1 : ~0UL \ ) #define BITMAP_FIRST_WORD_MASK(start) (~0UL ((start) % BITS_PER_LONG)) void bitmap_set(unsigned long *map, int start, int nr) { unsigned long *p = map + BIT_WORD(start); const int size = start + nr; int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG); unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start); while (nr - bits_to_set = 0) { *p |= mask_to_set; nr -= bits_to_set; bits_to_set = BITS_PER_LONG; mask_to_set = ~0UL; p++; } if (nr) { mask_to_set = BITMAP_LAST_WORD_MASK(size); *p |= mask_to_set; } } void bitmap_clear(unsigned long *map, int start, int nr) { unsigned long *p = map + BIT_WORD(start); const int size = start + nr; int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG); unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start); while (nr - bits_to_clear = 0) { *p = ~mask_to_clear; nr -= bits_to_clear; bits_to_clear = BITS_PER_LONG; mask_to_clear = ~0UL; p++; } if (nr) { mask_to_clear = BITMAP_LAST_WORD_MASK(size); *p = ~mask_to_clear; } } static unsigned long __ffs(unsigned long word) { int num = 0; if ((word 0x) == 0) { num += 16; word = 16; } if ((word 0xff) == 0) { num += 8; word = 8; } if ((word 0xf) == 0) { num += 4; word = 4; } if ((word 0x3) == 0) { num += 2; word = 2; } if ((word 0x1) == 0) num += 1; return num; } unsigned long find_next_bit(const unsigned long *addr, unsigned long size, unsigned long offset) { const unsigned long *p = addr + BITOP_WORD(offset); unsigned long result = offset ~(BITS_PER_LONG-1); unsigned long tmp; if (offset = size) return size; size -= result; offset %= BITS_PER_LONG; if (offset) { tmp = *(p++); tmp = (~0UL offset); if (size BITS_PER_LONG) goto found_first; if (tmp) goto found_middle; size -= BITS_PER_LONG; result += BITS_PER_LONG; } while (size ~(BITS_PER_LONG-1)) { if ((tmp = *(p++))) goto found_middle; result += BITS_PER_LONG; size -= BITS_PER_LONG; } if (!size) return result; tmp = *p; found_first: tmp = (~0UL (BITS_PER_LONG - size)); if (tmp == 0UL) /* Are any bits set? */ return result + size; /* Nope. */ found_middle: return result + __ffs(tmp); } #define ffz(x) __ffs(~(x)) unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size, unsigned long offset) { const unsigned long *p = addr + BITOP_WORD(offset); unsigned long result = offset ~(BITS_PER_LONG-1); unsigned long tmp; if (offset = size) return size; size -= result; offset %= BITS_PER_LONG; if (offset) { tmp = *(p++); tmp |= ~0UL (BITS_PER_LONG - offset); if (size BITS_PER_LONG) goto found_first; if (~tmp) goto found_middle; size -= BITS_PER_LONG; result += BITS_PER_LONG; } while (size ~(BITS_PER_LONG-1)) { if (~(tmp = *(p++))) goto
Re: [RFC PATCH 05/12] of: add common header for flattened device tree representation
On Fri, Oct 09, 2009 at 01:07:57AM -0600, Grant Likely wrote: On Fri, Oct 9, 2009 at 12:35 AM, David Gibson da...@gibson.dropbear.id.au wrote: On Tue, Oct 06, 2009 at 10:30:59PM -0600, Grant Likely wrote: Add a common header file for working with the flattened device tree data structure and merge the shared data tags used by Microblaze and PowerPC Signed-off-by: Grant Likely grant.lik...@secretlab.ca --- arch/microblaze/include/asm/prom.h | 12 +--- arch/powerpc/include/asm/prom.h | 12 +--- include/linux/of_fdt.h | 30 ++ 3 files changed, 32 insertions(+), 22 deletions(-) create mode 100644 include/linux/of_fdt.h diff --git a/arch/microblaze/include/asm/prom.h b/arch/microblaze/include/asm/prom.h index 64e8b3a..5f461f0 100644 --- a/arch/microblaze/include/asm/prom.h +++ b/arch/microblaze/include/asm/prom.h @@ -17,20 +17,10 @@ #ifndef _ASM_MICROBLAZE_PROM_H #define _ASM_MICROBLAZE_PROM_H #ifdef __KERNEL__ - -/* Definitions used by the flattened device tree */ -#define OF_DT_HEADER 0xd00dfeed /* marker */ -#define OF_DT_BEGIN_NODE 0x1 /* Start of node, full name */ -#define OF_DT_END_NODE 0x2 /* End node */ -#define OF_DT_PROP 0x3 /* Property: name off, size, content */ -#define OF_DT_NOP 0x4 /* nop */ -#define OF_DT_END 0x9 - -#define OF_DT_VERSION 0x10 So, if you're merging all these, I guess the question is do we also want to merge them with scripts/dtc/libfdt/fdt.h, and by extension with the upstream libfdt header file which defines the same things. I see your question and raise you another. Where should the merge file live for it to be included both by dtc and kernel code? Or should it just be cloned in the kernel tree? Yeah, a good question. As I see it there are two options. Number one is just make sure everything relevant that the kernel needs is in the libfdt version, then just have the kernel code reference it from its location in scripts/dtc. Other option is we clone the file in the kernel tree. Requires keeping in sync, in theory at least, but since that file has been pretty static (since it's only supposed to contain passive structures/constants related to the physical flat tree structure - no code or prototypes). -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] of/platform: Implement support for dev_pm_ops
On Tue, 2009-10-13 at 02:44 +0400, Anton Vorontsov wrote: I agree that there is some room for improvements in general (e.g. merging platform and of_platform devices/drivers), but it's not as easy as you would like to think. Let's make it in a separate step that don't stop real features from being implemented (e.g. hibernate). For the six functions that we can reuse I can prepare a cleanup patch that we can merge via -mm, or it can just sit and collect needed acks and can be merged via any tree. But please, no cross-tree dependencies for the cruicial features. I agree. I'll take the patch for now. In the long run, I'm all for killing of_platform if we can find a proper way to replace it with platform. IE. With dev_archdata, any device carries the of device node, so of_platform doesn't really buy us much anymore. We could even default by populating platform device resources with standard-parsing of reg properties etc... So for devices who don't actually need anything more, we may get away re-using platform devices as-is, all we would need is some kind of conversion table or such to map OF match to platform dev names, or maybe a secondary match table in the drivers themselves. Anyway, that's an old discussion, something we still need to sort out... Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] i2c-powermac: Reject unsupported I2C transactions
On Wed, 2009-09-30 at 22:14 +0200, Jean Delvare wrote: The i2c-powermac driver doesn't support arbitrary multi-message I2C transactions, only SMBus ones. Make it clear by returning an error if a multi-message I2C transaction is attempted. This is better than only processing the first message, because most callers won't recover from the short transaction. Anyone wishing to issue multi-message transactions should use the SMBus API instead of the raw I2C API. Signed-off-by: Jean Delvare kh...@linux-fr.org Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org --- drivers/i2c/busses/i2c-powermac.c |6 ++ 1 file changed, 6 insertions(+) --- linux-2.6.32-rc1.orig/drivers/i2c/busses/i2c-powermac.c 2009-06-10 05:05:27.0 +0200 +++ linux-2.6.32-rc1/drivers/i2c/busses/i2c-powermac.c2009-09-30 20:29:42.0 +0200 @@ -146,6 +146,12 @@ static int i2c_powermac_master_xfer( str int read; int addrdir; + if (num != 1) { + dev_err(adap-dev, + Multi-message I2C transactions not supported\n); + return -EOPNOTSUPP; + } + if (msgs-flags I2C_M_TEN) return -EINVAL; read = (msgs-flags I2C_M_RD) != 0; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[Fwd: [PATCH 1/2] i2c-powermac: Refactor i2c_powermac_smbus_xfer]
---BeginMessage--- I wanted to add some error logging to the i2c-powermac driver, but found that it was very difficult due to the way the i2c_powermac_smbus_xfer function is organized. Refactor the code in this function so that each low-level function is only called once. Signed-off-by: Jean Delvare kh...@linux-fr.org Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org --- This needs testing! Thanks. drivers/i2c/busses/i2c-powermac.c | 85 + 1 file changed, 41 insertions(+), 44 deletions(-) --- linux-2.6.32-rc3.orig/drivers/i2c/busses/i2c-powermac.c 2009-10-10 14:08:39.0 +0200 +++ linux-2.6.32-rc3/drivers/i2c/busses/i2c-powermac.c 2009-10-10 14:13:04.0 +0200 @@ -49,48 +49,38 @@ static s32 i2c_powermac_smbus_xfer( stru int rc = 0; int read = (read_write == I2C_SMBUS_READ); int addrdir = (addr 1) | read; + int mode, subsize, len; + u32 subaddr; + u8 *buf; u8 local[2]; - rc = pmac_i2c_open(bus, 0); - if (rc) - return rc; + if (size == I2C_SMBUS_QUICK || size == I2C_SMBUS_BYTE) { + mode = pmac_i2c_mode_std; + subsize = 0; + subaddr = 0; + } else { + mode = read ? pmac_i2c_mode_combined : pmac_i2c_mode_stdsub; + subsize = 1; + subaddr = command; + } switch (size) { case I2C_SMBUS_QUICK: - rc = pmac_i2c_setmode(bus, pmac_i2c_mode_std); - if (rc) - goto bail; - rc = pmac_i2c_xfer(bus, addrdir, 0, 0, NULL, 0); + buf = NULL; + len = 0; break; case I2C_SMBUS_BYTE: - rc = pmac_i2c_setmode(bus, pmac_i2c_mode_std); - if (rc) - goto bail; - rc = pmac_i2c_xfer(bus, addrdir, 0, 0, data-byte, 1); - break; case I2C_SMBUS_BYTE_DATA: - rc = pmac_i2c_setmode(bus, read ? - pmac_i2c_mode_combined : - pmac_i2c_mode_stdsub); - if (rc) - goto bail; - rc = pmac_i2c_xfer(bus, addrdir, 1, command, data-byte, 1); + buf = data-byte; + len = 1; break; case I2C_SMBUS_WORD_DATA: - rc = pmac_i2c_setmode(bus, read ? - pmac_i2c_mode_combined : - pmac_i2c_mode_stdsub); - if (rc) - goto bail; if (!read) { local[0] = data-word 0xff; local[1] = (data-word 8) 0xff; } - rc = pmac_i2c_xfer(bus, addrdir, 1, command, local, 2); - if (rc == 0 read) { - data-word = ((u16)local[1]) 8; - data-word |= local[0]; - } + buf = local; + len = 2; break; /* Note that these are broken vs. the expected smbus API where @@ -105,28 +95,35 @@ static s32 i2c_powermac_smbus_xfer(stru * a repeat start/addr phase (but not stop in between) */ case I2C_SMBUS_BLOCK_DATA: - rc = pmac_i2c_setmode(bus, read ? - pmac_i2c_mode_combined : - pmac_i2c_mode_stdsub); - if (rc) - goto bail; - rc = pmac_i2c_xfer(bus, addrdir, 1, command, data-block, - data-block[0] + 1); - + buf = data-block; + len = data-block[0] + 1; break; case I2C_SMBUS_I2C_BLOCK_DATA: - rc = pmac_i2c_setmode(bus, read ? - pmac_i2c_mode_combined : - pmac_i2c_mode_stdsub); - if (rc) - goto bail; - rc = pmac_i2c_xfer(bus, addrdir, 1, command, - data-block[1], data-block[0]); + buf = data-block[1]; + len = data-block[0]; break; default: - rc = -EINVAL; + return -EINVAL; + } + + rc = pmac_i2c_open(bus, 0); + if (rc) + return rc; + + rc = pmac_i2c_setmode(bus, mode); + if (rc) + goto bail; + + rc = pmac_i2c_xfer(bus, addrdir, subsize, subaddr, buf, len); + if (rc) + goto bail; + + if (size == I2C_SMBUS_WORD_DATA read) { + data-word = ((u16)local[1]) 8; +
[Fwd: [PATCH 2/2] i2c-powermac: Log errors]
---BeginMessage--- Log errors when they happen, otherwise we have no idea what went wrong. Signed-off-by: Jean Delvare kh...@linux-fr.org Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org --- drivers/i2c/busses/i2c-powermac.c | 28 +++- 1 file changed, 23 insertions(+), 5 deletions(-) --- linux-2.6.32-rc3.orig/drivers/i2c/busses/i2c-powermac.c 2009-10-10 14:13:04.0 +0200 +++ linux-2.6.32-rc3/drivers/i2c/busses/i2c-powermac.c 2009-10-10 14:13:12.0 +0200 @@ -108,16 +108,25 @@ static s32 i2c_powermac_smbus_xfer( stru } rc = pmac_i2c_open(bus, 0); - if (rc) + if (rc) { + dev_err(adap-dev, Failed to open I2C, err %d\n, rc); return rc; + } rc = pmac_i2c_setmode(bus, mode); - if (rc) + if (rc) { + dev_err(adap-dev, Failed to set I2C mode %d, err %d\n, + mode, rc); goto bail; + } rc = pmac_i2c_xfer(bus, addrdir, subsize, subaddr, buf, len); - if (rc) + if (rc) { + dev_err(adap-dev, + I2C transfer at 0x%02x failed, size %d, err %d\n, + addrdir 1, size, rc); goto bail; + } if (size == I2C_SMBUS_WORD_DATA read) { data-word = ((u16)local[1]) 8; @@ -157,12 +166,21 @@ static int i2c_powermac_master_xfer( str addrdir ^= 1; rc = pmac_i2c_open(bus, 0); - if (rc) + if (rc) { + dev_err(adap-dev, Failed to open I2C, err %d\n, rc); return rc; + } rc = pmac_i2c_setmode(bus, pmac_i2c_mode_std); - if (rc) + if (rc) { + dev_err(adap-dev, Failed to set I2C mode %d, err %d\n, + pmac_i2c_mode_std, rc); goto bail; + } rc = pmac_i2c_xfer(bus, addrdir, 0, 0, msgs-buf, msgs-len); + if (rc 0) + dev_err(adap-dev, I2C %s 0x%02x failed, err %d\n, + addrdir 1 ? read from : write to, addrdir 1, + rc); bail: pmac_i2c_close(bus); return rc 0 ? rc : 1; -- Jean Delvare ---End Message--- ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/6] powerpc: Make NR_IRQS a CONFIG option
The irq_desc array consumes quite a lot of space, and for systems that don't need or can't have 512 irqs it's just wasted space. The first 16 are reserved for ISA, so the minimum of 32 is really 16 - and no one has asked for more than 512 so leave that as the maximum. Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- arch/powerpc/Kconfig | 10 ++ arch/powerpc/include/asm/irq.h |4 ++-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 10a0a54..2230e75 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -56,6 +56,16 @@ config IRQ_PER_CPU bool default y +config NR_IRQS + int Number of virtual interrupt numbers + range 32 512 + default 512 + help + This defines the number of virtual interrupt numbers the kernel + can manage. Virtual interrupt numbers are what you see in + /proc/interrupts. If you configure your system to have too few, + drivers will fail to load or worse - handle with care. + config STACKTRACE_SUPPORT bool default y diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index bbcd1aa..b83fcc8 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -34,8 +34,8 @@ extern atomic_t ppc_n_lost_interrupts; */ #define NO_IRQ_IGNORE ((unsigned int)-1) -/* Total number of virq in the platform (make it a CONFIG_* option ? */ -#define NR_IRQS512 +/* Total number of virq in the platform */ +#define NR_IRQSCONFIG_NR_IRQS /* Number of irqs reserved for the legacy controller */ #define NUM_ISA_INTERRUPTS 16 -- 1.6.2.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/6] powerpc/pseries: Use irq_has_action() in eeh_disable_irq()
Rather than open-coding our own check, use irq_has_action() to check if an irq has an action - ie. is in use. irq_has_action() doesn't take the descriptor lock, but it shouldn't matter - we're just using it as an indicator that the irq is in use. disable_irq_nosync() will take the descriptor lock before doing anything also. Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- arch/powerpc/platforms/pseries/eeh_driver.c | 18 +- 1 files changed, 1 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/platforms/pseries/eeh_driver.c b/arch/powerpc/platforms/pseries/eeh_driver.c index 0e8db67..ef8e454 100644 --- a/arch/powerpc/platforms/pseries/eeh_driver.c +++ b/arch/powerpc/platforms/pseries/eeh_driver.c @@ -63,22 +63,6 @@ static void print_device_node_tree(struct pci_dn *pdn, int dent) } #endif -/** - * irq_in_use - return true if this irq is being used - */ -static int irq_in_use(unsigned int irq) -{ - int rc = 0; - unsigned long flags; - struct irq_desc *desc = irq_desc + irq; - - spin_lock_irqsave(desc-lock, flags); - if (desc-action) - rc = 1; - spin_unlock_irqrestore(desc-lock, flags); - return rc; -} - /** * eeh_disable_irq - disable interrupt for the recovering device */ @@ -93,7 +77,7 @@ static void eeh_disable_irq(struct pci_dev *dev) if (dev-msi_enabled || dev-msix_enabled) return; - if (!irq_in_use(dev-irq)) + if (!irq_has_action(dev-irq)) return; PCI_DN(dn)-eeh_mode |= EEH_MODE_IRQ_DISABLED; -- 1.6.2.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/6] powerpc: Remove get_irq_desc()
get_irq_desc() is a powerpc-specific version of irq_to_desc(). That is reason enough to remove it, but it also doesn't know about sparse irq_desc support which irq_to_desc() does (when we enable it). Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- arch/powerpc/include/asm/irq.h |2 - arch/powerpc/kernel/crash.c |2 +- arch/powerpc/kernel/irq.c | 28 -- arch/powerpc/platforms/512x/mpc5121_ads_cpld.c |2 +- arch/powerpc/platforms/52xx/media5200.c |2 +- arch/powerpc/platforms/82xx/pq2ads-pci-pic.c|2 +- arch/powerpc/platforms/85xx/socrates_fpga_pic.c |2 +- arch/powerpc/platforms/86xx/gef_pic.c |2 +- arch/powerpc/platforms/cell/beat_interrupt.c|2 +- arch/powerpc/platforms/cell/spider-pic.c|4 +- arch/powerpc/platforms/iseries/irq.c|2 +- arch/powerpc/platforms/powermac/pic.c |8 +++--- arch/powerpc/platforms/pseries/xics.c |8 +++--- arch/powerpc/sysdev/cpm1.c |2 +- arch/powerpc/sysdev/cpm2_pic.c | 10 +--- arch/powerpc/sysdev/fsl_msi.c |2 +- arch/powerpc/sysdev/i8259.c |4 +- arch/powerpc/sysdev/ipic.c |2 +- arch/powerpc/sysdev/mpc8xx_pic.c|2 +- arch/powerpc/sysdev/mpic.c | 18 +++--- arch/powerpc/sysdev/mv64x60_pic.c |2 +- arch/powerpc/sysdev/qe_lib/qe_ic.c |4 +- arch/powerpc/sysdev/tsi108_pci.c|2 +- arch/powerpc/sysdev/uic.c |6 ++-- arch/powerpc/sysdev/xilinx_intc.c |2 +- 25 files changed, 62 insertions(+), 60 deletions(-) diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index b83fcc8..03dc28c 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -17,8 +17,6 @@ #include asm/atomic.h -#define get_irq_desc(irq) (irq_desc[(irq)]) - /* Define a way to iterate across irqs. */ #define for_each_irq(i) \ for ((i) = 0; (i) NR_IRQS; ++(i)) diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c index 0a8439a..6f4613d 100644 --- a/arch/powerpc/kernel/crash.c +++ b/arch/powerpc/kernel/crash.c @@ -373,7 +373,7 @@ void default_machine_crash_shutdown(struct pt_regs *regs) hard_irq_disable(); for_each_irq(i) { - struct irq_desc *desc = irq_desc + i; + struct irq_desc *desc = irq_to_desc(i); if (desc-status IRQ_INPROGRESS) desc-chip-eoi(i); diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index e5d1211..6563221 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -190,7 +190,7 @@ int show_interrupts(struct seq_file *p, void *v) } if (i NR_IRQS) { - desc = get_irq_desc(i); + desc = irq_to_desc(i); spin_lock_irqsave(desc-lock, flags); action = desc-action; if (!action || !action-handler) @@ -230,23 +230,25 @@ skip: #ifdef CONFIG_HOTPLUG_CPU void fixup_irqs(cpumask_t map) { + struct irq_desc *desc; unsigned int irq; static int warned; for_each_irq(irq) { cpumask_t mask; - if (irq_desc[irq].status IRQ_PER_CPU) + desc = irq_to_desc(irq); + if (desc desc-status IRQ_PER_CPU) continue; - cpumask_and(mask, irq_desc[irq].affinity, map); + cpumask_and(mask, desc-affinity, map); if (any_online_cpu(mask) == NR_CPUS) { printk(Breaking affinity for irq %i\n, irq); mask = map; } - if (irq_desc[irq].chip-set_affinity) - irq_desc[irq].chip-set_affinity(irq, mask); - else if (irq_desc[irq].action !(warned++)) + if (desc-chip-set_affinity) + desc-chip-set_affinity(irq, mask); + else if (desc-action !(warned++)) printk(Cannot set affinity for irq %i\n, irq); } @@ -273,7 +275,7 @@ static inline void handle_one_irq(unsigned int irq) return; } - desc = irq_desc + irq; + desc = irq_to_desc(irq); saved_sp_limit = current-thread.ksp_limit; irqtp-task = curtp-task; @@ -535,7 +537,7 @@ struct irq_host *irq_alloc_host(struct device_node *of_node, smp_wmb(); /* Clear norequest flags */ - get_irq_desc(i)-status = ~IRQ_NOREQUEST; + irq_to_desc(i)-status = ~IRQ_NOREQUEST; /* Legacy flags are left to default at this point,
[PATCH 4/6] powerpc: Make virq_debug_show() cope with sparse irq_descs
Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- arch/powerpc/kernel/irq.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 6563221..baa49eb 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -1065,8 +1065,11 @@ static int virq_debug_show(struct seq_file *m, void *private) seq_printf(m, %-5s %-7s %-15s %s\n, virq, hwirq, chip name, host name); - for (i = 1; i NR_IRQS; i++) { + for (i = 1; i nr_irqs; i++) { desc = irq_to_desc(i); + if (!desc) + continue; + spin_lock_irqsave(desc-lock, flags); if (desc-action desc-action-handler) { -- 1.6.2.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 5/6] powerpc: Rearrange and fix show_interrupts() for sparse irq_descs
Move the default case out of the if, ie. when we're just displaying an irq. And consolidate all the odd cases at the top, ie. printing the header and footer. And in the process cope with sparse irq_descs. Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- arch/powerpc/kernel/irq.c | 64 ++--- 1 files changed, 37 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index baa49eb..63e27d5 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -187,33 +187,7 @@ int show_interrupts(struct seq_file *p, void *v) for_each_online_cpu(j) seq_printf(p, CPU%d , j); seq_putc(p, '\n'); - } - - if (i NR_IRQS) { - desc = irq_to_desc(i); - spin_lock_irqsave(desc-lock, flags); - action = desc-action; - if (!action || !action-handler) - goto skip; - seq_printf(p, %3d: , i); -#ifdef CONFIG_SMP - for_each_online_cpu(j) - seq_printf(p, %10u , kstat_irqs_cpu(i, j)); -#else - seq_printf(p, %10u , kstat_irqs(i)); -#endif /* CONFIG_SMP */ - if (desc-chip) - seq_printf(p, %s , desc-chip-typename); - else - seq_puts(p, None ); - seq_printf(p, %s, (desc-status IRQ_LEVEL) ? Level : Edge ); - seq_printf(p, %s, action-name); - for (action = action-next; action; action = action-next) - seq_printf(p, , %s, action-name); - seq_putc(p, '\n'); -skip: - spin_unlock_irqrestore(desc-lock, flags); - } else if (i == NR_IRQS) { + } else if (i == nr_irqs) { #if defined(CONFIG_PPC32) defined(CONFIG_TAU_INT) if (tau_initialized){ seq_puts(p, TAU: ); @@ -223,7 +197,43 @@ skip: } #endif /* CONFIG_PPC32 CONFIG_TAU_INT*/ seq_printf(p, BAD: %10u\n, ppc_spurious_interrupts); + + return 0; } + + desc = irq_to_desc(i); + if (!desc) + return 0; + + spin_lock_irqsave(desc-lock, flags); + + action = desc-action; + if (!action || !action-handler) + goto skip; + + seq_printf(p, %3d: , i); +#ifdef CONFIG_SMP + for_each_online_cpu(j) + seq_printf(p, %10u , kstat_irqs_cpu(i, j)); +#else + seq_printf(p, %10u , kstat_irqs(i)); +#endif /* CONFIG_SMP */ + + if (desc-chip) + seq_printf(p, %s , desc-chip-typename); + else + seq_puts(p, None ); + + seq_printf(p, %s, (desc-status IRQ_LEVEL) ? Level : Edge ); + seq_printf(p, %s, action-name); + + for (action = action-next; action; action = action-next) + seq_printf(p, , %s, action-name); + seq_putc(p, '\n'); + +skip: + spin_unlock_irqrestore(desc-lock, flags); + return 0; } -- 1.6.2.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 6/6] powerpc: Enable sparse irq_descs on powerpc
Defining CONFIG_SPARSE_IRQ enables generic code that gets rid of the static irq_desc array, and replaces it with an array of pointers to irq_descs. It also allows node local allocation of irq_descs, however we currently don't have the information available to do that, so we just allocate them on all on node 0. Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- arch/powerpc/Kconfig| 13 arch/powerpc/include/asm/irq.h |3 ++ arch/powerpc/kernel/irq.c | 40 -- arch/powerpc/kernel/ppc_ksyms.c |1 - arch/powerpc/kernel/setup_64.c |5 5 files changed, 49 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2230e75..825d889 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -388,6 +388,19 @@ config IRQ_ALL_CPUS CPU. Generally saying Y is safe, although some problems have been reported with SMP Power Macintoshes with this option enabled. +config SPARSE_IRQ + bool Support sparse irq numbering + default y + help + This enables support for sparse irqs. This is useful for distro + kernels that want to define a high CONFIG_NR_CPUS value but still + want to have low kernel memory footprint on smaller machines. + + ( Sparse IRQs can also be beneficial on NUMA boxes, as they spread + out the irq_desc[] array in a more NUMA-friendly way. ) + + If you don't know what to do here, say Y. + config NUMA bool NUMA support depends on PPC64 diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index 03dc28c..c85a32f 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -38,6 +38,9 @@ extern atomic_t ppc_n_lost_interrupts; /* Number of irqs reserved for the legacy controller */ #define NUM_ISA_INTERRUPTS 16 +/* Same thing, used by the generic IRQ code */ +#define NR_IRQS_LEGACY NUM_ISA_INTERRUPTS + /* This type is the placeholder for a hardware interrupt number. It has to * be big enough to enclose whatever representation is used by a given * platform. diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 63e27d5..eba5392 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -85,7 +85,10 @@ extern int tau_interrupts(int); #endif /* CONFIG_PPC32 */ #ifdef CONFIG_PPC64 + +#ifndef CONFIG_SPARSE_IRQ EXPORT_SYMBOL(irq_desc); +#endif int distribute_irqs = 1; @@ -613,8 +616,16 @@ void irq_set_virq_count(unsigned int count) static int irq_setup_virq(struct irq_host *host, unsigned int virq, irq_hw_number_t hwirq) { + struct irq_desc *desc; + + desc = irq_to_desc_alloc_node(virq, 0); + if (!desc) { + pr_debug(irq: - allocating desc failed\n); + goto error; + } + /* Clear IRQ_NOREQUEST flag */ - irq_to_desc(virq)-status = ~IRQ_NOREQUEST; + desc-status = ~IRQ_NOREQUEST; /* map it */ smp_wmb(); @@ -623,11 +634,14 @@ static int irq_setup_virq(struct irq_host *host, unsigned int virq, if (host-ops-map(host, virq, hwirq)) { pr_debug(irq: - mapping failed, freeing\n); - irq_free_virt(virq, 1); - return -1; + goto error; } return 0; + +error: + irq_free_virt(virq, 1); + return -1; } unsigned int irq_create_direct_mapping(struct irq_host *host) @@ -1008,12 +1022,24 @@ void irq_free_virt(unsigned int virq, unsigned int count) spin_unlock_irqrestore(irq_big_lock, flags); } -void irq_early_init(void) +int arch_early_irq_init(void) { - unsigned int i; + struct irq_desc *desc; + int i; - for (i = 0; i NR_IRQS; i++) - irq_to_desc(i)-status |= IRQ_NOREQUEST; + for (i = 0; i NR_IRQS; i++) { + desc = irq_to_desc(i); + if (desc) + desc-status |= IRQ_NOREQUEST; + } + + return 0; +} + +int arch_init_chip_data(struct irq_desc *desc, int node) +{ + desc-status |= IRQ_NOREQUEST; + return 0; } /* We need to create the radix trees late */ diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c index c8b27bb..07115d6 100644 --- a/arch/powerpc/kernel/ppc_ksyms.c +++ b/arch/powerpc/kernel/ppc_ksyms.c @@ -162,7 +162,6 @@ EXPORT_SYMBOL(screen_info); #ifdef CONFIG_PPC32 EXPORT_SYMBOL(timer_interrupt); -EXPORT_SYMBOL(irq_desc); EXPORT_SYMBOL(tb_ticks_per_jiffy); EXPORT_SYMBOL(cacheable_memcpy); EXPORT_SYMBOL(cacheable_memzero); diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 797ea95..8e5ec92 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -357,11 +357,6 @@ void __init setup_system(void) */ initialize_cache_info(); -
Re: [PATCH, RFC] powerpc, pci: fix MODPOST warning
On Mon, 2009-10-05 at 09:06 +0200, Heiko Schocher wrote: Hello, Heiko Schocher wrote: making a powerpc target with PCI support, shows the following warning: MODPOST vmlinux.o WARNING: vmlinux.o(.text+0x10430): Section mismatch in reference from the function pcibios_allocate_bus_resources() to the function .init.text:reparent_resources() The function pcibios_allocate_bus_resources() references the function __init reparent_resources(). This is often because pcibios_allocate_bus_resources lacks a __init annotation or the annotation of reparent_resources is wrong. This patch fix this warning by removing the __init annotation before reparent_resources. No comments? So, is this fix OK, or unusable? Nah, just me missing it but it's reference on patchwork. I'll pick the patch up. We can probably make some of that __devinit instead of __init though but we can look at it later. Cheers Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev