Re: 2.6.30-rc6: Problem with an SSD disk on Freescale PowerPC mpc8315e-rdb, works fine on x86
Adding the sata_fsl.c developers to the recipients: On Mon, Jun 8, 2009 at 4:59 PM, Leon Woestenbergleon.woestenb...@gmail.com wrote: Hello, using 2.6.30-rc6, I get the following problems when I read from a SSD disk, connected to the 3.0 Gb SATA controller of the MPC8315E SoC rev 1.0 running Linux 2.6.30-rc6. Below see the output from two dd read runs. The disk behaves fine on a x86 box. What I can do to (help) pin-point the problem? Regards, Leon. r...@mpc8315e-rdb:~# dd if=/dev/sda of=/dev/null bs=4k ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:3e:1e:e0:01/00:00:00:00:00/e0 tag 0 dma 31744 in res 50/00:3e:e0:df:01/00:00:00:00:00/e0 Emask 0x1 (device error) ata2.00: status: { DRDY } ata2: hard resetting link ata2: Signature Update detected @ 3528 msecs ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2.00: configured for UDMA/133 sd 1:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08 sd 1:0:0:0: [sda] Sense Key : 0xb [current] [descriptor] Descriptor sense data with sense descriptors (in hex): 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 00 01 df e0 sd 1:0:0:0: [sda] ASC=0x0 ASCQ=0x0 end_request: I/O error, dev sda, sector 122910 __ratelimit: 52 callbacks suppressed Buffer I/O error on device sda, logical block 122910 Buffer I/O error on device sda, logical block 122911 Buffer I/O error on device sda, logical block 122912 Buffer I/O error on device sda, logical block 122913 Buffer I/O error on device sda, logical block 122914 Buffer I/O error on device sda, logical block 122915 Buffer I/O error on device sda, logical block 122916 Buffer I/O error on device sda, logical block 122917 Buffer I/O error on device sda, logical block 122918 Buffer I/O error on device sda, logical block 122919 ata2: EH complete dd: /dev/sda: Input/output error r...@mpc8315e-rdb:~# dd if=/dev/sda of=/dev/null bs=4k ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:32:9a:6e:00/00:00:00:00:00/e0 tag 0 dma 25600 in res 50/00:3e:5c:6e:00/00:00:00:00:00/e0 Emask 0x1 (device error) ata2.00: status: { DRDY } ata2: hard resetting link ata2: Signature Update detected @ 3528 msecs ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2.00: configured for UDMA/133 sd 1:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08 sd 1:0:0:0: [sda] Sense Key : 0xb [current] [descriptor] Descriptor sense data with sense descriptors (in hex): 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 00 00 6e 5c sd 1:0:0:0: [sda] ASC=0x0 ASCQ=0x0 end_request: I/O error, dev sda, sector 28314 __ratelimit: 52 callbacks suppressed Buffer I/O error on device sda, logical block 28314 Buffer I/O error on device sda, logical block 28315 Buffer I/O error on device sda, logical block 28316 Buffer I/O error on device sda, logical block 28317 Buffer I/O error on device sda, logical block 28318 Buffer I/O error on device sda, logical block 28319 Buffer I/O error on device sda, logical block 28320 Buffer I/O error on device sda, logical block 28321 Buffer I/O error on device sda, logical block 28322 Buffer I/O error on device sda, logical block 28323 ata2: EH complete dd: /dev/sda: Input/output error r...@mpc8315e-rdb:~# uname -a Linux mpc8315e-rdb 2.6.30-rc6 #1 Mon Jun 8 15:54:00 CEST 2009 ppc unknown r...@mpc8315e-rdb:~# hdparm -i /dev/sda /dev/sda: Model=Solidata X SSD , FwRev=0955 , SerialNo=... Config={ HardSect NotMFM HdSw15uSec Fixed DTR10Mbs } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0 BuffType=unknown, BuffSize=0kB, MaxMultSect=128, MultSect=?1? CurCHS=16383/16/63, CurSects=16514064, LBA=no IORDY=no, tPIO={min:240,w/IORDY:120} PIO modes: pio0 pio3 pio4 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 AdvancedPM=yes: disabled (255) WriteCache=disabled Drive conforms to: Unspecified: ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7 * signifies the current active mode -- Leon -- Leon ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4] zone_reclaim is always 0 by default
On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote: Let me start by saying I agree completely with everything you wrote and still disagree with this patch, but was willing to compromise and work around this for our upcoming x86_64 machine by putting a value add into our packaging of adding a sysctl that turns reclaim back on. ... Index: b/arch/powerpc/include/asm/topology.h === --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -10,6 +10,12 @@ struct device_node; #include asm/mmzone.h +/* + * Distance above which we begin to use zone reclaim + */ +#define RECLAIM_DISTANCE 20 + + Where is the ia-64-specific modifier to RECAIM_DISTANCE? It was already defined as 15 in arch/ia64/include/asm/topology.h Thanks, Robin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4] zone_reclaim is always 0 by default
On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote: On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote: Let me start by saying I agree completely with everything you wrote and still disagree with this patch, but was willing to compromise and work around this for our upcoming x86_64 machine by putting a value add into our packaging of adding a sysctl that turns reclaim back on. To be honest, I'm more leaning towards a NACK than an ACK on this one. I don't support enough NUMA machines to feel strongly enough about it but unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's might be there seems ill-advised to me and will have other consequences for existing more traditional x86-64 NUMA machines. ... Index: b/arch/powerpc/include/asm/topology.h === --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -10,6 +10,12 @@ struct device_node; #include asm/mmzone.h +/* + * Distance above which we begin to use zone reclaim + */ +#define RECLAIM_DISTANCE 20 + + Where is the ia-64-specific modifier to RECAIM_DISTANCE? It was already defined as 15 in arch/ia64/include/asm/topology.h /me slaps self thanks -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] mpc83xx/usb.c: fix usb mux setup for mpc834x
usb0 and usb1 mux settings in the sicrl register were swapped (twice!) in mpc834x_usb_cfg(), leading to various strange issues with fsl-ehci and full speed devices. The USB port config on mpc834x is done using 2 muxes: Port 0 is always used for MPH port 0, and port 1 can either be used for MPH port 1 or DR (unless DR uses TMDI phy or OTG, then it uses both ports) - See 8349 RM figure 1-4.. mpc8349_usb_cfg() had this inverted for the DR, and it also had the bit positions of the usb0 / usb1 mux settings swapped. It would basically work if you specified port1 instead of port0 for the MPH controller (and happened to use ULPI phys), which is what all the 834x dts have done, even though that configuration is physically invalid. Instead fix mpc8349_usb_cfg() and adjust the dts files to match reality. Signed-off-by: Peter Korsgaard jac...@sunsite.dk --- arch/powerpc/boot/dts/asp834x-redboot.dts |2 +- arch/powerpc/boot/dts/mpc8349emitx.dts|2 +- arch/powerpc/boot/dts/mpc834x_mds.dts |2 +- arch/powerpc/boot/dts/sbc8349.dts |2 +- arch/powerpc/platforms/83xx/mpc83xx.h |4 ++-- arch/powerpc/platforms/83xx/usb.c | 10 +- 6 files changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/boot/dts/asp834x-redboot.dts b/arch/powerpc/boot/dts/asp834x-redboot.dts index 7da84fd..261d10c 100644 --- a/arch/powerpc/boot/dts/asp834x-redboot.dts +++ b/arch/powerpc/boot/dts/asp834x-redboot.dts @@ -167,7 +167,7 @@ interrupt-parent = ipic; interrupts = 39 0x8; phy_type = ulpi; - port1; + port0; }; /* phy type (ULPI, UTMI, UTMI_WIDE, SERIAL) */ u...@23000 { diff --git a/arch/powerpc/boot/dts/mpc8349emitx.dts b/arch/powerpc/boot/dts/mpc8349emitx.dts index 1ae38f0..e540d44 100644 --- a/arch/powerpc/boot/dts/mpc8349emitx.dts +++ b/arch/powerpc/boot/dts/mpc8349emitx.dts @@ -156,7 +156,7 @@ interrupt-parent = ipic; interrupts = 39 0x8; phy_type = ulpi; - port1; + port0; }; u...@23000 { diff --git a/arch/powerpc/boot/dts/mpc834x_mds.dts b/arch/powerpc/boot/dts/mpc834x_mds.dts index d9f0a23..a667fe7 100644 --- a/arch/powerpc/boot/dts/mpc834x_mds.dts +++ b/arch/powerpc/boot/dts/mpc834x_mds.dts @@ -153,7 +153,7 @@ interrupt-parent = ipic; interrupts = 39 0x8; phy_type = ulpi; - port1; + port0; }; /* phy type (ULPI, UTMI, UTMI_WIDE, SERIAL) */ u...@23000 { diff --git a/arch/powerpc/boot/dts/sbc8349.dts b/arch/powerpc/boot/dts/sbc8349.dts index a36dbbc..c7e1c4b 100644 --- a/arch/powerpc/boot/dts/sbc8349.dts +++ b/arch/powerpc/boot/dts/sbc8349.dts @@ -144,7 +144,7 @@ interrupt-parent = ipic; interrupts = 39 0x8; phy_type = ulpi; - port1; + port0; }; /* phy type (ULPI, UTMI, UTMI_WIDE, SERIAL) */ u...@23000 { diff --git a/arch/powerpc/platforms/83xx/mpc83xx.h b/arch/powerpc/platforms/83xx/mpc83xx.h index 83cfe51..d1dc5b0 100644 --- a/arch/powerpc/platforms/83xx/mpc83xx.h +++ b/arch/powerpc/platforms/83xx/mpc83xx.h @@ -22,8 +22,8 @@ /* system i/o configuration register low */ #define MPC83XX_SICRL_OFFS 0x114 #define MPC834X_SICRL_USB_MASK 0x6000 -#define MPC834X_SICRL_USB0 0x4000 -#define MPC834X_SICRL_USB1 0x2000 +#define MPC834X_SICRL_USB0 0x2000 +#define MPC834X_SICRL_USB1 0x4000 #define MPC831X_SICRL_USB_MASK 0x0c00 #define MPC831X_SICRL_USB_ULPI 0x0800 #define MPC8315_SICRL_USB_MASK 0x00fc diff --git a/arch/powerpc/platforms/83xx/usb.c b/arch/powerpc/platforms/83xx/usb.c index 11e1fac..f53eba3 100644 --- a/arch/powerpc/platforms/83xx/usb.c +++ b/arch/powerpc/platforms/83xx/usb.c @@ -51,21 +51,21 @@ int mpc834x_usb_cfg(void) !strcmp(prop, utmi_wide))) { sicrl |= MPC834X_SICRL_USB0 | MPC834X_SICRL_USB1; sicrh |= MPC834X_SICRH_USB_UTMI; - port1_is_dr = 1; + port0_is_dr = 1; } else if (prop !strcmp(prop, serial)) { dr_mode = of_get_property(np, dr_mode, NULL); if (dr_mode !strcmp(dr_mode, otg)) { sicrl |= MPC834X_SICRL_USB0 | MPC834X_SICRL_USB1; - port1_is_dr = 1; + port0_is_dr = 1; } else { - sicrl |=
Re: [PATCH v4] zone_reclaim is always 0 by default
On Tue, Jun 09, 2009 at 11:37:55AM +0100, Mel Gorman wrote: On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote: On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote: Let me start by saying I agree completely with everything you wrote and still disagree with this patch, but was willing to compromise and work around this for our upcoming x86_64 machine by putting a value add into our packaging of adding a sysctl that turns reclaim back on. To be honest, I'm more leaning towards a NACK than an ACK on this one. I don't support enough NUMA machines to feel strongly enough about it but unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's might be there seems ill-advised to me and will have other consequences for existing more traditional x86-64 NUMA machines. I was sort-of planning on coming up with an x86_64 arch specific function for setting zone_reclaim_mode, but didn't like the direction things were going. Something to the effect of... --- 20090609.orig/mm/page_alloc.c 2009-06-09 06:51:34.0 -0500 +++ 20090609/mm/page_alloc.c2009-06-09 06:55:00.160762069 -0500 @@ -2326,12 +2326,7 @@ static void build_zonelists(pg_data_t *p while ((node = find_next_best_node(local_node, used_mask)) = 0) { int distance = node_distance(local_node, node); - /* -* If another node is sufficiently far away then it is better -* to reclaim pages in a zone before going off node. -*/ - if (distance RECLAIM_DISTANCE) - zone_reclaim_mode = 1; + zone_reclaim_mode = arch_zone_reclaim_mode(distance); /* * We don't want to pressure a particular node. And then letting each arch define an arch_zone_reclaim_mode(). If other values are needed in the determination, we would add parameters to reflect this. For ia64, add static inline ia64_zone_reclaim_mode(int distance) { if (distance 15) return 1; } #define arch_zone_reclaim_mode(_d) ia64_zone_reclaim_mode(_d) Then, inside x86_64_zone_reclaim_mode(), I could make it something like if (distance 40 || is_uv_system()) return 1; In the end, I didn't think this fight was worth fighting given how ugly this felt. Upon second thought, I am beginning to think it is not that bad, but I also don't think it is that good either. Thanks, Robin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4] zone_reclaim is always 0 by default
Hi sorry for late responce. my e-mail reading speed is very slow ;-) First, Could you please read past thread? I think many topic of this mail are already discussed. On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote: Current linux policy is, zone_reclaim_mode is enabled by default if the machine has large remote node distance. it's because we could assume that large distance mean large server until recently. We don't make assumptions about the server being large, small or otherwise. The affinity tables reporting a distance of 20 or more is saying remote memory has twice the latency of local memory. This is true irrespective of workload and implies that going off-node has a real penalty regardless of workload. No. Now, we talk about off-node allocation vs unnecessary file cache dropping. IOW, off-node allocation vs disk access. Then, the worth doesn't only depend on off-node distance, but also depend on workload IO tendency and IO speed. Fujitsu has 64 core ia64 HPC box, zone-reclaim sometimes made performance degression although its box. So, I don't think this problem is small vs large machine issue. nor i7 issue. high-speed P2P CPU integrated memory controller expose old issue. In general, workload depended configration shouldn't put into default settings. However, current code is long standing about two year. Highest POWER and IA64 HPC machine (only) use this setting. Thus, x86 and almost rest architecture change default setting, but Only power and ia64 remain current configuration for backward-compatibility. What about if it's x86-64-based NUMA but it's not i7 based. There, the NUMA distances might really mean something and that zone_reclaim behaviour is desirable. hmmm.. I don't hope ignore AMD, I think it's common characterastic of P2P and integrated memory controller machine. Also, I don't hope detect CPU family or similar, because we need update such code evey when Intel makes new cpu. Can we detect P2P interconnect machine? I'm not sure. I think if we're going down the road of setting the default, it shouldn't be per-architecture defaults as such. Other choices for addressing this might be; 1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5 (or some other sensible figure) on i7 2. There should be a per-arch modifier callback for the affinity distances. If the x86 code detects the CPU is an i7, it can reduce the reported latencies to be more in line with expected reality. 3. Do not use zone_reclaim() for file-backed data if more than 20% of memory overall is free. The difficulty is figuring out if the allocation is for file pages. 4. Change zone_reclaim_mode default to mean do your best to figure it out. Patch 1 would default large distances to 1 to see what happens. Then apply a heuristic when in figure-it-out mode and using reclaim_mode == 1 If we have locally reclaimed 2% of the nodes memory in file pages within the last 5 seconds when = 20% of total physical memory was free, then set the reclaim_mode to 0 on the assumption the node is mostly caching pages and shouldn't be reclaimed to avoid excessive IO Option 1 would appear to be the most straight-forward but option 2 should be doable. Option 3 and 4 could turn into a rats nest and I would consider those approaches a bit more drastic. hmhm. I think the key-point of option 1 and 2 are proper hardware detecting way. option 3 and 4 are more prefere idea to me. I like workload adapted heuristic. but you already pointed out its hard, because page-allocator don't know allocation purpose ;) @@ -10,6 +10,12 @@ struct device_node; #include asm/mmzone.h +/* + * Distance above which we begin to use zone reclaim + */ +#define RECLAIM_DISTANCE 20 + + Where is the ia-64-specific modifier to RECAIM_DISTANCE? arch/ia64/include/asm/topology.h has /* * Distance above which we begin to use zone reclaim */ #define RECLAIM_DISTANCE 15 I don't think distance==15 is machine independent proper definition. but there is long lived definition ;) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages
This patch implements toleration of dynamic memory operations and 16 GB gigantic pages. On module load the driver walks through available system memory, checks for available memory ranges and then registers the kernel internal memory region accordingly. The translation of address ranges is implemented via a 3-level busmap. Signed-off-by: Hannes Hering heri...@de.ibm.com --- This patch is built and tested against infiniband.git. Please apply for 2.6.31. Regards Hannes Index: infiniband/drivers/infiniband/hw/ehca/ehca_main.c === --- infiniband.orig/drivers/infiniband/hw/ehca/ehca_main.c 2009-06-09 14:20:37.0 +0200 +++ infiniband/drivers/infiniband/hw/ehca/ehca_main.c 2009-06-09 14:20:47.0 +0200 @@ -52,7 +52,7 @@ #include ehca_tools.h #include hcp_if.h -#define HCAD_VERSION 0026 +#define HCAD_VERSION 0027 MODULE_LICENSE(Dual BSD/GPL); MODULE_AUTHOR(Christoph Raisch rai...@de.ibm.com); @@ -506,6 +506,7 @@ shca-ib_device.detach_mcast= ehca_detach_mcast; shca-ib_device.process_mad = ehca_process_mad; shca-ib_device.mmap= ehca_mmap; + shca-ib_device.dma_ops = ehca_dma_mapping_ops; if (EHCA_BMASK_GET(HCA_CAP_SRQ, shca-hca_cap)) { shca-ib_device.uverbs_cmd_mask |= @@ -1028,17 +1029,23 @@ goto module_init1; } + ret = ehca_create_busmap(); + if (ret) { + ehca_gen_err(Cannot create busmap.); + goto module_init2; + } + ret = ibmebus_register_driver(ehca_driver); if (ret) { ehca_gen_err(Cannot register eHCA device driver); ret = -EINVAL; - goto module_init2; + goto module_init3; } ret = register_memory_notifier(ehca_mem_nb); if (ret) { ehca_gen_err(Failed registering memory add/remove notifier); - goto module_init3; + goto module_init4; } if (ehca_poll_all_eqs != 1) { @@ -1053,9 +1060,12 @@ return 0; -module_init3: +module_init4: ibmebus_unregister_driver(ehca_driver); +module_init3: + ehca_destroy_busmap(); + module_init2: ehca_destroy_slab_caches(); @@ -1073,6 +1083,8 @@ unregister_memory_notifier(ehca_mem_nb); + ehca_destroy_busmap(); + ehca_destroy_slab_caches(); ehca_destroy_comp_pool(); Index: infiniband/drivers/infiniband/hw/ehca/ehca_mrmw.c === --- infiniband.orig/drivers/infiniband/hw/ehca/ehca_mrmw.c 2009-06-09 14:20:37.0 +0200 +++ infiniband/drivers/infiniband/hw/ehca/ehca_mrmw.c 2009-06-09 14:20:47.0 +0200 @@ -53,6 +53,39 @@ /* max number of rpages (per hcall register_rpages) */ #define MAX_RPAGES 512 +/* DMEM toleration management */ +#define EHCA_SECTSHIFTSECTION_SIZE_BITS +#define EHCA_SECTSIZE (1UL EHCA_SECTSHIFT) +#define EHCA_HUGEPAGESHIFT 34 +#define EHCA_HUGEPAGE_SIZE (1UL EHCA_HUGEPAGESHIFT) +#define EHCA_HUGEPAGE_PFN_MASK ((EHCA_HUGEPAGE_SIZE - 1) PAGE_SHIFT) +#define EHCA_INVAL_ADDR0xULL +#define EHCA_DIR_INDEX_SHIFT 13 /* 8k Entries in 64k block */ +#define EHCA_TOP_INDEX_SHIFT (EHCA_DIR_INDEX_SHIFT * 2) +#define EHCA_MAP_ENTRIES (1 EHCA_DIR_INDEX_SHIFT) +#define EHCA_TOP_MAP_SIZE (0x1) /* currently fixed map size */ +#define EHCA_DIR_MAP_SIZE (0x1) +#define EHCA_ENT_MAP_SIZE (0x1) +#define EHCA_INDEX_MASK (EHCA_MAP_ENTRIES - 1) +#define EHCA_REG_MR 0 +#define EHCA_REG_BUSMAP_MR (~0) + +static unsigned long ehca_mr_len; +/* + * Memory map data structures + */ +struct ehca_dir_bmap { + u64 ent[EHCA_MAP_ENTRIES]; +}; +struct ehca_top_bmap { + struct ehca_dir_bmap *dir[EHCA_MAP_ENTRIES]; +}; +struct ehca_bmap { + struct ehca_top_bmap *top[EHCA_MAP_ENTRIES]; +}; + +static struct ehca_bmap *ehca_bmap; + static struct kmem_cache *mr_cache; static struct kmem_cache *mw_cache; @@ -68,6 +101,8 @@ #define EHCA_MR_PGSHIFT1M 20 #define EHCA_MR_PGSHIFT16M 24 +static u64 ehca_map_vaddr(void *caddr); + static u32 ehca_encode_hwpage_size(u32 pgsize) { int log = ilog2(pgsize); @@ -135,7 +170,8 @@ goto get_dma_mr_exit0; } - ret = ehca_reg_maxmr(shca, e_maxmr, (u64 *)KERNELBASE, + ret = ehca_reg_maxmr(shca, e_maxmr, +(void *)ehca_map_vaddr((void *)KERNELBASE), mr_access_flags, e_pd, e_maxmr-ib.ib_mr.lkey, e_maxmr-ib.ib_mr.rkey); @@ -251,7 +287,7 @@ ret = ehca_reg_mr(shca, e_mr, iova_start, size, mr_access_flags, e_pd,
Re: [Powerpc/SLQB] Next June 06 : BUG during scsi initialization
On Mon, Jun 08, 2009 at 05:42:14PM +0530, Sachin Sant wrote: Pekka J Enberg wrote: Hi Sachin, __slab_alloc_page: nid=2, cache_node=c000de01ba00, cache_list=c000de01ba00 __slab_alloc_page: nid=2, cache_node=c000de01bd00, cache_list=c000de01bd00 __slab_alloc_page: nid=2, cache_node=c000de01ba00, cache_lisBUG: spinlock bad magic on CPU#1, modprobe/62 lock: c08c4280, .magic: 7dcc61f0, .owner: || status == __GCONV_INCOMPLETE_INPUT || status == __GCONV_FULL_OUTPUT/724596736, .owner_cpu: 4095 Call Trace: [c000c7da36d0] [c00116e0] .show_stack+0x6c/0x16c (unreliable) [c000c7da3780] [c0365bcc] .spin_bug+0xb0/0xd4 [c000c7da3810] [c0365e94] ._raw_spin_lock+0x48/0x184 [c000c7da38b0] [c05de4f8] ._spin_lock+0x10/0x24 [c000c7da3920] [c0141240] .__slab_alloc_page+0x410/0x4b4 [c000c7da39e0] [c0142804] .kmem_cache_alloc+0x13c/0x21c [c000c7da3aa0] [c01431dc] .kmem_cache_create+0x294/0x2a8 [c000c7da3b90] [d0ea1438] .scsi_init_queue+0x38/0x170 [scsi_mod] [c000c7da3c20] [d0ea1334] .init_scsi+0x1c/0xe8 [scsi_mod] [c000c7da3ca0] [c00092c0] .do_one_initcall+0x80/0x19c [c000c7da3d90] [c00c09c8] .SyS_init_module+0xe0/0x244 [c000c7da3e30] [c0008534] syscall_exit+0x0/0x40 I can't really work it out. It seems to be the kmem_cache_cache which has a problem, but there have already been lots of caches created and even this samw cache_node already used right beforehand with no problem. Unless a CPU or node comes up or something right at this point or the caller is scheduled onto a different CPU... oopses seem to all have CPU#1, wheras boot CPU is probably #0 (these CPUs are node 0 and memory is only on node 1 and 2 where there are no CPUs if I read correctly). I still can't see the reason for the failure, but can you try this patch please and show dmesg? --- mm/slqb.c | 34 +++--- 1 file changed, 31 insertions(+), 3 deletions(-) Index: linux-2.6/mm/slqb.c === --- linux-2.6.orig/mm/slqb.c +++ linux-2.6/mm/slqb.c @@ -963,6 +963,7 @@ static struct slqb_page *allocate_slab(s flags |= s-allocflags; + flags = ~0x2000; page = (struct slqb_page *)alloc_pages_node(node, flags, s-order); if (!page) return NULL; @@ -1357,6 +1358,8 @@ static noinline void *__slab_alloc_page( unsigned int colour; void *object; + if (gfpflags 0x2000) + printk(SLQB: __slab_alloc_page cpu=%d request node=%d\n, smp_processor_id(), node); c = get_cpu_slab(s, smp_processor_id()); colour = c-colour_next; c-colour_next += s-colour_off; @@ -1374,6 +1377,8 @@ static noinline void *__slab_alloc_page( if (unlikely(!page)) return page; + if (gfpflags 0x2000) + printk(SLQB: __slab_alloc_page cpu=%d,nid=%d request node=%d page node=%d\n, smp_processor_id(), numa_node_id(), node, slqb_page_to_nid(page)); if (!NUMA_BUILD || likely(slqb_page_to_nid(page) == numa_node_id())) { struct kmem_cache_cpu *c; int cpu = smp_processor_id(); @@ -1382,6 +1387,7 @@ static noinline void *__slab_alloc_page( l = c-list; page-list = l; + printk(SLQB: __slab_alloc_page spin_lock(%p)\n, l-page_lock); spin_lock(l-page_lock); l-nr_slabs++; l-nr_partial++; @@ -1398,6 +1404,8 @@ static noinline void *__slab_alloc_page( l = n-list; page-list = l; + printk(SLQB: __slab_alloc_page spin_lock(%p)\n, n-list_lock); + printk(SLQB: __slab_alloc_page spin_lock(%p)\n, l-page_lock); spin_lock(n-list_lock); spin_lock(l-page_lock); l-nr_slabs++; @@ -1411,6 +1419,7 @@ static noinline void *__slab_alloc_page( #endif } VM_BUG_ON(!object); + printk(SLQB: __slab_alloc_page OK\n); return object; } @@ -1440,6 +1449,8 @@ static void *__remote_slab_alloc_node(st struct kmem_cache_list *l; void *object; + if (gfpflags 0x2000) + printk(SLQB: __remote_slab_alloc_node cpu=%d request node=%d\n, smp_processor_id(), node); n = s-node_slab[node]; if (unlikely(!n)) /* node has no memory */ return NULL; @@ -1541,7 +1552,11 @@ static __always_inline void *slab_alloc( again: local_irq_save(flags); + if (gfpflags 0x2000) + printk(SLQB: slab_alloc cpu=%d,nid=%d request node=%d\n, smp_processor_id(), numa_node_id(), node); object = __slab_alloc(s, gfpflags, node); + if (gfpflags 0x2000) + printk(SLQB: slab_alloc cpu=%d return=%p\n, smp_processor_id(), object);
Re: [PATCH v4] zone_reclaim is always 0 by default
On Tue, Jun 09, 2009 at 10:48:34PM +0900, KOSAKI Motohiro wrote: Hi sorry for late responce. my e-mail reading speed is very slow ;-) First, Could you please read past thread? I think many topic of this mail are already discussed. I think I caught them all but the horrible fact of the matter is that whether zone_reclaim_mode should be 1 or 0 on NUMA machines is it depends. There are arguements for both and no clear winner. On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote: Current linux policy is, zone_reclaim_mode is enabled by default if the machine has large remote node distance. it's because we could assume that large distance mean large server until recently. We don't make assumptions about the server being large, small or otherwise. The affinity tables reporting a distance of 20 or more is saying remote memory has twice the latency of local memory. This is true irrespective of workload and implies that going off-node has a real penalty regardless of workload. No. Now, we talk about off-node allocation vs unnecessary file cache dropping. IOW, off-node allocation vs disk access. Even if we used GFP flags to identify the file pages, there is no guarantee that we are taking the correct action to keep relevant pages in memory. Then, the worth doesn't only depend on off-node distance, but also depend on workload IO tendency and IO speed. Fujitsu has 64 core ia64 HPC box, zone-reclaim sometimes made performance degression although its box. I bet if it was 0, that the off-node accesses would somewtimes make performance degression as well :( So, I don't think this problem is small vs large machine issue. nor i7 issue. high-speed P2P CPU integrated memory controller expose old issue. In general, workload depended configration shouldn't put into default settings. However, current code is long standing about two year. Highest POWER and IA64 HPC machine (only) use this setting. Thus, x86 and almost rest architecture change default setting, but Only power and ia64 remain current configuration for backward-compatibility. What about if it's x86-64-based NUMA but it's not i7 based. There, the NUMA distances might really mean something and that zone_reclaim behaviour is desirable. hmmm.. I don't hope ignore AMD, I think it's common characterastic of P2P and integrated memory controller machine. Also, I don't hope detect CPU family or similar, because we need update such code evey when Intel makes new cpu. Can we detect P2P interconnect machine? I'm not sure. I've no idea. It's not just I7 because some of the AMD chips will have integrated memory controllers as well. We were somewhat depending on the affinity information providing the necessary information. I think if we're going down the road of setting the default, it shouldn't be per-architecture defaults as such. Other choices for addressing this might be; 1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5 (or some other sensible figure) on i7 2. There should be a per-arch modifier callback for the affinity distances. If the x86 code detects the CPU is an i7, it can reduce the reported latencies to be more in line with expected reality. 3. Do not use zone_reclaim() for file-backed data if more than 20% of memory overall is free. The difficulty is figuring out if the allocation is for file pages. 4. Change zone_reclaim_mode default to mean do your best to figure it out. Patch 1 would default large distances to 1 to see what happens. Then apply a heuristic when in figure-it-out mode and using reclaim_mode == 1 If we have locally reclaimed 2% of the nodes memory in file pages within the last 5 seconds when = 20% of total physical memory was free, then set the reclaim_mode to 0 on the assumption the node is mostly caching pages and shouldn't be reclaimed to avoid excessive IO Option 1 would appear to be the most straight-forward but option 2 should be doable. Option 3 and 4 could turn into a rats nest and I would consider those approaches a bit more drastic. hmhm. I think the key-point of option 1 and 2 are proper hardware detecting way. option 3 and 4 are more prefere idea to me. I like workload adapted heuristic. but you already pointed out its hard, because page-allocator don't know allocation purpose ;) Option 3 may be undoable. Even if the allocations are tagged as this is a file-backed allocation, we have no way of detecting how important that is to the overall workload. Option 4 would be the preference. It's a heuristic that might let us down, but the administrator can override it and fix the reclaim_mode in the event we get it wrong. @@ -10,6 +10,12 @@ struct device_node; #include asm/mmzone.h +/* + * Distance above which we begin to use
Re: [BUILD FAILURE 01/04] Next June 04:PPC64 randconfig [drivers/staging/comedi/drivers.o]
On Tue, 2009-06-09 at 13:50 +1000, Benjamin Herrenschmidt wrote: On Sun, 2009-06-07 at 20:06 +0530, Subrata Modak wrote: On Sat, 2009-06-06 at 09:36 -0400, Frank Mori Hess wrote: On Saturday 06 June 2009, Greg KH wrote: Frank and Ian, any thoughts about the vmap call in the comedi_buf_alloc() call? Why is it using PAGE_KERNEL_NOCACHE, and what is the prealloc_buf buffer used for? It is a circular buffer used to hold data streaming either to or from a board (for example when producing an analog output waveform). Reads and writes to the device files read/write to the circular buffer, plus a few drivers do dma directly to/from it. I personally don't have a problem with requiring drivers to have their own dma buffers and making them copy data between their private dma buffers and the main circular buffer. I guess the original design wanted to support zero-copy dma. Great to hear that. How about a patch that solves my build problem on PPC64(the problem seems to be existing for long) ? In any case, doing PAGE_KERNEL_NOCACHE for DMA memory is incorrect on many architectures. So at this stage, there's no much option but ifdef I suspect for now until this is fixed properly. Ok. But, i am not sure whether Greg will agree to this. If, Ok, is the following patch i sent earlier Ok ? http://lkml.org/lkml/2009/6/5/462, Regards-- Subrata It does make sense to want to have some memory like that shared between user space and DMA, though I don't know what the right approach that works on all archs is at this stage. Worth asking the Alsa guys, I think they have similar issues :-) But doing double buffering might do the trick fine for now. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [BUILD FAILURE 02/04] Next June 04:PPC64 randconfig [drivers/usb/host/ohci-hcd.o]
On Fri, 2009-06-05 at 13:26 -0500, Subrata Modak wrote: On Thu, 2009-06-04 at 10:07 -0400, Jon Smirl wrote: On Thu, Jun 4, 2009 at 9:31 AM, Subrata Modak subr...@linux.vnet.ibm.com wrote: CC drivers/usb/host/ohci-hcd.o In file included from drivers/usb/host/ohci-hcd.c:1060: drivers/usb/host/ohci-ppc-of.c:242:2: error: #error No endianess selected for ppc-of-ohci make[3]: *** [drivers/usb/host/ohci-hcd.o] Error 1 make[2]: *** [drivers/usb/host] Error 2 make[1]: *** [drivers/usb] Error 2 make: *** [drivers] Error 2 I reported this earlier, and there were some discussions: http://groups.google.co.kr/group/linux.kernel/browse_thread/thread/edff9d5572d3d225 Proposed patch by Arnd should fix this. It has not been merged. http://lkml.org/lkml/2009/4/22/49 Correct, it fixes the issue. However, since few changes might have gone to the Kconfig, the patch does not apply cleanly. Below is the patch, just a retake of the earlier one, but on the latest code. David, Can you please pickup the following patch ? David, Is it you who will be merging this patch. Or, do i need to send it to somebody else ? Regards-- Subrata Signed-off-by: Arnd Bergmann a...@arndb.de, Resent-by: Subrata Modak subr...@linux.vnet.ibm.com --- --- linux-2.6.30-rc8/drivers/usb/host/Kconfig.orig2009-06-05 10:31:30.0 -0500 +++ linux-2.6.30-rc8/drivers/usb/host/Kconfig 2009-06-05 10:37:53.0 -0500 @@ -181,26 +181,26 @@ config USB_OHCI_HCD_PPC_SOC Enables support for the USB controller on the MPC52xx or STB03xxx processor chip. If unsure, say Y. -config USB_OHCI_HCD_PPC_OF - bool OHCI support for PPC USB controller on OF platform bus - depends on USB_OHCI_HCD PPC_OF - default y - ---help--- - Enables support for the USB controller PowerPC present on the - OpenFirmware platform bus. - config USB_OHCI_HCD_PPC_OF_BE - bool Support big endian HC - depends on USB_OHCI_HCD_PPC_OF - default y + bool OHCI support for OF platform bus (big endian) + depends on USB_OHCI_HCD PPC_OF select USB_OHCI_BIG_ENDIAN_DESC select USB_OHCI_BIG_ENDIAN_MMIO + ---help--- + Enables support for big-endian USB controllers present on the + OpenFirmware platform bus. config USB_OHCI_HCD_PPC_OF_LE - bool Support little endian HC - depends on USB_OHCI_HCD_PPC_OF - default n + bool OHCI support for OF platform bus (little endian) + depends on USB_OHCI_HCD PPC_OF select USB_OHCI_LITTLE_ENDIAN + ---help--- + Enables support for little-endian USB controllers present on the + OpenFirmware platform bus. + + config USB_OHCI_HCD_PPC_OF + bool + default USB_OHCI_HCD_PPC_OF_BE || USB_OHCI_HCD_PPC_OF_LE config USB_OHCI_HCD_PCI bool OHCI support for PCI-bus USB controllers --- Regards-- Subrata ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH -next] powerpc/85xx: Add support for X-ES MPC85xx boards
On Mon, 2009-06-08 at 17:52 -0500, Kumar Gala wrote: +static void xes_mpc85xx_configure_l1(void) +{ [snip] I'd prefer we move this into __setup_cpu_e500v1/__setup_cpu_e500v2 so its done for all processors regardless of platform. How does something like this look? Let me know and I can test and submit it separately. - Nate diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S b/arch/powerpc/kernel/cpu_setup_fsl_booke.S index eb4b9ad..546804f 100644 --- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S +++ b/arch/powerpc/kernel/cpu_setup_fsl_booke.S @@ -17,6 +17,34 @@ #include asm/cputable.h #include asm/ppc_asm.h +_GLOBAL(__e500_icache_enable) + mfspr r3, SPRN_L1CSR1 + orisr3, r3, l1csr1_...@h + ori r3, r3, (L1CSR1_ICFI | L1CSR1_ICE) + mtspr SPRN_L1CSR1, r3 /* Enable I-Cache */ + isync + blr + +_GLOBAL(__e500_dcache_enable) + msync + isync + li r3, 0 + mtspr SPRN_L1CSR0, r3 /* Disable */ + msync + isync + li r3, L1CSR0_DCFI + mtspr SPRN_L1CSR0, r3 /* Invalidate */ + msync + isync + mfspr r3, SPRN_L1CSR0 + orisr3, r3, l1csr0_...@h + ori r3, r3, (L1CSR0_DCFI | L1CSR0_DCE) + msync + isync + mtspr SPRN_L1CSR0, r3 /* Enable */ + isync + blr + _GLOBAL(__setup_cpu_e200) /* enable dedicated debug exception handling resources (Debug APU) */ mfspr r3,SPRN_HID0 @@ -25,7 +53,16 @@ _GLOBAL(__setup_cpu_e200) b __setup_e200_ivors _GLOBAL(__setup_cpu_e500v1) _GLOBAL(__setup_cpu_e500v2) - b __setup_e500_ivors + mflrr4 + bl __e500_icache_enable + bl __e500_dcache_enable + bl __setup_e500_ivors + mtlrr4 + blr _GLOBAL(__setup_cpu_e500mc) - b __setup_e500mc_ivors - + mflrr4 + bl __e500_icache_enable + bl __e500_dcache_enable + bl __setup_e500mc_ivors + mtlrr4 + blr ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [BUILD FAILURE 01/04] Next June 04:PPC64 randconfig [drivers/staging/comedi/drivers.o]
On Tue, Jun 9, 2009 at 20:34, Subrata Modaksubr...@linux.vnet.ibm.com wrote: On Tue, 2009-06-09 at 13:50 +1000, Benjamin Herrenschmidt wrote: On Sun, 2009-06-07 at 20:06 +0530, Subrata Modak wrote: On Sat, 2009-06-06 at 09:36 -0400, Frank Mori Hess wrote: On Saturday 06 June 2009, Greg KH wrote: Frank and Ian, any thoughts about the vmap call in the comedi_buf_alloc() call? Why is it using PAGE_KERNEL_NOCACHE, and what is the prealloc_buf buffer used for? It is a circular buffer used to hold data streaming either to or from a board (for example when producing an analog output waveform). Reads and writes to the device files read/write to the circular buffer, plus a few drivers do dma directly to/from it. I personally don't have a problem with requiring drivers to have their own dma buffers and making them copy data between their private dma buffers and the main circular buffer. I guess the original design wanted to support zero-copy dma. Great to hear that. How about a patch that solves my build problem on PPC64(the problem seems to be existing for long) ? In any case, doing PAGE_KERNEL_NOCACHE for DMA memory is incorrect on many architectures. So at this stage, there's no much option but ifdef I suspect for now until this is fixed properly. Ok. But, i am not sure whether Greg will agree to this. If, Ok, is the following patch i sent earlier Ok ? http://lkml.org/lkml/2009/6/5/462, Your patch helps powerpc only. Compilation is still broken on most other architectures. It does make sense to want to have some memory like that shared between user space and DMA, though I don't know what the right approach that works on all archs is at this stage. Worth asking the Alsa guys, I think they have similar issues :-) But doing double buffering might do the trick fine for now. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH -next] powerpc/85xx: Add support for X-ES MPC85xx boards
On Jun 9, 2009, at 1:53 PM, Nate Case wrote: On Mon, 2009-06-08 at 17:52 -0500, Kumar Gala wrote: +static void xes_mpc85xx_configure_l1(void) +{ [snip] I'd prefer we move this into __setup_cpu_e500v1/__setup_cpu_e500v2 so its done for all processors regardless of platform. How does something like this look? Let me know and I can test and submit it separately. - Nate diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S b/arch/ powerpc/kernel/cpu_setup_fsl_booke.S index eb4b9ad..546804f 100644 --- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S +++ b/arch/powerpc/kernel/cpu_setup_fsl_booke.S @@ -17,6 +17,34 @@ #include asm/cputable.h #include asm/ppc_asm.h +_GLOBAL(__e500_icache_enable) I'd prefer we test to see if the cache is enabled and if it is just return + mfspr r3, SPRN_L1CSR1 + orisr3, r3, l1csr1_...@h + ori r3, r3, (L1CSR1_ICFI | L1CSR1_ICE) + mtspr SPRN_L1CSR1, r3 /* Enable I-Cache */ + isync + blr + +_GLOBAL(__e500_dcache_enable) I'd prefer we test to see if the cache is enabled and if it is just return + msync + isync + li r3, 0 + mtspr SPRN_L1CSR0, r3 /* Disable */ + msync + isync + li r3, L1CSR0_DCFI should probably flash reset the locks as well. + mtspr SPRN_L1CSR0, r3 /* Invalidate */ + msync + isync + mfspr r3, SPRN_L1CSR0 + orisr3, r3, l1csr0_...@h + ori r3, r3, (L1CSR0_DCFI | L1CSR0_DCE) + msync + isync + mtspr SPRN_L1CSR0, r3 /* Enable */ + isync + blr + _GLOBAL(__setup_cpu_e20 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/mpc52xx/mtd: fix mtd-ram access for 16-bit Local Plus Bus
Hi all, this patch adds support for RAM chips connected to the Local Plus Bus of a MPC5200B in 16-bit mode. As no single byte write accesses are allowed by the bus in this mode, a byte write has to be split into a word read - modify - write sequence (mpc52xx_memcpy2lpb16, as fix/extension for memcpy_toio; note that memcpy_fromio *does* work just fine). It has been tested in conjunction with Wolfram Sang's mtd-ram [1] and Sascha Hauer's jffs unaligned access [2] patches on 2.6.29.1, with a Renesas static RAM connected in 16-bit Large Flash mode. [1] http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-June/072794.html [2] http://article.gmane.org/gmane.linux.drivers.mtd/21521 Signed-off-by: Albrecht Dreß albrecht.dr...@arcor.de Cc: Grant Likely grant.lik...@secretlab.ca Cc: David Woodhouse dw...@infradead.org Cc: linuxppc-...@ozlabs.org --- diff -u linux-2.6.29.1.orig/arch/powerpc/platforms/52xx/mpc52xx_common.c linux-2.6.29.1/arch/powerpc/platforms/52xx/mpc52xx_common.c --- linux-2.6.29.1.orig/arch/powerpc/platforms/52xx/mpc52xx_common.c 2009-04-02 22:55:27.0 +0200 +++ linux-2.6.29.1/arch/powerpc/platforms/52xx/mpc52xx_common.c 2009-06-09 21:16:22.0 +0200 @@ -225,3 +225,59 @@ while (1); } + +/** + * mpc52xx_memcpy2lpb16: copy data to the Local Plus Bus in 16-bit mode which + * doesn't allow byte accesses + */ +void +mpc52xx_memcpy2lpb16(volatile void __iomem *dest, const void *src, +unsigned long n) +{ + void *vdest = (void __force *) dest; + + __asm__ __volatile__ (sync : : : memory); + + if (((unsigned long) vdest 1) != 0) { + u8 buf[2]; + + *(u16 *)buf = *((volatile u16 *)(vdest - 1)); + buf[1] = *((u8 *)src); + *((volatile u16 *)(vdest - 1)) = *(u16 *)buf; + src++; + vdest++; + n--; + } + + /* looks weird, but helps the optimiser... */ + if (n = 4) { + unsigned long chunks = n 2; + volatile u32 * _dst = (volatile u32 *)(vdest - 4); + volatile u32 * _src = (volatile u32 *)(src - 4); + + vdest += chunks 2; + src += chunks 2; + do { + *++_dst = *++_src; + } while (--chunks); + n = 3; + } + + if (n = 2) { + *((volatile u16 *)vdest) = *((volatile u16 *)src); + src += 2; + vdest += 2; + n -= 2; + } + + if (n 0) { + u8 buf[2]; + + *(u16 *)buf = *((volatile u16 *)vdest); + buf[0] = *((u8 *)src); + *((volatile u16 *)vdest) = *(u16 *)buf; + } + + __asm__ __volatile__ (sync : : : memory); +} +EXPORT_SYMBOL(mpc52xx_memcpy2lpb16); diff -u linux-2.6.29.1.orig/arch/powerpc/include/asm/mpc52xx.h linux-2.6.29.1/arch/powerpc/include/asm/mpc52xx.h --- linux-2.6.29.1.orig/arch/powerpc/include/asm/mpc52xx.h 2009-04-02 22:55:27.0 +0200 +++ linux-2.6.29.1/arch/powerpc/include/asm/mpc52xx.h 2009-06-09 21:14:31.0 +0200 @@ -274,6 +274,8 @@ extern void mpc52xx_map_common_devices(void); extern int mpc52xx_set_psc_clkdiv(int psc_id, int clkdiv); extern void mpc52xx_restart(char *cmd); +extern void mpc52xx_memcpy2lpb16(volatile void __iomem *dest, const void *src, +unsigned long n); /* mpc52xx_pic.c */ extern void mpc52xx_init_irq(void); diff -u linux-2.6.29.1.orig/include/linux/mtd/map.h linux-2.6.29.1/include/linux/mtd/map.h --- linux-2.6.29.1.orig/include/linux/mtd/map.h 2009-04-02 22:55:27.0 +0200 +++ linux-2.6.29.1/include/linux/mtd/map.h 2009-06-08 14:28:05.0 +0200 @@ -13,6 +13,9 @@ #include asm/unaligned.h #include asm/system.h #include asm/io.h +#ifdef CONFIG_PPC_MPC52xx +#include asm/mpc52xx.h +#endif #ifdef CONFIG_MTD_MAP_BANK_WIDTH_1 #define map_bankwidth(map) 1 @@ -417,6 +420,11 @@ static inline void inline_map_copy_to(struct map_info *map, unsigned long to, const void *from, ssize_t len) { +#ifdef CONFIG_PPC_MPC52xx + if (map-bankwidth == 2) + mpc52xx_memcpy2lpb16(map-virt + to, from, len); + else +#endif memcpy_toio(map-virt + to, from, len); } pgpVaj8YdBwdW.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4] zone_reclaim is always 0 by default
On Tue, 9 Jun 2009 07:02:14 -0500 Robin Holt h...@sgi.com wrote: On Tue, Jun 09, 2009 at 11:37:55AM +0100, Mel Gorman wrote: On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote: On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote: Let me start by saying I agree completely with everything you wrote and still disagree with this patch, but was willing to compromise and work around this for our upcoming x86_64 machine by putting a value add into our packaging of adding a sysctl that turns reclaim back on. To be honest, I'm more leaning towards a NACK than an ACK on this one. I don't support enough NUMA machines to feel strongly enough about it but unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's might be there seems ill-advised to me and will have other consequences for existing more traditional x86-64 NUMA machines. I was sort-of planning on coming up with an x86_64 arch specific function for setting zone_reclaim_mode, but didn't like the direction things were going. Something to the effect of... --- 20090609.orig/mm/page_alloc.c 2009-06-09 06:51:34.0 -0500 +++ 20090609/mm/page_alloc.c2009-06-09 06:55:00.160762069 -0500 @@ -2326,12 +2326,7 @@ static void build_zonelists(pg_data_t *p while ((node = find_next_best_node(local_node, used_mask)) = 0) { int distance = node_distance(local_node, node); - /* -* If another node is sufficiently far away then it is better -* to reclaim pages in a zone before going off node. -*/ - if (distance RECLAIM_DISTANCE) - zone_reclaim_mode = 1; + zone_reclaim_mode = arch_zone_reclaim_mode(distance); /* * We don't want to pressure a particular node. And then letting each arch define an arch_zone_reclaim_mode(). If other values are needed in the determination, we would add parameters to reflect this. For ia64, add static inline ia64_zone_reclaim_mode(int distance) { if (distance 15) return 1; } #define arch_zone_reclaim_mode(_d) ia64_zone_reclaim_mode(_d) Then, inside x86_64_zone_reclaim_mode(), I could make it something like if (distance 40 || is_uv_system()) return 1; In the end, I didn't think this fight was worth fighting given how ugly this felt. Upon second thought, I am beginning to think it is not that bad, but I also don't think it is that good either. We've done worse before now... Is it not possible to work out at runtime whether zone reclaim mode is beneficial? Given that zone_reclaim_mode is settable from initscripts, why all the fuss? Is anyone testing RECLAIM_WRITE and RECLAIM_SWAP, btw? The root cause of this problem: having something called mode. Any time we put a mode in the kernel, we get in a mess trying to work out when to set it and to what. I think I'll drop this patch for now. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [BUILD FAILURE 01/04] Next June 04:PPC64 randconfig [drivers/staging/comedi/drivers.o]
In any case, doing PAGE_KERNEL_NOCACHE for DMA memory is incorrect on many architectures. So at this stage, there's no much option but ifdef I suspect for now until this is fixed properly. Ok. But, i am not sure whether Greg will agree to this. If, Ok, is the following patch i sent earlier Ok ? http://lkml.org/lkml/2009/6/5/462, Not really. You probably want to use a constant (call it MY_DMA_MAP_PGPROT), and in a header, you have a bunch of ifdef's that set it to PAGE_KERNEL, PAGE_KERNEL_NOCACHE or PAGE_KERNEL_NC depending on what's needed. Today, you can pretty much assume that - x86*, sparc*, ia64*, alpha, ... needs PAGE_KERNEL - powerpc needs PAGE_KERNEL if !CONFIG_NOT_COHERENT_CACHE - powerpc needs PAGE_KERNEL_NC if CONFIG_NOT_COHERENT_CACHE - ARM and MIPS, I think, needs PAGE_KERNEL_NOCACHE - ... others I don't know. Cheers, Ben. Regards-- Subrata It does make sense to want to have some memory like that shared between user space and DMA, though I don't know what the right approach that works on all archs is at this stage. Worth asking the Alsa guys, I think they have similar issues :-) But doing double buffering might do the trick fine for now. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Info threads hangs in Linux-2.6.29 with KGDBOE
Hi All, *ISSUE *: *Info threads* hangs in KGDBOE Kernel : Linux-2.6.29 Bug found in Architectures: PowerPC (ppc32), x86 --- While trying to run kernel* Linux-2.6.29* on* PowerPC* Xilinx target with *KGDBOE *enabled. Further issues arise when I run *info threads* after connecting to the target. following is the error: (gdb) target remote udp:10.161.2.35:6443 warning: The remote protocol may be unreliable over UDP. Some events may be lost, rendering further debugging impossible. Remote debugging using udp:10.161.2.35:6443 kgdb_breakpoint () at kernel/kgdb.c:1803 1803arch_kgdb_breakpoint(); (gdb) info threads [New Thread -2] [New Thread 2] [New Thread 3] [New Thread 4] [New Thread 5] [New Thread 6] [New Thread 59] [New Thread 67] [New Thread 101] [New Thread 102] [New Thread 103] [New Thread 104] [New Thread 105] 14 Thread 105 (nfsiod) __switch_to (prev=value optimized out, new=0xcf89c100) at arch/powerpc/kernel/process.c:411 13 Thread 104 (aio/0) __switch_to (prev=value optimized out, new=0xcf82f4e0) at arch/powerpc/kernel/process.c:411 12 Thread 103 (kswapd0) __switch_to (prev=value optimized out, new=0xcf82f4e0) at arch/powerpc/kernel/process.c:411 11 Thread 102 (pdflush) __switch_to (prev=value optimized out, new=0xcf82e880) at arch/powerpc/kernel/process.c:411 10 Thread 101 (pdflush) Ignoring packet error, continuing... Ignoring packet error, continuing... Ignoring packet error, continuing... Ignoring packet error, continuing... Ignoring packet error, continuing... Finally kernel dies, after these error messages. This issue is not found till Linux-2.6.28.10 kernel version, KGDBOE works fine in x86 PowerPC. Now the bug is seen in x86 (32bit) and PowerPC from kernel version Linux-2.6.29 Hope this should not be raw_smp_processor_id issue ! The CPU ID returned in both arch's is 0. Which patch in netpoll* or any net device has caused this issue. One more thing to notice in x86 or PowerPC the kernel dies exactly after reply of four threads/packets. Thanks, Srikanth Krishnakar ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] of_serial: Add UPF_FIXED_TYPE flag
This patch adds the UPF_FIXED_TYPE flag which will bypass the 8250's autoconfig probe for uart type. The uart type identified by the of_serial's parse of the flat device tree will be utilized as defined. Signed-off-by: Dave Mitchell dmitch...@amcc.com --- drivers/serial/of_serial.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/serial/of_serial.c b/drivers/serial/of_serial.c index 14f8fa9..3f2027c 100644 --- a/drivers/serial/of_serial.c +++ b/drivers/serial/of_serial.c @@ -67,7 +67,7 @@ static int __devinit of_platform_serial_setup(struct of_device *ofdev, port-type = type; port-uartclk = *clk; port-flags = UPF_SHARE_IRQ | UPF_BOOT_AUTOCONF | UPF_IOREMAP - | UPF_FIXED_PORT; + | UPF_FIXED_PORT | UPF_FIXED_TYPE; port-dev = ofdev-dev; /* If current-speed was set, then try not to change it. */ if (spd) -- 1.6.3.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages
On Tue, 2009-06-09 at 15:59 +0200, Hannes Hering wrote: This patch implements toleration of dynamic memory operations and 16 GB gigantic pages. On module load the driver walks through available system memory, checks for available memory ranges and then registers the kernel internal memory region accordingly. The translation of address ranges is implemented via a 3-level busmap. Hi Hannes, For those of us who haven't read the HEA spec lately, can you give us some more detail on that? :) How does it interact with kexec/kdump? +static int ehca_update_busmap(unsigned long pfn, unsigned long nr_pages) +{ + unsigned long i, start_section, end_section; + int top, dir, idx; + + if (!nr_pages) + return 0; + + if (!ehca_bmap) { + ehca_bmap = kmalloc(sizeof(struct ehca_bmap), GFP_KERNEL); + if (!ehca_bmap) + return -ENOMEM; + /* Set map block to 0xFF according to EHCA_INVAL_ADDR */ + memset(ehca_bmap, 0xFF, EHCA_TOP_MAP_SIZE); + } + + start_section = phys_to_abs(pfn * PAGE_SIZE) / EHCA_SECTSIZE; + end_section = phys_to_abs((pfn + nr_pages) * PAGE_SIZE) / EHCA_SECTSIZE; phys_to_abs() ? As below, or does it come from somewhere else? arch/powerpc/include/asm/abs_addr.h: 47 static inline unsigned long phys_to_abs(unsigned long pa) 48 { 49 unsigned long chunk; 50 51 /* This is a no-op on non-iSeries */ 52 if (!firmware_has_feature(FW_FEATURE_ISERIES)) 53 return pa; 54 55 chunk = addr_to_chunk(pa); 56 57 if (chunk mschunks_map.num_chunks) 58 chunk = mschunks_map.mapping[chunk]; 59 60 return chunk_to_addr(chunk) + (pa MSCHUNKS_OFFSET_MASK); 61 } cheers signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [BUILD FAILURE 02/04] Next June 04:PPC64 randconfig [drivers/usb/host/ohci-hcd.o]
On Friday 05 June 2009, Subrata Modak wrote: Correct, it fixes the issue. However, since few changes might have gone to the Kconfig, the patch does not apply cleanly. Below is the patch, just a retake of the earlier one, but on the latest code. And it got mangled a bit along the way. Plus, the original one goofed up Kconfig dependency displays ... both issues fixed in this version, against current mainline GIT. If someone can verify all four PPC/OF/OHCI configs build on on PPC64, I'm OK with it. - Dave == CUT HERE From: Arnd Bergmann a...@arndb.de Subject: fix build failure for PPC64 randconfig [usb/ohci] We could just make the USB_OHCI_HCD_PPC_OF option implicit and selected only if at least one of USB_OHCI_HCD_PPC_OF_BE and USB_OHCI_HCD_PPC_OF_LE are set. [ dbrown...@users.sourceforge.net: fix patch manglation and dependencies ] Signed-off-by: Arnd Bergmann a...@arndb.de Resent-by: Subrata Modak subr...@linux.vnet.ibm.com Signed-off-by: David Brownell dbrown...@users.sourceforge.net --- drivers/usb/host/Kconfig | 29 +++-- 1 file changed, 15 insertions(+), 14 deletions(-) --- a/drivers/usb/host/Kconfig +++ b/drivers/usb/host/Kconfig @@ -180,26 +180,27 @@ config USB_OHCI_HCD_PPC_SOC Enables support for the USB controller on the MPC52xx or STB03xxx processor chip. If unsure, say Y. -config USB_OHCI_HCD_PPC_OF - bool OHCI support for PPC USB controller on OF platform bus - depends on USB_OHCI_HCD PPC_OF - default y - ---help--- - Enables support for the USB controller PowerPC present on the - OpenFirmware platform bus. - config USB_OHCI_HCD_PPC_OF_BE - bool Support big endian HC - depends on USB_OHCI_HCD_PPC_OF - default y + bool OHCI support for OF platform bus (big endian) + depends on USB_OHCI_HCD PPC_OF select USB_OHCI_BIG_ENDIAN_DESC select USB_OHCI_BIG_ENDIAN_MMIO + ---help--- + Enables support for big-endian USB controllers present on the + OpenFirmware platform bus. config USB_OHCI_HCD_PPC_OF_LE - bool Support little endian HC - depends on USB_OHCI_HCD_PPC_OF - default n + bool OHCI support for OF platform bus (little endian) + depends on USB_OHCI_HCD PPC_OF select USB_OHCI_LITTLE_ENDIAN + ---help--- + Enables support for little-endian USB controllers present on the + OpenFirmware platform bus. + +config USB_OHCI_HCD_PPC_OF + bool + depends on USB_OHCI_HCD PPC_OF + default USB_OHCI_HCD_PPC_OF_BE || USB_OHCI_HCD_PPC_OF_LE config USB_OHCI_HCD_PCI bool OHCI support for PCI-bus USB controllers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
next branch update
Hi ! I've updated my next branch with the following patches. We're getting real close to the merge window now, so if something is missing, please holler ASAP. Cheers, Ben. Becky Bruce (1): powerpc: Add support for swiotlb on 32-bit Benjamin Herrenschmidt (8): powerpc/mm: Fix some SMP issues with MMU context handling powerpc/mm: Fix a AB-BA deadlock scenario with nohash MMU context lock powerpc: Set init_bootmem_done on NUMA platforms as well powerpc: Move VMX and VSX asm code to vector.S powerpc: Introduce CONFIG_PPC_BOOK3S powerpc: Split exception handling out of head_64.S powerpc: Separate PACA fields for server CPUs powerpc: Shield code specific to 64-bit server processors Grant Likely (1): powerpc/virtex: refactor intc driver and add support for i8259 cascading John Linn (1): fbdev: Add PLB support and cleanup DCR in xilinxfb driver. Roderick Colenbrander (3): powerpc/virtex: Add support for Xilinx PCI host bridge powerpc/virtex: Add Xilinx ML510 reference design support powerpc/virtex: Add ml510 reference design device tree Roland McGrath (1): powerpc: Add PTRACE_SINGLEBLOCK support Stephen Rothwell (4): powerpc/pseries: Fix warnings when printing resource_size_t powerpc/xmon: Remove unused variable in xmon.c powerpc: Fix warning when printing a resource_size_t powerpc/spufs: Remove unused error path ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: next branch update
On Tue, Jun 9, 2009 at 9:14 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: Hi ! I've updated my next branch with the following patches. We're getting real close to the merge window now, so if something is missing, please holler ASAP. Just these two; but I see you've got them marked under review: http://patchwork.ozlabs.org/patch/28191/ http://patchwork.ozlabs.org/patch/27752/ g. -- Grant Likely, B.Sc., P.Eng. Secret Lab Technologies Ltd. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev