linux-next: manual merge of the driver-core tree with the powerpc tree
Hi Greg, Today's linux-next merge of the driver-core tree got a conflict in drivers/block/ps3disk.c between commits 6dee2c87ebbe5d7ce8c4c163966a0bd9c02c75ef (block/ps3: remove driver_data direct access of struct device) and 03fa68c245cccbcb99035cbabaa13b408ba91ab5 (ps3: shorten ps3_system_bus_ [gs]et_driver_data to ps3_system_bus_[gs]et_drvdata) from the powerpc tree and commit db7afa200c4ef6823a2a40e4ea1dd747775be01a (block/ps3: remove driver_data direct access of struct device) from the driver-core tree. I fixed it up (I used the version from the powerpc tree). Greg, I think the driver-core patch is no longer relevant. -- Cheers, Stephen Rothwells...@canb.auug.org.au pgp7DK8NPJvkN.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: linux-next: manual merge of the driver-core tree with the powerpc tree
On Tue, Jun 16, 2009 at 03:53:17PM +1000, Stephen Rothwell wrote: Hi Greg, Today's linux-next merge of the driver-core tree got a conflict in drivers/block/ps3disk.c between commits 6dee2c87ebbe5d7ce8c4c163966a0bd9c02c75ef (block/ps3: remove driver_data direct access of struct device) and 03fa68c245cccbcb99035cbabaa13b408ba91ab5 (ps3: shorten ps3_system_bus_ [gs]et_driver_data to ps3_system_bus_[gs]et_drvdata) from the powerpc tree and commit db7afa200c4ef6823a2a40e4ea1dd747775be01a (block/ps3: remove driver_data direct access of struct device) from the driver-core tree. I fixed it up (I used the version from the powerpc tree). Greg, I think the driver-core patch is no longer relevant. I pushed out an update a number of hours ago (like 6+), so you should have gotten it with this update. When did you pull from my tree? I also just sent a merge request to Linus, so everything should be fixed up now. thanks, greg k-h ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: linux-next: manual merge of the driver-core tree with the powerpc tree
Hi Greg, On Mon, 15 Jun 2009 23:18:49 -0700 Greg KH g...@kroah.com wrote: I pushed out an update a number of hours ago (like 6+), so you should have gotten it with this update. When did you pull from my tree? I also just sent a merge request to Linus, so everything should be fixed up now. OK, thanks. I fetch the trees usually between 8:30am and 10:00am UTC +1000 (or +1100 in summer), so about 7 hours ago :-) -- Cheers, Stephen Rothwells...@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ pgpdq8U6asvT8.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: linux-next: Linus/powerpc tree build warning
On Tue, 2009-06-16 at 16:24 +1000, Stephen Rothwell wrote: Hi Ben, Today's linux-next build (powerpc ppc44x_defconfig) produced this warning: drivers/video/xilinxfb.c: In function 'xilinxfb_assign': drivers/video/xilinxfb.c:328: warning: cast to pointer from integer of different size Introduced by commit dac4ccfb64bcdd5b4c248ccc22903d67486573cd (fbdev: Add PLB support and cleanup DCR in xilinxfb driver). Known problem, a fix is currently being polished by Grant and John. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages
Hi Roland, thank you for taking a look at the code! On Fri, 12 Jun 2009 21:50:58 -0700 Roland Dreier rdre...@cisco.com wrote: OK, one major issue with this patch and a few minor nits. First, the major issue is that I don't see anything in the patch that changes the code in ehca_mem_notifier() in ehca_main.c: case MEM_GOING_ONLINE: case MEM_GOING_OFFLINE: /* only ok if no hca is attached to the lpar */ spin_lock_irqsave(shca_list_lock, flags); if (list_empty(shca_list)) { spin_unlock_irqrestore(shca_list_lock, flags); return NOTIFY_OK; } else { spin_unlock_irqrestore(shca_list_lock, flags); if (printk_timed_ratelimit(ehca_dmem_warn_time, 30 * 1000)) ehca_gen_err(DMEM operations are not allowed in conjunction with eHCA); return NOTIFY_BAD; } But your patch description says: This patch implements toleration of dynamic memory operations But it seems you're still going to hit the same NOTIFY_BAD case above after your patch. So something doesn't compute for me. Could you explain more? Yeah, the notifier code remains untouched as we still do not allow dynamic memory operations _while_ our module is loaded. The patch allows the driver to cope with DMEM operations that happened before the module was loaded, which might result in a non-contiguous memory layout. When the driver registers its global memory region in the system, the memory layout must be considered. We chose the term toleration instead of support to illustrate this. I'll put some more details into the changelog, incorporate the other comments and send out a second version of the patch. Thanks, Alex Second, a nit: +#define EHCA_REG_MR 0 +#define EHCA_REG_BUSMAP_MR (~0) and you pass these as the reg_busmap parm in: int ehca_reg_mr(struct ehca_shca *shca, struct ehca_mr *e_mr, u64 *iova_start, @@ -991,7 +1031,8 @@ struct ehca_pd *e_pd, struct ehca_mr_pginfo *pginfo, u32 *lkey, /*OUT*/ - u32 *rkey) /*OUT*/ + u32 *rkey, /*OUT*/ + int reg_busmap) and test it as: + if (reg_busmap) + ret = ehca_reg_bmap_mr_rpages(shca, e_mr, pginfo); + else + ret = ehca_reg_mr_rpages(shca, e_mr, pginfo); So the ~0 for true looks a bit odd. One option would be to make reg_busmap a bool, since that's how you're using it, but then you lose the nice self-documenting macro where you call things. So I think it would be cleaner to do something like enum ehca_reg_type { EHCA_REG_MR, EHCA_REG_BUSMAP_MR }; and make the int reg_busmap parameter into enum ehca_reg_type reg_type and have the code become + if (reg_type == EHCA_REG_BUSMAP_MR) + ret = ehca_reg_bmap_mr_rpages(shca, e_mr, pginfo); + else if (reg_type == EHCA_REG_MR) + ret = ehca_reg_mr_rpages(shca, e_mr, pginfo); + else + ret = -EINVAL or something like that. +struct ib_dma_mapping_ops ehca_dma_mapping_ops = { + .mapping_error = ehca_dma_mapping_error, + .map_single = ehca_dma_map_single, + .unmap_single = ehca_dma_unmap_single, + .map_page = ehca_dma_map_page, + .unmap_page = ehca_dma_unmap_page, + .map_sg = ehca_dma_map_sg, + .unmap_sg = ehca_dma_unmap_sg, + .dma_address = ehca_dma_address, + .dma_len = ehca_dma_len, + .sync_single_for_cpu = ehca_dma_sync_single_for_cpu, + .sync_single_for_device = ehca_dma_sync_single_for_device, + .alloc_coherent = ehca_dma_alloc_coherent, + .free_coherent = ehca_dma_free_coherent, +}; I always think structures like this are easier to read if you align the '=' signs. But no big deal. + ret = ehca_create_busmap(); + if (ret) { + ehca_gen_err(Cannot create busmap.); + goto module_init2; + } + ret = ibmebus_register_driver(ehca_driver); if (ret) { ehca_gen_err(Cannot register eHCA device driver); ret = -EINVAL; - goto module_init2; + goto module_init3; } ret = register_memory_notifier(ehca_mem_nb); if (ret) { ehca_gen_err(Failed registering memory add/remove notifier); - goto module_init3; + goto module_init4; Having to renumber unrelated things is when something changes is why I don't like this style of error path labels. But I think it's well and truly too late to fix that in ehca. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2.6.31 try 2] ehca: Tolerate dynamic memory operations and huge pages
From: Hannes Hering heri...@de.ibm.com This patch implements toleration of dynamic memory operations and 16 GB gigantic pages. Toleration means that the driver can cope with dynamic memory operations that happened before the driver was loaded. While using the ehca driver, dynamic memory operations are still prohibited. On module load the driver walks through available system memory, checks for available memory ranges and then registers the kernel internal memory region accordingly. The translation of address ranges is implemented via a 3-level busmap. Signed-off-by: Hannes Hering heri...@de.ibm.com --- This patch is built and tested against infiniband.git. Please apply for 2.6.31. drivers/infiniband/hw/ehca/ehca_main.c | 20 + drivers/infiniband/hw/ehca/ehca_mrmw.c | 508 - drivers/infiniband/hw/ehca/ehca_mrmw.h | 13 3 files changed, 523 insertions(+), 18 deletions(-) --- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_main.c +++ infiniband.git/drivers/infiniband/hw/ehca/ehca_main.c @@ -52,7 +52,7 @@ #include ehca_tools.h #include hcp_if.h -#define HCAD_VERSION 0026 +#define HCAD_VERSION 0027 MODULE_LICENSE(Dual BSD/GPL); MODULE_AUTHOR(Christoph Raisch rai...@de.ibm.com); @@ -506,6 +506,7 @@ static int ehca_init_device(struct ehca_ shca-ib_device.detach_mcast= ehca_detach_mcast; shca-ib_device.process_mad = ehca_process_mad; shca-ib_device.mmap= ehca_mmap; + shca-ib_device.dma_ops = ehca_dma_mapping_ops; if (EHCA_BMASK_GET(HCA_CAP_SRQ, shca-hca_cap)) { shca-ib_device.uverbs_cmd_mask |= @@ -1028,17 +1029,23 @@ static int __init ehca_module_init(void) goto module_init1; } + ret = ehca_create_busmap(); + if (ret) { + ehca_gen_err(Cannot create busmap.); + goto module_init2; + } + ret = ibmebus_register_driver(ehca_driver); if (ret) { ehca_gen_err(Cannot register eHCA device driver); ret = -EINVAL; - goto module_init2; + goto module_init3; } ret = register_memory_notifier(ehca_mem_nb); if (ret) { ehca_gen_err(Failed registering memory add/remove notifier); - goto module_init3; + goto module_init4; } if (ehca_poll_all_eqs != 1) { @@ -1053,9 +1060,12 @@ static int __init ehca_module_init(void) return 0; -module_init3: +module_init4: ibmebus_unregister_driver(ehca_driver); +module_init3: + ehca_destroy_busmap(); + module_init2: ehca_destroy_slab_caches(); @@ -1073,6 +1083,8 @@ static void __exit ehca_module_exit(void unregister_memory_notifier(ehca_mem_nb); + ehca_destroy_busmap(); + ehca_destroy_slab_caches(); ehca_destroy_comp_pool(); --- infiniband.git.orig/drivers/infiniband/hw/ehca/ehca_mrmw.c +++ infiniband.git/drivers/infiniband/hw/ehca/ehca_mrmw.c @@ -53,6 +53,38 @@ /* max number of rpages (per hcall register_rpages) */ #define MAX_RPAGES 512 +/* DMEM toleration management */ +#define EHCA_SECTSHIFTSECTION_SIZE_BITS +#define EHCA_SECTSIZE (1UL EHCA_SECTSHIFT) +#define EHCA_HUGEPAGESHIFT 34 +#define EHCA_HUGEPAGE_SIZE (1UL EHCA_HUGEPAGESHIFT) +#define EHCA_HUGEPAGE_PFN_MASK ((EHCA_HUGEPAGE_SIZE - 1) PAGE_SHIFT) +#define EHCA_INVAL_ADDR0xULL +#define EHCA_DIR_INDEX_SHIFT 13 /* 8k Entries in 64k block */ +#define EHCA_TOP_INDEX_SHIFT (EHCA_DIR_INDEX_SHIFT * 2) +#define EHCA_MAP_ENTRIES (1 EHCA_DIR_INDEX_SHIFT) +#define EHCA_TOP_MAP_SIZE (0x1) /* currently fixed map size */ +#define EHCA_DIR_MAP_SIZE (0x1) +#define EHCA_ENT_MAP_SIZE (0x1) +#define EHCA_INDEX_MASK (EHCA_MAP_ENTRIES - 1) + +static unsigned long ehca_mr_len; + +/* + * Memory map data structures + */ +struct ehca_dir_bmap { + u64 ent[EHCA_MAP_ENTRIES]; +}; +struct ehca_top_bmap { + struct ehca_dir_bmap *dir[EHCA_MAP_ENTRIES]; +}; +struct ehca_bmap { + struct ehca_top_bmap *top[EHCA_MAP_ENTRIES]; +}; + +static struct ehca_bmap *ehca_bmap; + static struct kmem_cache *mr_cache; static struct kmem_cache *mw_cache; @@ -68,6 +100,8 @@ enum ehca_mr_pgsize { #define EHCA_MR_PGSHIFT1M 20 #define EHCA_MR_PGSHIFT16M 24 +static u64 ehca_map_vaddr(void *caddr); + static u32 ehca_encode_hwpage_size(u32 pgsize) { int log = ilog2(pgsize); @@ -135,7 +169,8 @@ struct ib_mr *ehca_get_dma_mr(struct ib_ goto get_dma_mr_exit0; } - ret = ehca_reg_maxmr(shca, e_maxmr, (u64 *)KERNELBASE, + ret = ehca_reg_maxmr(shca, e_maxmr, +(void *)ehca_map_vaddr((void *)KERNELBASE), mr_access_flags, e_pd,
Re: [PATCH 2/2] uio: add an of_genirq driver
diff --git a/Documentation/powerpc/dts-bindings/uio-generic.txt b/Documentation/powerpc/dts-bindings/uio-generic.txt new file mode 100644 index 000..8ad9861 --- /dev/null +++ b/Documentation/powerpc/dts-bindings/uio-generic.txt @@ -0,0 +1,16 @@ +UIO for custom devices + +A device which will be mapped using the UIO subsystem. + +Properties: + - compatible : should contain the specific model used, followed by + generic-uio. + - reg : address range(s) of the device (up to MAX_UIO_MAPS) + - interrupts : interrupt of the device + +Example: + c64f...@0 { + compatible = ptx,c64fpga001, generic-uio; + reg = 0x0 0x1; + interrupts = 0 0 3; + }; Hmmm, I'm not happy about this. The device tree describes the hardware, not the way Linux uses the hardware. UIO definitely falls into the category of Linux implementation detail. Yes, I am aware of that. I just started with the mechanisms which are available today and hoped we could find some compatible-value which will suit all needs. This should be approached from the other way around. Either the generic-uio of_platform driver should contain an explicit list of devices to be handled by UIO, Well, that could lead to a quite huge match_table over time. or the OF infrastructure should be modified to allow things like force binding of_devices to of_drivers at runtime. That is an interesting idea. I could imagine something like a 'new_compatible entry in the sysfs-section of the driver similar to 'new_id' for PCI. After writing a new compatible-string into it, matching will triggered again with the new entry added. That could (should?) also be placed at the of-core-level. Or did you have something else in mind? Regards, Wolfram -- Pengutronix e.K. | Wolfram Sang| Industrial Linux Solutions | http://www.pengutronix.de/ | signature.asc Description: Digital signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 2/2] mtd/maps/mtd-ram: add an of-platform driver
On Sat, Jun 06, 2009 at 09:14:08AM +0100, David Woodhouse wrote: On Fri, 2009-06-05 at 14:05 +0200, Wolfram Sang wrote: Create an of-aware driver using the now exported generic functions from plat-ram.c. Also add the documentation for the binding. Partitions are not yet supported. Tested on a phyCORE-MPC5200B-IO. Do we have an ack for the device-tree bindings? Well, Grant acked so far and he surely has a voice in the device-tree-world :) Or do you want a specific ack for that? Regards, Wolfram -- Pengutronix e.K. | Wolfram Sang| Industrial Linux Solutions | http://www.pengutronix.de/ | signature.asc Description: Digital signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 2/2] mtd/maps/mtd-ram: add an of-platform driver
On Mon, Jun 15, 2009 at 07:43:20PM +0200, Albrecht Dreß wrote: Am 05.06.09 14:05 schrieb(en) Wolfram Sang: Create an of-aware driver using the now exported generic functions from plat-ram.c. Also add the documentation for the binding. Partitions are not yet supported. Tested on a phyCORE-MPC5200B-IO. Dumb question: what is the current status of this patch? I ask because I don't know. David wondered if 'of_physmap' could be used, so I wrote what I experienced during development. I can't tell if this helped the case. Grant wondered if we need a bankwidth. IMHO it is needed for now, but I don't know if this is a common agreement. Regards, Wolfram -- Pengutronix e.K. | Wolfram Sang| Industrial Linux Solutions | http://www.pengutronix.de/ | signature.asc Description: Digital signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Chipselect in SPI binding with mpc5200-psc-spi
Now my driver gets probed. This modalias error was due to wrong compatible attribute of my of node. But I still need to have the call to of_register_spi_devices(). The attached patch shows what I mean. rg kd Grant Likely wrote: On Mon, Jun 15, 2009 at 10:36 AM, Kári Davíðssonkari.davids...@marel.com wrote: Is this true? Grant Likely wrote: Yes, this is right. The psc_spi driver automatically registers all spi children that it finds in the device tree onto the SPI bus. Therefore registering an spi_driver() is the right thing to do. I am writing an SPI protocol driver and I find that my driver is never probed. I tried to add and call to of_register_spi_devices() in the drivers/spi/mpc52xx_psc_spi.c::mpc52xx_psc_spi_of_probe() function, without much effect besided that the DTS node is parsed but the driver probe is not called, actually it complains about a modalias for my node is missing. What do you see when you look in /sys/bus/spi/devices? You should see a directory for your device. What do you see in /sys/bus/spi/drivers? In here you should see your driver. If they are both there, then you just have a problem matching your driver name to the device name. g. Index: drivers/spi/mpc52xx_psc_spi.c === --- drivers/spi/mpc52xx_psc_spi.c (revision 548) +++ drivers/spi/mpc52xx_psc_spi.c (working copy) @@ -22,6 +22,7 @@ #include linux/delay.h #include linux/spi/spi.h #include linux/fsl_devices.h +#include linux/of_spi.h #include asm/mpc52xx.h #include asm/mpc52xx_psc.h @@ -371,7 +372,7 @@ /* bus_num is used only for the case dev-platform_data == NULL */ static int __init mpc52xx_psc_spi_do_probe(struct device *dev, u32 regaddr, - u32 size, unsigned int irq, s16 bus_num) + u32 size, unsigned int irq, s16 bus_num, struct spi_master ** pmaster) { struct fsl_spi_platform_data *pdata = dev-platform_data; struct mpc52xx_psc_spi *mps; @@ -439,6 +440,10 @@ if (ret 0) goto unreg_master; +dev_info(dev, Activated\n); + +*pmaster = master; + return ret; unreg_master: @@ -474,6 +479,8 @@ const u32 *regaddr_p; u64 regaddr64, size64; s16 id = -1; +int res; +struct spi_master * master = NULL;; regaddr_p = of_get_address(op-node, 0, size64, NULL); if (!regaddr_p) { @@ -495,8 +502,16 @@ id = *psc_nump + 1; } - return mpc52xx_psc_spi_do_probe(op-dev, (u32)regaddr64, (u32)size64, - irq_of_parse_and_map(op-node, 0), id); + res = mpc52xx_psc_spi_do_probe(op-dev, (u32)regaddr64, (u32)size64, + irq_of_parse_and_map(op-node, 0), id, master); + +if( master != NULL ) +{ +/* Add any subnodes on the SPI bus */ +of_register_spi_devices( master, op-node); +} + +return res; } static int __exit mpc52xx_psc_spi_of_remove(struct of_device *op) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] uio: add an of_genirq driver
On Tue, Jun 16, 2009 at 3:04 AM, Wolfram Sangw.s...@pengutronix.de wrote: diff --git a/Documentation/powerpc/dts-bindings/uio-generic.txt b/Documentation/powerpc/dts-bindings/uio-generic.txt new file mode 100644 index 000..8ad9861 --- /dev/null +++ b/Documentation/powerpc/dts-bindings/uio-generic.txt @@ -0,0 +1,16 @@ +UIO for custom devices + +A device which will be mapped using the UIO subsystem. + +Properties: + - compatible : should contain the specific model used, followed by + generic-uio. + - reg : address range(s) of the device (up to MAX_UIO_MAPS) + - interrupts : interrupt of the device + +Example: + c64f...@0 { + compatible = ptx,c64fpga001, generic-uio; + reg = 0x0 0x1; + interrupts = 0 0 3; + }; Hmmm, I'm not happy about this. The device tree describes the hardware, not the way Linux uses the hardware. UIO definitely falls into the category of Linux implementation detail. Yes, I am aware of that. I just started with the mechanisms which are available today and hoped we could find some compatible-value which will suit all needs. Trouble is a value that suits all needs today probably won't a year from now. :-) This should be approached from the other way around. Either the generic-uio of_platform driver should contain an explicit list of devices to be handled by UIO, Well, that could lead to a quite huge match_table over time. or the OF infrastructure should be modified to allow things like force binding of_devices to of_drivers at runtime. That is an interesting idea. I could imagine something like a 'new_compatible entry in the sysfs-section of the driver similar to 'new_id' for PCI. After writing a new compatible-string into it, matching will triggered again with the new entry added. That could (should?) also be placed at the of-core-level. Or did you have something else in mind? Yeah, that sounds appropriate. g. -- Grant Likely, B.Sc., P.Eng. Secret Lab Technologies Ltd. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 2/2] mtd/maps/mtd-ram: add an of-platform driver
On Tue, Jun 16, 2009 at 3:18 AM, Wolfram Sangw.s...@pengutronix.de wrote: On Mon, Jun 15, 2009 at 07:43:20PM +0200, Albrecht Dreß wrote: Am 05.06.09 14:05 schrieb(en) Wolfram Sang: Create an of-aware driver using the now exported generic functions from plat-ram.c. Also add the documentation for the binding. Partitions are not yet supported. Tested on a phyCORE-MPC5200B-IO. Dumb question: what is the current status of this patch? I ask because I don't know. David wondered if 'of_physmap' could be used, so I wrote what I experienced during development. I can't tell if this helped the case. Grant wondered if we need a bankwidth. IMHO it is needed for now, but I don't know if this is a common agreement. I'm not happy about the use case though. It probably shouldn't appear in this binding, or if it does it should be tagged as an optional property. It is only in the 5200 localplus case that bank-width is needed to figure out how to apply the workaround. g. -- Grant Likely, B.Sc., P.Eng. Secret Lab Technologies Ltd. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 2/2] mtd/maps/mtd-ram: add an of-platform driver
Grant wondered if we need a bankwidth. IMHO it is needed for now, but I don't know if this is a common agreement. I'm not happy about the use case though. It probably shouldn't appear in this binding, or if it does it should be tagged as an optional property. It is only in the 5200 localplus case that bank-width is needed to figure out how to apply the workaround. Maybe there is a misunderstanding here. I am not talking about Albrecht's case. What I replied to your concern is that bankwidth is used(!) in the underlying map-ram-driver in mapram_erase() at the moment. Whether this is really needed could be discussed perhaps, but is beyond the scope of this patch series IMHO. I'd think this can be addressed in a later series, if needed, although this could mean that the binding will change (bank-width becoming optional). Regards, Wolfram -- Pengutronix e.K. | Wolfram Sang| Industrial Linux Solutions | http://www.pengutronix.de/ | signature.asc Description: Digital signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] powerpc: 85xx: Add PHY fixup to socrates board code
On Apr 21, 2009, at 6:24 PM, Anatolij Gustschin wrote: Kumar Gala wrote: On Apr 21, 2009, at 4:54 PM, Kumar Gala wrote: On Apr 21, 2009, at 12:19 PM, Anatolij Gustschin wrote: If the firmware missed to initialize the PHY correctly, Linux may hang up on socrates while eth0/eth1 interface startup (caused by continuous unacknowledged PHY interrupt). This patch adds PHY fixup to socrates platform code to ensure the PHY is pre-initialized correctly. It is needed to be compatible with older firmware. Signed-off-by: Anatolij Gustschin ag...@denx.de --- Changes since first version: use macros instead of register numbers as suggested by Anton Kumar, could you please consider this patch for inclusion into 2.6.30? Thanks! Sorry. I dont think this is board specific and should at a minimum be done in m88e1011_config_init in drivers/net/phy/marvell.c. Not sure how 88E1011 differs from 88E, but I'm wondering if you really want to set config_init for m88e1011 to m88e_config_init - k I got confused by the #'s.. I think we should have a struct in marvell.c for m88e1121 which I'm guessing is the PHY you are using. yes, m88e1121 is correct. In 2.6.30-rc2 there is already a m88e1121 struct in marvell_drivers[] in drivers/net/phy/marvell.c. The reason I'm not doing the m88e1121 pre-init stuff in config_init is as follows: m88e1121 is a multi-PHY device with two PHY ports and each port could signal an interrupt. This PHY device can be pin-strapped to use shared interrupt pin for both PHY ports (as used on socrates board). PHY specific config_init will be called e.g. while eth0 startup in phy_attach() which is called from phy_connect() in drivers/net/phy/ phy_device.c. phy_connect() then calls phy_start_interrupts() which registers the interrupt handler and enables the interrupts for the PHY-port config_init is called for. Now interrupts can be cleared/acknowledged for this PHY-port (eth0 interface), but interrupts from the another PHY-port can not be acknowledged as eth1 was not brought up yet. Placing this fixup in drivers/net/phy/marvell.c as in config_init callback could be done, but this will add more overhead as the fixup routine have to do more work: acquire 'struct mii_bus' pointer and walk through all registered PHYs searching for the PHY which use the same interrupt, then getting the address of this PHY on the bus and disable and clear PHY irqs by writing/reading to/from this PHY, (but only in the case it was not already brought up and has interrupts enabled!) e.g.: struct mii_bus *bus = phydev-bus; int addr; for (addr = 0; addr PHY_MAX_ADDR; addr++) { struct phy_device *phy = bus-phy_map[addr]; if (addr != phydev-addr bus-irq[addr] == phydev-irq (phy-phy_id 0x0ff0) == 0x01410cb0 !(phy-interrupts PHY_INTERRUPT_ENABLED)) { int imask = phy_read(phy, MII_M1011_IMASK); if (imask) { phy_write(phy, 0x12, 0); /* disable */ phy_read(phy, 0x13); /* clear */ } } } All this to allow support for multiple m88e1121 devices. Otherwise, after registering first phy interrupt handler and enabling interrupt pending irq on other PHY port or other PHY device will lock up the board. The fixup in this patch will only be done while mdio bus scan before registering a PHY device. Did we ever resolve this? - k ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
PowerPC PCI DMA issues (prefetch/coherency?)
Hello All, We're developing on a Freescale MPC8272 and are having some nasty problems with PCI bus mastering and data corruption. We have some custom hardware that is bus mastering, reading data from the CPUs memory for it's own use. Most of the time, the data is correct, however occasionally we are seeing data that appears to be from somewhere else in memory (usually memory that has already been read by the PCI device). The problem looks like stale data on the PCI bridge prefetch buffers or a cache coherency problem, but we've been unable to come up with a solution to our problem. It is my understanding that it shouldn't be a cache coherency problem as the CPU cache should be snooped as the data is read from memory. Even if it were an issue, the pci_map_sg* functions should have sorted out any cache coherency issues before the DMA operation started. I've not been able to find anything on the Freescale data sheet that provides any way of flushing the prefetch cache on the PCI bridge. We've done a bit of experimenting, and found that turning off prefetch appears to solve (or possibly mask?) the problem (at the expensive of massive performance problems). I've also tried DMA'ing two adjacent userspace buffers in memory (from the same page), and see corruption on the second buffer. If I populate both buffers, then DMA them both, the data is fine. If I populate the first, DMA the first, then populate the second and DMA the second, corruption occurs at the start of the second buffer. If I add 8-32 bytes of padding between the buffers, the problem goes away. The PCI spec says that the PCI bridge is supposed to flush any data from it's prefetch buffers that are not read by the bus master, so technically, this isn't supposed to happen. I've tried making sure that buffers are cache line (and page) aligned, and are multiples of cache lines, but it's made no difference. PIO mode works fine, and I've checked the data with the CPU just before, and immediately after the DMA and the driver sees no data integrity issues. There are memory write barriers just before the DMA start, so all the registers should be correct before the DMA starts. For background info, the device doing the bus mastering is a Xilinx Virtex 5 FPGA. We've monitored the data as it comes off the PCI bus using ChipScope - so the firmware should not be manipulating the data in any way. We have some hardware/firmware/drivers that has a lot of common code that runs on an x86 platform (as opposed to powerpc), and that works without any issues whatsoever. Has anyone got any ideas what this might be? Does anyone of know issues with PCI bridges on the PowerPC platform? Is there extra things that need to be done from the driver when DMAing on PowerPC (I've looked at other drivers and there's nothing obvious). The chip errata doesn't have anything on it that looks like it could cause this. I'm really hoping this is something that we're doing wrong in the driver or the firmware, but we've been through both the firmware and drivers countless times and are unable to see anything wrong. Any thoughts/ideas would be much appreciated! Regards, Chris -- __ Chris Pringle Software Engineer Miranda Technologies Ltd. Hithercroft Road Wallingford Oxfordshire OX10 9DG UK Tel. +44 1491 820206 Fax. +44 1491 820001 www.miranda.com Miranda Technologies Limited Registered in England and Wales CN 02017053 Registered Office: James House, Mere Park, Dedmere Road, Marlow, Bucks, SL7 1FJ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] RFC: powerpc: expose the multi-bit ops that underlie single-bit ops.
Thanks for taking the time to look at this Ben, comments inline. Benjamin Herrenschmidt wrote: On Tue, 2009-05-26 at 14:19 -0400, Geoff Thorpe wrote: NOT FOR COMMIT, THIS IS A REQUEST FOR FEEDBACK. The bitops.h functions that operate on a single bit in a bitfield are implemented by operating on the corresponding word location. In all cases the inner logic appears to be valid if the mask being applied has more than one bit set, so this patch exposes those inner operations. Indeed, set_bits() was already available, but it duplicated code from set_bit() (rather than making the latter a wrapper) - it was also missing the PPC405_ERR77() workaround and the volatile address qualifier present in other APIs. This corrects that, and exposes the other multi-bit equivalents. One advantage of these multi-bit forms is that they allow word-sized variables to essentially be their own spinlocks. Hi ! Sorry for the delay, that was on my have a look one of these days, low priority list for a bit too long :-) NP, optimal throughput often requires a compromise in latency :-) NB, the same factoring is possible in asm-generic/bitops/[non-]atomic.h. I would be happy to provide corresponding patches if this approach is deemed appropriate. Feedback would be most welcome. Signed-off-by: Geoff Thorpe geoff.tho...@freescale.com --- arch/powerpc/include/asm/bitops.h | 111 +++-- 1 files changed, 69 insertions(+), 42 deletions(-) diff --git a/arch/powerpc/include/asm/bitops.h b/arch/powerpc/include/asm/bitops.h index 897eade..72de28c 100644 --- a/arch/powerpc/include/asm/bitops.h +++ b/arch/powerpc/include/asm/bitops.h @@ -56,11 +56,10 @@ #define BITOP_WORD(nr) ((nr) / BITS_PER_LONG) #define BITOP_LE_SWIZZLE((BITS_PER_LONG-1) ~0x7) -static __inline__ void set_bit(int nr, volatile unsigned long *addr) +static __inline__ void set_bits(unsigned long mask, volatile unsigned long *_p) { unsigned long old; -unsigned long mask = BITOP_MASK(nr); -unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); +unsigned long *p = (unsigned long *)_p; __asm__ __volatile__( 1:PPC_LLARX %0,0,%3 # set_bit\n @@ -73,11 +72,16 @@ static __inline__ void set_bit(int nr, volatile unsigned long *addr) : cc ); } -static __inline__ void clear_bit(int nr, volatile unsigned long *addr) +static __inline__ void set_bit(int nr, volatile unsigned long *addr) +{ +set_bits(BITOP_MASK(nr), addr + BITOP_WORD(nr)); +} No objection with the above. +static __inline__ void clear_bits(unsigned long mask, +volatile unsigned long *_p) { unsigned long old; -unsigned long mask = BITOP_MASK(nr); -unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); +unsigned long *p = (unsigned long *)_p; __asm__ __volatile__( 1:PPC_LLARX %0,0,%3 # clear_bit\n @@ -90,11 +94,16 @@ static __inline__ void clear_bit(int nr, volatile unsigned long *addr) : cc ); } -static __inline__ void clear_bit_unlock(int nr, volatile unsigned long *addr) +static __inline__ void clear_bit(int nr, volatile unsigned long *addr) +{ +clear_bits(BITOP_MASK(nr), addr + BITOP_WORD(nr)); +} Looks good too. +static __inline__ void clear_bits_unlock(unsigned long mask, +volatile unsigned long *_p) { unsigned long old; -unsigned long mask = BITOP_MASK(nr); -unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); +unsigned long *p = (unsigned long *)_p; __asm__ __volatile__( LWSYNC_ON_SMP @@ -108,11 +117,16 @@ static __inline__ void clear_bit_unlock(int nr, volatile unsigned long *addr) : cc, memory); } -static __inline__ void change_bit(int nr, volatile unsigned long *addr) +static __inline__ void clear_bit_unlock(int nr, volatile unsigned long *addr) +{ +clear_bits_unlock(BITOP_MASK(nr), addr + BITOP_WORD(nr)); +} I'm not sure it's useful to provide a multi-bit variant of the lock and unlock primitives. Do other archs do ? For clear_bit_unlock(), no they don't appear to. There is a fallback in include/asm-generic though, and I notice that it's used in a few places, eg. drivers/rtc/rtc-dev. OTOH some other archs appear to provide their own test_and_set_bit_lock(), and there's a fallback in includes/asm-generic for that too. Do you see a reason to isolate either of these and not factor out the inner word-based logic? +static __inline__ void change_bits(unsigned long mask, +volatile unsigned long *_p) { unsigned long old; -unsigned long mask = BITOP_MASK(nr); -unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); +unsigned long *p = (unsigned long *)_p; __asm__ __volatile__( 1:PPC_LLARX %0,0,%3 # change_bit\n @@ -125,12 +139,16 @@ static
[PATCH] powerpc: Update Warp defconfig
I forgot to include the defconfig in the last set of patches. So you don't get to use the shiny new LEDS driver on the warp unless you turn it on. Enabling hotplug is also very important since we have moved to udev on the warp. Like most defconfig patches, most of the changes are just the normal kernel changes, not warp specific. If it is too early, and we want to hold off until say rc4 with defconfigs I have no problem with that. I have tried to outline all the changes we specifically made. If anybody sees any bad choices with options, let me know! Cheers, Sean * Enable GPIO LEDS * Enable LED triggers * Move to slub * Enable hotplug * Enable timestamps on printks * Enable UBIFS Signed-off-by: Sean MacLennan smaclen...@pikatech.com --- diff --git a/arch/powerpc/configs/44x/warp_defconfig b/arch/powerpc/configs/44x/warp_defconfig index 3b77f09..787635f 100644 --- a/arch/powerpc/configs/44x/warp_defconfig +++ b/arch/powerpc/configs/44x/warp_defconfig @@ -1,7 +1,7 @@ # # Automatically generated make config: don't edit -# Linux kernel version: 2.6.29-rc2 -# Fri Jan 23 07:57:16 2009 +# Linux kernel version: 2.6.30 +# Tue Jun 9 23:35:36 2009 # # CONFIG_PPC64 is not set @@ -41,6 +41,7 @@ CONFIG_ARCH_HAS_ILOG2_U32=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_FIND_NEXT_BIT=y +CONFIG_GENERIC_GPIO=y # CONFIG_ARCH_NO_VIRT_TO_BUS is not set CONFIG_PPC=y CONFIG_EARLY_PRINTK=y @@ -53,10 +54,12 @@ CONFIG_PPC_UDBG_16550=y # CONFIG_GENERIC_TBSYNC is not set CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y +CONFIG_DTC=y # CONFIG_DEFAULT_UIMAGE is not set CONFIG_PPC_DCR_NATIVE=y # CONFIG_PPC_DCR_MMIO is not set CONFIG_PPC_DCR=y +CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # @@ -74,7 +77,17 @@ CONFIG_SYSVIPC_SYSCTL=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_AUDIT is not set -# CONFIG_IKCONFIG is not set + +# +# RCU Subsystem +# +CONFIG_CLASSIC_RCU=y +# CONFIG_TREE_RCU is not set +# CONFIG_PREEMPT_RCU is not set +# CONFIG_TREE_RCU_TRACE is not set +# CONFIG_PREEMPT_RCU_TRACE is not set +CONFIG_IKCONFIG=y +CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=14 CONFIG_GROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y @@ -82,27 +95,29 @@ CONFIG_FAIR_GROUP_SCHED=y CONFIG_USER_SCHED=y # CONFIG_CGROUP_SCHED is not set # CONFIG_CGROUPS is not set -CONFIG_SYSFS_DEPRECATED=y -CONFIG_SYSFS_DEPRECATED_V2=y +# CONFIG_SYSFS_DEPRECATED_V2 is not set # CONFIG_RELAY is not set # CONFIG_NAMESPACES is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= +CONFIG_RD_GZIP=y +# CONFIG_RD_BZIP2 is not set +# CONFIG_RD_LZMA is not set # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y +CONFIG_ANON_INODES=y CONFIG_EMBEDDED=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set -# CONFIG_HOTPLUG is not set +# CONFIG_STRIP_ASM_SYMS is not set +CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y -CONFIG_COMPAT_BRK=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y -CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y @@ -110,10 +125,13 @@ CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_AIO=y CONFIG_VM_EVENT_COUNTERS=y -CONFIG_SLAB=y -# CONFIG_SLUB is not set +CONFIG_SLUB_DEBUG=y +CONFIG_COMPAT_BRK=y +# CONFIG_SLAB is not set +CONFIG_SLUB=y # CONFIG_SLOB is not set # CONFIG_PROFILING is not set +# CONFIG_MARKERS is not set CONFIG_HAVE_OPROFILE=y # CONFIG_KPROBES is not set CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y @@ -121,6 +139,7 @@ CONFIG_HAVE_IOREMAP_PROT=y CONFIG_HAVE_KPROBES=y CONFIG_HAVE_KRETPROBES=y CONFIG_HAVE_ARCH_TRACEHOOK=y +# CONFIG_SLOW_WORK is not set # CONFIG_HAVE_GENERIC_DMA_COHERENT is not set CONFIG_SLABINFO=y CONFIG_RT_MUTEXES=y @@ -133,7 +152,6 @@ CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_BLOCK=y # CONFIG_LBD is not set -# CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_BLK_DEV_BSG is not set # CONFIG_BLK_DEV_INTEGRITY is not set @@ -149,11 +167,6 @@ CONFIG_DEFAULT_AS=y # CONFIG_DEFAULT_CFQ is not set # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=anticipatory -CONFIG_CLASSIC_RCU=y -# CONFIG_TREE_RCU is not set -# CONFIG_PREEMPT_RCU is not set -# CONFIG_TREE_RCU_TRACE is not set -# CONFIG_PREEMPT_RCU_TRACE is not set # CONFIG_FREEZER is not set # @@ -173,10 +186,11 @@ CONFIG_WARP=y # CONFIG_ARCHES is not set # CONFIG_CANYONLANDS is not set # CONFIG_GLACIER is not set +# CONFIG_REDWOOD is not set # CONFIG_YOSEMITE is not set # CONFIG_XILINX_VIRTEX440_GENERIC_BOARD is not set # CONFIG_PPC44x_SIMPLE is not set -# CONFIG_PPC4xx_GPIO is not set +CONFIG_PPC4xx_GPIO=y CONFIG_440EP=y CONFIG_IBM440EP_ERR42=y # CONFIG_IPIC is not set @@ -235,9 +249,13 @@ CONFIG_ZONE_DMA_FLAG=1 CONFIG_BOUNCE=y CONFIG_VIRT_TO_BUS=y CONFIG_UNEVICTABLE_LRU=y +CONFIG_HAVE_MLOCK=y +CONFIG_HAVE_MLOCKED_PAGE_BIT=y +CONFIG_STDBINUTILS=y CONFIG_PPC_4K_PAGES=y #
Re: PowerPC PCI DMA issues (prefetch/coherency?)
Hi Chris, The last couple of months I had the 'pleasure' to work with a Xilinx ML510 ATX board which contains a Virtex-5 FXT. In my case I had to set up the plbv46 pci soft-core to function properly with PCI add-on boards and onboard pci devices. I have had a lot of issues including ones with DMA but in the end it worked properly. I'm not sure how in your case the Virtex-5 is wired to the PCI bus but in my case it was connected to the crossbar switch of the PowerPC 440 which is inside the Virtex-5 FXT. Not all Xilinx reference designs were interfaced properly to this (sometimes the MPLB-SPLB connection from the soft-core to the crossbar wasn't connected) and even then the PCI soft-core wasn't configured properly by Xilinx and this MIGHT be similar to your issue. In the explanation of the issue I will use the terminology used by Xilinx. The CPU side is the 'IPIF' side and the and the PCI side is the 'PCI' side. Both domains have their own address domains and hence translation from IPIFtoPCI and PCItoIPIF is needed. From what I understood is that the 'incoming window' (ipif-pci) and outgoing window (pciipif) should use different address ranges else the bridge gets confused. I don't have access to my design right now but by head Xilinx doesn't set C_IPIFBAR2PCIBAR_0 properly in a lot of cases and just sets it to 0. Which means that cpu - ipif (lets say the soft core is accessible from 0xa000 from the cpu) 0xa0001234 translates to 0x1234 on the PCI side but PCI cards expect that when they write low memory addresses that they write to system memory. What should be done is to set C_IPIFBAR2PCIBAR_0 to 0xa000 in this case, so that the incoming window is 0xa000-0x and the outgoing window 0x-0x9fff or just set it half way at 0x8000. I'm not saying this is your issue but it could be it. Regards, Roderick Colenbrander On Tue, Jun 16, 2009 at 4:08 PM, Chris Pringlechris.prin...@oxtel.com wrote: Hello All We're developing on a Freescale MPC8272 and are having some nasty problems with PCI bus mastering and data corruption. We have some custom hardware that is bus mastering, reading data from the CPUs memory for it's own use. Most of the time, the data is correct, however occasionally we are seeing data that appears to be from somewhere else in memory (usually memory that has already been read by the PCI device). The problem looks like stale data on the PCI bridge prefetch buffers or a cache coherency problem, but we've been unable to come up with a solution to our problem. It is my understanding that it shouldn't be a cache coherency problem as the CPU cache should be snooped as the data is read from memory. Even if it were an issue, the pci_map_sg* functions should have sorted out any cache coherency issues before the DMA operation started. I've not been able to find anything on the Freescale data sheet that provides any way of flushing the prefetch cache on the PCI bridge. We've done a bit of experimenting, and found that turning off prefetch appears to solve (or possibly mask?) the problem (at the expensive of massive performance problems). I've also tried DMA'ing two adjacent userspace buffers in memory (from the same page), and see corruption on the second buffer. If I populate both buffers, then DMA them both, the data is fine. If I populate the first, DMA the first, then populate the second and DMA the second, corruption occurs at the start of the second buffer. If I add 8-32 bytes of padding between the buffers, the problem goes away. The PCI spec says that the PCI bridge is supposed to flush any data from it's prefetch buffers that are not read by the bus master, so technically, this isn't supposed to happen. I've tried making sure that buffers are cache line (and page) aligned, and are multiples of cache lines, but it's made no difference. PIO mode works fine, and I've checked the data with the CPU just before, and immediately after the DMA and the driver sees no data integrity issues. There are memory write barriers just before the DMA start, so all the registers should be correct before the DMA starts. For background info, the device doing the bus mastering is a Xilinx Virtex 5 FPGA. We've monitored the data as it comes off the PCI bus using ChipScope - so the firmware should not be manipulating the data in any way. We have some hardware/firmware/drivers that has a lot of common code that runs on an x86 platform (as opposed to powerpc), and that works without any issues whatsoever. Has anyone got any ideas what this might be? Does anyone of know issues with PCI bridges on the PowerPC platform? Is there extra things that need to be done from the driver when DMAing on PowerPC (I've looked at other drivers and there's nothing obvious). The chip errata doesn't have anything on it that looks like it could cause this. I'm really hoping this is something that we're doing wrong in the driver or the firmware, but
Re: [ewg] Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages
Yeah, the notifier code remains untouched as we still do not allow dynamic memory operations _while_ our module is loaded. The patch allows the driver to cope with DMEM operations that happened before the module was loaded, which might result in a non-contiguous memory layout. When the driver registers its global memory region in the system, the memory layout must be considered. We chose the term toleration instead of support to illustrate this. I see. So things just silently broke in some cases when the driver was loaded after operations you didn't tolerate? Anyway, thanks for the explanation. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
MPC83xx watchdog reset board dead lock
Hello, this is a hardware, even board issue, but I hope to find the right target audience here. In our MPC83xx design I would like to prevent dead lock in case where a field upgrade is performed, i.e. NOR Flash is erased or written, and the MPC83xx built-in hardware watchdog triggers. In u-boot the scenario can be easily reproduced by running this command (WARNING, erases some sectors!) on an MPC8313E-RDB: erase_wdg=mw.l 0xe204 0x1007 1;mw.w 0xe20e 0x556c 1;mw.w 0xe20e 0xaa39 1;erase 1:10-30 This sets up the watchdog to reset soonish, then starts erasing NOR sectors. Watchdog triggers and resets - Dead lock. Most MPC8xxx board designs I have seen suffer from this possible dead lock: - NOR Flash is put in erase mode or write mode - Hardware watchdog triggers - HRESET# is asserted by the processor, during which the configuration words are read from NOR Flash. Either HRESET# is not attached to NOR, NOR stays in erase/write mode and invalid words will be read - dead lock or either: HRESET# is attached to NOR reset, NOR is reset, but stays in reset as HRESET# stays asserted. We have been looking at several solutions hardware wise that reset the NOR flash on HRESET# going low, but the processors are stubborn, read the config words only once, than dead lock. I wonder if there are known-working designs for this. Regards, -- Leon ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC83xx watchdog reset board dead lock
Hi Leon, Most MPC8xxx board designs I have seen suffer from this possible dead lock: - NOR Flash is put in erase mode or write mode - Hardware watchdog triggers - HRESET# is asserted by the processor, during which the configuration words are read from NOR Flash. Either HRESET# is not attached to NOR, NOR stays in erase/write mode and invalid words will be read - dead lock or either: HRESET# is attached to NOR reset, NOR is reset, but stays in reset as HRESET# stays asserted. We have been looking at several solutions hardware wise that reset the NOR flash on HRESET# going low, but the processors are stubborn, read the config words only once, than dead lock. I wonder if there are known-working designs for this. What do you do in the case of blank flash on a board? I'm not sure that this will work for you, but it might ... on my boards, the PowerPC connects to an FPGA on the local bus, and the flash is connected to that FPGA (it made routing easier to go through the FPGA). When the board powers up, a small FSM in the FPGA reads from the first word in Flash, and checks it is not blank. If it is blank, the configuration pins are asserted such that the processor uses a hard-coded reset configuration word, and the processor reset is deasserted. If the flash is not blank, the processor is allowed to boot from Flash. An alternative method for firmware updates would be to use the boot from high-mem versus low-mem flag in the RCWs. On a number of the Freescale EVMs, you can have two images in the Flash; one at low mem, and one at high mem. Toggling a dip switch selects the value of the BMS bit in the RCWs, and lets you boot from one image or the other. This is a good trick that avoids having to have a debugger to recover back to a sane image - note that a CPLD delivers the RCWs, not the flash. So in either of these examples, the RCWs source is sometimes not the Flash image ... which requires that feature designed into the original hardware. Those two ideas should get you thinking :) Cheers, Dave ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3] powerpc: add ioremap_early() for mapping IO regions before MMU_init()
On Mon, Jun 15, 2009 at 12:57 AM, Benjamin Herrenschmidtb...@kernel.crashing.org wrote: On Wed, 2009-05-27 at 12:55 -0600, Grant Likely wrote: From: Grant Likely grant.lik...@secretlab.ca ioremap_early() is useful for things like mapping SoC internally registers and early debug output because it allows mappings to devices to be setup early in the boot process where they are needed. It also give a performance boost since BAT mapped registers don't get flushed out of the TLB. Without ioremap_early(), early mappings are set up in an ad-hoc manner and they get lost when the MMU is set up. Drivers then have to perform hacky fixups to transition over to new mappings. Signed-off-by: Grant Likely grant.lik...@secretlab.ca --- My 40x config gives me: /home/benh/linux-powerpc-test/drivers/video/xilinxfb.c:409: warning: ‘dcr_host.base’ may be used uninitialized in this function (warning, I think, was already there, so the patch is going into -next but we may want another one, provided we find a way to shut the idiot up without horrible hacks since that's just gcc being stupid I believe). I'll have the final fix out to you today. g. -- Grant Likely, B.Sc., P.Eng. Secret Lab Technologies Ltd. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: PowerPC PCI DMA issues (prefetch/coherency?)
Chris Pringle wrote: The kernel version is 2.6.26. Firmware is custom on a custom board. Cache coherency on PCI DMA requires that the memory be mapped with the M attribute on this chip, but that should be happening based on detection of the core. I'm not sure where to look to verify this? Check asm/cputable.h for CPU_FTR_NEED_COHERENT. Make sure that CONFIG_8260 is one of the #ifdefs that turns that on. It looks like that was in place by 2.6.26 in arch/powerpc. I'm not sure what to look for in arch/ppc. Also make sure that you park the bus on PCI and raise its arbitration priority, as done at the end of fixup_pci in arch/powerpc/boot/cuboot-pq2.c. Since this is a reasonably recent kernel, Not really, there was a fair amount of 82xx work in the mid-2.6.20s. The addition of CPU_FTR_NEED_COHERENT to 82xx was somewhere in that time. Can you try 2.6.30? I'd guess that both of these things are correct. I've had a quick look in that file and there is code in there raising arbitartion priority and parking the bus. Just because the code is there doesn't mean you're using it -- are you using cuImage? Are you using arch/ppc or arch/powerpc? Typically this would be done by firmware; it's only in cuboot because u-boot wasn't doing it. BTW, you may want to post to linuxppc-dev@lists.ozlabs.org for powerpc-specific issues, especially this kind of hardware issue. I've just posted there :-) Doh, sorry. :-) Interestingly, I've just turned off cache snooping and the problem has got much worse. This has surprised me as I thought that part of the job done by pci_map_sg was to flush the CPU cache It only flushes the cache on hardware that doesn't do coherent DMA. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: PowerPC PCI DMA issues (prefetch/coherency?)
Scott Wood wrote: Check asm/cputable.h for CPU_FTR_NEED_COHERENT. Make sure that CONFIG_8260 is one of the #ifdefs that turns that on. It looks like that was in place by 2.6.26 in arch/powerpc. I'm not sure what to look for in arch/ppc. I've just checked that and it's definitely switched on in CPU_FTR_COMMON (CONFIG_8260 is also being used). Also make sure that you park the bus on PCI and raise its arbitration priority, as done at the end of fixup_pci in arch/powerpc/boot/cuboot-pq2.c. Since this is a reasonably recent kernel, Not really, there was a fair amount of 82xx work in the mid-2.6.20s. The addition of CPU_FTR_NEED_COHERENT to 82xx was somewhere in that time. Can you try 2.6.30? I'll give it a try, but that won't be a quick thing to do - will hopefully manage to get that done tomorrow if it patches without too many issues. I should point out that we've got the low latency patches on this kernel too; I guess it'd be worth trying it without them before I move kernels. I'd guess that both of these things are correct. I've had a quick look in that file and there is code in there raising arbitartion priority and parking the bus. Just because the code is there doesn't mean you're using it -- are you using cuImage? Are you using arch/ppc or arch/powerpc? Typically this would be done by firmware; it's only in cuboot because u-boot wasn't doing it. Just checked this is being called and it is. We're using arch/powerpc. Interestingly, I've just turned off cache snooping and the problem has got much worse. This has surprised me as I thought that part of the job done by pci_map_sg was to flush the CPU cache It only flushes the cache on hardware that doesn't do coherent DMA. Ah right - that would explain what we're seeing then... Doh. Thought I might have been onto something then. Is there any way to force a cache flush? That'd at least prove it was a caching issue if it resolved the problem. Thanks, Chris -- __ Chris Pringle Software Engineer Miranda Technologies Ltd. Hithercroft Road Wallingford Oxfordshire OX10 9DG UK Tel. +44 1491 820206 Fax. +44 1491 820001 www.miranda.com Miranda Technologies Limited Registered in England and Wales CN 02017053 Registered Office: James House, Mere Park, Dedmere Road, Marlow, Bucks, SL7 1FJ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC83xx watchdog reset board dead lock
Hello, On Tue, Jun 16, 2009 at 6:30 PM, David Hawkinsd...@ovro.caltech.edu wrote: Most MPC8xxx board designs I have seen suffer from this possible dead lock: - NOR Flash is put in erase mode or write mode - Hardware watchdog triggers - HRESET# is asserted by the processor, during which the configuration words are read from NOR Flash. Either HRESET# is not attached to NOR, NOR stays in erase/write mode and invalid words will be read - dead lock or either: HRESET# is attached to NOR reset, NOR is reset, but stays in reset as HRESET# stays asserted. We have been looking at several solutions hardware wise that reset the NOR flash on HRESET# going low, but the processors are stubborn, read the config words only once, than dead lock. I wonder if there are known-working designs for this. What do you do in the case of blank flash on a board? The problem is not with blank flash or firmware upgrades, we know how to handle that. Your solution is (a solution) to a different problem. The problem lies in the fact that board dead lock can occur if the watchdog triggers, for all reference designs I have seen. Thanks for thinking along. I would like to solve the original problem though. BTW, we use CPLD/FPGAs on most of our boards, this one we do not for cost reasons. Regards, -- Leon ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: PowerPC PCI DMA issues (prefetch/coherency?)
Chris Pringle wrote: Ah right - that would explain what we're seeing then... Doh. Thought I might have been onto something then. Is there any way to force a cache flush? That'd at least prove it was a caching issue if it resolved the problem. You could enable CONFIG_NOT_COHERENT_CACHE. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 2/2] mtd/maps/mtd-ram: add an of-platform driver
Am 16.06.09 14:53 schrieb(en) Grant Likely: I'm not happy about the use case though. It probably shouldn't appear in this binding, or if it does it should be tagged as an optional property. I agree with you that the naming is really misleading - other devices which are not mtd's may suffer from the same problem of the lpb (in my case, I have an extra memory-mapped Ethernet chip which I didn't try to access yet...). As it is actually a chip select property, what about defining an optional property like cs-width = (8|16|32) which defaults to 8 and may be added to each 5200 lpb child? It is only in the 5200 localplus case that bank-width is needed to figure out how to apply the workaround. Just out of curiosity: what about the localbus of other Freescale chips (82xx? 83xx? Maybe others?)? Thanks, Albrecht. pgpLbC13BupnH.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: PowerPC PCI DMA issues (prefetch/coherency?)
Arnd Bergmann wrote: On Tuesday 16 June 2009, Scott Wood wrote: Chris Pringle wrote: Ah right - that would explain what we're seeing then... Doh. Thought I might have been onto something then. Is there any way to force a cache flush? That'd at least prove it was a caching issue if it resolved the problem. You could enable CONFIG_NOT_COHERENT_CACHE. If the whole system is noncoherent, that is the right solution. I meant it more as a test than a permanent solution... If the device is the only one, you can also use dma_alloc_noncoherent() and flush explicitly with dma_cache_sync(). I don't see how that would help -- aren't those also controlled by CONFIG_NOT_COHERENT_CACHE? -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: PowerPC PCI DMA issues (prefetch/coherency?)
On Tuesday 16 June 2009, Scott Wood wrote: If the device is the only one, you can also use dma_alloc_noncoherent() and flush explicitly with dma_cache_sync(). I don't see how that would help -- aren't those also controlled by CONFIG_NOT_COHERENT_CACHE? Ah, yes you are right. PowerPC implements dma_alloc_noncoherent as dma_alloc_coherent, so dma_cache_sync() is actually a NOP (or should be). Actually there seems to be a bug in here: Since dma_alloc_noncoherent gives you a coherent mapping (or NULL) on noncoherent machines, dma_cache_sync() is redundant and should not actually flush the cache, or we should change dma_alloc_noncoherent to do a simple alloc_pages on CONFIG_NON_COHERENT_CACHE and leave dma_cache_sync() as it is. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH] fsldma: Add DMA_SLAVE support
Hi Ira, Ira Snyder wrote: Use the DMA_SLAVE capability of the DMAEngine API to copy/from a scatterlist into an arbitrary list of hardware address/length pairs. This allows a single DMA transaction to copy data from several different devices into a scatterlist at the same time. This also adds support to enable some controller-specific features such as external start and external pause of a DMA transaction. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- This is a request for comments on this patch. I hunch it is not quite ready for inclusion, though it is certainly ready for review. Correct functioning of this patch depends on the patches submitted earlier. As suggested by Dan Williams, I implemented DMA_SLAVE support for the fsldma controller to allow me to use the hardware to transfer to/from a scatterlist to a list of hardware address/length pairs. I implemented support for the extra features available in the DMA controller, such as external pause and external start. I have not tested the features yet. I am willing to drop the support if everything else looks good. I have implemented helper functions for creating the list of hardware address/length pairs as static inline functions in the linux/fsldma.h header. Should I incorporate these into the driver itself and use EXPORT_SYMBOL()? I've never done this before :) Using EXPORT_SYMBOL would defeat the purpose of conforming to the dmaengine api which should allow other subsystems to generically discover an fsldma resource. diff --git a/include/linux/fsldma.h b/include/linux/fsldma.h new file mode 100644 index 000..a42dcdd --- /dev/null +++ b/include/linux/fsldma.h @@ -0,0 +1,105 @@ +/* + * Freescale MPC83XX / MPC85XX DMA Controller + * + * Copyright (c) 2009 Ira W. Snyder i...@ovro.caltech.edu + * + * This file is licensed under the terms of the GNU General Public License + * version 2. This program is licensed as is without any warranty of any + * kind, whether express or implied. + */ + +#ifndef __LINUX_FSLDMA_H__ +#define __LINUX_FSLDMA_H__ + +#include linux/dmaengine.h + +/* + * physical hardware address / length pair for use with the + * DMAEngine DMA_SLAVE API + */ +struct fsl_dma_hw_addr { + struct list_head entry; + + dma_addr_t address; + size_t length; +}; Can you explain a bit more why you need the new dma address list, would a struct scatterlist suffice? In general it is difficult to merge new functionality without an in-tree user. Can you share the client of this new api? I suspect you can get away without needing these new helper routines. Have a look at how Haavard implemented his DMA_SLAVE client in drivers/mmc/host/atmel-mci.c. Haavard ended up needing to add some public structure definitions to include/linux, but my preference is to keep this in an architecture/platform specific header file location if possible. Regards, Dan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC83xx watchdog reset board dead lock
Hi Leon, Most MPC8xxx board designs I have seen suffer from this possible dead lock: - NOR Flash is put in erase mode or write mode - Hardware watchdog triggers - HRESET# is asserted by the processor, during which the configuration words are read from NOR Flash. Either HRESET# is not attached to NOR, NOR stays in erase/write mode and invalid words will be read - dead lock or either: HRESET# is attached to NOR reset, NOR is reset, but stays in reset as HRESET# stays asserted. We have been looking at several solutions hardware wise that reset the NOR flash on HRESET# going low, but the processors are stubborn, read the config words only once, than dead lock. I wonder if there are known-working designs for this. What do you do in the case of blank flash on a board? The problem is not with blank flash or firmware upgrades, we know how to handle that. Your solution is (a solution) to a different problem. The problem lies in the fact that board dead lock can occur if the watchdog triggers, for all reference designs I have seen. Thanks for thinking along. I would like to solve the original problem though. BTW, we use CPLD/FPGAs on most of our boards, this one we do not for cost reasons. So we're talking about the sequence where HRESET# asserts as in say the logic analyzer trace on p34: http://www.ovro.caltech.edu/~dwh/carma_board/powerpc_mpc8349e.pdf the LALE pulse where the processor reads the RCWs occurs very soon after the falling edge of HRESET#. So the Flash needs to be reset to ensure that it is in read-array mode, so that the processor doesn't choke. Since HRESET# is still low, thats no good. A pulse generator that is based on HRESET# might work, but the pulse would have to be long enough to meet any reset requirement of the flash, yet short enough so that the read of the first RCW would be valid. Since the local bus is running really slow at this point, I think that could be done ok. How about a set-reset flip flop that is set on the falling edge of HRESET# and cleared on the rising edge of LALE. That'll produce a decent reset pulse to the flash, and then there is plenty of time for the first access to produce valid data on the bus. Cheers, Dave ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 2/2] mtd/maps/mtd-ram: add an of-platform driver
Okay, fair enough. I wasn't paying very close attention when I replied. It still seems awkward to me, but not enough to object (ie. It's not dangerous). g. On Jun 16, 2009 7:20 AM, Wolfram Sang w.s...@pengutronix.de wrote: Grant wondered if we need a bankwidth. IMHO it is needed for now, but I don't know if this i... Maybe there is a misunderstanding here. I am not talking about Albrecht's case. What I replied to your concern is that bankwidth is used(!) in the underlying map-ram-driver in mapram_erase() at the moment. Whether this is really needed could be discussed perhaps, but is beyond the scope of this patch series IMHO. I'd think this can be addressed in a later series, if needed, although this could mean that the binding will change (bank-width becoming optional). Regards, Wolfram -- Pengutronix e.K. | Wolfram Sang ... -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAko3nAUACgkQD27XaX1/VRtTkACfW0aUMJHrU3m4DCel0pm5fA6J WaQAnjGo5fn6JvMHt3Ke/xFTGB1uYT6p =V9t5 -END PGP SIGNATURE- ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
2.6.30 kernel panic with any network driver
I just merged in 2.6.30 into our tree and I'm seeing this only when I have a network driver built into the kernel, and it doesn't seem to matter which network driver. Maybe I missed something I need to update in our network drivers to match the kernel. Thanks for any help, John console [ttyS0] enabled brd: module loaded loop: module loaded Unable to handle kernel paging request for data at address 0x75686369 Faulting instruction address: 0xc01c3e34 Oops: Kernel access of bad area, sig: 11 [#1] PREEMPT Xilinx Virtex440 Modules linked in: NIP: c01c3e34 LR: c01c2ab4 CTR: REGS: cf831dc0 TRAP: 0300 Not tainted (2.6.30) MSR: 00029000 EE,ME,CE CR: 84284324 XER: 4020 DEAR: 75686369, ESR: TASK = cf82f8a0[1] 'swapper' THREAD: cf83 GPR00: c01c2ab4 cf831e70 cf82f8a0 cf87b3a8 c034e5a4 c01c2a80 0001 GPR08: cfbe69d8 c034dd98 0f2b cf821778 84284342 fd9781f4 f6f6dff7 6ade GPR16: d7f40173 bbd35ff7 3ef7fbff fff5d3f7 0059dac0 006f GPR24: 0001 c034dd98 cf0416e0 cf87b3a0 cf87b3dc c034e5a4 75686369 NIP [c01c3e34] platform_match+0x20/0xa8 LR [c01c2ab4] __driver_attach+0x34/0xa8 Call Trace: [cf831e70] [c027a394] klist_next+0x10c/0x120 (unreliable) [cf831e90] [c01c2ab4] __driver_attach+0x34/0xa8 [cf831eb0] [c01c123c] bus_for_each_dev+0x5c/0x98 [cf831ee0] [c01c243c] driver_attach+0x24/0x34 [cf831ef0] [c01c2050] bus_add_driver+0x18c/0x264 [cf831f10] [c01c2fa8] driver_register+0x6c/0x170 [cf831f30] [c031f8cc] xemaclite_init+0x1c/0x4c [cf831f50] [c00011ac] do_one_initcall+0x34/0x1a0 [cf831fd0] [c0307848] kernel_init+0xa0/0x104 [cf831ff0] [c000e188] kernel_thread+0x4c/0x68 Instruction dump: This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH] fsldma: Add DMA_SLAVE support
On Tue, Jun 16, 2009 at 12:01:40PM -0700, Dan Williams wrote: Hi Ira, Ira Snyder wrote: Use the DMA_SLAVE capability of the DMAEngine API to copy/from a scatterlist into an arbitrary list of hardware address/length pairs. This allows a single DMA transaction to copy data from several different devices into a scatterlist at the same time. This also adds support to enable some controller-specific features such as external start and external pause of a DMA transaction. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- This is a request for comments on this patch. I hunch it is not quite ready for inclusion, though it is certainly ready for review. Correct functioning of this patch depends on the patches submitted earlier. As suggested by Dan Williams, I implemented DMA_SLAVE support for the fsldma controller to allow me to use the hardware to transfer to/from a scatterlist to a list of hardware address/length pairs. I implemented support for the extra features available in the DMA controller, such as external pause and external start. I have not tested the features yet. I am willing to drop the support if everything else looks good. I have implemented helper functions for creating the list of hardware address/length pairs as static inline functions in the linux/fsldma.h header. Should I incorporate these into the driver itself and use EXPORT_SYMBOL()? I've never done this before :) Using EXPORT_SYMBOL would defeat the purpose of conforming to the dmaengine api which should allow other subsystems to generically discover an fsldma resource. Any driver would still use dma_request_channel(), etc. to get access to a DMA channel. AFAICT, DMA_SLAVE is intended for doing something completely hardware-specific with the DMA controller. diff --git a/include/linux/fsldma.h b/include/linux/fsldma.h new file mode 100644 index 000..a42dcdd --- /dev/null +++ b/include/linux/fsldma.h @@ -0,0 +1,105 @@ +/* + * Freescale MPC83XX / MPC85XX DMA Controller + * + * Copyright (c) 2009 Ira W. Snyder i...@ovro.caltech.edu + * + * This file is licensed under the terms of the GNU General Public License + * version 2. This program is licensed as is without any warranty of any + * kind, whether express or implied. + */ + +#ifndef __LINUX_FSLDMA_H__ +#define __LINUX_FSLDMA_H__ + +#include linux/dmaengine.h + +/* + * physical hardware address / length pair for use with the + * DMAEngine DMA_SLAVE API + */ +struct fsl_dma_hw_addr { + struct list_head entry; + + dma_addr_t address; + size_t length; +}; Can you explain a bit more why you need the new dma address list, would a struct scatterlist suffice? I don't believe so. A scatterlist only holds page/length pairs. How would you pass an arbitrary dma_addr_t/length pair in a scatterlist. I /could/ abuse sg_dma_address() and do something like the following, but I think you'd be even less inclined to take the patch: struct scatterlist sg[10]; sg_dma_address(sg) = addr1; sg_dma_len(sg) = len1; sg++; sg_dma_address(sg) = addr2; sg_dma_len(sg) = len2; /* and so on */ This would mean that there is a scatterlist with the struct page pointers set to NULL, which has not had dma_map_sg() run on it. Seems like abuse to me. What I've implemented is this: (sorry about the poor drawing) scatterlist fsl_dma_hw_addr +++---+ | DATA | | DEST1 | | DATA | + +---+ | DATA | | | DATA | | +---+ | DATA | +--- | DEST2 | | DATA |+---+ ++ . . . Of course, the reverse works as well. You can copy from a list of hardware address/length pairs to a scatterlist. So, using my implementation of the DMA_SLAVE feature, you can take a big chunk of data (which is organized into a scatterlist) and DMA it directly to a set of hardware addresses, all in a single, unbroken transaction. I've got an FPGA programmer which needs a ~12MB image dumped to a FIFO at 0xf0003000 in 4K chunks (all writes must be in the 0xf0003000 to 0xf0004000 range). The programmer is actually in control of the DMA controller at that time. Internally, the FPGA programmer does some toggling of pins, etc. which is needed to actually push the image into the FPGA's themselves. In general it is difficult to merge new functionality without an in-tree user. Can you share the client of this new api? I've inlined the driver for the FPGA programmer below. I don't think it is appropriate to push into mainline, since it will only work for our board, and nothing else. It is pretty simple, but I'm totally open to suggestions for changes. I used a char device to fill in a scatterlist, then set up the DMA to 0xf0003000 in 4K chunks. I've got another driver that uses the interface, but this one is a bit simpler. I can post the other one if you'd like, as well. I
[PATCH RFC 1/2] Makefile: Never use -fno-omit-frame-pointer
According to Segher Boessenkool and GCC manual, -fomit-frame-pointer is only the default when optimising on archs/ABIs where it doesn't hinder debugging and -pg. So, we do not get it by default on x86, not at any optimisation level. On the other hand, *using* -fno-omit-frame-pointer causes gcc to produce buggy code on PowerPC targets. If Segher and GCC manual are right, this patch should be a no-op for all arches except PowerPC, where the patch fixes gcc issues. Signed-off-by: Anton Vorontsov avoront...@ru.mvista.com --- See this thread for more discussion: http://osdir.com/ml/linux-kernel/2009-05/msg01754.html p.s. Obviously, I didn't test this patch on anything else but PPC32. ;-) Segher, do you know if all GCC versions that we support for building Linux are behaving the way that GCC manual describe? Thanks, Makefile |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Makefile b/Makefile index ea63667..70ad1ff 100644 --- a/Makefile +++ b/Makefile @@ -535,7 +535,7 @@ KBUILD_CFLAGS += $(call cc-option, -fno-stack-protector) endif ifdef CONFIG_FRAME_POINTER -KBUILD_CFLAGS += -fno-omit-frame-pointer -fno-optimize-sibling-calls +KBUILD_CFLAGS += -fno-optimize-sibling-calls else KBUILD_CFLAGS += -fomit-frame-pointer endif -- 1.6.3.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH RFC 2/2] powerpc: Remove -fno-omit-frame-pointer workarounds
The workarounds aren't needed any longer since the top level Makefile doesn't pass -fno-omit-frame-pointer cflag for PowerPC builds. Signed-off-by: Anton Vorontsov avoront...@ru.mvista.com --- arch/powerpc/Makefile|5 - arch/powerpc/kernel/Makefile | 12 ++-- arch/powerpc/platforms/powermac/Makefile |2 +- lib/Kconfig.debug|6 +++--- 4 files changed, 10 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index bc35f4e..dd3c63b 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -120,11 +120,6 @@ ifeq ($(CONFIG_6xx),y) KBUILD_CFLAGS += -mcpu=powerpc endif -# Work around a gcc code-gen bug with -fno-omit-frame-pointer. -ifeq ($(CONFIG_FUNCTION_TRACER),y) -KBUILD_CFLAGS += -mno-sched-epilog -endif - cpu-as-$(CONFIG_4xx) += -Wa,-m405 cpu-as-$(CONFIG_6xx) += -Wa,-maltivec cpu-as-$(CONFIG_POWER4)+= -Wa,-maltivec diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 612b0c4..19ebe1b 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -14,14 +14,14 @@ endif ifdef CONFIG_FUNCTION_TRACER # Do not trace early boot code -CFLAGS_REMOVE_cputable.o = -pg -mno-sched-epilog -CFLAGS_REMOVE_prom_init.o = -pg -mno-sched-epilog -CFLAGS_REMOVE_btext.o = -pg -mno-sched-epilog -CFLAGS_REMOVE_prom.o = -pg -mno-sched-epilog +CFLAGS_REMOVE_cputable.o = -pg +CFLAGS_REMOVE_prom_init.o = -pg +CFLAGS_REMOVE_btext.o = -pg +CFLAGS_REMOVE_prom.o = -pg # do not trace tracer code -CFLAGS_REMOVE_ftrace.o = -pg -mno-sched-epilog +CFLAGS_REMOVE_ftrace.o = -pg # timers used by tracing -CFLAGS_REMOVE_time.o = -pg -mno-sched-epilog +CFLAGS_REMOVE_time.o = -pg endif obj-y := cputable.o ptrace.o syscalls.o \ diff --git a/arch/powerpc/platforms/powermac/Makefile b/arch/powerpc/platforms/powermac/Makefile index 50f1693..0eb8781 100644 --- a/arch/powerpc/platforms/powermac/Makefile +++ b/arch/powerpc/platforms/powermac/Makefile @@ -2,7 +2,7 @@ CFLAGS_bootx_init.o += -fPIC ifdef CONFIG_FUNCTION_TRACER # Do not trace early boot code -CFLAGS_REMOVE_bootx_init.o = -pg -mno-sched-epilog +CFLAGS_REMOVE_bootx_init.o = -pg endif obj-y += pic.o setup.o time.o feature.o pci.o \ diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 39f113a..1479e54 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -492,7 +492,7 @@ config LOCKDEP bool depends on DEBUG_KERNEL TRACE_IRQFLAGS_SUPPORT STACKTRACE_SUPPORT LOCKDEP_SUPPORT select STACKTRACE - select FRAME_POINTER if !MIPS !PPC !ARM_UNWIND !S390 + select FRAME_POINTER if !MIPS !ARM_UNWIND !S390 select KALLSYMS select KALLSYMS_ALL @@ -861,13 +861,13 @@ config FAULT_INJECTION_STACKTRACE_FILTER depends on FAULT_INJECTION_DEBUG_FS STACKTRACE_SUPPORT depends on !X86_64 select STACKTRACE - select FRAME_POINTER if !PPC !S390 + select FRAME_POINTER if !S390 help Provide stacktrace filter for fault-injection capabilities config LATENCYTOP bool Latency measuring infrastructure - select FRAME_POINTER if !MIPS !PPC !S390 + select FRAME_POINTER if !MIPS !S390 select KALLSYMS select KALLSYMS_ALL select STACKTRACE -- 1.6.3.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: 2.6.30 kernel panic with any network driver
John Linn wrote: console [ttyS0] enabled brd: module loaded loop: module loaded Unable to handle kernel paging request for data at address 0x75686369 That address spells uhci, FWIW. Maybe a bad pointer cast somewhere in the USB platform data? -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] RFC: powerpc: expose the multi-bit ops that underlie single-bit ops.
On Tue, 2009-06-16 at 10:28 -0400, Geoff Thorpe wrote: Hi ! Sorry for the delay, that was on my have a look one of these days, low priority list for a bit too long :-) NP, optimal throughput often requires a compromise in latency :-) Hehehe, so true :-) I'm not sure it's useful to provide a multi-bit variant of the lock and unlock primitives. Do other archs do ? For clear_bit_unlock(), no they don't appear to. There is a fallback in include/asm-generic though, and I notice that it's used in a few places, eg. drivers/rtc/rtc-dev. OTOH some other archs appear to provide their own test_and_set_bit_lock(), and there's a fallback in includes/asm-generic for that too. Do you see a reason to isolate either of these and not factor out the inner word-based logic? Don't bother, especially if we are using a macro to generate them, we may as well make them all look the same. .../... Yup, believe it or not, I saw this coming but didn't have the guts to try proposing something like it up-front (in particular, I was wary of botching some subtleties in the assembly). Fair enough :-) Maybe we can shrink that file significantly (and avoid the risk for typos etc...) by generating them all from a macro. Something like (typed directly into the mailer :-) #define DEFINE_BITOP(op, prefix, postfix) \ asm volatile (\ prefix\ 1:PPC_LLARX %0,0,%3\n \ __stringify(op) %1,%0,%2\n \ PPC405_ERR77(0,%3)\ PPC_STLCX %1,0,%3\n \ bne- 1b\n \ postfix \ : =r (old), =r (t) : r (mask), r (p) : cc, memory) and so: static inline void set_bits(unsigned long mask, volatile unsigned long *addr) { unsigned long old, t; DEFINE_BITOP(or, , ); } static inline void test_and_set_bits(unsigned long mask, volatile unsigned long *addr) { unsigned long old, t; DEFINE_BITOP(or, LWSYNC_ON_SMP, ISYNC_ON_SMP); return (old mask) != 0; } etc... Sounds good, I'll try working this up and I'll send a new patch shortly. You can also go totally mad and generate the whole function (both -s and non -s variants) from one macro but I wouldn't go that far :-) So can I assume implicitly that changing the set_bits() function to add the 'volatile' qualifier to the prototype (and the missing PPC405_ERR77() workaround) was OK? The PPC405_ERR77 workaround is definitely needed. The volatile, well, I suspect it's useless, but it will remove warnings when callers call these on something that is declared as volatile in the first place. Do x86 use volatile there ? If not, then don't do it on powerpc neither, it could well be an historical remain. It's not functionally useful, the memory clobber in the asm takes care of telling the compiler not to mess around I believe. Also - any opinion on whether the same re-factoring of the asm-generic versions should be undertaken? I'm not looking to bite off more than I can chew, but I don't know if it's frowned upon to make powerpc-only extensions to the API. And if you think an asm-generic patch makes sense, could that be taken via linuxppc-dev too or does it need to go elsewhere? I'm not people care -that- much :-) You can always try and post it to lkml (maybe linux-arch too) and see what comes back... but let's finish the powerpc variant first :-) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] lib: Provide generic atomic64_t implementation
On Sun, Jun 14, 2009 at 04:04:36PM +0300, Avi Kivity wrote: Paul Mackerras wrote: Avi Kivity writes: An alternative implementation using 64-bit cmpxchg will recover most of the costs of hashed spinlocks. I assume most serious 32-bit architectures have them? Have a 64-bit cmpxchg, you mean? x86 is the only one I know of, and it already has an atomic64_t implementation using cmpxchg8b (or whatever it's called). Yes (and it is cmpxchg8b). I'm surprised powerpc doesn't have DCAS support. Well, s390 and m68k have the equivalent (although I don't think Linux suppiorts SMP m68k, although some dual 68040/68060 boards have existed). But 32 bit PPC will never have it. It just does not fit in the architecture since integer loads and stores are limited to 32 bit (or split into 32 bit chunks). Besides that there is no instruction that performs a read-modify-write of memory. This would make the LSU much more complex for a corner case. Hey, Intel also botched the first implementation of cmpxchg8b on the Pentium: the (in)famous f00f bug is actually lock cmpxchg8b with a register operand. Now for these counters, other solutions could be considered, like using the most significant bit as a lock and having only 63 usable bits (when counting ns, this overflows at 292 years). My thinking is that the 32-bit non-x86 architectures will be mostly UP, so the overhead is just an interrupt enable/restore. Those that are SMP I would expect to be small SMP -- mostly just 2 cpus and maybe a few 4-way systems. The new Nehalems provide 8 logical threads in a single socket. All those threads share a cache, and they have cmpxchg8b anyway, so this won't matter. The problem is not Nehalem (who wants to run 32 bit kernels on a Nehalem anyway) or x86. The problem is that the assumption that the largest PPC32 SMP are 4 way may be outdated: http://www.freescale.com/webapp/sps/site/prod_summary.jsp?fastpreview=1code=P4080 and some products including that processor have been announced (I don't know whether they are shipping or not) and (apparently) run Linux. Gabriel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] [PATCH 2/2 v2] qe: add polling timeout to qe_issue_cmd()
On May 26, 2009, at 10:21 AM, Timur Tabi wrote: The qe_issue_cmd() function (Freescale PowerPC QUICC Engine library) polls on a register until a status bit changes, but does not include a timeout to handle the situation if the bit never changes. Change the code to use the new spin_event_timeout() macro, which simplifies polling on a register without a timeout. Signed-off-by: Timur Tabi ti...@freescale.com --- This patch depends on my previous patch, powerpc: introduce macro spin_event_timeout(). arch/powerpc/sysdev/qe_lib/qe.c |9 ++--- 1 files changed, 6 insertions(+), 3 deletions(-) applied to next - k ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: PowerPC PCI DMA issues (prefetch/coherency?)
On Tue, 2009-06-16 at 20:02 +0200, Arnd Bergmann wrote: On Tuesday 16 June 2009, Scott Wood wrote: If the device is the only one, you can also use dma_alloc_noncoherent() and flush explicitly with dma_cache_sync(). I don't see how that would help -- aren't those also controlled by CONFIG_NOT_COHERENT_CACHE? Ah, yes you are right. PowerPC implements dma_alloc_noncoherent as dma_alloc_coherent, so dma_cache_sync() is actually a NOP (or should be). But we still need to sync the result of dma_map_* when used multiple times for a single mapping. Cheers, Ben. Actually there seems to be a bug in here: Since dma_alloc_noncoherent gives you a coherent mapping (or NULL) on noncoherent machines, dma_cache_sync() is redundant and should not actually flush the cache, or we should change dma_alloc_noncoherent to do a simple alloc_pages on CONFIG_NON_COHERENT_CACHE and leave dma_cache_sync() as it is. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: PowerPC PCI DMA issues (prefetch/coherency?)
On Wed, 17 Jun 2009 10:18:45 +1000 Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Tue, 2009-06-16 at 20:02 +0200, Arnd Bergmann wrote: On Tuesday 16 June 2009, Scott Wood wrote: If the device is the only one, you can also use dma_alloc_noncoherent() and flush explicitly with dma_cache_sync(). I don't see how that would help -- aren't those also controlled by CONFIG_NOT_COHERENT_CACHE? Ah, yes you are right. PowerPC implements dma_alloc_noncoherent as dma_alloc_coherent, so dma_cache_sync() is actually a NOP (or should be). But we still need to sync the result of dma_map_* when used multiple times for a single mapping. We have dma_sync_{single|sg}_for_{cpu|device} API for the above purpose. dma_cache_sync is supposed to be used only with the buffers that dma_alloc_noncoherent() returns. On architecutures that maps dma_alloc_noncoherent to dma_alloc_coherent, dma_cache_sync() is supposed to be NOP. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: PowerPC PCI DMA issues (prefetch/coherency?)
Hello all, On Wed, Jun 17, 2009 at 2:37 AM, FUJITA Tomonorifujita.tomon...@lab.ntt.co.jp wrote: On Wed, 17 Jun 2009 10:18:45 +1000 Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Tue, 2009-06-16 at 20:02 +0200, Arnd Bergmann wrote: On Tuesday 16 June 2009, Scott Wood wrote: If the device is the only one, you can also use dma_alloc_noncoherent() and flush explicitly with dma_cache_sync(). I don't see how that would help -- aren't those also controlled by CONFIG_NOT_COHERENT_CACHE? Ah, yes you are right. PowerPC implements dma_alloc_noncoherent as dma_alloc_coherent, so dma_cache_sync() is actually a NOP (or should be). But we still need to sync the result of dma_map_* when used multiple times for a single mapping. We have dma_sync_{single|sg}_for_{cpu|device} API for the above purpose. dma_cache_sync is supposed to be used only with the buffers that dma_alloc_noncoherent() returns. On architecutures that maps dma_alloc_noncoherent to dma_alloc_coherent, dma_cache_sync() is supposed to be NOP. This discussion raised some doubt with me about my use case: I my case (note I am not the poster) I am using (what LDD3 calls) streaming mappings: I use pci_map_sg(), have the device perform either DMA master reads or writes to the bus address using PCIe. After that, I use pci_unmap_sg(). My assumption is that pci_unmap_sg() either makes the cache coherent or invalidated and thus I do not need to take further actions. This is on a MPC83xx or 85xx system. Is this assumption correct? Regards, Leon/ -- Leon ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] RFC: powerpc: expose the multi-bit ops that underlie single-bit ops.
On Wed, 17 Jun 2009 07:33:46 +1000 Benjamin Herrenschmidt b...@kernel.crashing.org wrote: You can also go totally mad and generate the whole function (both -s and non -s variants) from one macro but I wouldn't go that far :-) Please don't (unless you keep the function names intact), it makes finding the function with grep almost impossible ... Also - any opinion on whether the same re-factoring of the asm-generic versions should be undertaken? I'm not looking to bite off more than I can chew, but I don't know if it's frowned upon to make powerpc-only extensions to the API. And if you think an asm-generic patch makes sense, could that be taken via linuxppc-dev too or does it need to go elsewhere? I'm not people care -that- much :-) You can always try and post it to lkml (maybe linux-arch too) and see what comes back... but let's finish the powerpc variant first :-) Also, we have an asm-generic maintainer now ... -- Cheers, Stephen Rothwells...@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ pgpNr9zFd9saM.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: PowerPC PCI DMA issues (prefetch/coherency?)
On Wed, 2009-06-17 at 09:37 +0900, FUJITA Tomonori wrote: dma_cache_sync is supposed to be used only with the buffers that dma_alloc_noncoherent() returns. On architecutures that maps dma_alloc_noncoherent to dma_alloc_coherent, dma_cache_sync() is supposed to be NOP. Or at least a sync() on powerpc but yeah, I see. We should probably do that. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Warning in block code
Hoy ! I see that: block/blk-settings.c: In function ‘blk_set_default_limits’: block/blk-settings.c:115: warning: large integer implicitly truncated to unsigned type Comes from lim-bounce_pfn = BLK_BOUNCE_ANY; With BLK_BOUNCE_ANY being a include/linux/blkdev.h:#define BLK_BOUNCE_ANY (-1ULL) And struct queue_limit.bounce_pfn is: unsigned long bounce_pfn; (The warning is fishy as both quantities are unsigned, but there -is- truncation happening here). Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] powerpc/of: Fix usage of dev_set_name() in of_device_alloc()
dev_set_name() takes a format string, so use it properly and avoid a warning with recent gcc's Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/kernel/of_device.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- linux-work.orig/arch/powerpc/kernel/of_device.c 2009-06-17 11:34:54.0 +1000 +++ linux-work/arch/powerpc/kernel/of_device.c 2009-06-17 11:35:04.0 +1000 @@ -76,7 +76,7 @@ struct of_device *of_device_alloc(struct dev-dev.archdata.of_node = np; if (bus_id) - dev_set_name(dev-dev, bus_id); + dev_set_name(dev-dev, %s, bus_id); else of_device_make_bus_id(dev); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/2] powerpc/440: Fix warning early debug code
The function udbg_44x_as1_flush() has the wrong prototype causing a warning when enabling 440 early debug. Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/kernel/udbg_16550.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- linux-work.orig/arch/powerpc/kernel/udbg_16550.c2009-06-17 11:52:05.0 +1000 +++ linux-work/arch/powerpc/kernel/udbg_16550.c 2009-06-17 11:52:08.0 +1000 @@ -219,7 +219,7 @@ void udbg_init_pas_realmode(void) #ifdef CONFIG_PPC_EARLY_DEBUG_44x #include platforms/44x/44x.h -static int udbg_44x_as1_flush(void) +static void udbg_44x_as1_flush(void) { if (udbg_comport) { while ((as1_readb(udbg_comport-lsr) LSR_THRE) == 0) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Update Warp defconfig
On Tue, 16 Jun 2009 11:24:35 -0400 Sean MacLennan smaclen...@pikatech.com wrote: I forgot to include the defconfig in the last set of patches. So you don't get to use the shiny new LEDS driver on the warp unless you turn it on. Enabling hotplug is also very important since we have moved to udev on the warp. Like most defconfig patches, most of the changes are just the normal kernel changes, not warp specific. If it is too early, and we want to hold off until say rc4 with defconfigs I have no problem with that. I have tried to outline all the changes we specifically made. If anybody sees any bad choices with options, let me know! Cheers, Sean * Enable GPIO LEDS * Enable LED triggers * Move to slub * Enable hotplug * Enable timestamps on printks * Enable UBIFS Oops, missed an important option in the list: * Enable in kernel config If you use Pika's development environment (PADS), you need this enabled. We use the config to enable certain features, mainly udev. Without this config, we will assume you have an old Pika kernel without new features. Cheers, Sean ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/3] powerpc/rtas: Turn rtas lock into a raw spinlock
RTAS currently uses a normal spinlock. However it can be called from contexts where this is not necessarily a good idea. For example, it can be called while syncing timebases, with the core timebase being frozen. Unfortunately, that will deadlock in case of lock contention when spinlock debugging is enabled as the spin lock debugging code will try to use __delay() which ... relies on the timebase being enabled. Also RTAS can be used in some low level IRQ handling code path so it may as well be a raw spinlock for -rt sake. Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/include/asm/rtas.h |2 +- arch/powerpc/kernel/rtas.c | 38 +- 2 files changed, 30 insertions(+), 10 deletions(-) --- linux-work.orig/arch/powerpc/include/asm/rtas.h 2009-06-10 16:09:36.0 +1000 +++ linux-work/arch/powerpc/include/asm/rtas.h 2009-06-10 16:40:50.0 +1000 @@ -58,7 +58,7 @@ struct rtas_t { unsigned long entry;/* physical address pointer */ unsigned long base; /* physical address pointer */ unsigned long size; - spinlock_t lock; + raw_spinlock_t lock; struct rtas_args args; struct device_node *dev;/* virtual address pointer */ }; Index: linux-work/arch/powerpc/kernel/rtas.c === --- linux-work.orig/arch/powerpc/kernel/rtas.c 2009-06-10 16:09:36.0 +1000 +++ linux-work/arch/powerpc/kernel/rtas.c 2009-06-10 16:40:50.0 +1000 @@ -40,7 +40,7 @@ #include asm/atomic.h struct rtas_t rtas = { - .lock = SPIN_LOCK_UNLOCKED + .lock = __RAW_SPIN_LOCK_UNLOCKED }; EXPORT_SYMBOL(rtas); @@ -67,6 +67,28 @@ unsigned long rtas_rmo_buf; void (*rtas_flash_term_hook)(int); EXPORT_SYMBOL(rtas_flash_term_hook); +/* RTAS use home made raw locking instead of spin_lock_irqsave + * because those can be called from within really nasty contexts + * such as having the timebase stopped which would lockup with + * normal locks and spinlock debugging enabled + */ +static unsigned long lock_rtas(void) +{ + unsigned long flags; + + local_irq_save(flags); + preempt_disable(); + __raw_spin_lock_flags(rtas.lock, flags); + return flags; +} + +static void unlock_rtas(unsigned long flags) +{ + __raw_spin_unlock(rtas.lock); + local_irq_restore(flags); + preempt_enable(); +} + /* * call_rtas_display_status and call_rtas_display_status_delay * are designed only for very early low-level debugging, which @@ -79,7 +101,7 @@ static void call_rtas_display_status(cha if (!rtas.base) return; - spin_lock_irqsave(rtas.lock, s); + s = lock_rtas(); args-token = 10; args-nargs = 1; @@ -89,7 +111,7 @@ static void call_rtas_display_status(cha enter_rtas(__pa(args)); - spin_unlock_irqrestore(rtas.lock, s); + unlock_rtas(s); } static void call_rtas_display_status_delay(char c) @@ -411,8 +433,7 @@ int rtas_call(int token, int nargs, int if (!rtas.entry || token == RTAS_UNKNOWN_SERVICE) return -1; - /* Gotta do something different here, use global lock for now... */ - spin_lock_irqsave(rtas.lock, s); + s = lock_rtas(); rtas_args = rtas.args; rtas_args-token = token; @@ -439,8 +460,7 @@ int rtas_call(int token, int nargs, int outputs[i] = rtas_args-rets[i+1]; ret = (nret 0)? rtas_args-rets[0]: 0; - /* Gotta do something different here, use global lock for now... */ - spin_unlock_irqrestore(rtas.lock, s); + unlock_rtas(s); if (buff_copy) { log_error(buff_copy, ERR_TYPE_RTAS_LOG, 0); @@ -837,7 +857,7 @@ asmlinkage int ppc_rtas(struct rtas_args buff_copy = get_errorlog_buffer(); - spin_lock_irqsave(rtas.lock, flags); + flags = lock_rtas(); rtas.args = args; enter_rtas(__pa(rtas.args)); @@ -848,7 +868,7 @@ asmlinkage int ppc_rtas(struct rtas_args if (args.rets[0] == -1) errbuf = __fetch_rtas_last_error(buff_copy); - spin_unlock_irqrestore(rtas.lock, flags); + unlock_rtas(flags); if (buff_copy) { if (errbuf) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/3] powerpc: Use one common impl. of RTAS timebase sync and use raw spinlock
Several platforms use their own copy of what is essentially the same code, using RTAS to synchronize the timebases when bringing up new CPUs. This moves it all into a single common implementation and additionally turns the spinlock into a raw spinlock since the former can rely on the timebase not being frozen when spinlock debugging is enabled, and finally masks interrupts while the timebase is disabled. Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/include/asm/rtas.h |3 +++ arch/powerpc/kernel/rtas.c | 31 +++ arch/powerpc/platforms/cell/smp.c| 30 ++ arch/powerpc/platforms/chrp/smp.c| 30 ++ arch/powerpc/platforms/pseries/smp.c | 30 ++ 5 files changed, 40 insertions(+), 84 deletions(-) --- linux-work.orig/arch/powerpc/platforms/cell/smp.c 2009-06-17 11:56:31.0 +1000 +++ linux-work/arch/powerpc/platforms/cell/smp.c2009-06-17 11:56:42.0 +1000 @@ -36,7 +36,6 @@ #include asm/prom.h #include asm/smp.h #include asm/paca.h -#include asm/time.h #include asm/machdep.h #include asm/cputable.h #include asm/firmware.h @@ -140,31 +139,6 @@ static void __devinit smp_cell_setup_cpu mtspr(SPRN_DABRX, DABRX_KERNEL | DABRX_USER); } -static DEFINE_SPINLOCK(timebase_lock); -static unsigned long timebase = 0; - -static void __devinit cell_give_timebase(void) -{ - spin_lock(timebase_lock); - rtas_call(rtas_token(freeze-time-base), 0, 1, NULL); - timebase = get_tb(); - spin_unlock(timebase_lock); - - while (timebase) - barrier(); - rtas_call(rtas_token(thaw-time-base), 0, 1, NULL); -} - -static void __devinit cell_take_timebase(void) -{ - while (!timebase) - barrier(); - spin_lock(timebase_lock); - set_tb(timebase 32, timebase 0x); - timebase = 0; - spin_unlock(timebase_lock); -} - static void __devinit smp_cell_kick_cpu(int nr) { BUG_ON(nr 0 || nr = NR_CPUS); @@ -224,8 +198,8 @@ void __init smp_init_cell(void) /* Non-lpar has additional take/give timebase */ if (rtas_token(freeze-time-base) != RTAS_UNKNOWN_SERVICE) { - smp_ops-give_timebase = cell_give_timebase; - smp_ops-take_timebase = cell_take_timebase; + smp_ops-give_timebase = rtas_give_timebase; + smp_ops-take_timebase = rtas_take_timebase; } DBG( - smp_init_cell()\n); Index: linux-work/arch/powerpc/platforms/chrp/smp.c === --- linux-work.orig/arch/powerpc/platforms/chrp/smp.c 2009-06-17 11:56:31.0 +1000 +++ linux-work/arch/powerpc/platforms/chrp/smp.c2009-06-17 11:56:42.0 +1000 @@ -26,7 +26,6 @@ #include asm/io.h #include asm/prom.h #include asm/smp.h -#include asm/time.h #include asm/machdep.h #include asm/mpic.h #include asm/rtas.h @@ -45,37 +44,12 @@ static void __devinit smp_chrp_setup_cpu static DEFINE_SPINLOCK(timebase_lock); static unsigned int timebase_upper = 0, timebase_lower = 0; -void __devinit smp_chrp_give_timebase(void) -{ - spin_lock(timebase_lock); - rtas_call(rtas_token(freeze-time-base), 0, 1, NULL); - timebase_upper = get_tbu(); - timebase_lower = get_tbl(); - spin_unlock(timebase_lock); - - while (timebase_upper || timebase_lower) - barrier(); - rtas_call(rtas_token(thaw-time-base), 0, 1, NULL); -} - -void __devinit smp_chrp_take_timebase(void) -{ - while (!(timebase_upper || timebase_lower)) - barrier(); - spin_lock(timebase_lock); - set_tb(timebase_upper, timebase_lower); - timebase_upper = 0; - timebase_lower = 0; - spin_unlock(timebase_lock); - printk(CPU %i taken timebase\n, smp_processor_id()); -} - /* CHRP with openpic */ struct smp_ops_t chrp_smp_ops = { .message_pass = smp_mpic_message_pass, .probe = smp_mpic_probe, .kick_cpu = smp_chrp_kick_cpu, .setup_cpu = smp_chrp_setup_cpu, - .give_timebase = smp_chrp_give_timebase, - .take_timebase = smp_chrp_take_timebase, + .give_timebase = rtas_give_timebase, + .take_timebase = rtas_take_timebase, }; Index: linux-work/arch/powerpc/platforms/pseries/smp.c === --- linux-work.orig/arch/powerpc/platforms/pseries/smp.c2009-06-17 11:56:31.0 +1000 +++ linux-work/arch/powerpc/platforms/pseries/smp.c 2009-06-17 11:56:42.0 +1000 @@ -35,7 +35,6 @@ #include asm/prom.h #include asm/smp.h #include asm/paca.h -#include asm/time.h #include asm/machdep.h #include asm/cputable.h #include asm/firmware.h @@ -118,31 +117,6 @@ static void __devinit smp_xics_setup_cpu } #endif /* CONFIG_XICS */ -static
[PATCH 3/3] powerpc/pasemi: Use raw spinlock in SMP TB sync
spin_lock() can hang if called while the timebase is frozen, so use a raw lock instead, also disable interrupts while at it. Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/platforms/pasemi/setup.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) --- linux-work.orig/arch/powerpc/platforms/pasemi/setup.c 2009-06-17 12:13:33.0 +1000 +++ linux-work/arch/powerpc/platforms/pasemi/setup.c2009-06-17 12:15:01.0 +1000 @@ -71,20 +71,25 @@ static void pas_restart(char *cmd) } #ifdef CONFIG_SMP -static DEFINE_SPINLOCK(timebase_lock); +static raw_spinlock_t timebase_lock; static unsigned long timebase; static void __devinit pas_give_timebase(void) { - spin_lock(timebase_lock); + unsigned long flags; + + local_irq_save(flags); + hard_irq_disable(); + __raw_spin_lock(timebase_lock); mtspr(SPRN_TBCTL, TBCTL_FREEZE); isync(); timebase = get_tb(); - spin_unlock(timebase_lock); + __raw_spin_unlock(timebase_lock); while (timebase) barrier(); mtspr(SPRN_TBCTL, TBCTL_RESTART); + local_irq_restore(flags); } static void __devinit pas_take_timebase(void) @@ -92,10 +97,10 @@ static void __devinit pas_take_timebase( while (!timebase) smp_rmb(); - spin_lock(timebase_lock); + __raw_spin_lock(timebase_lock); set_tb(timebase 32, timebase 0x); timebase = 0; - spin_unlock(timebase_lock); + __raw_spin_unlock(timebase_lock); } struct smp_ops_t pas_smp_ops = { ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
mm: Move pgtable_cache_init() earlier
Some architectures need to initialize SLAB caches to be able to allocate page tables. They do that from pgtable_cache_init() so the later should be called earlier now, best is before vmalloc_init(). Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- Note: Only powerpc, sparc and xtensa use this and only to call kmem_cache_create() so with a bit of luck it should just work... Index: linux-work/init/main.c === --- linux-work.orig/init/main.c 2009-06-17 13:41:33.0 +1000 +++ linux-work/init/main.c 2009-06-17 13:41:45.0 +1000 @@ -546,6 +546,7 @@ static void __init mm_init(void) page_cgroup_init_flatmem(); mem_init(); kmem_cache_init(); + pgtable_cache_init(); vmalloc_init(); } @@ -684,7 +685,6 @@ asmlinkage void __init start_kernel(void late_time_init(); calibrate_delay(); pidmap_init(); - pgtable_cache_init(); anon_vma_init(); #ifdef CONFIG_X86 if (efi_enabled) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: mm: Move pgtable_cache_init() earlier
Hi Ben, On Wed, Jun 17, 2009 at 6:48 AM, Benjamin Herrenschmidtb...@kernel.crashing.org wrote: Some architectures need to initialize SLAB caches to be able to allocate page tables. They do that from pgtable_cache_init() so the later should be called earlier now, best is before vmalloc_init(). Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org Looks good to me! Acked-by: Pekka Enberg penb...@cs.helsinki.fi ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Warning in block code
On Wed, Jun 17 2009, Benjamin Herrenschmidt wrote: Hoy ! I see that: block/blk-settings.c: In function ???blk_set_default_limits???: block/blk-settings.c:115: warning: large integer implicitly truncated to unsigned type Comes from lim-bounce_pfn = BLK_BOUNCE_ANY; With BLK_BOUNCE_ANY being a include/linux/blkdev.h:#define BLK_BOUNCE_ANY (-1ULL) And struct queue_limit.bounce_pfn is: unsigned long bounce_pfn; (The warning is fishy as both quantities are unsigned, but there -is- truncation happening here). Should be safe to just make it -1UL now. -- Jens Axboe ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
linux-next: kvm/powerpc tree build failure
Hi Avi, Ben, Today's linux-next build (powerpc ppc44x_defconfig) failed like this: cc1: warnings being treated as errors arch/powerpc/kvm/../../../virt/kvm/kvm_main.c: In function '__kvm_set_memory_region': arch/powerpc/kvm/../../../virt/kvm/kvm_main.c:1178: error: integer overflow in expression Probably caused by commit ac04527f7947020c5890090b2ac87af4e98d977e (KVM: Disable large pages on misaligned memory slots). The build fails because arch/powerpc is now being built (mostly) with -Werror. arch/powerpc/include/asm/kvm_host.h:#define KVM_PAGES_PER_HPAGE (131) this needs to be (1UL 31) or ((unsigned int)1 31). I applied the following patch for today. From: Stephen Rothwell s...@canb.auug.org.au Date: Wed, 17 Jun 2009 14:57:29 +1000 Subject: [PATCH] kvm/powerpc: make 32 bit constant unsigned long KVM_PAGES_PER_HPAGE needs to be unsigned long since its value is 2^31. Eliminates this compiler warning: arch/powerpc/kvm/../../../virt/kvm/kvm_main.c:1178: error: integer overflow in expression Signed-off-by: Stephen Rothwell s...@canb.auug.org.au --- arch/powerpc/include/asm/kvm_host.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 3625424..d4caa61 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -34,7 +34,7 @@ #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 /* We don't currently support large pages. */ -#define KVM_PAGES_PER_HPAGE (131) +#define KVM_PAGES_PER_HPAGE (1UL 31) struct kvm; struct kvm_run; -- 1.6.3.1 -- Cheers, Stephen Rothwells...@canb.auug.org.au ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 2/6] Introduce PPC64 specific Hardware Breakpoint interfaces
On Wed, Jun 10, 2009 at 02:38:06PM +0530, K.Prasad wrote: Introduce PPC64 implementation for the generic hardware breakpoint interfaces defined in kernel/hw_breakpoint.c. Enable the HAVE_HW_BREAKPOINT flag and the Makefile. [snip] +void arch_update_kernel_hw_breakpoint(void *unused) +{ + struct hw_breakpoint *bp; + + /* Check if there is nothing to update */ + if (hbp_kernel_pos == HBP_NUM) + return; + + per_cpu(this_hbp_kernel[hbp_kernel_pos], get_cpu()) = bp = + hbp_kernel[hbp_kernel_pos]; + if (bp == NULL) + kdabr = 0; + else + kdabr = (bp-info.address ~HW_BREAKPOINT_ALIGN) | + bp-info.type | DABR_TRANSLATION; + set_dabr(kdabr); + put_cpu_no_resched(); +} + +/* + * Install the thread breakpoints in their debug registers. + */ +void arch_install_thread_hw_breakpoint(struct task_struct *tsk) +{ + set_dabr(tsk-thread.dabr); +} + +/* + * Clear the DABR which contains the thread-specific breakpoint address + */ +void arch_uninstall_thread_hw_breakpoint() +{ + set_dabr(0); +} + +/* + * Store a breakpoint's encoded address, length, and type. + */ +int arch_store_info(struct hw_breakpoint *bp, struct task_struct *tsk) +{ + /* Symbol names from user-space are rejected */ + if (tsk) { + if (bp-info.name) + return -EINVAL; + else + return 0; + } + /* + * User-space requests will always have the address field populated + * For kernel-addresses, either the address or symbol name can be + * specified. + */ + if (bp-info.name) + bp-info.address = (unsigned long) + kallsyms_lookup_name(bp-info.name); + if (bp-info.address) + if (kallsyms_lookup_size_offset(bp-info.address, + (bp-info.symbolsize), NULL)) + return 0; + return -EINVAL; +} + +/* + * Validate the arch-specific HW Breakpoint register settings + */ +int arch_validate_hwbkpt_settings(struct hw_breakpoint *bp, + struct task_struct *tsk) +{ + int is_kernel, ret = -EINVAL; + + if (!bp) + return ret; + + switch (bp-info.type) { + case HW_BREAKPOINT_READ: + case HW_BREAKPOINT_WRITE: + case HW_BREAKPOINT_RW: + break; + default: + return ret; + } + + if (bp-triggered) + ret = arch_store_info(bp, tsk); Under what circumstances would triggered be NULL? It's not clear to me that you wouldn't still need arch_store_info() in this case. + + is_kernel = is_kernel_addr(bp-info.address); + if ((tsk is_kernel) || (!tsk !is_kernel)) + return -EINVAL; + + return ret; +} + +void arch_update_user_hw_breakpoint(int pos, struct task_struct *tsk) +{ + struct thread_struct *thread = (tsk-thread); + struct hw_breakpoint *bp = thread-hbp[0]; + + if (bp) + thread-dabr = (bp-info.address ~HW_BREAKPOINT_ALIGN) | + bp-info.type | DABR_TRANSLATION; + else + thread-dabr = 0; +} + +void arch_flush_thread_hw_breakpoint(struct task_struct *tsk) +{ + struct thread_struct *thread = (tsk-thread); + + thread-dabr = 0; +} + +/* + * Handle debug exception notifications. + */ +int __kprobes hw_breakpoint_handler(struct die_args *args) +{ + int rc = NOTIFY_STOP; + struct hw_breakpoint *bp; + struct pt_regs *regs = args-regs; + unsigned long dar = regs-dar; + int cpu, is_one_shot, stepped = 1; + + /* Disable breakpoints during exception handling */ + set_dabr(0); + + cpu = get_cpu(); + /* Determine whether kernel- or user-space address is the trigger */ + bp = (hbp_kernel_pos == HBP_NUM) ? current-thread.hbp[0] : + per_cpu(this_hbp_kernel[0], cpu); + /* + * bp can be NULL due to lazy debug register switching + * or due to the delay between updates of hbp_kernel_pos + * and this_hbp_kernel. + */ + if (!bp) + goto out; + + is_one_shot = (bp-triggered == ptrace_triggered) ? 1 : 0; + per_cpu(dabr_data, cpu) = (hbp_kernel_pos == HBP_NUM) ? + current-thread.dabr : kdabr; + + /* Verify if dar lies within the address range occupied by the symbol + * being watched. Since we cannot get the symbol size for + * user-space requests we skip this check in that case + */ + if ((hbp_kernel_pos == 0) + !((bp-info.address = dar) + (dar = (bp-info.address + bp-info.symbolsize + /* + * This exception is triggered not because of a memory
Re: [Patch 5/6] Modify Data storage exception code to recognise DABR match first
On Wed, Jun 10, 2009 at 02:38:24PM +0530, K.Prasad wrote: Modify Data storage exception code to first lookout for a DABR match before recognising a kprobe or xmon exception. Signed-off-by: K.Prasad pra...@linux.vnet.ibm.com --- arch/powerpc/mm/fault.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) Index: linux-2.6-tip.hbkpt/arch/powerpc/mm/fault.c === --- linux-2.6-tip.hbkpt.orig/arch/powerpc/mm/fault.c +++ linux-2.6-tip.hbkpt/arch/powerpc/mm/fault.c @@ -136,6 +136,12 @@ int __kprobes do_page_fault(struct pt_re error_code = 0x4820; else is_write = error_code DSISR_ISSTORE; + + if (error_code DSISR_DABRMATCH) { + /* DABR match */ + do_dabr(regs, address, error_code); + return 0; + } Again, given the amount of work you're doing on all the breakpoint paths, I really think you should be rewriting do_dabr(), not just hooking in around it. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 4/6] Modify process and processor handling code to recognise hardware debug registers
On Wed, Jun 10, 2009 at 02:38:18PM +0530, K.Prasad wrote: Modify process handling code to recognise hardware debug registers during copy and flush operations. Introduce a new TIF_DEBUG task flag to indicate a process's use of debug register. Load the debug register values into a new CPU during initialisation. Signed-off-by: K.Prasad pra...@linux.vnet.ibm.com --- arch/powerpc/kernel/process.c | 15 +++ arch/powerpc/kernel/smp.c |2 ++ 2 files changed, 17 insertions(+) Index: linux-2.6-tip.hbkpt/arch/powerpc/kernel/process.c === --- linux-2.6-tip.hbkpt.orig/arch/powerpc/kernel/process.c +++ linux-2.6-tip.hbkpt/arch/powerpc/kernel/process.c @@ -50,6 +50,7 @@ #include asm/syscalls.h #ifdef CONFIG_PPC64 #include asm/firmware.h +#include asm/hw_breakpoint.h #endif #include linux/kprobes.h #include linux/kdebug.h @@ -254,8 +255,10 @@ void do_dabr(struct pt_regs *regs, unsig 11, SIGSEGV) == NOTIFY_STOP) return; +#ifndef CONFIG_PPC64 if (debugger_dabr_match(regs)) return; +#endif Won't this disable the check for breakpoints set by xmon - but I don't see anything in this patch series to convert xmon to use the new breakpoint interface instead. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 2/6] Introduce PPC64 specific Hardware Breakpoint interfaces
On Mon, Jun 15, 2009 at 12:48:28PM +0530, K.Prasad wrote: On Mon, Jun 15, 2009 at 04:40:45PM +1000, David Gibson wrote: On Wed, Jun 10, 2009 at 12:13:49PM +0530, K.Prasad wrote: On Fri, Jun 05, 2009 at 03:11:58PM +1000, David Gibson wrote: On Wed, Jun 03, 2009 at 10:05:11PM +0530, K.Prasad wrote: + else { + /* + * This exception is triggered not because of a memory access on + * the monitored variable but in the double-word address range + * in which it is contained. We will consume this exception, + * considering it as 'noise'. + */ + rc = NOTIFY_STOP; + goto out; + } + is_one_shot = (bp-triggered == ptrace_triggered) ? 1 : 0; Ouch, explicitly special-casing ptrace_triggered is pretty nasty. Since the bp_info is already arch specific, maybe it should include a flag to indicate whether the breakpoint is one-shot or not. The reason to check for ptrace_triggered is to contain the one-shot behaviour only to ptrace (thus retaining the semantics) and not to extend them to all user-space requests through register_user_hw_breakpoint(). Right, but couldn't you implement that withing ptrace_triggered itself, without a special test here, by having it cancel the breakpoint. A special check (either using the callback routine as above, or using a special flag) will be required in hw_breakpoint_handler() to enable early return (without single-stepping). I'm not sure if I got your suggestion right, and let me know if you think so. Well.. you could also recheck after calling triggered whether the breakpoint is still in existence. So cancelling the breakpoint in triggered would. Or you could change the signature for the triggered function, so it returns whether to leave the breakpoint active or not. That would have some advantages you could easily implement either one-shot, continuous or N-shot, or keep firing until some specific event behaviour from within the triggered function. But all that's a bit moot, because of the fact that you try to make the TRAP timing consistent (before or after execution of triggering instruction) but you don't do so for a general triggered hook. A one-shot behaviour for all user-space requests would create more work for the user-space programs (such as re-registration) and will leave open a small window of opportunity for debug register grabbing by kernel-space requests. So, in effect a request through register_user_hw_breakpoint() interface will behave as under: - Single-step over the causative instruction that triggered the breakpoint exception handler. - Deliver the SIGTRAP signal to user-space after executing the causative instruction. This behaviour is in consonance with that of kernel-space requests and those on x86 processors, and helps define a consistent behaviour across architectures for user-space. Let me know what you think on the same. I certainly see the value in consistent semantics across archs. However, I can also see uses for the powerpc trap-before-execute behaviour. That's why I'm suggesting it might be worth having an arch-specific flag. [snip] So, you suggest that the 'one-shot' behaviour should be driven by user-request and not just confined to ptrace? (The default behaviour for all breakpoints-minus-ptrace will remain 'continuous' though). Not one-shot behaviour as such, but whether the trigger fires before or after execution of the triggering instruction. The complication is that combining fire-before semantics with continuous firing becomes tricky. It can be implemented through an additional flag in 'struct arch_hw_breakpoint'. I can send a new version 7 of the patchset with this change (with the hope that the version 6 of the patchset looks fine in its present form!). Meanwhile, we'd like to know what uses you see in addition to the present one if the one-shot behaviour is made user-defined. Are those uses beyond what can be achieved through the present ptrace interface? +int __kprobes single_step_dabr_instruction(struct die_args *args) +{ + struct pt_regs *regs = args-regs; + int cpu = get_cpu(); + int ret = NOTIFY_DONE; + siginfo_t info; + unsigned long this_dabr_data = per_cpu(dabr_data, cpu); + + /* + * Check if we are single-stepping as a result of a + * previous HW Breakpoint exception + */ + if (this_dabr_data == 0) + goto out; + + regs-msr = ~MSR_SE; + /* Deliver signal to user-space */ + if (this_dabr_data TASK_SIZE) { + info.si_signo = SIGTRAP; + info.si_errno = 0; + info.si_code = TRAP_HWBKPT; +