Re: [PATCH] irqbalance, powerpc: add IRQs without settable SMP affinity to banned list
size_t size =3D 0; FILE *file; sprintf(buf, /proc/irq/%i/smp_affinity, number); - file =3D fopen(buf, r); + file =3D fopen(buf, r+); if (!file) continue; if (getline(line, size, file)=3D=3D0) { @@ -89,7 +89,14 @@ continue; } cpumask_parse_user(line, strlen(line), irq-mask); - fclose(file); + /* + * Check that we can write the affinity, if + * not take it out of the list. + */ + if (fputs(line, file) =3D=3D EOF) + can_set =3D 0; This is maybe a nit, but writing to the affinity file can fail for a few different reasons, some of them permanent, some transient. For instance,= if we're in a memory constrained condition temporarily irq_affinity_proc_wri= te might return -ENOMEM. =20 Yeah true, usually followed shortly by your kernel going so far into swap you never get it back, or OOMing, but I guess it's possible. Might it be better to modify this code so that, instead of using fputs to merge the various errors into an EOF, we use some other= write method that lets us better determine the error and selectively ban the in= terrupt only for those errors which we consider permanent? Yep. It seems fputs() gives you know way to get the actual error from write(), so it looks we'll need to switch to open/write, but that's probably not so terrible. fclose inherits the error from fputs and it sets errno correctly. Below uses this to catch only EIO errors and mark them for the banned list. Mikey irqbalance, powerpc: add IRQs without settable SMP affinity to banned list On pseries powerpc, IPIs are registered with an IRQ number so /proc/interrupts looks like this on a 2 core/2 thread machine: CPU0 CPU1 CPU2 CPU3 16:316428232905141138794 983121 XICS Level IPI 18:2605674 0 304994 0 XICS Level lan0 30: 400057 0 169209 0 XICS Level ibmvscsi LOC: 133734 77250 106425 91951 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts CNT: 0 0 0 0 Performance monitoring interrupts MCE: 0 0 0 0 Machine check exceptions Unfortunately this means irqbalance attempts to set the affinity of IPIs which is not possible. So in the above case, when irqbalance is in performance mode due to heavy IPI, lan0 and ibmvscsi activity, it sometimes attempts to put the IPIs on one core (CPU01) and lan0 and ibmvscsi on the other core (CPU23). This is suboptimal as we want lan0 and ibmvscsi to be on separate cores and IPIs to be ignored. When irqblance attempts writes to the IPI smp_affinity (ie. /proc/irq/16/smp_affinity in the above example) it fails with an EIO but irqbalance currently ignores this. This patch catches these write fails and in this case adds that IRQ number to the banned IRQ list. This will catch the above IPI case and any other IRQ where the SMP affinity can't be set. Tested on POWER6, POWER7 and x86. Signed-off-by: Michael Neuling mi...@neuling.org Index: irqbalance/irqlist.c === --- irqbalance.orig/irqlist.c +++ irqbalance/irqlist.c @@ -28,6 +28,7 @@ #include unistd.h #include sys/types.h #include dirent.h +#include errno.h #include types.h #include irqbalance.h @@ -67,7 +68,7 @@ DIR *dir; struct dirent *entry; char *c, *c2; - int nr , count = 0; + int nr , count = 0, can_set = 1; char buf[PATH_MAX]; sprintf(buf, /proc/irq/%i, number); dir = opendir(buf); @@ -80,7 +81,7 @@ size_t size = 0; FILE *file; sprintf(buf, /proc/irq/%i/smp_affinity, number); - file = fopen(buf, r); + file = fopen(buf, r+); if (!file) continue; if (getline(line, size, file)==0) { @@ -89,7 +90,13 @@ continue; } cpumask_parse_user(line, strlen(line), irq-mask); - fclose(file); + /* +* Check that we can write the affinity, if +* not take it out of the list. +*/ + fputs(line, file); + if (fclose(file) errno == EIO) + can_set = 0;
Re: [PATCH 1/2] PPC4xx: Generelizing drivers/dma/ppc4xx/adma.c
On Friday 24 September 2010 00:39:47 Tirumala Marri wrote: Will both versions of this driver exist in the same kernel build? For example the iop-adma driver supports iop13xx and iop3xx, but we select the archtitecture at build time? Or, as I assume in this case, will the two (maybe more?) ppc4xx adma drivers all be built in the same image, more like ioatdma? [Marri] We select the architecture at build time. It would be really preferable to support all those platforms in a single Linux image. If technically possible, please try to move this direction. Thanks. Cheers, Stefan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] spi_mpc8xxx: issue with using definition of pram in Device Tree
On Thu, Sep 16, 2010 at 09:05:03AM +0200, christophe leroy wrote: This patch applies to 2.6.34.7 and 2.6.35.4 It fixes an issue during the probe for CPM1 with definition of parameter ram from DTS Signed-off-by: christophe leroy christophe.le...@c-s.fr I'm sorry, I don't understand the fix from the given description. What is the problem, and why is cpm_muram_alloc_fixed() the wrong thing to call on CPM1? Does CPM2 still need it? g. diff -urN b/drivers/spi/spi_mpc8xxx.c c/drivers/spi/spi_mpc8xxx.c --- b/drivers/spi/spi_mpc8xxx.c 2010-09-08 16:43:50.0 +0200 +++ c/drivers/spi/spi_mpc8xxx.c 2010-09-08 16:44:03.0 +0200 @@ -822,7 +822,7 @@ if (!iprop || size != sizeof(*iprop) * 4) return -ENOMEM; - spi_base_ofs = cpm_muram_alloc_fixed(iprop[2], 2); + spi_base_ofs = iprop[2]; if (IS_ERR_VALUE(spi_base_ofs)) return -ENOMEM; @@ -844,7 +844,6 @@ return spi_base_ofs; } - cpm_muram_free(spi_base_ofs); return pram_ofs; } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] spi_mpc8xxx: issue with using definition of pram in Device Tree
Hello, The issue is that cpm_muram_alloc_fixed() allocates memory from the general purpose muram area (from 0x0 to 0x1bff). Here we need to return a pointer to the parameter RAM, which is located somewhere starting at 0x1c00. It is not a dynamic allocation that is required here but only to point on the correct location in the parameter RAM. For the CPM2, I don't know. I'm working with a MPC866. Attached is a previous discussion on the subject where I explain a bit more in details the issue. Regards C. Leroy Le 24/09/2010 09:10, Grant Likely a écrit : On Thu, Sep 16, 2010 at 09:05:03AM +0200, christophe leroy wrote: This patch applies to 2.6.34.7 and 2.6.35.4 It fixes an issue during the probe for CPM1 with definition of parameter ram from DTS Signed-off-by: christophe leroychristophe.le...@c-s.fr I'm sorry, I don't understand the fix from the given description. What is the problem, and why is cpm_muram_alloc_fixed() the wrong thing to call on CPM1? Does CPM2 still need it? g. diff -urN b/drivers/spi/spi_mpc8xxx.c c/drivers/spi/spi_mpc8xxx.c --- b/drivers/spi/spi_mpc8xxx.c 2010-09-08 16:43:50.0 +0200 +++ c/drivers/spi/spi_mpc8xxx.c 2010-09-08 16:44:03.0 +0200 @@ -822,7 +822,7 @@ if (!iprop || size != sizeof(*iprop) * 4) return -ENOMEM; - spi_base_ofs = cpm_muram_alloc_fixed(iprop[2], 2); + spi_base_ofs = iprop[2]; if (IS_ERR_VALUE(spi_base_ofs)) return -ENOMEM; @@ -844,7 +844,6 @@ return spi_base_ofs; } - cpm_muram_free(spi_base_ofs); return pram_ofs; } ---BeginMessage--- On Tue, 7 Sep 2010 11:17:17 +0200 LEROY Christophe christophe.le...@c-s.fr wrote: Dear Kumar, I have a small issue in the init of spi_mpc8xxx.c with MPC866 (CPM1) Unlike cpm_uart that maps the parameter ram directly using of_iomap(np,1), spi_mpc8xxx.c uses cpm_muram_alloc_fixed(). This has two impacts in the .dts file: * The driver must be declared with pram at 1d80 instead of 3d80 whereas it is not a child of mu...@2000 but a child of c...@9c0 * mu...@2000/d...@0 must be declared with reg = 0x0 0x2000 whereas is should be reg=0x0 0x1c00 to avoid cpm_muram_alloc() to allocate space from parameters ram. Maybe I misunderstood something ? Don't make the device tree lie, fix the driver instead. The allocator should not be given any chunks of muram that are dedicated to a fixed purpose -- it might hand it out to something else before you reserve it. I don't think that cpm_muram_alloc_fixed() has any legitimate use at all. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ---End Message--- ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.
On Thu, Sep 23, 2010 at 12:48:51PM -0700, john stultz wrote: On Thu, 2010-09-23 at 19:31 +0200, Richard Cochran wrote: A new syscall is introduced that allows tuning of a POSIX clock. The syscall is implemented for four architectures: arm, blackfin, powerpc, and x86. The new syscall, clock_adjtime, takes two parameters, the clock ID, and a pointer to a struct timex. The semantics of the timex struct have been expanded by one additional mode flag, which allows an absolute offset correction. When specificied, the clock offset is immediately corrected by adding the given time value to the current time value. So I'd still split this patch up a little bit more. 1) Patch that implements the ADJ_SETOFFSET (*and its implementation*) in do_adjtimex. 2) Patch that adds the new syscall and clock_id multiplexing. 3) Patches that wire it up to the rest of the architectures (there's still a bunch missing here). I was not sure what the policy is about adding syscalls. Is it the syscall author's responsibility to add it into every arch? The last time (see a2e2725541fad7) the commit only added half of some archs, and ignored others. In my patch, the syscall *really* works on the archs that are present in the patch. (Actually, I did not test blackfin, since I don't have one, but I included it since I know they have a PTP hardware clock.) +static inline int common_clock_adj(const clockid_t which_clock, struct timex *t) +{ + if (CLOCK_REALTIME == which_clock) + return do_adjtimex(t); + else + return -EOPNOTSUPP; +} Would it make sense to point to the do_adjtimex() in the k_clock definition for CLOCK_REALTIME rather then conditionalizing it here? But what about CLOCK_MONOTONIC_RAW, for example? Does it make sense to allow it to be adjusted? Thanks, Richard ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.
On Fri, Sep 24, 2010 at 08:03:43AM +1000, Benjamin Herrenschmidt wrote: On Thu, 2010-09-23 at 19:31 +0200, Richard Cochran wrote: A new syscall is introduced that allows tuning of a POSIX clock. The syscall is implemented for four architectures: arm, blackfin, powerpc, and x86. The new syscall, clock_adjtime, takes two parameters, the clock ID, and a pointer to a struct timex. The semantics of the timex struct have been expanded by one additional mode flag, which allows an absolute offset correction. When specificied, the clock offset is immediately corrected by adding the given time value to the current time value. Any reason why you CC'ed device-tree discuss ? This list is getting way too much unrelated stuff, which I find annoying, it would be nice if we were all a bit more careful here with our CC lists. Sorry, I only added device-tree because some one asked me to do so. http://marc.info/?l=linux-netdevm=127273157912358 I'll leave it off next time. Thanks, Richard ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] spi_mpc8xxx: issue with using definition of pram in Device Tree
Hello, On Fri, Sep 24, 2010 at 09:20:27AM +0200, LEROY Christophe wrote: The issue is that cpm_muram_alloc_fixed() allocates memory from the general purpose muram area (from 0x0 to 0x1bff). Here we need to return a pointer to the parameter RAM, which is located somewhere starting at 0x1c00. It is not a dynamic allocation that is required here but only to point on the correct location in the parameter RAM. For the CPM2, I don't know. I'm working with a MPC866. Attached is a previous discussion on the subject where I explain a bit more in details the issue. The patch looks OK, I think. Doesn't explain why that worked on MPC8272 (CPM2) and MPC8560 (also CPM2) machines though. But here's my guess (I no longer have these boards to test it): On 8272 I used this node: + s...@4c0 { + #address-cells = 1; + #size-cells = 0; + compatible = fsl,cpm2-spi, fsl,spi; + reg = 0x11a80 0x40 0x89fc 0x2; On that SOC there are two muram data regions 0x0..0x2000 and 0x9000..0x9100. Note that we actually don't want data regions, and the only reason why that worked is that sysdev/cpm_common.c maps muram(0)..muram(max). Thanks, -- Anton Vorontsov email: cbouatmai...@gmail.com irc://irc.freenode.net/bd2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
On Thu, Sep 23, 2010 at 12:53:20PM -0500, Christoph Lameter wrote: On Thu, 23 Sep 2010, Richard Cochran wrote: 3.3 Synchronizing the Linux System Time One could offer a PHC as a combined clock source and clock event device. The advantage of this approach would be that it obviates the need for synchronization when the PHC is selected as the system timer. However, some PHCs, namely the PHY based clocks, cannot be used in this way. Why not? Do PHY based clock not at least provide a counter that increments in synchronized intervals throughout the network? The counter in the PHY is accessed via the MDIO bus. One 16 bit read takes anywhere from 25 to 40 microseconds. Reading the 64 bit time value requires four reads, so we're talking about 100 to 160 microseconds, just for a single time reading. In addition to that, reading MDIO bus can sleep. So, we can't (in general) to offer PHCs as clock sources. Instead, the patch set provides a way to offer a Pulse Per Second (PPS) event from the PHC to the Linux PPS subsystem. A user space application can read the PPS events and tune the system clock, just like when using other external time sources like radio clocks or GPS. User space is subject to various latencies created by the OS etc. I would that in order to have fine grained (read microsecond) accurary we would have to run the portions that are relevant to obtaining the desired accuracy in the kernel. The time-critical operations are all performed in hardware (packet timestamp), or in kernel space (input PPS timestamp). User space only runs the servo (using hardware or kernel timestamps as input) and performs the clock correction. With a sample rate of 1 PPS, the small user space induced delay (a few dozen microseconds) between sample time and clock correction is not an issue. Thanks, Richard ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
On Thu, Sep 23, 2010 at 02:17:36PM -0500, Christoph Lameter wrote: On Thu, 23 Sep 2010, Richard Cochran wrote: + These properties set the operational parameters for the PTP + clock. You must choose these carefully for the clock to work right. + Here is how to figure good values: + + TimerOsc = system clock MHz + tclk_period = desired clock period nanoseconds + NominalFreq = 1000 / tclk_period MHz + FreqDivRatio = TimerOsc / NominalFreq (must be greater that 1.0) + tmr_add = ceil(2^32 / FreqDivRatio) + OutputClock = NominalFreq / tmr_prsc MHz + PulseWidth = 1 / OutputClockmicroseconds + FiperFreq1 = desired frequency in Hz + FiperDiv1= 100 * OutputClock / FiperFreq1 + tmr_fiper1 = tmr_prsc * tclk_period * FiperDiv1 - tclk_period + max_adj = 10 * (FreqDivRatio - 1.0) - 1 Great stuff for clock synchronization... + The calculation for tmr_fiper2 is the same as for tmr_fiper1. The + driver expects that tmr_fiper1 will be correctly set to produce a 1 + Pulse Per Second (PPS) signal, since this will be offered to the PPS + subsystem to synchronize the Linux clock. Argh. And conceptually completely screwed up. Why go through the PPS subsystem if you can directly tune the system clock based on a number of the cool periodic clock features that you have above? See how the other clocks do that easily? Look into drivers/clocksource. Add it there. Please do not introduce useless additional layers for clock sync. Load these ptp clocks like the other regular clock modules and make them sync system time like any other clock. Really guys: I want a PTP solution! Now! And not some idiotic additional kernel layers that just pass bits around because its so much fun and screws up clock accurary in due to the latency noise introduced while having so much fun with the bits. (Sorry if this message comes twice. Mutt/Gmail flaked out again.) I think you misunderstood this particular patch. The device tree parameters are really just internal driver stuff. When you use the eTSEC, you must make some design choices at the same time as you plan your board. The proper values for some of the eTSEC registers are based on these design choices. Since the Freescale documentation is a bit thin on this, I added a few notes to help my fellow board designers. Because these values are closely related to the board itself, I think that it is nicer to configure them via the device tree than using either CONFIG_ variables or platform data. Richard ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop
On Thu, 23 Sep 2010 15:05:59 -0700, Randy Dunlap wrote: On Thu, 23 Sep 2010 22:16:32 +0200 Mikael Pettersson wrote: Randy Dunlap writes: No kconfig warnings? Not that I recall. I can check tomorrow if necessary. No kconfig warnings. I checked with your .config file. Please post your full .config file. Just a matter of module i2c-core calls of_ functions and module of_i2c calls i2c_ functions. Hmph. Something for Grant, Jean, and Ben to work out. As far as I can see this is caused by this commit from Grant: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5 Mikael, can you please try reverting this patch and see if it solves your problem? -- Jean Delvare ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: ppc44x - how do i optimize driver for tlb hits
On Fri, Sep 24, 2010 at 02:43:52PM +1000, Benjamin Herrenschmidt wrote: The DMA is what I use in the real world case to get data into and out of these buffers. However, I can disable the DMA completely and do only the kmalloc. In this case I still see the same poor performance. My prefetching is part of my algo using the dcbt instructions. I know the instructions are effective b/c without them the algo is much less performant. So yes, my prefetches are explicit. Could be some effect of the cache structure, L2 cache, cache geometry (number of ways etc...). You might be able to alleviate that by changing the stride of your prefetch. Unfortunately, I'm not familiar enough with the 440 micro architecture and its caches to be able to help you much here. Also, doesn't kmalloc have a limit to the size of the request it will let you allocate? I know in the distant past you could allocate 128K with kmalloc, and 2M with an explicit call to get_free_pages. Anything larger than that had to use vmalloc. The limit might indeed be higher now, but a 4MB kmalloc buffer sounds very large, given that it would be contiguous pages. Two of them even less so. Ok, I will give that a try ... in addition, is there an easy way to use any sort of gprof like tool to see the system performance? What about looking at the 44x performance counters in some meaningful way? All the experiments point to the fetching being slower in the full program as opposed to the algo in a testbench, so I want to determine what it is that could cause that. Does it have any useful performance counters ? I didn't think it did but I may be mistaken. No, it doesn't. josh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] irqbalance, powerpc: add IRQs without settable SMP affinity to banned list
On Fri, Sep 24, 2010 at 04:56:34PM +1000, Michael Neuling wrote: size_t size =3D 0; FILE *file; sprintf(buf, /proc/irq/%i/smp_affinity, number); - file =3D fopen(buf, r); + file =3D fopen(buf, r+); if (!file) continue; if (getline(line, size, file)=3D=3D0) { @@ -89,7 +89,14 @@ continue; } cpumask_parse_user(line, strlen(line), irq-mask); - fclose(file); + /* +* Check that we can write the affinity, if +* not take it out of the list. +*/ + if (fputs(line, file) =3D=3D EOF) + can_set =3D 0; This is maybe a nit, but writing to the affinity file can fail for a few different reasons, some of them permanent, some transient. For instance,= if we're in a memory constrained condition temporarily irq_affinity_proc_wri= te might return -ENOMEM. =20 Yeah true, usually followed shortly by your kernel going so far into swap you never get it back, or OOMing, but I guess it's possible. Might it be better to modify this code so that, instead of using fputs to merge the various errors into an EOF, we use some other= write method that lets us better determine the error and selectively ban the in= terrupt only for those errors which we consider permanent? Yep. It seems fputs() gives you know way to get the actual error from write(), so it looks we'll need to switch to open/write, but that's probably not so terrible. fclose inherits the error from fputs and it sets errno correctly. Below uses this to catch only EIO errors and mark them for the banned list. Mikey irqbalance, powerpc: add IRQs without settable SMP affinity to banned list On pseries powerpc, IPIs are registered with an IRQ number so /proc/interrupts looks like this on a 2 core/2 thread machine: CPU0 CPU1 CPU2 CPU3 16:316428232905141138794 983121 XICS Level IPI 18:2605674 0 304994 0 XICS Level lan0 30: 400057 0 169209 0 XICS Level ibmvscsi LOC: 133734 77250 106425 91951 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts CNT: 0 0 0 0 Performance monitoring interrupts MCE: 0 0 0 0 Machine check exceptions Unfortunately this means irqbalance attempts to set the affinity of IPIs which is not possible. So in the above case, when irqbalance is in performance mode due to heavy IPI, lan0 and ibmvscsi activity, it sometimes attempts to put the IPIs on one core (CPU01) and lan0 and ibmvscsi on the other core (CPU23). This is suboptimal as we want lan0 and ibmvscsi to be on separate cores and IPIs to be ignored. When irqblance attempts writes to the IPI smp_affinity (ie. /proc/irq/16/smp_affinity in the above example) it fails with an EIO but irqbalance currently ignores this. This patch catches these write fails and in this case adds that IRQ number to the banned IRQ list. This will catch the above IPI case and any other IRQ where the SMP affinity can't be set. Tested on POWER6, POWER7 and x86. Signed-off-by: Michael Neuling mi...@neuling.org Index: irqbalance/irqlist.c === --- irqbalance.orig/irqlist.c +++ irqbalance/irqlist.c @@ -28,6 +28,7 @@ #include unistd.h #include sys/types.h #include dirent.h +#include errno.h #include types.h #include irqbalance.h @@ -67,7 +68,7 @@ DIR *dir; struct dirent *entry; char *c, *c2; - int nr , count = 0; + int nr , count = 0, can_set = 1; char buf[PATH_MAX]; sprintf(buf, /proc/irq/%i, number); dir = opendir(buf); @@ -80,7 +81,7 @@ size_t size = 0; FILE *file; sprintf(buf, /proc/irq/%i/smp_affinity, number); - file = fopen(buf, r); + file = fopen(buf, r+); if (!file) continue; if (getline(line, size, file)==0) { @@ -89,7 +90,13 @@ continue; } cpumask_parse_user(line, strlen(line), irq-mask); - fclose(file); + /* + * Check that we can write the
Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop
Jean Delvare writes: On Thu, 23 Sep 2010 15:05:59 -0700, Randy Dunlap wrote: On Thu, 23 Sep 2010 22:16:32 +0200 Mikael Pettersson wrote: Randy Dunlap writes: No kconfig warnings? Not that I recall. I can check tomorrow if necessary. No kconfig warnings. I checked with your .config file. Please post your full .config file. Just a matter of module i2c-core calls of_ functions and module of_i2c calls i2c_ functions. Hmph. Something for Grant, Jean, and Ben to work out. As far as I can see this is caused by this commit from Grant: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5 Mikael, can you please try reverting this patch and see if it solves your problem? Yes, reverting the above commit from 2.6.36-rc5 eliminated the warnings, and I was able to insmod the i2c-{core,dev,powermac}.ko modules. /Mikael ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: MPC8641D PEX: programming OWBAR in Endpoint mode?
On Fri, 2010-09-24 at 07:09 +0200, Chen, Tiejun wrote: Right but this should be done for RC mode, not for EP mode we're discussing. Tiejun According to the Freescale documentation, outbound is just as valid for endpoint as for root complex - indeed, to generate MSIs from software REQUIRES programming an outbound ATMU to access the host's APIC. Moreover, ANY PCI endpoint SHOULD be able to do bus master access, and that is done by the outbound ATMUs. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
However, if the clock selected by the BMC is switched off, loses its network connection..., the second best clock is selected by the BMC and becomes master. This clock may be less accurate and thus our slave clock has to switch from one notion of time to another. Is that the conflict you mentioned? No you get situations where you have policy reasons for trusting particular clocks for particular things. So you may have a PTP or NTP clock providing basic system time but also have other PTP clocks that are actually being used for synchronization work. With NTP it's not so far been a big issue - NTP isn't used for industrial high precision control and the cases we end up with multiple NTP clocks it's on a virtualised systems where it is isolated. With high precision clocks you sometimes want to honour a specific PTP time source and use it rather than try and merge it with your other time sources (which may differ from the equipment elsewhere). What matters is things like all the parts of a several mile long conveyor belt of hot steel slab stopping at the same moment [1]. In lots of control applications you've got assorted different time planes which wish to talk their own time and you have to accept it, so we need to support that kind of use. I agree entirely the normal boring 'I installed my distro and..' case for PTP or for NTP is merging all the sources, running the algorithm and using the system time for it. Likewise almost all normal application code will be watching system time. Alan [1] Which was my first encounter with writing Vax/VMS assembly language ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop
Hi Mikael, On Fri, 24 Sep 2010 12:50:01 +0200, Mikael Pettersson wrote: Jean Delvare writes: As far as I can see this is caused by this commit from Grant: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5 Mikael, can you please try reverting this patch and see if it solves your problem? Yes, reverting the above commit from 2.6.36-rc5 eliminated the warnings, and I was able to insmod the i2c-{core,dev,powermac}.ko modules. Thanks for testing and reporting. Grant, unless you come up with a fix very quickly, I'll have to revert 959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5 for 2.6.36. -- Jean Delvare ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: ppc44x - how do i optimize driver for tlb hits
On Fri, Sep 24, 2010 at 06:30:34AM -0400, Josh Boyer wrote: On Fri, Sep 24, 2010 at 02:43:52PM +1000, Benjamin Herrenschmidt wrote: The DMA is what I use in the real world case to get data into and out of these buffers. However, I can disable the DMA completely and do only the kmalloc. In this case I still see the same poor performance. My prefetching is part of my algo using the dcbt instructions. I know the instructions are effective b/c without them the algo is much less performant. So yes, my prefetches are explicit. Could be some effect of the cache structure, L2 cache, cache geometry (number of ways etc...). You might be able to alleviate that by changing the stride of your prefetch. My original theory was that it was having lots of cache misses. But since the algorithm works standalone fast and uses large enough buffers (4MB), much of the cache is flushed and replaced with my data. The cache is 32K, 8 way, 32b/line. I've crafted the algorithm to use those parameters. Unfortunately, I'm not familiar enough with the 440 micro architecture and its caches to be able to help you much here. Also, doesn't kmalloc have a limit to the size of the request it will let you allocate? I know in the distant past you could allocate 128K with kmalloc, and 2M with an explicit call to get_free_pages. Anything larger than that had to use vmalloc. The limit might indeed be higher now, but a 4MB kmalloc buffer sounds very large, given that it would be contiguous pages. Two of them even less so. I thought so too, but at least in the current implementation we found empirically that we could kmalloc up to but no more than 4MB. We have also tried an approach in user memory and then using get_user_pages and building a scatter-gather. We found that the compare code doesn't perform any better. I suppose another option is to to use the kernel profiling option I always see but have never used. Is that a viable option to figure out what is happening here? ayman ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
On Thu, Sep 23, 2010 at 09:36:54PM +0100, Alan Cox wrote: Drop the clockid_t and swap it for a file handle like a proper Unix or Linux interface. The rest is much the same fd = open /sys/class/timesource/[whatever] various queries you may want to do to check the name etc fclock_adjtime(fd, ...) Okay, but lets extend the story: clock_getttime(fd, ...); clock_settime(fd, ...); timer_create(fd, ...); Can you agree to that as well? (We would need to ensure that 'fd' avoids the range 0 to MAX_CLOCKS). Richard ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
On Fri, 24 Sep 2010 15:14:07 +0200 Richard Cochran richardcoch...@gmail.com wrote: On Thu, Sep 23, 2010 at 09:36:54PM +0100, Alan Cox wrote: Drop the clockid_t and swap it for a file handle like a proper Unix or Linux interface. The rest is much the same fd = open /sys/class/timesource/[whatever] various queries you may want to do to check the name etc fclock_adjtime(fd, ...) Okay, but lets extend the story: clock_getttime(fd, ...); clock_settime(fd, ...); timer_create(fd, ...); Can you agree to that as well? (We would need to ensure that 'fd' avoids the range 0 to MAX_CLOCKS). You can't do that avoiding as you might like because the behaviour of file handle numbering is defined by the standards. Hence the f* versions of the calls (and of lots of other stuff) Whether you add new syscalls or do the fd passing using flags and hide the ugly bits in glibc is another question. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
You can't do that avoiding as you might like because the behaviour of file handle numbering is defined by the standards. Hence the f* versions of the calls (and of lots of other stuff) Whether you add new syscalls or do the fd passing using flags and hide the ugly bits in glibc is another question. To add an example of what I mean you might end up defining CLOCK_FD to indicate to use the fd in the struct, but given syscalls are trivial codewise and would end up as fclock_foo(int fd, blah) { clock = fd_to_clock(fd); if (error) return error clock_do_foo(clock, blah); clock_put(clock); } and clock_foo(int posixid, blah) { clock = posix_to_clock(posixid) ... rest same } as wrappers it seems hardly worth adding ugly hacks ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop
Jean Delvare kh...@linux-fr.org wrote: Hi Mikael, On Fri, 24 Sep 2010 12:50:01 +0200, Mikael Pettersson wrote: Jean Delvare writes: As far as I can see this is caused by this commit from Grant: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5 Mikael, can you please try reverting this patch and see if it solves your problem? Yes, reverting the above commit from 2.6.36-rc5 eliminated the warnings, and I was able to insmod the i2c-{core,dev,powermac}.ko modules. Thanks for testing and reporting. Grant, unless you come up with a fix very quickly, I'll have to revert 959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5 for 2.6.36. I'll get a fix out today. g. -- Jean Delvare -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
On Thu, Sep 23, 2010 at 12:38:53PM -0700, john stultz wrote: On Thu, 2010-09-23 at 19:30 +0200, Richard Cochran wrote: /sys/class/timesource/name/id /sys/class/ptp/ptp_clock_X/id So yea, I'm not a fan of the timesource sysfs interface. One, I think the name is poor (posix_clocks or something a little more specific would be an improvement), and second, I don't like the dictionary interface, where one looks up the clock by name. Instead, I think having the id hanging off the class driver is much better, as it allows mapping the actual hardware to the id more clearly. So I'd drop the timesource listing. And maybe change id to clock_id so its a little more clear what the id is for. Okay, I will drop /sys/class/timesource (hope Alan Cox agrees :) I threw it out there mostly for the sake of discussion. I imagined that there could be other properties in that directory, like time scale (TAI, UTC, etc). But it seems like we don't really need anything in that direction. 3.3 Synchronizing the Linux System Time One could offer a PHC as a combined clock source and clock event device. The advantage of this approach would be that it obviates the need for synchronization when the PHC is selected as the system timer. However, some PHCs, namely the PHY based clocks, cannot be used in this way. Again, I'd scratch this. Okay, I only wanted to preempt the question which people are asking all the time: why can't it work with the system clock transparently? Instead, the patch set provides a way to offer a Pulse Per Second (PPS) event from the PHC to the Linux PPS subsystem. A user space application can read the PPS events and tune the system clock, just like when using other external time sources like radio clocks or GPS. Forgive me for a bit of a tangent here: So while I think this PPS method is a neat idea, I'm a little curious how much of a difference the PPS method for syncing the clock would be over just a simple reading of the two clocks and correcting the offset. It seems much of it depends on the read latency of the PTP hardware vs the interrupt latency. Also the PTP clock granularity would effect the read accuracy (like on the RTC, you don't really know how close to the second boundary you are). Have you done any such measurements between the two methods? I have not yet tested how well the PPS method works, but I expect at least as good results as when using a GPS. I just wonder if it would actually be something noticeable, and if its not, how much lighter this patch-set would be without the PPS connection. As you say, the problem with just reading two clocks at nearly the same time is that you have two uncertain operations. If you use a PPS, then there is only one clock to read, and that clock is the system clock, which hopefully is not too slow to read! In addition, PHY reads can sleep, and that surely won't work. Even with MAC PHCs, reading outside of interrupt context makes you vulnerable to other interrupts. Again, this isn't super critical, just trying to make sure we don't end up adding a bunch of code that doesn't end up being used. The PPS hooks are really only just a few lines of code. The great advantage of a PPS approach over and ad-hoc read two clocks and compare, is that, with a steady, known sample rate, you can analyze and predict your control loop behavior. There is lots of literature available on how to do it. IMHO, that is the big weakness of the timecompare.c stuff used in the current IGB driver. Also PPS interrupts are awfully frequent, so systems concerned with power-saving and deep idles probably would like something that could be done at a more coarse interval. We could always make the pulse rate programmable, for power-saving applications. 4.1 Supported Hardware Clocks == + Standard Linux system timer This driver exports the standard Linux timer as a PTP clock. Although this duplicates CLOCK_REALTIME, the code serves as a simple example for driver development and lets people who without special hardware try the new API. Still not a fan of this one, figure the app should handle the special case where there are no PTP clocks and just use CLOCK_REALTIME rather then funneling CLOCK_REALTIME through the PTP interface. It is really just as an example and for people who want to test driver the API. It can surely be removed before the final version... Thanks for your comments, Richard ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
Instead, I think having the id hanging off the class driver is much better, as it allows mapping the actual hardware to the id more clearly. So I'd drop the timesource listing. And maybe change id to clock_id so its a little more clear what the id is for. Okay, I will drop /sys/class/timesource (hope Alan Cox agrees :) It makes sense to hang anything off the physical id I threw it out there mostly for the sake of discussion. I imagined that there could be other properties in that directory, like time scale (TAI, UTC, etc). But it seems like we don't really need anything in that direction. They can still hang off the physical device. Thats really a detail interrupts are awfully frequent, so systems concerned with power-saving and deep idles probably would like something that could be done at a more coarse interval. We could always make the pulse rate programmable, for power-saving applications. I would expect the kernel drivers to be responsible for - Turning off when they can - Picking rates that are power optimal for the requirement The latter is a bit interesting as I don't see anything in any of the timer APIs to express accuracy (a problem we have in kernel too). Historically it simply hasn't mattered. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] De-couple sysfs memory directories from memory sections
On 09/23/2010 01:40 PM, Balbir Singh wrote: * Nathan Fontenot nf...@austin.ibm.com [2010-09-22 09:15:43]: This set of patches decouples the concept that a single memory section corresponds to a single directory in /sys/devices/system/memory/. On systems with large amounts of memory (1+ TB) there are performance issues related to creating the large number of sysfs directories. For a powerpc machine with 1 TB of memory we are creating 63,000+ directories. This is resulting in boot times of around 45-50 minutes for systems with 1 TB of memory and 8 hours for systems with 2 TB of memory. With this patch set applied I am now seeing boot times of 5 minutes or less. The root of this issue is in sysfs directory creation. Every time a directory is created a string compare is done against all sibling directories to ensure we do not create duplicates. The list of directory nodes in sysfs is kept as an unsorted list which results in this being an exponentially longer operation as the number of directories are created. The solution solved by this patch set is to allow a single directory in sysfs to span multiple memory sections. This is controlled by an optional architecturally defined function memory_block_size_bytes(). The default definition of this routine returns a memory block size equal to the memory section size. This maintains the current layout of sysfs memory directories as it appears to userspace to remain the same as it is today. For architectures that define their own version of this routine, as is done for powerpc in this patchset, the view in userspace would change such that each memoryXXX directory would span multiple memory sections. The number of sections spanned would depend on the value reported by memory_block_size_bytes. In both cases a new file 'end_phys_index' is created in each memoryXXX directory. This file will contain the physical id of the last memory section covered by the sysfs directory. For the default case, the value in 'end_phys_index' will be the same as in the existing 'phys_index' file. What does this mean for memory hotplug or hotunplug? Memory hotplug will function on a memory block size basis. For architectures that do not define their own memory_block_size_bytes() routine, they will get the default size and everything will work the same as it does today. For architectures that define their own memory_block_size_bytes() routine and have multiple memory sections per memory block, hotplug operations will add or remove all of the memory sections in the memory memory block. -Nathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM pathfor PTE_64BIT
On Fri, 24 Sep 2010 07:04:28 +0200 Chen, Tiejun tiejun.c...@windriver.com wrote: -Original Message- From: linuxppc-dev-bounces+tiejun.chen=windriver@lists.ozlabs.or g [mailto:linuxppc-dev-bounces+tiejun.chen=windriver@lists.o zlabs.org] On Behalf Of Benjamin Herrenschmidt Sent: Friday, September 24, 2010 5:59 AM To: Scott Wood Cc: Gortmaker, Paul; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM pathfor PTE_64BIT On Thu, 2010-09-23 at 15:33 -0500, Scott Wood wrote: I don't see a generic accessor that can test PTE flags for user access -- in the absence of one, I guess we need an ifdef here. Or at least put in a comment so anyone who adds a userspace use knows they need to fix it. We could make up one in powerpc arch at least #define pte_user(val) ((val _PAGE_USER) == _PAGE_USER) Looks good. Ben and Scott, But for the patched issue we're discussing we have to do #ifdef that as my original modification. Right? Or do you have other suggestion? Then I can improve that as v2. Ben's version should work without any ifdef, since it makes sure all bits of _PAGE_USER are set. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] spi_mpc8xxx: issue with using definition of pram in Device Tree
On Fri, 24 Sep 2010 01:10:06 -0600 Grant Likely grant.lik...@secretlab.ca wrote: On Thu, Sep 16, 2010 at 09:05:03AM +0200, christophe leroy wrote: This patch applies to 2.6.34.7 and 2.6.35.4 It fixes an issue during the probe for CPM1 with definition of parameter ram from DTS Signed-off-by: christophe leroy christophe.le...@c-s.fr I'm sorry, I don't understand the fix from the given description. What is the problem, and why is cpm_muram_alloc_fixed() the wrong thing to call on CPM1? Does CPM2 still need it? I don't see how cpm_muram_alloc_fixed() can be used safely at all. If you need a fixed address, it shouldn't be part of the general allocation pool, or something else might get it first. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] spi_mpc8xxx: issue with using definition of pram in Device Tree
On Fri, 24 Sep 2010 11:57:40 +0400 Anton Vorontsov cbouatmai...@gmail.com wrote: Doesn't explain why that worked on MPC8272 (CPM2) and MPC8560 (also CPM2) machines though. But here's my guess (I no longer have these boards to test it): On 8272 I used this node: + s...@4c0 { + #address-cells = 1; + #size-cells = 0; + compatible = fsl,cpm2-spi, fsl,spi; + reg = 0x11a80 0x40 0x89fc 0x2; On that SOC there are two muram data regions 0x0..0x2000 and 0x9000..0x9100. Note that we actually don't want data regions, and the only reason why that worked is that sysdev/cpm_common.c maps muram(0)..muram(max). Wouldn't it still fail the rh_alloc_fixed call? -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT
[Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT] On 24/09/2010 (Fri 07:59) Benjamin Herrenschmidt wrote: On Thu, 2010-09-23 at 15:33 -0500, Scott Wood wrote: I don't see a generic accessor that can test PTE flags for user access -- in the absence of one, I guess we need an ifdef here. Or at least put in a comment so anyone who adds a userspace use knows they need to fix it. We could make up one in powerpc arch at least #define pte_user(val) ((val _PAGE_USER) == _PAGE_USER) would do I've put the above into pte-common.h, restored the deleted code block which now uses pte_user() and I've updated the commit header to match. Passes sanity boot test on an sbc8548 both with and without PTE_64BIT. Thanks for the feedback. Paul. From d48ebb58b8214f9faec775a5e06902f638f165cf Mon Sep 17 00:00:00 2001 From: Tiejun Chen tiejun.c...@windriver.com Date: Tue, 21 Sep 2010 19:31:31 +0800 Subject: [PATCH] powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT There exists a four line chunk of code, which when configured for 64 bit address space, can incorrectly set certain page flags during the TLB creation. It turns out that this is code which isn't used, but might still serve a purpose. Since it isn't obvious why it exists or why it causes problems, the below description covers both in detail. For powerpc bootstrap, the physical memory (at most 768M), is mapped into the kernel space via the following path: MMU_init() | + adjust_total_lowmem() | + map_mem_in_cams() | + settlbcam(i, virt, phys, cam_sz, PAGE_KERNEL_X, 0); On settlbcam(), the kernel will create TLB entries according to the flag, PAGE_KERNEL_X. settlbcam() { ... TLBCAM[index].MAS1 = MAS1_VALID | MAS1_IPROT | MAS1_TSIZE(tsize) | MAS1_TID(pid); ^ These entries cannot be invalidated by the kernel since MAS1_IPROT is set on TLB property. ... if (flags _PAGE_USER) { TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR; TLBCAM[index].MAS3 |= ((flags _PAGE_RW) ? MAS3_UW : 0); } For classic BookE (flags _PAGE_USER) is 'zero' so it's fine. But on boards like the the Freescale P4080, we want to support 36-bit physical address on it. So the following options may be set: CONFIG_FSL_BOOKE=y CONFIG_PTE_64BIT=y CONFIG_PHYS_64BIT=y As a result, boards like the P4080 will introduce PTE format as Book3E. As per the file: arch/powerpc/include/asm/pgtable-ppc32.h * #elif defined(CONFIG_FSL_BOOKE) defined(CONFIG_PTE_64BIT) * #include asm/pte-book3e.h So PAGE_KERNEL_X is __pgprot(_PAGE_BASE | _PAGE_KERNEL_RWX) and the book3E version of _PAGE_KERNEL_RWX is defined with: (_PAGE_BAP_SW | _PAGE_BAP_SR | _PAGE_DIRTY | _PAGE_BAP_SX) Note the _PAGE_BAP_SR, which is also defined in the book3E _PAGE_USER: #define _PAGE_USER(_PAGE_BAP_UR | _PAGE_BAP_SR) /* Can be read */ So the possibility exists to wrongly assign the user MAS3_URWX bits to kernel (PAGE_KERNEL_X) address space via the following code fragment: if (flags _PAGE_USER) { TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR; TLBCAM[index].MAS3 |= ((flags _PAGE_RW) ? MAS3_UW : 0); } Here is a dump of the TLB info from Simics with the above code present: -- L2 TLB1 GT SSS UUU V I Row Logical PhysicalSS TLPID TID WIMGE XWR XWR F P V - - --- -- - - - --- --- - - - 0 c000-cfff 0-00fff 00 0 0 M XWR XWR 0 1 1 1 d000-dfff 01000-01fff 00 0 0 M XWR XWR 0 1 1 2 e000-efff 02000-02fff 00 0 0 M XWR XWR 0 1 1 Actually this conditional code was used for two legacy functions: 1: support KGDB to set break point. KGDB already dropped this; now uses its core write to set break point. 2: io_block_mapping() to create TLB in segmentation size (not PAGE_SIZE) for device IO space. This use case is also removed from the latest PowerPC kernel. However, there may still be a use case for it in the future, like large user pages, so we can't remove it entirely. As an alternative, we match on all bits of _PAGE_USER instead of just any bits, so the case where just _PAGE_BAP_SR is set can't sneak through. With this done, the TLB appears without U having XWR as below: --- L2 TLB1 GT SSS UUU V I Row Logical PhysicalSS TLPID TID WIMGE XWR XWR F P V - - --- -- - - - --- --- - - - 0 c000-cfff 0-00fff 00 0 0 M XWR 0 1 1 1 d000-dfff
Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.
On Fri, 2010-09-24 at 09:29 +0200, Richard Cochran wrote: On Thu, Sep 23, 2010 at 12:48:51PM -0700, john stultz wrote: So I'd still split this patch up a little bit more. 1) Patch that implements the ADJ_SETOFFSET (*and its implementation*) in do_adjtimex. 2) Patch that adds the new syscall and clock_id multiplexing. 3) Patches that wire it up to the rest of the architectures (there's still a bunch missing here). I was not sure what the policy is about adding syscalls. Is it the syscall author's responsibility to add it into every arch? The last time (see a2e2725541fad7) the commit only added half of some archs, and ignored others. In my patch, the syscall *really* works on the archs that are present in the patch. (Actually, I did not test blackfin, since I don't have one, but I included it since I know they have a PTP hardware clock.) I'm not sure about policy, but I think for completeness sake you should make sure every arch supports a new syscall. You're not expected to be able to test every one, but getting the basic support patch sent to maintainers should be done. +static inline int common_clock_adj(const clockid_t which_clock, struct timex *t) +{ + if (CLOCK_REALTIME == which_clock) + return do_adjtimex(t); + else + return -EOPNOTSUPP; +} Would it make sense to point to the do_adjtimex() in the k_clock definition for CLOCK_REALTIME rather then conditionalizing it here? But what about CLOCK_MONOTONIC_RAW, for example? -EOPNOTSUPP Does it make sense to allow it to be adjusted? No. I think only CLOCK_REALTIME would make sense of the existing clocks. I'm just suggesting you conditionalize it from the function pointer, rather then in the common function. thanks -john ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/2] powerpc/47x TLB optimization patches
These two patches reduce the frequency that the tlb caches are flushed in hardware. Both the normal tlb cache and the shadow tlb cache, which separates the tlbs for data and instruction access (dTLB and iTLB). Dave Kleikamp (2): 476: Set CCR2[DSTI] to prevent isync from flushing shadow TLB ppc: lazy flush_tlb_mm for nohash architectures arch/powerpc/include/asm/reg_booke.h |4 + arch/powerpc/kernel/head_44x.S| 25 ++ arch/powerpc/mm/mmu_context_nohash.c | 154 ++--- arch/powerpc/mm/mmu_decl.h|8 ++ arch/powerpc/mm/tlb_nohash.c | 28 +- arch/powerpc/mm/tlb_nohash_low.S | 14 +++- arch/powerpc/platforms/44x/Kconfig|7 ++ arch/powerpc/platforms/44x/misc_44x.S | 26 ++ 8 files changed, 249 insertions(+), 17 deletions(-) -- 1.7.2.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] 476: Set CCR2[DSTI] to prevent isync from flushing shadow TLB
When the DSTI (Disable Shadow TLB Invalidate) bit is set in the CCR2 register, the isync command does not flush the shadow TLB (iTLB dTLB). However, since the shadow TLB does not contain context information, we want the shadow TLB flushed in situations where we are switching context. In those situations, we explicitly clear the DSTI bit before performing isync, and set it again afterward. We also need to do the same when we perform isync after explicitly flushing the TLB. Th setting of the DSTI bit is dependent on CONFIG_PPC_47x_DISABLE_SHADOW_TLB_INVALIDATE. When we are confident that the feature works as expected, the option can probably be removed. Signed-off-by: Dave Kleikamp sha...@linux.vnet.ibm.com --- arch/powerpc/include/asm/reg_booke.h |4 arch/powerpc/kernel/head_44x.S| 25 + arch/powerpc/mm/tlb_nohash_low.S | 14 +- arch/powerpc/platforms/44x/Kconfig|7 +++ arch/powerpc/platforms/44x/misc_44x.S | 26 ++ 5 files changed, 75 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/reg_booke.h b/arch/powerpc/include/asm/reg_booke.h index 667a498..a7ecbfe 100644 --- a/arch/powerpc/include/asm/reg_booke.h +++ b/arch/powerpc/include/asm/reg_booke.h @@ -120,6 +120,7 @@ #define SPRN_TLB3CFG 0x2B3 /* TLB 3 Config Register */ #define SPRN_EPR 0x2BE /* External Proxy Register */ #define SPRN_CCR1 0x378 /* Core Configuration Register 1 */ +#define SPRN_CCR2_476 0x379 /* Core Configuration Register 2 (476)*/ #define SPRN_ZPR 0x3B0 /* Zone Protection Register (40x) */ #define SPRN_MAS7 0x3B0 /* MMU Assist Register 7 */ #define SPRN_MMUCR 0x3B2 /* MMU Control Register */ @@ -188,6 +189,9 @@ #defineCCR1_DPC0x0100 /* Disable L1 I-Cache/D-Cache parity checking */ #defineCCR1_TCS0x0080 /* Timer Clock Select */ +/* Bit definitions for CCR2. */ +#define CCR2_476_DSTI 0x0800 /* Disable Shadow TLB Invalidate */ + /* Bit definitions for the MCSR. */ #define MCSR_MCS 0x8000 /* Machine Check Summary */ #define MCSR_IB0x4000 /* Instruction PLB Error */ diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S index 562305b..0c1b118 100644 --- a/arch/powerpc/kernel/head_44x.S +++ b/arch/powerpc/kernel/head_44x.S @@ -703,8 +703,23 @@ _GLOBAL(set_context) stw r4, 0x4(r5) #endif mtspr SPRN_PID,r3 +BEGIN_MMU_FTR_SECTION + b 1f +END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_47x) isync /* Force context change */ blr +1: +#ifdef CONFIG_PPC_47x + mfspr r10,SPRN_CCR2_476 + rlwinm r11,r10,0,~CCR2_476_DSTI + mtspr SPRN_CCR2_476,r11 + isync /* Force context change */ + mtspr SPRN_CCR2_476,r10 +#else /* CONFIG_PPC_47x */ +2: trap + EMIT_BUG_ENTRY 2b,__FILE__,__LINE__,0; +#endif /* CONFIG_PPC_47x */ + blr /* * Init CPU state. This is called at boot time or for secondary CPUs @@ -861,6 +876,16 @@ skpinv:addir4,r4,1 /* Increment */ isync #endif /* CONFIG_PPC_EARLY_DEBUG_44x */ + mfspr r3,SPRN_CCR2_476 +#ifdef CONFIG_PPC_47x_DISABLE_SHADOW_TLB_INVALIDATE + /* With CCR2(DSTI) set, isync does not invalidate the shadow TLB */ + orisr3,r3,ccr2_476_d...@h +#else + rlwinm r3,r3,0,~CCR2_476_DSTI +#endif + mtspr SPRN_CCR2_476,r3 + isync + /* Establish the interrupt vector offsets */ SET_IVOR(0, CriticalInput); SET_IVOR(1, MachineCheck); diff --git a/arch/powerpc/mm/tlb_nohash_low.S b/arch/powerpc/mm/tlb_nohash_low.S index b9d9fed..f28fb52 100644 --- a/arch/powerpc/mm/tlb_nohash_low.S +++ b/arch/powerpc/mm/tlb_nohash_low.S @@ -112,7 +112,11 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_47x) clrrwi r4,r3,12/* get an EPN for the hashing with V = 0 */ ori r4,r4,PPC47x_TLBE_SIZE tlbwe r4,r7,0 /* write it */ + mfspr r8,SPRN_CCR2_476 + rlwinm r9,r8,0,~CCR2_476_DSTI + mtspr SPRN_CCR2_476,r9 isync + mtspr SPRN_CCR2_476,r8 wrtee r10 blr #else /* CONFIG_PPC_47x */ @@ -180,7 +184,11 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_47x) lwz r8,0(r10) /* Load boltmap entry */ addir10,r10,4 /* Next word */ b 1b /* Then loop */ -1: isync /* Sync shadows */ +1: mfspr r9,SPRN_CCR2_476 + rlwinm r10,r9,0,~CCR2_476_DSTI + mtspr SPRN_CCR2_476,r10 + isync /* Sync shadows */ + mtspr SPRN_CCR2_476,r9 wrtee r11 #else /* CONFIG_PPC_47x */ 1: trap @@ -203,7 +211,11 @@ _GLOBAL(_tlbivax_bcast) isync /* tlbivax 0,r3 - use .long to avoid binutils deps */ .long 0x7c000624 | (r3 11) + mfspr
[PATCH 2/2] ppc: lazy flush_tlb_mm for nohash architectures
On PPC_MMU_NOHASH processors that support a large number of contexts, implement a lazy flush_tlb_mm() that switches to a free context, marking the old one stale. The tlb is only flushed when no free contexts are available. The lazy tlb flushing is controlled by the global variable tlb_lazy_flush which is set during init, dependent upon MMU_FTR_TYPE_47x. Signed-off-by: Dave Kleikamp sha...@linux.vnet.ibm.com --- arch/powerpc/mm/mmu_context_nohash.c | 154 +++--- arch/powerpc/mm/mmu_decl.h |8 ++ arch/powerpc/mm/tlb_nohash.c | 28 +- 3 files changed, 174 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/mm/mmu_context_nohash.c b/arch/powerpc/mm/mmu_context_nohash.c index ddfd7ad..87c7dc2 100644 --- a/arch/powerpc/mm/mmu_context_nohash.c +++ b/arch/powerpc/mm/mmu_context_nohash.c @@ -17,10 +17,6 @@ * TODO: * * - The global context lock will not scale very well - * - The maps should be dynamically allocated to allow for processors - * that support more PID bits at runtime - * - Implement flush_tlb_mm() by making the context stale and picking - * a new one * - More aggressively clear stale map bits and maybe find some way to * also clear mm-cpu_vm_mask bits when processes are migrated */ @@ -52,6 +48,8 @@ #include asm/mmu_context.h #include asm/tlbflush.h +#include mmu_decl.h + static unsigned int first_context, last_context; static unsigned int next_context, nr_free_contexts; static unsigned long *context_map; @@ -59,9 +57,31 @@ static unsigned long *stale_map[NR_CPUS]; static struct mm_struct **context_mm; static DEFINE_RAW_SPINLOCK(context_lock); +int tlb_lazy_flush; +static int tlb_needs_flush[NR_CPUS]; +static unsigned long *context_available_map; +static unsigned int nr_stale_contexts; + #define CTX_MAP_SIZE \ (sizeof(unsigned long) * (last_context / BITS_PER_LONG + 1)) +/* + * if another cpu recycled the stale contexts, we need to flush + * the local TLB, so that we may re-use those contexts + */ +void flush_recycled_contexts(int cpu) +{ + int i; + + if (tlb_needs_flush[cpu]) { + pr_hard([%d] flushing tlb\n, cpu); + _tlbil_all(); + for (i = cpu_first_thread_in_core(cpu); +i = cpu_last_thread_in_core(cpu); i++) { + tlb_needs_flush[i] = 0; + } + } +} /* Steal a context from a task that has one at the moment. * @@ -147,7 +167,7 @@ static unsigned int steal_context_up(unsigned int id) pr_hardcont( | steal %d from 0x%p, id, mm); /* Flush the TLB for that context */ - local_flush_tlb_mm(mm); + __local_flush_tlb_mm(mm); /* Mark this mm has having no context anymore */ mm-context.id = MMU_NO_CONTEXT; @@ -161,13 +181,19 @@ static unsigned int steal_context_up(unsigned int id) #ifdef DEBUG_MAP_CONSISTENCY static void context_check_map(void) { - unsigned int id, nrf, nact; + unsigned int id, nrf, nact, nstale; - nrf = nact = 0; + nrf = nact = nstale = 0; for (id = first_context; id = last_context; id++) { int used = test_bit(id, context_map); - if (!used) - nrf++; + int allocated = tlb_lazy_flush + test_bit(id, context_available_map); + if (!used) { + if (allocated) + nstale++; + else + nrf++; + } if (used != (context_mm[id] != NULL)) pr_err(MMU: Context %d is %s and MM is %p !\n, id, used ? used : free, context_mm[id]); @@ -179,6 +205,11 @@ static void context_check_map(void) nr_free_contexts, nrf); nr_free_contexts = nrf; } + if (nstale != nr_stale_contexts) { + pr_err(MMU: Stale context count out of sync ! (%d vs %d)\n, + nr_stale_contexts, nstale); + nr_stale_contexts = nstale; + } if (nact num_online_cpus()) pr_err(MMU: More active contexts than CPUs ! (%d vs %d)\n, nact, num_online_cpus()); @@ -189,6 +220,38 @@ static void context_check_map(void) static void context_check_map(void) { } #endif +/* + * On architectures that support a large number of contexts, the tlb + * can be flushed lazily by picking a new context and making the stale + * context unusable until a lazy tlb flush has been issued. + * + * context_available_map keeps track of both active and stale contexts, + * while context_map continues to track only active contexts. When the + * lazy tlb flush is triggered, context_map is copied to + * context_available_map, making the once-stale contexts available again + */ +static void recycle_stale_contexts(void) +{ + if
RE: [PATCH 1/2] PPC4xx: Generelizing drivers/dma/ppc4xx/adma.c
It would be really preferable to support all those platforms in a single Linux image. If technically possible, please try to move this direction. It is do-able for couple of SoCs. Other SoC DMA engines are quite a bit different. Let me first do small steps first and slowly achieve some run time Differentiation. Thanks, Marri ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH RFCv1 0/2] dma: add support for sg-to-sg transfers
This series adds support for scatterlist to scatterlist transfers to the generic DMAEngine API. I have unconditionally enabled it when the fsldma driver is used to make testing easier. This feature should probably be selected by individual drivers. This series is intended to lay the groundwork for further changes to the series titled CARMA Board Support. That series will be updated when I have time and hardware to test with. This series has not been runtime tested yet. I am posting it only to gain comments before I spend the effort to update the driver that depends on this. To help reviewers, I'd like to comment on the architecture of dma_async_memcpy_sg_to_sg(). It explicitly avoids using descriptor chaining due to the way that feature interacts with the fsldma controller's external start feature. To use the external start feature properly, the in-memory descriptor chain must not be fragmented into multiple smaller chains. This is what is achieved by submitting all descriptors without using chaining. Ira W. Snyder (2): dmaengine: add support for scatterlist to scatterlist transfers fsldma: use generic support for scatterlist to scatterlist transfers arch/powerpc/include/asm/fsldma.h | 115 ++-- drivers/dma/Kconfig |4 + drivers/dma/dmaengine.c | 119 drivers/dma/fsldma.c | 219 +++-- include/linux/dmaengine.h | 10 ++ 5 files changed, 181 insertions(+), 286 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers
This adds support for scatterlist to scatterlist DMA transfers. As requested by Dan, this is hidden behind an ifdef so that it can be selected by the drivers that need it. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/Kconfig |4 ++ drivers/dma/dmaengine.c | 119 + include/linux/dmaengine.h | 10 3 files changed, 133 insertions(+), 0 deletions(-) diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 9520cf0..f688669 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -89,10 +89,14 @@ config AT_HDMAC Support the Atmel AHB DMA controller. This can be integrated in chips such as the Atmel AT91SAM9RL. +config DMAENGINE_SG_TO_SG + bool + config FSL_DMA tristate Freescale Elo and Elo Plus DMA support depends on FSL_SOC select DMA_ENGINE + select DMAENGINE_SG_TO_SG ---help--- Enable support for the Freescale Elo and Elo Plus DMA controllers. The Elo is the DMA controller on some 82xx and 83xx parts, and the diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 9d31d5e..57ec1e5 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -972,10 +972,129 @@ dma_async_memcpy_pg_to_pg(struct dma_chan *chan, struct page *dest_pg, } EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg); +#ifdef CONFIG_DMAENGINE_SG_TO_SG +dma_cookie_t +dma_async_memcpy_sg_to_sg(struct dma_chan *chan, + struct scatterlist *dst_sg, unsigned int dst_nents, + struct scatterlist *src_sg, unsigned int src_nents, + dma_async_tx_callback cb, void *cb_param) +{ + struct dma_device *dev = chan-device; + struct dma_async_tx_descriptor *tx; + dma_cookie_t cookie = -ENOMEM; + size_t dst_avail, src_avail; + struct list_head tx_list; + size_t transferred = 0; + dma_addr_t dst, src; + size_t len; + + if (dst_nents == 0 || src_nents == 0) + return -EINVAL; + + if (dst_sg == NULL || src_sg == NULL) + return -EINVAL; + + /* get prepared for the loop */ + dst_avail = sg_dma_len(dst_sg); + src_avail = sg_dma_len(src_sg); + + INIT_LIST_HEAD(tx_list); + + /* run until we are out of descriptors */ + while (true) { + + /* create the largest transaction possible */ + len = min_t(size_t, src_avail, dst_avail); + if (len == 0) + goto fetch; + + dst = sg_dma_address(dst_sg) + sg_dma_len(dst_sg) - dst_avail; + src = sg_dma_address(src_sg) + sg_dma_len(src_sg) - src_avail; + + /* setup the transaction */ + tx = dev-device_prep_dma_memcpy(chan, dst, src, len, 0); + if (!tx) { + dev_err(dev-dev, failed to alloc desc for memcpy\n); + return -ENOMEM; + } + + /* keep track of the tx for later */ + list_add_tail(tx-entry, tx_list); + + /* update metadata */ + transferred += len; + dst_avail -= len; + src_avail -= len; + +fetch: + /* fetch the next dst scatterlist entry */ + if (dst_avail == 0) { + + /* no more entries: we're done */ + if (dst_nents == 0) + break; + + /* fetch the next entry: if there are no more: done */ + dst_sg = sg_next(dst_sg); + if (dst_sg == NULL) + break; + + dst_nents--; + dst_avail = sg_dma_len(dst_sg); + } + + /* fetch the next src scatterlist entry */ + if (src_avail == 0) { + + /* no more entries: we're done */ + if (src_nents == 0) + break; + + /* fetch the next entry: if there are no more: done */ + src_sg = sg_next(src_sg); + if (src_sg == NULL) + break; + + src_nents--; + src_avail = sg_dma_len(src_sg); + } + } + + /* loop through the list of descriptors and submit them */ + list_for_each_entry(tx, tx_list, entry) { + + /* this is the last descriptor: add the callback */ + if (list_is_last(tx-entry, tx_list)) { + tx-callback = cb; + tx-callback_param = cb_param; + } + + /* submit the transaction */ + cookie = tx-tx_submit(tx); + if (dma_submit_error(cookie)) { + dev_err(dev-dev, failed to submit desc\n);
[PATCH RFCv1 2/2] fsldma: use generic support for scatterlist to scatterlist transfers
The fsldma driver uses the DMA_SLAVE API to handle scatterlist to scatterlist DMA transfers. For quite a while now, it has been possible to mimic the operation by using the device_prep_dma_memcpy() routine intelligently. Now that the DMAEngine API has grown generic support for scatterlist to scatterlist transfers, this operation is no longer needed. The generic support is used for scatterlist to scatterlist transfers. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- arch/powerpc/include/asm/fsldma.h | 115 ++-- drivers/dma/fsldma.c | 219 +++-- 2 files changed, 48 insertions(+), 286 deletions(-) diff --git a/arch/powerpc/include/asm/fsldma.h b/arch/powerpc/include/asm/fsldma.h index debc5ed..dc0bd27 100644 --- a/arch/powerpc/include/asm/fsldma.h +++ b/arch/powerpc/include/asm/fsldma.h @@ -1,7 +1,7 @@ /* * Freescale MPC83XX / MPC85XX DMA Controller * - * Copyright (c) 2009 Ira W. Snyder i...@ovro.caltech.edu + * Copyright (c) 2009-2010 Ira W. Snyder i...@ovro.caltech.edu * * This file is licensed under the terms of the GNU General Public License * version 2. This program is licensed as is without any warranty of any @@ -11,127 +11,32 @@ #ifndef __ARCH_POWERPC_ASM_FSLDMA_H__ #define __ARCH_POWERPC_ASM_FSLDMA_H__ -#include linux/slab.h #include linux/dmaengine.h /* - * Definitions for the Freescale DMA controller's DMA_SLAVE implemention + * The Freescale DMA controller has several features that are not accomodated + * in the Linux DMAEngine API. Therefore, the generic structure is expanded + * to allow drivers to use these features. * - * The Freescale DMA_SLAVE implementation was designed to handle many-to-many - * transfers. An example usage would be an accelerated copy between two - * scatterlists. Another example use would be an accelerated copy from - * multiple non-contiguous device buffers into a single scatterlist. + * This structure should be passed into the DMAEngine routine device_control() + * as in this example: * - * A DMA_SLAVE transaction is defined by a struct fsl_dma_slave. This - * structure contains a list of hardware addresses that should be copied - * to/from the scatterlist passed into device_prep_slave_sg(). The structure - * also has some fields to enable hardware-specific features. + * chan-device-device_control(chan, DMA_SLAVE_CONFIG, (unsigned long)cfg); */ /** - * struct fsl_dma_hw_addr - * @entry: linked list entry - * @address: the hardware address - * @length: length to transfer - * - * Holds a single physical hardware address / length pair for use - * with the DMAEngine DMA_SLAVE API. - */ -struct fsl_dma_hw_addr { - struct list_head entry; - - dma_addr_t address; - size_t length; -}; - -/** * struct fsl_dma_slave - * @addresses: a linked list of struct fsl_dma_hw_addr structures + * @config: the standard Linux DMAEngine API DMA_SLAVE configuration * @request_count: value for DMA request count - * @src_loop_size: setup and enable constant source-address DMA transfers - * @dst_loop_size: setup and enable constant destination address DMA transfers * @external_start: enable externally started DMA transfers * @external_pause: enable externally paused DMA transfers - * - * Holds a list of address / length pairs for use with the DMAEngine - * DMA_SLAVE API implementation for the Freescale DMA controller. */ -struct fsl_dma_slave { +struct fsldma_slave_config { + struct dma_slave_config config; - /* List of hardware address/length pairs */ - struct list_head addresses; - - /* Support for extra controller features */ unsigned int request_count; - unsigned int src_loop_size; - unsigned int dst_loop_size; bool external_start; bool external_pause; }; -/** - * fsl_dma_slave_append - add an address/length pair to a struct fsl_dma_slave - * @slave: the struct fsl_dma_slave to add to - * @address: the hardware address to add - * @length: the length of bytes to transfer from @address - * - * Add a hardware address/length pair to a struct fsl_dma_slave. Returns 0 on - * success, -ERRNO otherwise. - */ -static inline int fsl_dma_slave_append(struct fsl_dma_slave *slave, - dma_addr_t address, size_t length) -{ - struct fsl_dma_hw_addr *addr; - - addr = kzalloc(sizeof(*addr), GFP_ATOMIC); - if (!addr) - return -ENOMEM; - - INIT_LIST_HEAD(addr-entry); - addr-address = address; - addr-length = length; - - list_add_tail(addr-entry, slave-addresses); - return 0; -} - -/** - * fsl_dma_slave_free - free a struct fsl_dma_slave - * @slave: the struct fsl_dma_slave to free - * - * Free a struct fsl_dma_slave and all associated address/length pairs - */ -static inline void fsl_dma_slave_free(struct fsl_dma_slave *slave) -{ - struct fsl_dma_hw_addr *addr, *tmp; - - if (slave) { -
Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers
On Fri, Sep 24, 2010 at 12:46 PM, Ira W. Snyder i...@ovro.caltech.edu wrote: This adds support for scatterlist to scatterlist DMA transfers. As requested by Dan, this is hidden behind an ifdef so that it can be selected by the drivers that need it. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/Kconfig | 4 ++ drivers/dma/dmaengine.c | 119 + include/linux/dmaengine.h | 10 3 files changed, 133 insertions(+), 0 deletions(-) diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 9520cf0..f688669 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -89,10 +89,14 @@ config AT_HDMAC Support the Atmel AHB DMA controller. This can be integrated in chips such as the Atmel AT91SAM9RL. +config DMAENGINE_SG_TO_SG + bool + config FSL_DMA tristate Freescale Elo and Elo Plus DMA support depends on FSL_SOC select DMA_ENGINE + select DMAENGINE_SG_TO_SG ---help--- Enable support for the Freescale Elo and Elo Plus DMA controllers. The Elo is the DMA controller on some 82xx and 83xx parts, and the diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 9d31d5e..57ec1e5 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -972,10 +972,129 @@ dma_async_memcpy_pg_to_pg(struct dma_chan *chan, struct page *dest_pg, } EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg); +#ifdef CONFIG_DMAENGINE_SG_TO_SG +dma_cookie_t +dma_async_memcpy_sg_to_sg(struct dma_chan *chan, + struct scatterlist *dst_sg, unsigned int dst_nents, + struct scatterlist *src_sg, unsigned int src_nents, + dma_async_tx_callback cb, void *cb_param) +{ + struct dma_device *dev = chan-device; + struct dma_async_tx_descriptor *tx; + dma_cookie_t cookie = -ENOMEM; + size_t dst_avail, src_avail; + struct list_head tx_list; + size_t transferred = 0; + dma_addr_t dst, src; + size_t len; + + if (dst_nents == 0 || src_nents == 0) + return -EINVAL; + + if (dst_sg == NULL || src_sg == NULL) + return -EINVAL; + + /* get prepared for the loop */ + dst_avail = sg_dma_len(dst_sg); + src_avail = sg_dma_len(src_sg); + + INIT_LIST_HEAD(tx_list); + + /* run until we are out of descriptors */ + while (true) { + + /* create the largest transaction possible */ + len = min_t(size_t, src_avail, dst_avail); + if (len == 0) + goto fetch; + + dst = sg_dma_address(dst_sg) + sg_dma_len(dst_sg) - dst_avail; + src = sg_dma_address(src_sg) + sg_dma_len(src_sg) - src_avail; + + /* setup the transaction */ + tx = dev-device_prep_dma_memcpy(chan, dst, src, len, 0); + if (!tx) { + dev_err(dev-dev, failed to alloc desc for memcpy\n); + return -ENOMEM; I don't think any dma channels gracefully handle descriptors that were prepped but not submitted. You would probably need to submit the backlog, poll for completion, and then return the error. Alternatively, the expectation is that descriptor allocations are transient, i.e. once previously submitted transactions are completed the descriptors will return to the available pool. So you could do what async_tx routines do and just poll for a descriptor. + } + + /* keep track of the tx for later */ + list_add_tail(tx-entry, tx_list); + + /* update metadata */ + transferred += len; + dst_avail -= len; + src_avail -= len; + +fetch: + /* fetch the next dst scatterlist entry */ + if (dst_avail == 0) { + + /* no more entries: we're done */ + if (dst_nents == 0) + break; + + /* fetch the next entry: if there are no more: done */ + dst_sg = sg_next(dst_sg); + if (dst_sg == NULL) + break; + + dst_nents--; + dst_avail = sg_dma_len(dst_sg); + } + + /* fetch the next src scatterlist entry */ + if (src_avail == 0) { + + /* no more entries: we're done */ + if (src_nents == 0) + break; + + /* fetch the next entry: if there are no more: done */ + src_sg = sg_next(src_sg); + if (src_sg == NULL) + break; + + src_nents--;
Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers
On Fri, Sep 24, 2010 at 01:40:56PM -0700, Dan Williams wrote: On Fri, Sep 24, 2010 at 12:46 PM, Ira W. Snyder i...@ovro.caltech.edu wrote: This adds support for scatterlist to scatterlist DMA transfers. As requested by Dan, this is hidden behind an ifdef so that it can be selected by the drivers that need it. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/Kconfig | 4 ++ drivers/dma/dmaengine.c | 119 + include/linux/dmaengine.h | 10 3 files changed, 133 insertions(+), 0 deletions(-) diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 9520cf0..f688669 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -89,10 +89,14 @@ config AT_HDMAC Support the Atmel AHB DMA controller. This can be integrated in chips such as the Atmel AT91SAM9RL. +config DMAENGINE_SG_TO_SG + bool + config FSL_DMA tristate Freescale Elo and Elo Plus DMA support depends on FSL_SOC select DMA_ENGINE + select DMAENGINE_SG_TO_SG ---help--- Enable support for the Freescale Elo and Elo Plus DMA controllers. The Elo is the DMA controller on some 82xx and 83xx parts, and the diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 9d31d5e..57ec1e5 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -972,10 +972,129 @@ dma_async_memcpy_pg_to_pg(struct dma_chan *chan, struct page *dest_pg, } EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg); +#ifdef CONFIG_DMAENGINE_SG_TO_SG +dma_cookie_t +dma_async_memcpy_sg_to_sg(struct dma_chan *chan, + struct scatterlist *dst_sg, unsigned int dst_nents, + struct scatterlist *src_sg, unsigned int src_nents, + dma_async_tx_callback cb, void *cb_param) +{ + struct dma_device *dev = chan-device; + struct dma_async_tx_descriptor *tx; + dma_cookie_t cookie = -ENOMEM; + size_t dst_avail, src_avail; + struct list_head tx_list; + size_t transferred = 0; + dma_addr_t dst, src; + size_t len; + + if (dst_nents == 0 || src_nents == 0) + return -EINVAL; + + if (dst_sg == NULL || src_sg == NULL) + return -EINVAL; + + /* get prepared for the loop */ + dst_avail = sg_dma_len(dst_sg); + src_avail = sg_dma_len(src_sg); + + INIT_LIST_HEAD(tx_list); + + /* run until we are out of descriptors */ + while (true) { + + /* create the largest transaction possible */ + len = min_t(size_t, src_avail, dst_avail); + if (len == 0) + goto fetch; + + dst = sg_dma_address(dst_sg) + sg_dma_len(dst_sg) - dst_avail; + src = sg_dma_address(src_sg) + sg_dma_len(src_sg) - src_avail; + + /* setup the transaction */ + tx = dev-device_prep_dma_memcpy(chan, dst, src, len, 0); + if (!tx) { + dev_err(dev-dev, failed to alloc desc for memcpy\n); + return -ENOMEM; I don't think any dma channels gracefully handle descriptors that were prepped but not submitted. You would probably need to submit the backlog, poll for completion, and then return the error. Alternatively, the expectation is that descriptor allocations are transient, i.e. once previously submitted transactions are completed the descriptors will return to the available pool. So you could do what async_tx routines do and just poll for a descriptor. Can you give me an example? Even some pseudocode would help. The other DMAEngine functions (dma_async_memcpy_*()) don't do anything with the descriptor if submit fails. Take for example dma_async_memcpy_buf_to_buf(). If tx-tx_submit(tx); fails, any code using it has no way to return the descriptor to the free pool. Does tx_submit() implicitly return descriptors to the free pool if it fails? + } + + /* keep track of the tx for later */ + list_add_tail(tx-entry, tx_list); + + /* update metadata */ + transferred += len; + dst_avail -= len; + src_avail -= len; + +fetch: + /* fetch the next dst scatterlist entry */ + if (dst_avail == 0) { + + /* no more entries: we're done */ + if (dst_nents == 0) + break; + + /* fetch the next entry: if there are no more: done */ + dst_sg = sg_next(dst_sg); + if (dst_sg == NULL) + break; + +
[PATCH 1/1] Add config option for batched hcalls
Add a config option for the (batched) MULTITCE and BULK_REMOVE h-calls. By default, these options are on and are beneficial for performance and throughput reasons. If disabled, the code will fall back to using less optimal TCE and REMOVE hcalls. The ability to easily disable these options is useful for some of the PREEMPT_RT related investigation and work occurring on Power. Signed-off-by: Will Schmidt will_schm...@vnet.ibm.com cc: Anton Blanchard an...@samba.org cc: Benjamin Herrenschmidt b...@kernel.crashing.org --- diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index f0e6f28..0b5e6a9 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -81,3 +81,23 @@ config DTL which are accessible through a debugfs file. Say N if you are unsure. + +config BULK_REMOVE + bool Enable BULK_REMOVE + depends on PPC_PSERIES + default y + help + Enable the BULK_REMOVE option for the hash page code. + This relies on a hcall-bulk firmware feature, and + should be enabled for performance throughput. + +config MULTITCE + bool Enable MultiTCE + depends on PPC_PSERIES + default y + help + Enable the Multi-TCE code, allowing a single hcall to + update multiple TCE entries at one time. This relies + on a hcall-multi-tce firmware feature, and should be + enabled for performance throughput. + diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c index 0a4d8c..4327064 100644 --- a/arch/powerpc/platforms/pseries/firmware.c +++ b/arch/powerpc/platforms/pseries/firmware.c @@ -51,9 +51,13 @@ firmware_features_table[FIRMWARE_MAX_FEATURES] = { {FW_FEATURE_VIO,hcall-vio}, {FW_FEATURE_RDMA, hcall-rdma}, {FW_FEATURE_LLAN, hcall-lLAN}, +#if defined(CONFIG_BULK_REMOVE) {FW_FEATURE_BULK_REMOVE,hcall-bulk}, +#endif {FW_FEATURE_XDABR, hcall-xdabr}, +#if defined(CONFIG_MULTITCE) {FW_FEATURE_MULTITCE, hcall-multi-tce}, +#endif {FW_FEATURE_SPLPAR, hcall-splpar}, }; /* Build up the firmware features bitmask using the contents of ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers
On Fri, 2010-09-24 at 14:24 -0700, Ira W. Snyder wrote: On Fri, Sep 24, 2010 at 01:40:56PM -0700, Dan Williams wrote: I don't think any dma channels gracefully handle descriptors that were prepped but not submitted. You would probably need to submit the backlog, poll for completion, and then return the error. Alternatively, the expectation is that descriptor allocations are transient, i.e. once previously submitted transactions are completed the descriptors will return to the available pool. So you could do what async_tx routines do and just poll for a descriptor. Can you give me an example? Even some pseudocode would help. Here is one from do_async_gen_syndrome() in crypto/async_tx/async_pq.c: /* Since we have clobbered the src_list we are committed * to doing this asynchronously. Drivers force forward * progress in case they can not provide a descriptor */ for (;;) { tx = dma-device_prep_dma_pq(chan, dma_dest, dma_src[src_off], pq_src_cnt, coefs[src_off], len, dma_flags); if (likely(tx)) break; async_tx_quiesce(submit-depend_tx); dma_async_issue_pending(chan); } The other DMAEngine functions (dma_async_memcpy_*()) don't do anything with the descriptor if submit fails. Take for example dma_async_memcpy_buf_to_buf(). If tx-tx_submit(tx); fails, any code using it has no way to return the descriptor to the free pool. Does tx_submit() implicitly return descriptors to the free pool if it fails? No, submit() failures are a hold over from when the ioatdma driver used to perform additional descriptor allocation at -submit() time. After prep() the expectation is that the engine is just waiting to be told go and can't fail. The only reason -submit() retains a return code is to support the cookie based method for polling for operation completion. A dma driver should handle all descriptor submission failure scenarios at prep time. Ok, I thought the list was clearer, but this is equally easy. How about the following change that does away with the list completely. Then things should work on ioatdma as well. From d59569ff48a89ef5411af3cf2995af7b742c5cd3 Mon Sep 17 00:00:00 2001 From: Ira W. Snyder i...@ovro.caltech.edu Date: Fri, 24 Sep 2010 14:18:09 -0700 Subject: [PATCH] dma: improve scatterlist to scatterlist transfer This is an improved algorithm to improve support on the Intel I/OAT driver. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/dmaengine.c | 52 +--- include/linux/dmaengine.h |3 -- 2 files changed, 25 insertions(+), 30 deletions(-) diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 57ec1e5..cde775c 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -983,10 +983,13 @@ dma_async_memcpy_sg_to_sg(struct dma_chan *chan, struct dma_async_tx_descriptor *tx; dma_cookie_t cookie = -ENOMEM; size_t dst_avail, src_avail; - struct list_head tx_list; + struct scatterlist *sg; size_t transferred = 0; + size_t dst_total = 0; + size_t src_total = 0; dma_addr_t dst, src; size_t len; + int i; if (dst_nents == 0 || src_nents == 0) return -EINVAL; @@ -994,12 +997,17 @@ dma_async_memcpy_sg_to_sg(struct dma_chan *chan, if (dst_sg == NULL || src_sg == NULL) return -EINVAL; + /* get the total count of bytes in each scatterlist */ + for_each_sg(dst_sg, sg, dst_nents, i) + dst_total += sg_dma_len(sg); + + for_each_sg(src_sg, sg, src_nents, i) + src_total += sg_dma_len(sg); + What about overrun or underrun do we not care if src_total != dst_total? Otherwise looks ok. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers
On Fri, Sep 24, 2010 at 02:53:14PM -0700, Dan Williams wrote: On Fri, 2010-09-24 at 14:24 -0700, Ira W. Snyder wrote: On Fri, Sep 24, 2010 at 01:40:56PM -0700, Dan Williams wrote: I don't think any dma channels gracefully handle descriptors that were prepped but not submitted. You would probably need to submit the backlog, poll for completion, and then return the error. Alternatively, the expectation is that descriptor allocations are transient, i.e. once previously submitted transactions are completed the descriptors will return to the available pool. So you could do what async_tx routines do and just poll for a descriptor. Can you give me an example? Even some pseudocode would help. Here is one from do_async_gen_syndrome() in crypto/async_tx/async_pq.c: /* Since we have clobbered the src_list we are committed * to doing this asynchronously. Drivers force forward * progress in case they can not provide a descriptor */ for (;;) { tx = dma-device_prep_dma_pq(chan, dma_dest, dma_src[src_off], pq_src_cnt, coefs[src_off], len, dma_flags); if (likely(tx)) break; async_tx_quiesce(submit-depend_tx); dma_async_issue_pending(chan); } The other DMAEngine functions (dma_async_memcpy_*()) don't do anything with the descriptor if submit fails. Take for example dma_async_memcpy_buf_to_buf(). If tx-tx_submit(tx); fails, any code using it has no way to return the descriptor to the free pool. Does tx_submit() implicitly return descriptors to the free pool if it fails? No, submit() failures are a hold over from when the ioatdma driver used to perform additional descriptor allocation at -submit() time. After prep() the expectation is that the engine is just waiting to be told go and can't fail. The only reason -submit() retains a return code is to support the cookie based method for polling for operation completion. A dma driver should handle all descriptor submission failure scenarios at prep time. Ok, that's more like what I expected. So we still need the try forever code similar to the above. I can add that for the next version. Ok, I thought the list was clearer, but this is equally easy. How about the following change that does away with the list completely. Then things should work on ioatdma as well. From d59569ff48a89ef5411af3cf2995af7b742c5cd3 Mon Sep 17 00:00:00 2001 From: Ira W. Snyder i...@ovro.caltech.edu Date: Fri, 24 Sep 2010 14:18:09 -0700 Subject: [PATCH] dma: improve scatterlist to scatterlist transfer This is an improved algorithm to improve support on the Intel I/OAT driver. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/dmaengine.c | 52 +--- include/linux/dmaengine.h |3 -- 2 files changed, 25 insertions(+), 30 deletions(-) diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 57ec1e5..cde775c 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -983,10 +983,13 @@ dma_async_memcpy_sg_to_sg(struct dma_chan *chan, struct dma_async_tx_descriptor *tx; dma_cookie_t cookie = -ENOMEM; size_t dst_avail, src_avail; - struct list_head tx_list; + struct scatterlist *sg; size_t transferred = 0; + size_t dst_total = 0; + size_t src_total = 0; dma_addr_t dst, src; size_t len; + int i; if (dst_nents == 0 || src_nents == 0) return -EINVAL; @@ -994,12 +997,17 @@ dma_async_memcpy_sg_to_sg(struct dma_chan *chan, if (dst_sg == NULL || src_sg == NULL) return -EINVAL; + /* get the total count of bytes in each scatterlist */ + for_each_sg(dst_sg, sg, dst_nents, i) + dst_total += sg_dma_len(sg); + + for_each_sg(src_sg, sg, src_nents, i) + src_total += sg_dma_len(sg); + What about overrun or underrun do we not care if src_total != dst_total? Otherwise looks ok. I don't know if we should care about that. The algorithm handles that case just fine. It copies the maximum amount it can, which is exactly min(src_total, dst_total). Whichever scatterlist runs out of entries first is the shortest. As a real world example, my driver verifies that both scatterlists have exactly the right number of bytes available before trying to program the hardware. Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: ppc44x - how do i optimize driver for tlb hits
On Fri, 2010-09-24 at 08:08 -0500, Ayman El-Khashab wrote: I suppose another option is to to use the kernel profiling option I always see but have never used. Is that a viable option to figure out what is happening here? With perf and stochastic sampling ? If you sample fast enough... but you'll mostly point to your routine I suppose... though it might tell you statistically where in your code, which -might- help. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.
This list is getting way too much unrelated stuff, which I find annoying, it would be nice if we were all a bit more careful here with our CC lists. Sorry, I only added device-tree because some one asked me to do so. http://marc.info/?l=linux-netdevm=127273157912358 I'll leave it off next time. That's allright. I'd rather you just post the bindings there than the whole patch least but no big deal. I was just fixing my email filters and notice a lot of seemingly unrelated stuff landing there :-) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH (Option 1)] of/i2c: fix module load order issue caused by of_i2c.c
Commit 959e85f7, i2c: add OF-style registration and binding caused a module dependency loop where of_i2c.c calls functions in i2c-core, and i2c-core calls of_i2c_register_devices() in of_i2c. This means that when i2c support is built as a module when CONFIG_OF is set, then neither i2c_core nor of_i2c are able to be loaded. This patch fixes the problem by moving the of_i2c_register_devices() function into the body of i2c_core and renaming it to i2c_scan_of_devices (of_i2c_register_devices is analogous to the existing i2c_scan_static_board_info function and so should be named similarly). This function isn't called by any code outside of i2c_core, and it must always be present when CONFIG_OF is selected, so it makes sense to locate it there. When CONFIG_OF is not selected, of_i2c_register_devices() becomes a no-op. Signed-off-by: Grant Likely grant.lik...@secretlab.ca --- drivers/i2c/i2c-core.c | 61 ++-- drivers/of/of_i2c.c| 57 - include/linux/of_i2c.h |7 -- 3 files changed, 59 insertions(+), 66 deletions(-) diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c index 6649176..64a261b 100644 --- a/drivers/i2c/i2c-core.c +++ b/drivers/i2c/i2c-core.c @@ -32,8 +32,8 @@ #include linux/init.h #include linux/idr.h #include linux/mutex.h -#include linux/of_i2c.h #include linux/of_device.h +#include linux/of_irq.h #include linux/completion.h #include linux/hardirq.h #include linux/irqflags.h @@ -818,6 +818,63 @@ static void i2c_scan_static_board_info(struct i2c_adapter *adapter) up_read(__i2c_board_lock); } +#ifdef CONFIG_OF +void i2c_scan_of_devices(struct i2c_adapter *adap) +{ + void *result; + struct device_node *node; + + /* Only register child devices if the adapter has a node pointer set */ + if (!adap-dev.of_node) + return; + + for_each_child_of_node(adap-dev.of_node, node) { + struct i2c_board_info info = {}; + struct dev_archdata dev_ad = {}; + const __be32 *addr; + int len; + + dev_dbg(adap-dev, of_i2c: register %s\n, node-full_name); + if (of_modalias_node(node, info.type, sizeof(info.type)) 0) { + dev_err(adap-dev, of_i2c: modalias failure on %s\n, + node-full_name); + continue; + } + + addr = of_get_property(node, reg, len); + if (!addr || (len sizeof(int))) { + dev_err(adap-dev, of_i2c: invalid reg on %s\n, + node-full_name); + continue; + } + + info.addr = be32_to_cpup(addr); + if (info.addr (1 10) - 1) { + dev_err(adap-dev, of_i2c: invalid addr=%x on %s\n, + info.addr, node-full_name); + continue; + } + + info.irq = irq_of_parse_and_map(node, 0); + info.of_node = of_node_get(node); + info.archdata = dev_ad; + + request_module(%s, info.type); + + result = i2c_new_device(adap, info); + if (result == NULL) { + dev_err(adap-dev, of_i2c: Failure registering %s\n, + node-full_name); + of_node_put(node); + irq_dispose_mapping(info.irq); + continue; + } + } +} +#else +static inline void i2c_scan_of_devices(struct i2c_adapter *adap) { } +#endif + static int i2c_do_add_adapter(struct i2c_driver *driver, struct i2c_adapter *adap) { @@ -877,7 +934,7 @@ static int i2c_register_adapter(struct i2c_adapter *adap) i2c_scan_static_board_info(adap); /* Register devices from the device tree */ - of_i2c_register_devices(adap); + i2c_scan_of_devices(adap); /* Notify drivers */ mutex_lock(core_lock); diff --git a/drivers/of/of_i2c.c b/drivers/of/of_i2c.c index 0a694de..e0c3841 100644 --- a/drivers/of/of_i2c.c +++ b/drivers/of/of_i2c.c @@ -17,63 +17,6 @@ #include linux/of_irq.h #include linux/module.h -void of_i2c_register_devices(struct i2c_adapter *adap) -{ - void *result; - struct device_node *node; - - /* Only register child devices if the adapter has a node pointer set */ - if (!adap-dev.of_node) - return; - - dev_dbg(adap-dev, of_i2c: walking child nodes\n); - - for_each_child_of_node(adap-dev.of_node, node) { - struct i2c_board_info info = {}; - struct dev_archdata dev_ad = {}; - const __be32 *addr; - int len; - - dev_dbg(adap-dev, of_i2c: register %s\n, node-full_name); - - if
Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop
On Fri, Sep 24, 2010 at 7:48 AM, Grant Likely grant.lik...@secretlab.ca wrote: Jean Delvare kh...@linux-fr.org wrote: Hi Mikael, On Fri, 24 Sep 2010 12:50:01 +0200, Mikael Pettersson wrote: Jean Delvare writes: As far as I can see this is caused by this commit from Grant: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5 Mikael, can you please try reverting this patch and see if it solves your problem? Yes, reverting the above commit from 2.6.36-rc5 eliminated the warnings, and I was able to insmod the i2c-{core,dev,powermac}.ko modules. Thanks for testing and reporting. Grant, unless you come up with a fix very quickly, I'll have to revert 959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5 for 2.6.36. I'll get a fix out today. I've got two different fixes that I'm about to send you. You can choose the fix that you prefer. The first option moves the offending function into i2c-core.c. The function parses the device tree data and creates i2c_device for each i2c device node that it finds. This is analogous to i2c_scan_static_board_info(). The second options reverts most of the 959e85f7 commit, but keeps the line that allows of-style matching is retained so that all i2c_devices on powerpc machines will still bind correctly. My preferred solution is the first option because the tested code path does not changed. The offending function is simply moved verbatim. The second option is a smaller patch, but I can only test one of the affected drivers. However, I'll let you make the decision. Both have been build tested on PowerPC and ARM, and run tested on a PowerPC MPC5200 board. patches to follow in a few minutes.. g. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH (Option 2)] of/i2c: fix module load order issue caused by of_i2c.c
Commit 959e85f7, i2c: add OF-style registration and binding caused a module dependency loop where of_i2c.c calls functions in i2c-core, and i2c-core calls of_i2c_register_devices() in of_i2c. This means that when i2c support is built as a module when CONFIG_OF is set, then neither i2c_core nor of_i2c are able to be loaded. This patch fixes the problem by moving the of_i2c_register_devices() calls back into the device drivers. Device drivers already specifically request the core code to parse the device tree for devices anyway by setting the of_node pointer, so it isn't a big deal to also call the registration function. The drivers just become slightly more verbose. Signed-off-by: Grant Likely grant.lik...@secretlab.ca --- drivers/i2c/busses/i2c-cpm.c |5 + drivers/i2c/busses/i2c-ibm_iic.c |3 +++ drivers/i2c/busses/i2c-mpc.c |1 + drivers/i2c/i2c-core.c |4 4 files changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c index f7bd261..f2de3be 100644 --- a/drivers/i2c/busses/i2c-cpm.c +++ b/drivers/i2c/busses/i2c-cpm.c @@ -677,6 +677,11 @@ static int __devinit cpm_i2c_probe(struct platform_device *ofdev, dev_dbg(ofdev-dev, hw routines for %s registered.\n, cpm-adap.name); + /* +* register OF I2C devices +*/ + of_i2c_register_devices(cpm-adap); + return 0; out_shut: cpm_i2c_shutdown(cpm); diff --git a/drivers/i2c/busses/i2c-ibm_iic.c b/drivers/i2c/busses/i2c-ibm_iic.c index 43ca32f..89eedf4 100644 --- a/drivers/i2c/busses/i2c-ibm_iic.c +++ b/drivers/i2c/busses/i2c-ibm_iic.c @@ -761,6 +761,9 @@ static int __devinit iic_probe(struct platform_device *ofdev, dev_info(ofdev-dev, using %s mode\n, dev-fast_mode ? fast (400 kHz) : standard (100 kHz)); + /* Now register all the child nodes */ + of_i2c_register_devices(adap); + return 0; error_cleanup: diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c index a1c419a..b74e6dc 100644 --- a/drivers/i2c/busses/i2c-mpc.c +++ b/drivers/i2c/busses/i2c-mpc.c @@ -632,6 +632,7 @@ static int __devinit fsl_i2c_probe(struct platform_device *op, dev_err(i2c-dev, failed to add adapter\n); goto fail_add; } + of_i2c_register_devices(i2c-adap); return result; diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c index 6649176..a9589f5 100644 --- a/drivers/i2c/i2c-core.c +++ b/drivers/i2c/i2c-core.c @@ -32,7 +32,6 @@ #include linux/init.h #include linux/idr.h #include linux/mutex.h -#include linux/of_i2c.h #include linux/of_device.h #include linux/completion.h #include linux/hardirq.h @@ -876,9 +875,6 @@ static int i2c_register_adapter(struct i2c_adapter *adap) if (adap-nr __i2c_first_dynamic_bus_num) i2c_scan_static_board_info(adap); - /* Register devices from the device tree */ - of_i2c_register_devices(adap); - /* Notify drivers */ mutex_lock(core_lock); bus_for_each_drv(i2c_bus_type, NULL, adap, __process_new_adapter); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers
On Fri, 2010-09-24 at 15:04 -0700, Ira W. Snyder wrote: On Fri, Sep 24, 2010 at 02:53:14PM -0700, Dan Williams wrote: What about overrun or underrun do we not care if src_total != dst_total? Otherwise looks ok. I don't know if we should care about that. The algorithm handles that case just fine. It copies the maximum amount it can, which is exactly min(src_total, dst_total). Whichever scatterlist runs out of entries first is the shortest. As a real world example, my driver verifies that both scatterlists have exactly the right number of bytes available before trying to program the hardware. Ok, just handle the prep failure and I think we are good to go. -- Dan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers
On Fri, Sep 24, 2010 at 03:04:19PM -0700, Ira W. Snyder wrote: On Fri, Sep 24, 2010 at 02:53:14PM -0700, Dan Williams wrote: On Fri, 2010-09-24 at 14:24 -0700, Ira W. Snyder wrote: On Fri, Sep 24, 2010 at 01:40:56PM -0700, Dan Williams wrote: I don't think any dma channels gracefully handle descriptors that were prepped but not submitted. You would probably need to submit the backlog, poll for completion, and then return the error. Alternatively, the expectation is that descriptor allocations are transient, i.e. once previously submitted transactions are completed the descriptors will return to the available pool. So you could do what async_tx routines do and just poll for a descriptor. Can you give me an example? Even some pseudocode would help. Here is one from do_async_gen_syndrome() in crypto/async_tx/async_pq.c: /* Since we have clobbered the src_list we are committed * to doing this asynchronously. Drivers force forward * progress in case they can not provide a descriptor */ for (;;) { tx = dma-device_prep_dma_pq(chan, dma_dest, dma_src[src_off], pq_src_cnt, coefs[src_off], len, dma_flags); if (likely(tx)) break; async_tx_quiesce(submit-depend_tx); dma_async_issue_pending(chan); } The other DMAEngine functions (dma_async_memcpy_*()) don't do anything with the descriptor if submit fails. Take for example dma_async_memcpy_buf_to_buf(). If tx-tx_submit(tx); fails, any code using it has no way to return the descriptor to the free pool. Does tx_submit() implicitly return descriptors to the free pool if it fails? No, submit() failures are a hold over from when the ioatdma driver used to perform additional descriptor allocation at -submit() time. After prep() the expectation is that the engine is just waiting to be told go and can't fail. The only reason -submit() retains a return code is to support the cookie based method for polling for operation completion. A dma driver should handle all descriptor submission failure scenarios at prep time. Ok, that's more like what I expected. So we still need the try forever code similar to the above. I can add that for the next version. When coding this change, I've noticed one problem that would break my driver. I cannot issue dma_async_issue_pending() on the channel while creating the descriptors, since this will start transferring the previously submitted DMA descriptors. This breaks the external hardware control requirement. Imagine this scenario: 1) device is not yet setup for external control (nothing is pulsing the pins) 2) dma_async_memcpy_sg_to_sg() - this hits an allocation failure, which calls dma_async_issue_pending() - this causes the DMA engine to start transferring to a device which is not ready yet - memory pressure stops, and allocation succeeds again - some descriptors have been transferred, but not the ones since the alloc failure - now the first half of the descriptors (pre alloc failure) have been transferred - the second half of the descriptors (post alloc failure) are still pending - the dma_async_memcpy_sg_to_sg() returns success: all tx_submit() succeeded 3) device_control() - setup external control mode 4) dma_async_issue_pending() - start the externally controlled transfer 5) tell the external agent to start controlling the DMA transaction - now there isn't enough data left, and the external agent fails to program the FPGAs I don't mind adding it to the code, since I have enough memory that I don't ever see allocation failures. It is an embedded system, and we've been careful not to overcommit memory. I think for all other users, it would be the appropriate thing to do. Most people don't care if the scatterlist is copied in two chunks with a time gap in the middle. An alternative implementation would be to implement device_prep_sg_to_sg() that returned a struct dma_async_tx_descriptor, which could then be used as normal by higher layers. This would allow the driver to allocate / cleanup all descriptors in one shot. This would be completely robust to this error situation. Is there one solution you'd prefer over the other? They're both similar in the amount of code, though duplication would probably be increased in the device_prep_sg_to_sg() case. If any other driver implements it. Thanks, Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH RFCv2 0/2] dma: add support for sg-to-sg transfers
This series adds support for scatterlist to scatterlist transfers to the generic DMAEngine API. I have hidden it behind a configuration option to allow specific drivers that need this functionality to enable it. This series is intended to lay the groundwork for further changes to the series titled CARMA Board Support. That series will be updated when I have time and hardware to test with. This series has not been runtime tested yet. I am posting it only to gain comments before I spend the effort to update the driver that depends on this. To help reviewers, I'd like to comment on the architecture of dma_async_memcpy_sg_to_sg(). It explicitly avoids using descriptor chaining due to the way that feature interacts with the fsldma controller's external start feature. To use the external start feature properly, the in-memory descriptor chain must not be fragmented into multiple smaller chains. This is what is achieved by submitting all descriptors without using chaining. An alternative implementation would create a device_prep_sg_to_sg() function, and use that to allocate all descriptors in one shot. That implementation would be safer against allocation failures than this one. I would recommend against committing this until I've tested it on real hardware. Ira W. Snyder (2): dmaengine: add support for scatterlist to scatterlist transfers fsldma: use generic support for scatterlist to scatterlist transfers arch/powerpc/include/asm/fsldma.h | 115 ++-- drivers/dma/Kconfig |4 + drivers/dma/dmaengine.c | 119 drivers/dma/fsldma.c | 219 +++-- include/linux/dmaengine.h | 10 ++ 5 files changed, 181 insertions(+), 286 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH RFCv2 1/2] dmaengine: add support for scatterlist to scatterlist transfers
This adds support for scatterlist to scatterlist DMA transfers. This is currently hidden behind a configuration option, which will allow drivers which need this functionality to select it individually. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/Kconfig |3 + drivers/dma/dmaengine.c | 125 + include/linux/dmaengine.h |6 ++ 3 files changed, 134 insertions(+), 0 deletions(-) diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 9520cf0..82d2244 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -89,6 +89,9 @@ config AT_HDMAC Support the Atmel AHB DMA controller. This can be integrated in chips such as the Atmel AT91SAM9RL. +config DMAENGINE_SG_TO_SG + bool + config FSL_DMA tristate Freescale Elo and Elo Plus DMA support depends on FSL_SOC diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 9d31d5e..9238b86 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -972,6 +972,131 @@ dma_async_memcpy_pg_to_pg(struct dma_chan *chan, struct page *dest_pg, } EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg); +#ifdef CONFIG_DMAENGINE_SG_TO_SG +dma_cookie_t +dma_async_memcpy_sg_to_sg(struct dma_chan *chan, + struct scatterlist *dst_sg, unsigned int dst_nents, + struct scatterlist *src_sg, unsigned int src_nents, + dma_async_tx_callback cb, void *cb_param) +{ + struct dma_device *dev = chan-device; + struct dma_async_tx_descriptor *tx; + dma_cookie_t cookie = -ENOMEM; + size_t dst_avail, src_avail; + struct scatterlist *sg; + size_t transferred = 0; + size_t dst_total = 0; + size_t src_total = 0; + dma_addr_t dst, src; + size_t len; + int i; + + if (dst_nents == 0 || src_nents == 0) + return -EINVAL; + + if (dst_sg == NULL || src_sg == NULL) + return -EINVAL; + + /* get the total count of bytes in each scatterlist */ + for_each_sg(dst_sg, sg, dst_nents, i) + dst_total += sg_dma_len(sg); + + for_each_sg(src_sg, sg, src_nents, i) + src_total += sg_dma_len(sg); + + /* get prepared for the loop */ + dst_avail = sg_dma_len(dst_sg); + src_avail = sg_dma_len(src_sg); + + /* run until we are out of descriptors */ + while (true) { + + /* create the largest transaction possible */ + len = min_t(size_t, src_avail, dst_avail); + if (len == 0) + goto fetch; + + dst = sg_dma_address(dst_sg) + sg_dma_len(dst_sg) - dst_avail; + src = sg_dma_address(src_sg) + sg_dma_len(src_sg) - src_avail; + + /* +* get a descriptor +* +* we must poll for a descriptor here since the DMAEngine API +* does not provide a way for external users to free previously +* allocated descriptors +*/ + for (;;) { + tx = dev-device_prep_dma_memcpy(chan, dst, src, len, 0); + if (likely(tx)) + break; + + dma_async_issue_pending(chan); + } + + /* update metadata */ + transferred += len; + dst_avail -= len; + src_avail -= len; + + /* if this is the last transfer, setup the callback */ + if (dst_total == transferred || src_total == transferred) { + tx-callback = cb; + tx-callback_param = cb_param; + } + + /* submit the transaction */ + cookie = tx-tx_submit(tx); + if (dma_submit_error(cookie)) { + dev_err(dev-dev, failed to submit desc\n); + return cookie; + } + +fetch: + /* fetch the next dst scatterlist entry */ + if (dst_avail == 0) { + + /* no more entries: we're done */ + if (dst_nents == 0) + break; + + /* fetch the next entry: if there are no more: done */ + dst_sg = sg_next(dst_sg); + if (dst_sg == NULL) + break; + + dst_nents--; + dst_avail = sg_dma_len(dst_sg); + } + + /* fetch the next src scatterlist entry */ + if (src_avail == 0) { + + /* no more entries: we're done */ + if (src_nents == 0) + break; + + /* fetch the next entry: if there are no more: done */ + src_sg =
[PATCH RFCv2 2/2] fsldma: use generic support for scatterlist to scatterlist transfers
The fsldma driver uses the DMA_SLAVE API to handle scatterlist to scatterlist DMA transfers. For quite a while now, it has been possible to mimic the operation by using the device_prep_dma_memcpy() routine intelligently. Now that the DMAEngine API has grown generic support for scatterlist to scatterlist transfers, this operation is no longer needed. The generic support is used for scatterlist to scatterlist transfers. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- arch/powerpc/include/asm/fsldma.h | 115 ++-- drivers/dma/fsldma.c | 219 +++-- 2 files changed, 48 insertions(+), 286 deletions(-) diff --git a/arch/powerpc/include/asm/fsldma.h b/arch/powerpc/include/asm/fsldma.h index debc5ed..dc0bd27 100644 --- a/arch/powerpc/include/asm/fsldma.h +++ b/arch/powerpc/include/asm/fsldma.h @@ -1,7 +1,7 @@ /* * Freescale MPC83XX / MPC85XX DMA Controller * - * Copyright (c) 2009 Ira W. Snyder i...@ovro.caltech.edu + * Copyright (c) 2009-2010 Ira W. Snyder i...@ovro.caltech.edu * * This file is licensed under the terms of the GNU General Public License * version 2. This program is licensed as is without any warranty of any @@ -11,127 +11,32 @@ #ifndef __ARCH_POWERPC_ASM_FSLDMA_H__ #define __ARCH_POWERPC_ASM_FSLDMA_H__ -#include linux/slab.h #include linux/dmaengine.h /* - * Definitions for the Freescale DMA controller's DMA_SLAVE implemention + * The Freescale DMA controller has several features that are not accomodated + * in the Linux DMAEngine API. Therefore, the generic structure is expanded + * to allow drivers to use these features. * - * The Freescale DMA_SLAVE implementation was designed to handle many-to-many - * transfers. An example usage would be an accelerated copy between two - * scatterlists. Another example use would be an accelerated copy from - * multiple non-contiguous device buffers into a single scatterlist. + * This structure should be passed into the DMAEngine routine device_control() + * as in this example: * - * A DMA_SLAVE transaction is defined by a struct fsl_dma_slave. This - * structure contains a list of hardware addresses that should be copied - * to/from the scatterlist passed into device_prep_slave_sg(). The structure - * also has some fields to enable hardware-specific features. + * chan-device-device_control(chan, DMA_SLAVE_CONFIG, (unsigned long)cfg); */ /** - * struct fsl_dma_hw_addr - * @entry: linked list entry - * @address: the hardware address - * @length: length to transfer - * - * Holds a single physical hardware address / length pair for use - * with the DMAEngine DMA_SLAVE API. - */ -struct fsl_dma_hw_addr { - struct list_head entry; - - dma_addr_t address; - size_t length; -}; - -/** * struct fsl_dma_slave - * @addresses: a linked list of struct fsl_dma_hw_addr structures + * @config: the standard Linux DMAEngine API DMA_SLAVE configuration * @request_count: value for DMA request count - * @src_loop_size: setup and enable constant source-address DMA transfers - * @dst_loop_size: setup and enable constant destination address DMA transfers * @external_start: enable externally started DMA transfers * @external_pause: enable externally paused DMA transfers - * - * Holds a list of address / length pairs for use with the DMAEngine - * DMA_SLAVE API implementation for the Freescale DMA controller. */ -struct fsl_dma_slave { +struct fsldma_slave_config { + struct dma_slave_config config; - /* List of hardware address/length pairs */ - struct list_head addresses; - - /* Support for extra controller features */ unsigned int request_count; - unsigned int src_loop_size; - unsigned int dst_loop_size; bool external_start; bool external_pause; }; -/** - * fsl_dma_slave_append - add an address/length pair to a struct fsl_dma_slave - * @slave: the struct fsl_dma_slave to add to - * @address: the hardware address to add - * @length: the length of bytes to transfer from @address - * - * Add a hardware address/length pair to a struct fsl_dma_slave. Returns 0 on - * success, -ERRNO otherwise. - */ -static inline int fsl_dma_slave_append(struct fsl_dma_slave *slave, - dma_addr_t address, size_t length) -{ - struct fsl_dma_hw_addr *addr; - - addr = kzalloc(sizeof(*addr), GFP_ATOMIC); - if (!addr) - return -ENOMEM; - - INIT_LIST_HEAD(addr-entry); - addr-address = address; - addr-length = length; - - list_add_tail(addr-entry, slave-addresses); - return 0; -} - -/** - * fsl_dma_slave_free - free a struct fsl_dma_slave - * @slave: the struct fsl_dma_slave to free - * - * Free a struct fsl_dma_slave and all associated address/length pairs - */ -static inline void fsl_dma_slave_free(struct fsl_dma_slave *slave) -{ - struct fsl_dma_hw_addr *addr, *tmp; - - if (slave) { -