Re: [PATCH] irqbalance, powerpc: add IRQs without settable SMP affinity to banned list

2010-09-24 Thread Michael Neuling

 size_t size =3D 0;
 FILE *file;
 sprintf(buf, /proc/irq/%i/smp_affinity, number);
   - file =3D fopen(buf, r);
   + file =3D fopen(buf, r+);
 if (!file)
 continue;
 if (getline(line, size, file)=3D=3D0) {
   @@ -89,7 +89,14 @@
 continue;
 }
 cpumask_parse_user(line, strlen(line), irq-mask);
   - fclose(file);
   + /*
   +  * Check that we can write the affinity, if
   +  * not take it out of the list.
   +  */
   + if (fputs(line, file) =3D=3D EOF)
   + can_set =3D 0;
 
  This is maybe a nit, but writing to the affinity file can fail for a few
  different reasons, some of them permanent, some transient.  For instance,=
  if
  we're in a memory constrained condition temporarily irq_affinity_proc_wri=
 te
  might return -ENOMEM. =20
 
 Yeah true, usually followed shortly by your kernel going so far into
 swap you never get it back, or OOMing, but I guess it's possible.
 
  Might it be better to modify this code so that, instead
  of using fputs to merge the various errors into an EOF, we use some other=
  write
  method that lets us better determine the error and selectively ban the in=
 terrupt
  only for those errors which we consider permanent?
 
 Yep. It seems fputs() gives you know way to get the actual error from
 write(), so it looks we'll need to switch to open/write, but that's
 probably not so terrible.

fclose inherits the error from fputs and it sets errno correctly.  Below
uses this to catch only EIO errors and mark them for the banned list.

Mikey

irqbalance, powerpc: add IRQs without settable SMP affinity to banned list

On pseries powerpc, IPIs are registered with an IRQ number so
/proc/interrupts looks like this on a 2 core/2 thread machine:

   CPU0   CPU1   CPU2   CPU3
 16:316428232905141138794 983121   XICS Level   
 IPI
 18:2605674  0 304994  0   XICS Level   
 lan0
 30: 400057  0 169209  0   XICS Level   
 ibmvscsi
LOC: 133734  77250 106425  91951   Local timer interrupts
SPU:  0  0  0  0   Spurious interrupts
CNT:  0  0  0  0   Performance monitoring 
interrupts
MCE:  0  0  0  0   Machine check exceptions

Unfortunately this means irqbalance attempts to set the affinity of IPIs
which is not possible.  So in the above case, when irqbalance is in
performance mode due to heavy IPI, lan0 and ibmvscsi activity, it
sometimes attempts to put the IPIs on one core (CPU01) and lan0 and
ibmvscsi on the other core (CPU23).  This is suboptimal as we want lan0
and ibmvscsi to be on separate cores and IPIs to be ignored.

When irqblance attempts writes to the IPI smp_affinity (ie.
/proc/irq/16/smp_affinity in the above example) it fails with an EIO but
irqbalance currently ignores this.

This patch catches these write fails and in this case adds that IRQ
number to the banned IRQ list.  This will catch the above IPI case and
any other IRQ where the SMP affinity can't be set.

Tested on POWER6, POWER7 and x86.

Signed-off-by: Michael Neuling mi...@neuling.org
 
Index: irqbalance/irqlist.c
===
--- irqbalance.orig/irqlist.c
+++ irqbalance/irqlist.c
@@ -28,6 +28,7 @@
 #include unistd.h
 #include sys/types.h
 #include dirent.h
+#include errno.h
 
 #include types.h
 #include irqbalance.h
@@ -67,7 +68,7 @@
DIR *dir;
struct dirent *entry;
char *c, *c2;
-   int nr , count = 0;
+   int nr , count = 0, can_set = 1;
char buf[PATH_MAX];
sprintf(buf, /proc/irq/%i, number);
dir = opendir(buf);
@@ -80,7 +81,7 @@
size_t size = 0;
FILE *file;
sprintf(buf, /proc/irq/%i/smp_affinity, number);
-   file = fopen(buf, r);
+   file = fopen(buf, r+);
if (!file)
continue;
if (getline(line, size, file)==0) {
@@ -89,7 +90,13 @@
continue;
}
cpumask_parse_user(line, strlen(line), irq-mask);
-   fclose(file);
+   /*
+* Check that we can write the affinity, if
+* not take it out of the list.
+*/
+   fputs(line, file);
+   if (fclose(file)  errno == EIO)
+   can_set = 0;
   

Re: [PATCH 1/2] PPC4xx: Generelizing drivers/dma/ppc4xx/adma.c

2010-09-24 Thread Stefan Roese
On Friday 24 September 2010 00:39:47 Tirumala Marri wrote:
  Will both versions of this driver exist in the same kernel build?  For
  example the iop-adma driver supports iop13xx and iop3xx, but we select
  the archtitecture at build time?  Or, as I assume in this case, will
  the
  two (maybe more?) ppc4xx adma drivers all be built in the same image,
  more like ioatdma?
 
 [Marri] We select the architecture at build time.

It would be really preferable to support all those platforms in a single Linux 
image. If technically possible, please try to move this direction.

Thanks.

Cheers,
Stefan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] spi_mpc8xxx: issue with using definition of pram in Device Tree

2010-09-24 Thread Grant Likely
On Thu, Sep 16, 2010 at 09:05:03AM +0200, christophe leroy wrote:
 This patch applies to 2.6.34.7 and 2.6.35.4
 It fixes an issue during the probe for CPM1 with definition of parameter ram 
 from DTS
 
 Signed-off-by: christophe leroy christophe.le...@c-s.fr

I'm sorry, I don't understand the fix from the given description.
What is the problem, and why is cpm_muram_alloc_fixed() the wrong
thing to call on CPM1?  Does CPM2 still need it?

g.

 
 diff -urN b/drivers/spi/spi_mpc8xxx.c c/drivers/spi/spi_mpc8xxx.c
 --- b/drivers/spi/spi_mpc8xxx.c   2010-09-08 16:43:50.0 +0200
 +++ c/drivers/spi/spi_mpc8xxx.c   2010-09-08 16:44:03.0 +0200
 @@ -822,7 +822,7 @@
   if (!iprop || size != sizeof(*iprop) * 4)
   return -ENOMEM;
  
 - spi_base_ofs = cpm_muram_alloc_fixed(iprop[2], 2);
 + spi_base_ofs = iprop[2];
   if (IS_ERR_VALUE(spi_base_ofs))
   return -ENOMEM;
  
 @@ -844,7 +844,6 @@
   return spi_base_ofs;
   }
  
 - cpm_muram_free(spi_base_ofs);
   return pram_ofs;
  }
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] spi_mpc8xxx: issue with using definition of pram in Device Tree

2010-09-24 Thread LEROY Christophe

 Hello,

The issue is that cpm_muram_alloc_fixed() allocates memory from the 
general purpose muram area (from 0x0 to 0x1bff).
Here we need to return a pointer to the parameter RAM, which is located 
somewhere starting at 0x1c00. It is not a dynamic allocation that is 
required here but only to point on the correct location in the parameter 
RAM.


For the CPM2, I don't know. I'm working with a MPC866.

Attached is a previous discussion on the subject where I explain a bit 
more in details the issue.


Regards
C. Leroy

Le 24/09/2010 09:10, Grant Likely a écrit :

On Thu, Sep 16, 2010 at 09:05:03AM +0200, christophe leroy wrote:

This patch applies to 2.6.34.7 and 2.6.35.4
It fixes an issue during the probe for CPM1 with definition of parameter ram 
from DTS

Signed-off-by: christophe leroychristophe.le...@c-s.fr

I'm sorry, I don't understand the fix from the given description.
What is the problem, and why is cpm_muram_alloc_fixed() the wrong
thing to call on CPM1?  Does CPM2 still need it?

g.


diff -urN b/drivers/spi/spi_mpc8xxx.c c/drivers/spi/spi_mpc8xxx.c
--- b/drivers/spi/spi_mpc8xxx.c 2010-09-08 16:43:50.0 +0200
+++ c/drivers/spi/spi_mpc8xxx.c 2010-09-08 16:44:03.0 +0200
@@ -822,7 +822,7 @@
if (!iprop || size != sizeof(*iprop) * 4)
return -ENOMEM;

-   spi_base_ofs = cpm_muram_alloc_fixed(iprop[2], 2);
+   spi_base_ofs = iprop[2];
if (IS_ERR_VALUE(spi_base_ofs))
return -ENOMEM;

@@ -844,7 +844,6 @@
return spi_base_ofs;
}

-   cpm_muram_free(spi_base_ofs);
return pram_ofs;
  }
---BeginMessage---
On Tue, 7 Sep 2010 11:17:17 +0200
LEROY Christophe christophe.le...@c-s.fr wrote:

 
   Dear Kumar,
 
 I have a small issue in the init of spi_mpc8xxx.c with MPC866 (CPM1)
 
 Unlike cpm_uart that maps the parameter ram directly using 
 of_iomap(np,1), spi_mpc8xxx.c uses cpm_muram_alloc_fixed().
 
 This has two impacts in the .dts file:
 * The driver must be declared with pram at 1d80 instead of 3d80 whereas 
 it is not a child of mu...@2000 but a child of c...@9c0
 * mu...@2000/d...@0 must be declared with reg = 0x0 0x2000   whereas 
 is should be reg=0x0 0x1c00 to avoid cpm_muram_alloc() to allocate 
 space from parameters ram.
 
 Maybe I misunderstood something ?

Don't make the device tree lie, fix the driver instead.

The allocator should not be given any chunks of muram that are
dedicated to a fixed purpose -- it might hand it out to something else
before you reserve it.  I don't think that cpm_muram_alloc_fixed() has
any legitimate use at all.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

---End Message---
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.

2010-09-24 Thread Richard Cochran
On Thu, Sep 23, 2010 at 12:48:51PM -0700, john stultz wrote:
 On Thu, 2010-09-23 at 19:31 +0200, Richard Cochran wrote:
  A new syscall is introduced that allows tuning of a POSIX clock. The
  syscall is implemented for four architectures: arm, blackfin, powerpc,
  and x86.
  
  The new syscall, clock_adjtime, takes two parameters, the clock ID,
  and a pointer to a struct timex. The semantics of the timex struct
  have been expanded by one additional mode flag, which allows an
  absolute offset correction. When specificied, the clock offset is
  immediately corrected by adding the given time value to the current
  time value.
 
 
 So I'd still split this patch up a little bit more.
 
 1) Patch that implements the ADJ_SETOFFSET  (*and its implementation*)
 in do_adjtimex.
 
 2) Patch that adds the new syscall and clock_id multiplexing.
 
 3) Patches that wire it up to the rest of the architectures (there's
 still a bunch missing here).

I was not sure what the policy is about adding syscalls. Is it the
syscall author's responsibility to add it into every arch?

The last time (see a2e2725541fad7) the commit only added half of some
archs, and ignored others. In my patch, the syscall *really* works on
the archs that are present in the patch.

(Actually, I did not test blackfin, since I don't have one, but I
included it since I know they have a PTP hardware clock.)

  +static inline int common_clock_adj(const clockid_t which_clock, struct 
  timex *t)
  +{
  +   if (CLOCK_REALTIME == which_clock)
  +   return do_adjtimex(t);
  +   else
  +   return -EOPNOTSUPP;
  +}
 
 
 Would it make sense to point to the do_adjtimex() in the k_clock
 definition for CLOCK_REALTIME rather then conditionalizing it here?

But what about CLOCK_MONOTONIC_RAW, for example?

Does it make sense to allow it to be adjusted?

Thanks,
Richard
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.

2010-09-24 Thread Richard Cochran
On Fri, Sep 24, 2010 at 08:03:43AM +1000, Benjamin Herrenschmidt wrote:
 On Thu, 2010-09-23 at 19:31 +0200, Richard Cochran wrote:
  A new syscall is introduced that allows tuning of a POSIX clock. The
  syscall is implemented for four architectures: arm, blackfin, powerpc,
  and x86.
  
  The new syscall, clock_adjtime, takes two parameters, the clock ID,
  and a pointer to a struct timex. The semantics of the timex struct
  have been expanded by one additional mode flag, which allows an
  absolute offset correction. When specificied, the clock offset is
  immediately corrected by adding the given time value to the current
  time value.
 
 Any reason why you CC'ed device-tree discuss ?
 
 This list is getting way too much unrelated stuff, which I find
 annoying, it would be nice if we were all a bit more careful here with
 our CC lists.

Sorry, I only added device-tree because some one asked me to do so.

http://marc.info/?l=linux-netdevm=127273157912358

I'll leave it off next time.

Thanks,
Richard
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] spi_mpc8xxx: issue with using definition of pram in Device Tree

2010-09-24 Thread Anton Vorontsov
Hello,

On Fri, Sep 24, 2010 at 09:20:27AM +0200, LEROY Christophe wrote:
 The issue is that cpm_muram_alloc_fixed() allocates memory from the
 general purpose muram area (from 0x0 to 0x1bff).
 Here we need to return a pointer to the parameter RAM, which is
 located somewhere starting at 0x1c00. It is not a dynamic allocation
 that is required here but only to point on the correct location in
 the parameter RAM.
 
 For the CPM2, I don't know. I'm working with a MPC866.
 
 Attached is a previous discussion on the subject where I explain a
 bit more in details the issue.

The patch looks OK, I think.

Doesn't explain why that worked on MPC8272 (CPM2) and MPC8560
(also CPM2) machines though. But here's my guess (I no longer
have these boards to test it):

On 8272 I used this node:

+   s...@4c0 {
+   #address-cells = 1;
+   #size-cells = 0;
+   compatible = fsl,cpm2-spi, fsl,spi;
+   reg = 0x11a80 0x40 0x89fc 0x2;

On that SOC there are two muram data regions 0x0..0x2000 and
0x9000..0x9100. Note that we actually don't want data regions,
and the only reason why that worked is that sysdev/cpm_common.c
maps muram(0)..muram(max).

Thanks,

-- 
Anton Vorontsov
email: cbouatmai...@gmail.com
irc://irc.freenode.net/bd2
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-24 Thread Richard Cochran
On Thu, Sep 23, 2010 at 12:53:20PM -0500, Christoph Lameter wrote:
 On Thu, 23 Sep 2010, Richard Cochran wrote:
  3.3 Synchronizing the Linux System Time
  
 
 One could offer a PHC as a combined clock source and clock event
 device. The advantage of this approach would be that it obviates
 the need for synchronization when the PHC is selected as the system
 timer. However, some PHCs, namely the PHY based clocks, cannot be
 used in this way.
 
 Why not? Do PHY based clock not at least provide a counter that increments
 in synchronized intervals throughout the network?

The counter in the PHY is accessed via the MDIO bus. One 16 bit read
takes anywhere from 25 to 40 microseconds. Reading the 64 bit time
value requires four reads, so we're talking about 100 to 160
microseconds, just for a single time reading.

In addition to that, reading MDIO bus can sleep.  So, we can't (in
general) to offer PHCs as clock sources.

 Instead, the patch set provides a way to offer a Pulse Per Second
 (PPS) event from the PHC to the Linux PPS subsystem. A user space
 application can read the PPS events and tune the system clock, just
 like when using other external time sources like radio clocks or
 GPS.
 
 User space is subject to various latencies created by the OS etc. I would
 that in order to have fine grained (read microsecond) accurary we would
 have to run the portions that are relevant to obtaining the desired
 accuracy in the kernel.

The time-critical operations are all performed in hardware (packet
timestamp), or in kernel space (input PPS timestamp). User space only
runs the servo (using hardware or kernel timestamps as input) and
performs the clock correction. With a sample rate of 1 PPS, the small
user space induced delay (a few dozen microseconds) between sample
time and clock correction is not an issue.

Thanks,
Richard
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.

2010-09-24 Thread Richard Cochran
On Thu, Sep 23, 2010 at 02:17:36PM -0500, Christoph Lameter wrote:
 On Thu, 23 Sep 2010, Richard Cochran wrote:
  +  These properties set the operational parameters for the PTP
  +  clock. You must choose these carefully for the clock to work right.
  +  Here is how to figure good values:
  +
  +  TimerOsc = system clock   MHz
  +  tclk_period  = desired clock period   nanoseconds
  +  NominalFreq  = 1000 / tclk_period MHz
  +  FreqDivRatio = TimerOsc / NominalFreq (must be greater that 1.0)
  +  tmr_add  = ceil(2^32 / FreqDivRatio)
  +  OutputClock  = NominalFreq / tmr_prsc MHz
  +  PulseWidth   = 1 / OutputClockmicroseconds
  +  FiperFreq1   = desired frequency in Hz
  +  FiperDiv1= 100 * OutputClock / FiperFreq1
  +  tmr_fiper1   = tmr_prsc * tclk_period * FiperDiv1 - tclk_period
  +  max_adj  = 10 * (FreqDivRatio - 1.0) - 1
 
 Great stuff for clock synchronization...
 
  +  The calculation for tmr_fiper2 is the same as for tmr_fiper1. The
  +  driver expects that tmr_fiper1 will be correctly set to produce a 1
  +  Pulse Per Second (PPS) signal, since this will be offered to the PPS
  +  subsystem to synchronize the Linux clock.
 
 Argh. And conceptually completely screwed up. Why go through the PPS
 subsystem if you can directly tune the system clock based on a number of
 the cool periodic clock features that you have above? See how the other
 clocks do that easily? Look into drivers/clocksource. Add it there.
 
 Please do not introduce useless additional layers for clock sync. Load
 these ptp clocks like the other regular clock modules and make them sync
 system time like any other clock.
 
 Really guys: I want a PTP solution! Now! And not some idiotic additional
 kernel layers that just pass bits around because its so much fun and
 screws up clock accurary in due to the latency noise introduced while
 having so much fun with the bits.

(Sorry if this message comes twice. Mutt/Gmail flaked out again.)

I think you misunderstood this particular patch. The device tree
parameters are really just internal driver stuff. When you use the
eTSEC, you must make some design choices at the same time as you plan
your board. The proper values for some of the eTSEC registers are
based on these design choices. Since the Freescale documentation is a
bit thin on this, I added a few notes to help my fellow board
designers.

Because these values are closely related to the board itself, I think
that it is nicer to configure them via the device tree than using
either CONFIG_ variables or platform data.

Richard
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop

2010-09-24 Thread Jean Delvare
On Thu, 23 Sep 2010 15:05:59 -0700, Randy Dunlap wrote:
 On Thu, 23 Sep 2010 22:16:32 +0200 Mikael Pettersson wrote:
  Randy Dunlap writes:
No kconfig warnings?
  
  Not that I recall.  I can check tomorrow if necessary.
 
 No kconfig warnings.  I checked with your .config file.
 
Please post your full .config file.
 
 Just a matter of module i2c-core calls of_ functions and module of_i2c calls
 i2c_ functions.  Hmph.  Something for Grant, Jean, and Ben to work out.

As far as I can see this is caused by this commit from Grant:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5

Mikael, can you please try reverting this patch and see if it solves
your problem?

-- 
Jean Delvare
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: ppc44x - how do i optimize driver for tlb hits

2010-09-24 Thread Josh Boyer
On Fri, Sep 24, 2010 at 02:43:52PM +1000, Benjamin Herrenschmidt wrote:
 The DMA is what I use in the real world case to get data into and out 
 of these buffers.  However, I can disable the DMA completely and do only
 the kmalloc.  In this case I still see the same poor performance.  My
 prefetching is part of my algo using the dcbt instructions.  I know the
 instructions are effective b/c without them the algo is much less 
 performant.  So yes, my prefetches are explicit.

Could be some effect of the cache structure, L2 cache, cache geometry
(number of ways etc...). You might be able to alleviate that by changing
the stride of your prefetch.

Unfortunately, I'm not familiar enough with the 440 micro architecture
and its caches to be able to help you much here.

Also, doesn't kmalloc have a limit to the size of the request it will
let you allocate?  I know in the distant past you could allocate 128K
with kmalloc, and 2M with an explicit call to get_free_pages.  Anything
larger than that had to use vmalloc.  The limit might indeed be higher
now, but a 4MB kmalloc buffer sounds very large, given that it would be
contiguous pages.  Two of them even less so.

 Ok, I will give that a try ... in addition, is there an easy way to use
 any sort of gprof like tool to see the system performance?  What about
 looking at the 44x performance counters in some meaningful way?  All
 the experiments point to the fetching being slower in the full program
 as opposed to the algo in a testbench, so I want to determine what it is
 that could cause that.

Does it have any useful performance counters ? I didn't think it did but
I may be mistaken.

No, it doesn't.

josh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] irqbalance, powerpc: add IRQs without settable SMP affinity to banned list

2010-09-24 Thread Neil Horman
On Fri, Sep 24, 2010 at 04:56:34PM +1000, Michael Neuling wrote:
 
size_t size =3D 0;
FILE *file;
sprintf(buf, /proc/irq/%i/smp_affinity, 
number);
-   file =3D fopen(buf, r);
+   file =3D fopen(buf, r+);
if (!file)
continue;
if (getline(line, size, file)=3D=3D0) {
@@ -89,7 +89,14 @@
continue;
}
cpumask_parse_user(line, strlen(line), 
irq-mask);
-   fclose(file);
+   /*
+* Check that we can write the affinity, if
+* not take it out of the list.
+*/
+   if (fputs(line, file) =3D=3D EOF)
+   can_set =3D 0;
  
   This is maybe a nit, but writing to the affinity file can fail for a few
   different reasons, some of them permanent, some transient.  For instance,=
   if
   we're in a memory constrained condition temporarily irq_affinity_proc_wri=
  te
   might return -ENOMEM. =20
  
  Yeah true, usually followed shortly by your kernel going so far into
  swap you never get it back, or OOMing, but I guess it's possible.
  
   Might it be better to modify this code so that, instead
   of using fputs to merge the various errors into an EOF, we use some other=
   write
   method that lets us better determine the error and selectively ban the in=
  terrupt
   only for those errors which we consider permanent?
  
  Yep. It seems fputs() gives you know way to get the actual error from
  write(), so it looks we'll need to switch to open/write, but that's
  probably not so terrible.
 
 fclose inherits the error from fputs and it sets errno correctly.  Below
 uses this to catch only EIO errors and mark them for the banned list.
 
 Mikey
 
 irqbalance, powerpc: add IRQs without settable SMP affinity to banned list
 
 On pseries powerpc, IPIs are registered with an IRQ number so
 /proc/interrupts looks like this on a 2 core/2 thread machine:
 
CPU0   CPU1   CPU2   CPU3
  16:316428232905141138794 983121   XICS Level 
IPI
  18:2605674  0 304994  0   XICS Level 
lan0
  30: 400057  0 169209  0   XICS Level 
ibmvscsi
 LOC: 133734  77250 106425  91951   Local timer interrupts
 SPU:  0  0  0  0   Spurious interrupts
 CNT:  0  0  0  0   Performance monitoring 
 interrupts
 MCE:  0  0  0  0   Machine check exceptions
 
 Unfortunately this means irqbalance attempts to set the affinity of IPIs
 which is not possible.  So in the above case, when irqbalance is in
 performance mode due to heavy IPI, lan0 and ibmvscsi activity, it
 sometimes attempts to put the IPIs on one core (CPU01) and lan0 and
 ibmvscsi on the other core (CPU23).  This is suboptimal as we want lan0
 and ibmvscsi to be on separate cores and IPIs to be ignored.
 
 When irqblance attempts writes to the IPI smp_affinity (ie.
 /proc/irq/16/smp_affinity in the above example) it fails with an EIO but
 irqbalance currently ignores this.
 
 This patch catches these write fails and in this case adds that IRQ
 number to the banned IRQ list.  This will catch the above IPI case and
 any other IRQ where the SMP affinity can't be set.
 
 Tested on POWER6, POWER7 and x86.
 
 Signed-off-by: Michael Neuling mi...@neuling.org
  
 Index: irqbalance/irqlist.c
 ===
 --- irqbalance.orig/irqlist.c
 +++ irqbalance/irqlist.c
 @@ -28,6 +28,7 @@
  #include unistd.h
  #include sys/types.h
  #include dirent.h
 +#include errno.h
  
  #include types.h
  #include irqbalance.h
 @@ -67,7 +68,7 @@
   DIR *dir;
   struct dirent *entry;
   char *c, *c2;
 - int nr , count = 0;
 + int nr , count = 0, can_set = 1;
   char buf[PATH_MAX];
   sprintf(buf, /proc/irq/%i, number);
   dir = opendir(buf);
 @@ -80,7 +81,7 @@
   size_t size = 0;
   FILE *file;
   sprintf(buf, /proc/irq/%i/smp_affinity, number);
 - file = fopen(buf, r);
 + file = fopen(buf, r+);
   if (!file)
   continue;
   if (getline(line, size, file)==0) {
 @@ -89,7 +90,13 @@
   continue;
   }
   cpumask_parse_user(line, strlen(line), irq-mask);
 - fclose(file);
 + /*
 +  * Check that we can write the 

Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop

2010-09-24 Thread Mikael Pettersson
Jean Delvare writes:
  On Thu, 23 Sep 2010 15:05:59 -0700, Randy Dunlap wrote:
   On Thu, 23 Sep 2010 22:16:32 +0200 Mikael Pettersson wrote:
Randy Dunlap writes:
  No kconfig warnings?

Not that I recall.  I can check tomorrow if necessary.
   
   No kconfig warnings.  I checked with your .config file.
   
  Please post your full .config file.
   
   Just a matter of module i2c-core calls of_ functions and module of_i2c 
   calls
   i2c_ functions.  Hmph.  Something for Grant, Jean, and Ben to work out.
  
  As far as I can see this is caused by this commit from Grant:
  
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5
  
  Mikael, can you please try reverting this patch and see if it solves
  your problem?

Yes, reverting the above commit from 2.6.36-rc5 eliminated the warnings,
and I was able to insmod the i2c-{core,dev,powermac}.ko modules.

/Mikael
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: MPC8641D PEX: programming OWBAR in Endpoint mode?

2010-09-24 Thread David Hagood
On Fri, 2010-09-24 at 07:09 +0200, Chen, Tiejun wrote:

 
 Right but this should be done for RC mode, not for EP mode we're
 discussing.
 
 Tiejun

According to the Freescale documentation, outbound is just as valid for
endpoint as for root complex - indeed, to generate MSIs from software
REQUIRES programming an outbound ATMU to access the host's APIC.

Moreover, ANY PCI endpoint SHOULD be able to do bus master access, and
that is done by the outbound ATMUs.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.

2010-09-24 Thread Alan Cox
 However, if the clock selected by the BMC is switched off, loses its 
 network connection..., the second best clock is selected by the BMC and 
 becomes master. This clock may be less accurate and thus our slave clock 
 has to switch from one notion of time to another. Is that the conflict 
 you mentioned?

No you get situations where you have policy reasons for trusting
particular clocks for particular things.

So you may have a PTP or NTP clock providing basic system time but also
have other PTP clocks that are actually being used for synchronization
work.

With NTP it's not so far been a big issue - NTP isn't used for industrial
high precision control and the cases we end up with multiple NTP clocks
it's on a virtualised systems where it is isolated.

With high precision clocks you sometimes want to honour a specific PTP
time source and use it rather than try and merge it with your other time
sources (which may differ from the equipment elsewhere). What matters is
things like all the parts of a several mile long conveyor belt of hot
steel slab stopping at the same moment [1].

In lots of control applications you've got assorted different time planes
which wish to talk their own time and you have to accept it, so we need
to support that kind of use.

I agree entirely the normal boring 'I installed my distro and..' case for
PTP or for NTP is merging all the sources, running the algorithm and using
the system time for it. Likewise almost all normal application code
will be watching system time.

Alan
[1] Which was my first encounter with writing Vax/VMS assembly language
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop

2010-09-24 Thread Jean Delvare
Hi Mikael,

On Fri, 24 Sep 2010 12:50:01 +0200, Mikael Pettersson wrote:
 Jean Delvare writes:
   As far as I can see this is caused by this commit from Grant:
   
   
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5
   
   Mikael, can you please try reverting this patch and see if it solves
   your problem?
 
 Yes, reverting the above commit from 2.6.36-rc5 eliminated the warnings,
 and I was able to insmod the i2c-{core,dev,powermac}.ko modules.

Thanks for testing and reporting. Grant, unless you come up with a fix
very quickly, I'll have to revert
959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5 for 2.6.36.

-- 
Jean Delvare
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: ppc44x - how do i optimize driver for tlb hits

2010-09-24 Thread Ayman El-Khashab
On Fri, Sep 24, 2010 at 06:30:34AM -0400, Josh Boyer wrote:
 On Fri, Sep 24, 2010 at 02:43:52PM +1000, Benjamin Herrenschmidt wrote:
  The DMA is what I use in the real world case to get data into and out 
  of these buffers.  However, I can disable the DMA completely and do only
  the kmalloc.  In this case I still see the same poor performance.  My
  prefetching is part of my algo using the dcbt instructions.  I know the
  instructions are effective b/c without them the algo is much less 
  performant.  So yes, my prefetches are explicit.
 
 Could be some effect of the cache structure, L2 cache, cache geometry
 (number of ways etc...). You might be able to alleviate that by changing
 the stride of your prefetch.

My original theory was that it was having lots of cache misses.  But since
the algorithm works standalone fast and uses large enough buffers (4MB),
much of the cache is flushed and replaced with my data.  The cache is 32K,
8 way, 32b/line.  I've crafted the algorithm to use those parameters.

 
 Unfortunately, I'm not familiar enough with the 440 micro architecture
 and its caches to be able to help you much here.
 
 Also, doesn't kmalloc have a limit to the size of the request it will
 let you allocate?  I know in the distant past you could allocate 128K
 with kmalloc, and 2M with an explicit call to get_free_pages.  Anything
 larger than that had to use vmalloc.  The limit might indeed be higher
 now, but a 4MB kmalloc buffer sounds very large, given that it would be
 contiguous pages.  Two of them even less so.

I thought so too, but at least in the current implementation we found
empirically that we could kmalloc up to but no more than 4MB.  We have 
also tried an approach in user memory and then using get_user_pages
and building a scatter-gather.  We found that the compare code doesn't 
perform any better. 

I suppose another option is to to use the kernel profiling option I 
always see but have never used.  Is that a viable option to figure out
what is happening here?  

ayman
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-24 Thread Richard Cochran
On Thu, Sep 23, 2010 at 09:36:54PM +0100, Alan Cox wrote:
 Drop the clockid_t and swap it for a file handle like a proper Unix or
 Linux interface. The rest is much the same
 
   fd = open /sys/class/timesource/[whatever]
 
   various queries you may want to do to check the name etc
 
   fclock_adjtime(fd, ...)

Okay, but lets extend the story:

clock_getttime(fd, ...);

clock_settime(fd, ...);

timer_create(fd, ...);

Can you agree to that as well?

(We would need to ensure that 'fd' avoids the range 0 to MAX_CLOCKS).

Richard
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-24 Thread Alan Cox
On Fri, 24 Sep 2010 15:14:07 +0200
Richard Cochran richardcoch...@gmail.com wrote:

 On Thu, Sep 23, 2010 at 09:36:54PM +0100, Alan Cox wrote:
  Drop the clockid_t and swap it for a file handle like a proper Unix or
  Linux interface. The rest is much the same
  
  fd = open /sys/class/timesource/[whatever]
  
  various queries you may want to do to check the name etc
  
  fclock_adjtime(fd, ...)
 
 Okay, but lets extend the story:
 
   clock_getttime(fd, ...);
 
   clock_settime(fd, ...);
 
   timer_create(fd, ...);
 
 Can you agree to that as well?
 
 (We would need to ensure that 'fd' avoids the range 0 to MAX_CLOCKS).

You can't do that avoiding as you might like because the behaviour of
file handle numbering is defined by the standards. Hence the f*
versions of the calls (and of lots of other stuff)

Whether you add new syscalls or do the fd passing using flags and hide
the ugly bits in glibc is another question.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-24 Thread Alan Cox
 You can't do that avoiding as you might like because the behaviour of
 file handle numbering is defined by the standards. Hence the f*
 versions of the calls (and of lots of other stuff)
 
 Whether you add new syscalls or do the fd passing using flags and hide
 the ugly bits in glibc is another question.

To add an example of what I mean you might end up defining CLOCK_FD to
indicate to use the fd in the struct, but given syscalls are trivial
codewise and would end up as

fclock_foo(int fd, blah)
{
clock = fd_to_clock(fd);
if (error)
return error
clock_do_foo(clock, blah);
clock_put(clock);
}

and

clock_foo(int posixid, blah)
{
clock = posix_to_clock(posixid)
...
rest same
}

as wrappers it seems hardly worth adding ugly hacks
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop

2010-09-24 Thread Grant Likely


Jean Delvare kh...@linux-fr.org wrote:

Hi Mikael,

On Fri, 24 Sep 2010 12:50:01 +0200, Mikael Pettersson wrote:
 Jean Delvare writes:
   As far as I can see this is caused by this commit from Grant:
   
   
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5
   
   Mikael, can you please try reverting this patch and see if it solves
   your problem?
 
 Yes, reverting the above commit from 2.6.36-rc5 eliminated the warnings,
 and I was able to insmod the i2c-{core,dev,powermac}.ko modules.

Thanks for testing and reporting. Grant, unless you come up with a fix
very quickly, I'll have to revert
959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5 for 2.6.36.

I'll get a fix out today.
g.


-- 
Jean Delvare

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-24 Thread Richard Cochran
On Thu, Sep 23, 2010 at 12:38:53PM -0700, john stultz wrote:
 On Thu, 2010-09-23 at 19:30 +0200, Richard Cochran wrote:
  /sys/class/timesource/name/id
  /sys/class/ptp/ptp_clock_X/id
  
 So yea, I'm not a fan of the timesource sysfs interface. One, I think
 the name is poor (posix_clocks or something a little more specific would
 be an improvement), and second, I don't like the dictionary interface,
 where one looks up the clock by name.
 
 Instead, I think having the id hanging off the class driver is much
 better, as it allows mapping the actual hardware to the id more clearly.
 
 So I'd drop the timesource listing. And maybe change id to
 clock_id so its a little more clear what the id is for.

Okay, I will drop /sys/class/timesource (hope Alan Cox agrees :)

I threw it out there mostly for the sake of discussion. I imagined
that there could be other properties in that directory, like time
scale (TAI, UTC, etc). But it seems like we don't really need anything
in that direction.

  3.3 Synchronizing the Linux System Time 
  
  
 One could offer a PHC as a combined clock source and clock event
 device. The advantage of this approach would be that it obviates
 the need for synchronization when the PHC is selected as the system
 timer. However, some PHCs, namely the PHY based clocks, cannot be
 used in this way.
 
 Again, I'd scratch this. 

Okay, I only wanted to preempt the question which people are asking
all the time: why can't it work with the system clock transparently?

 Instead, the patch set provides a way to offer a Pulse Per Second
 (PPS) event from the PHC to the Linux PPS subsystem. A user space
 application can read the PPS events and tune the system clock, just
 like when using other external time sources like radio clocks or
 GPS.
 
 Forgive me for a bit of a tangent here:
   So while I think this PPS method is a neat idea, I'm a little curious
 how much of a difference the PPS method for syncing the clock would be
 over just a simple reading of the two clocks and correcting the offset.
 
 It seems much of it depends on the read latency of the PTP hardware vs
 the interrupt latency. Also the PTP clock granularity would effect the
 read accuracy (like on the RTC, you don't really know how close to the
 second boundary you are).
 
 Have you done any such measurements between the two methods?

I have not yet tested how well the PPS method works, but I expect at
least as good results as when using a GPS.

 I just
 wonder if it would actually be something noticeable, and if its not, how
 much lighter this patch-set would be without the PPS connection.

As you say, the problem with just reading two clocks at nearly the
same time is that you have two uncertain operations. If you use a PPS,
then there is only one clock to read, and that clock is the system
clock, which hopefully is not too slow to read!

In addition, PHY reads can sleep, and that surely won't work. Even with
MAC PHCs, reading outside of interrupt context makes you vulnerable to
other interrupts.

 Again, this isn't super critical, just trying to make sure we don't end
 up adding a bunch of code that doesn't end up being used.

The PPS hooks are really only just a few lines of code.

The great advantage of a PPS approach over and ad-hoc read two clocks
and compare, is that, with a steady, known sample rate, you can
analyze and predict your control loop behavior. There is lots of
literature available on how to do it. IMHO, that is the big weakness
of the timecompare.c stuff used in the current IGB driver.

 Also PPS
 interrupts are awfully frequent, so systems concerned with power-saving
 and deep idles probably would like something that could be done at a
 more coarse interval.

We could always make the pulse rate programmable, for power-saving
applications.

  4.1 Supported Hardware Clocks 
  ==
  
 + Standard Linux system timer
   This driver exports the standard Linux timer as a PTP clock.
   Although this duplicates CLOCK_REALTIME, the code serves as a
   simple example for driver development and lets people who without
   special hardware try the new API.
 
 Still not a fan of this one, figure the app should handle the special
 case where there are no PTP clocks and just use CLOCK_REALTIME rather
 then funneling CLOCK_REALTIME through the PTP interface.

It is really just as an example and for people who want to test driver
the API. It can surely be removed before the final version...

Thanks for your comments,

Richard
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-24 Thread Alan Cox
  Instead, I think having the id hanging off the class driver is much
  better, as it allows mapping the actual hardware to the id more clearly.
  
  So I'd drop the timesource listing. And maybe change id to
  clock_id so its a little more clear what the id is for.
 
 Okay, I will drop /sys/class/timesource (hope Alan Cox agrees :)

It makes sense to hang anything off the physical id

 I threw it out there mostly for the sake of discussion. I imagined
 that there could be other properties in that directory, like time
 scale (TAI, UTC, etc). But it seems like we don't really need anything
 in that direction.

They can still hang off the physical device. Thats really a detail

  interrupts are awfully frequent, so systems concerned with power-saving
  and deep idles probably would like something that could be done at a
  more coarse interval.
 
 We could always make the pulse rate programmable, for power-saving
 applications.

I would expect the kernel drivers to be responsible for
- Turning off when they can
- Picking rates that are power optimal for the requirement

The latter is a bit interesting as I don't see anything in any of the
timer APIs to express accuracy (a problem we have in kernel too).
Historically it simply hasn't mattered.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] De-couple sysfs memory directories from memory sections

2010-09-24 Thread Nathan Fontenot
On 09/23/2010 01:40 PM, Balbir Singh wrote:
 * Nathan Fontenot nf...@austin.ibm.com [2010-09-22 09:15:43]:
 
 This set of patches decouples the concept that a single memory
 section corresponds to a single directory in 
 /sys/devices/system/memory/.  On systems
 with large amounts of memory (1+ TB) there are performance issues
 related to creating the large number of sysfs directories.  For
 a powerpc machine with 1 TB of memory we are creating 63,000+
 directories.  This is resulting in boot times of around 45-50
 minutes for systems with 1 TB of memory and 8 hours for systems
 with 2 TB of memory.  With this patch set applied I am now seeing
 boot times of 5 minutes or less.

 The root of this issue is in sysfs directory creation. Every time
 a directory is created a string compare is done against all sibling
 directories to ensure we do not create duplicates.  The list of
 directory nodes in sysfs is kept as an unsorted list which results
 in this being an exponentially longer operation as the number of
 directories are created.

 The solution solved by this patch set is to allow a single
 directory in sysfs to span multiple memory sections.  This is
 controlled by an optional architecturally defined function
 memory_block_size_bytes().  The default definition of this
 routine returns a memory block size equal to the memory section
 size. This maintains the current layout of sysfs memory
 directories as it appears to userspace to remain the same as it
 is today.

 For architectures that define their own version of this routine,
 as is done for powerpc in this patchset, the view in userspace
 would change such that each memoryXXX directory would span
 multiple memory sections.  The number of sections spanned would
 depend on the value reported by memory_block_size_bytes.

 In both cases a new file 'end_phys_index' is created in each
 memoryXXX directory.  This file will contain the physical id
 of the last memory section covered by the sysfs directory.  For
 the default case, the value in 'end_phys_index' will be the same
 as in the existing 'phys_index' file.

 
 What does this mean for memory hotplug or hotunplug? 
 

Memory hotplug will function on a memory block size basis.  For
architectures that do not define their own memory_block_size_bytes()
routine, they will get the default size and everything will work
the same as it does today.

For architectures that define their own memory_block_size_bytes()
routine and have multiple memory sections per memory block, hotplug
operations will add or remove all of the memory sections in the memory
memory block.

-Nathan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM pathfor PTE_64BIT

2010-09-24 Thread Scott Wood
On Fri, 24 Sep 2010 07:04:28 +0200
Chen, Tiejun tiejun.c...@windriver.com wrote:

  -Original Message-
  From: 
  linuxppc-dev-bounces+tiejun.chen=windriver@lists.ozlabs.or
  g 
  [mailto:linuxppc-dev-bounces+tiejun.chen=windriver@lists.o
  zlabs.org] On Behalf Of Benjamin Herrenschmidt
  Sent: Friday, September 24, 2010 5:59 AM
  To: Scott Wood
  Cc: Gortmaker, Paul; linuxppc-dev@lists.ozlabs.org
  Subject: Re: [PATCH] powerpc: Fix invalid page flags in 
  create TLB CAM pathfor PTE_64BIT
  
  On Thu, 2010-09-23 at 15:33 -0500, Scott Wood wrote:
   I don't see a generic accessor that can test PTE flags for 
  user access 
   -- in the absence of one, I guess we need an ifdef here.  
  Or at least 
   put in a comment so anyone who adds a userspace use knows 
  they need to 
   fix it.
  
  We could make up one in powerpc arch at least
  
  #define pte_user(val) ((val  _PAGE_USER) == _PAGE_USER)
  
 
 Looks good. 
 
 Ben and Scott,
 
 But for the patched issue we're discussing we have to do #ifdef that as
 my original modification. Right? Or do you have other suggestion? Then I
 can improve that as v2.

Ben's version should work without any ifdef, since it makes sure
all bits of _PAGE_USER are set.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] spi_mpc8xxx: issue with using definition of pram in Device Tree

2010-09-24 Thread Scott Wood
On Fri, 24 Sep 2010 01:10:06 -0600
Grant Likely grant.lik...@secretlab.ca wrote:

 On Thu, Sep 16, 2010 at 09:05:03AM +0200, christophe leroy wrote:
  This patch applies to 2.6.34.7 and 2.6.35.4
  It fixes an issue during the probe for CPM1 with definition of parameter 
  ram from DTS
  
  Signed-off-by: christophe leroy christophe.le...@c-s.fr
 
 I'm sorry, I don't understand the fix from the given description.
 What is the problem, and why is cpm_muram_alloc_fixed() the wrong
 thing to call on CPM1?  Does CPM2 still need it?

I don't see how cpm_muram_alloc_fixed() can be used safely at all.  If
you need a fixed address, it shouldn't be part of the general
allocation pool, or something else might get it first.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] spi_mpc8xxx: issue with using definition of pram in Device Tree

2010-09-24 Thread Scott Wood
On Fri, 24 Sep 2010 11:57:40 +0400
Anton Vorontsov cbouatmai...@gmail.com wrote:

 Doesn't explain why that worked on MPC8272 (CPM2) and MPC8560
 (also CPM2) machines though. But here's my guess (I no longer
 have these boards to test it):
 
 On 8272 I used this node:
 
 +   s...@4c0 {
 +   #address-cells = 1;
 +   #size-cells = 0;
 +   compatible = fsl,cpm2-spi, fsl,spi;
 +   reg = 0x11a80 0x40 0x89fc 0x2;
 
 On that SOC there are two muram data regions 0x0..0x2000 and
 0x9000..0x9100. Note that we actually don't want data regions,
 and the only reason why that worked is that sysdev/cpm_common.c
 maps muram(0)..muram(max).

Wouldn't it still fail the rh_alloc_fixed call?

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT

2010-09-24 Thread Paul Gortmaker
[Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM path for 
PTE_64BIT] On 24/09/2010 (Fri 07:59) Benjamin Herrenschmidt wrote:

 On Thu, 2010-09-23 at 15:33 -0500, Scott Wood wrote:
  I don't see a generic accessor that can test PTE flags for user
  access -- in the absence of one, I guess we need an ifdef here.  Or at
  least put in a comment so anyone who adds a userspace use knows they
  need to fix it. 
 
 We could make up one in powerpc arch at least
 
 #define pte_user(val) ((val  _PAGE_USER) == _PAGE_USER)
 
 would do

I've put the above into pte-common.h, restored the deleted code block
which now uses pte_user() and I've updated the commit header to match.
Passes sanity boot test on an sbc8548 both with and without PTE_64BIT.

Thanks for the feedback.
Paul.


From d48ebb58b8214f9faec775a5e06902f638f165cf Mon Sep 17 00:00:00 2001
From: Tiejun Chen tiejun.c...@windriver.com
Date: Tue, 21 Sep 2010 19:31:31 +0800
Subject: [PATCH] powerpc: Fix invalid page flags in create TLB CAM path for 
PTE_64BIT

There exists a four line chunk of code, which when configured for
64 bit address space, can incorrectly set certain page flags during
the TLB creation.  It turns out that this is code which isn't used,
but might still serve a purpose.  Since it isn't obvious why it exists
or why it causes problems, the below description covers both in detail.

For powerpc bootstrap, the physical memory (at most 768M), is mapped
into the kernel space via the following path:

MMU_init()
|
+ adjust_total_lowmem()
|
+ map_mem_in_cams()
|
+ settlbcam(i, virt, phys, cam_sz, PAGE_KERNEL_X, 0);

On settlbcam(), the kernel will create TLB entries according to the flag,
PAGE_KERNEL_X.

settlbcam()
{
...
TLBCAM[index].MAS1 = MAS1_VALID
| MAS1_IPROT | MAS1_TSIZE(tsize) | MAS1_TID(pid);
^
These entries cannot be invalidated by the
kernel since MAS1_IPROT is set on TLB property.
...
if (flags  _PAGE_USER) {
   TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR;
   TLBCAM[index].MAS3 |= ((flags  _PAGE_RW) ? MAS3_UW : 0);
}

For classic BookE (flags  _PAGE_USER) is 'zero' so it's fine.
But on boards like the the Freescale P4080, we want to support 36-bit
physical address on it. So the following options may be set:

CONFIG_FSL_BOOKE=y
CONFIG_PTE_64BIT=y
CONFIG_PHYS_64BIT=y

As a result, boards like the P4080 will introduce PTE format as Book3E.
As per the file: arch/powerpc/include/asm/pgtable-ppc32.h

  * #elif defined(CONFIG_FSL_BOOKE)  defined(CONFIG_PTE_64BIT)
  * #include asm/pte-book3e.h

So PAGE_KERNEL_X is __pgprot(_PAGE_BASE | _PAGE_KERNEL_RWX) and the
book3E version of _PAGE_KERNEL_RWX is defined with:

  (_PAGE_BAP_SW | _PAGE_BAP_SR | _PAGE_DIRTY | _PAGE_BAP_SX)

Note the _PAGE_BAP_SR, which is also defined in the book3E _PAGE_USER:

  #define _PAGE_USER(_PAGE_BAP_UR | _PAGE_BAP_SR) /* Can be read */

So the possibility exists to wrongly assign the user MAS3_URWX bits
to kernel (PAGE_KERNEL_X) address space via the following code fragment:

if (flags  _PAGE_USER) {
   TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR;
   TLBCAM[index].MAS3 |= ((flags  _PAGE_RW) ? MAS3_UW : 0);
}

Here is a dump of the TLB info from Simics with the above code present:
--
L2 TLB1
GT   SSS UUU V I
 Row  Logical   PhysicalSS TLPID  TID  WIMGE XWR XWR F P   V
- - --- -- - - - --- --- - -   -
  0   c000-cfff 0-00fff 00 0 0   M   XWR XWR 0 1   1
  1   d000-dfff 01000-01fff 00 0 0   M   XWR XWR 0 1   1
  2   e000-efff 02000-02fff 00 0 0   M   XWR XWR 0 1   1

Actually this conditional code was used for two legacy functions:

  1: support KGDB to set break point.
 KGDB already dropped this; now uses its core write to set break point.

  2: io_block_mapping() to create TLB in segmentation size (not PAGE_SIZE)
 for device IO space.
 This use case is also removed from the latest PowerPC kernel.

However, there may still be a use case for it in the future, like
large user pages, so we can't remove it entirely.  As an alternative,
we match on all bits of _PAGE_USER instead of just any bits, so the
case where just _PAGE_BAP_SR is set can't sneak through.

With this done, the TLB appears without U having XWR as below:

---
L2 TLB1
GT   SSS UUU V I
 Row  Logical   PhysicalSS TLPID  TID  WIMGE XWR XWR F P   V
- - --- -- - - - --- --- - -   -
  0   c000-cfff 0-00fff 00 0 0   M   XWR 0 1   1
  1   d000-dfff 

Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.

2010-09-24 Thread john stultz
On Fri, 2010-09-24 at 09:29 +0200, Richard Cochran wrote:
 On Thu, Sep 23, 2010 at 12:48:51PM -0700, john stultz wrote:
  So I'd still split this patch up a little bit more.
  
  1) Patch that implements the ADJ_SETOFFSET  (*and its implementation*)
  in do_adjtimex.
  
  2) Patch that adds the new syscall and clock_id multiplexing.
  
  3) Patches that wire it up to the rest of the architectures (there's
  still a bunch missing here).
 
 I was not sure what the policy is about adding syscalls. Is it the
 syscall author's responsibility to add it into every arch?
 
 The last time (see a2e2725541fad7) the commit only added half of some
 archs, and ignored others. In my patch, the syscall *really* works on
 the archs that are present in the patch.
 
 (Actually, I did not test blackfin, since I don't have one, but I
 included it since I know they have a PTP hardware clock.)

I'm not sure about policy, but I think for completeness sake you should
make sure every arch supports a new syscall. You're not expected to be
able to test every one, but getting the basic support patch sent to
maintainers should be done.

   +static inline int common_clock_adj(const clockid_t which_clock, struct 
   timex *t)
   +{
   + if (CLOCK_REALTIME == which_clock)
   + return do_adjtimex(t);
   + else
   + return -EOPNOTSUPP;
   +}
  
  
  Would it make sense to point to the do_adjtimex() in the k_clock
  definition for CLOCK_REALTIME rather then conditionalizing it here?
 
 But what about CLOCK_MONOTONIC_RAW, for example?

-EOPNOTSUPP

 Does it make sense to allow it to be adjusted?

No. I think only CLOCK_REALTIME would make sense of the existing clocks.

I'm just suggesting you conditionalize it from the function pointer,
rather then in the common function.

thanks
-john


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 0/2] powerpc/47x TLB optimization patches

2010-09-24 Thread Dave Kleikamp
These two patches reduce the frequency that the tlb caches are flushed in
hardware.  Both the normal tlb cache and the shadow tlb cache, which
separates the tlbs for data and instruction access (dTLB and iTLB).

Dave Kleikamp (2):
  476: Set CCR2[DSTI] to prevent isync from flushing shadow TLB
  ppc: lazy flush_tlb_mm for nohash architectures

 arch/powerpc/include/asm/reg_booke.h  |4 +
 arch/powerpc/kernel/head_44x.S|   25 ++
 arch/powerpc/mm/mmu_context_nohash.c  |  154 ++---
 arch/powerpc/mm/mmu_decl.h|8 ++
 arch/powerpc/mm/tlb_nohash.c  |   28 +-
 arch/powerpc/mm/tlb_nohash_low.S  |   14 +++-
 arch/powerpc/platforms/44x/Kconfig|7 ++
 arch/powerpc/platforms/44x/misc_44x.S |   26 ++
 8 files changed, 249 insertions(+), 17 deletions(-)

-- 
1.7.2.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/2] 476: Set CCR2[DSTI] to prevent isync from flushing shadow TLB

2010-09-24 Thread Dave Kleikamp
When the DSTI (Disable Shadow TLB Invalidate) bit is set in the CCR2
register, the isync command does not flush the shadow TLB (iTLB  dTLB).

However, since the shadow TLB does not contain context information, we
want the shadow TLB flushed in situations where we are switching context.
In those situations, we explicitly clear the DSTI bit before performing
isync, and set it again afterward.  We also need to do the same when we
perform isync after explicitly flushing the TLB.

Th setting of the DSTI bit is dependent on
CONFIG_PPC_47x_DISABLE_SHADOW_TLB_INVALIDATE.  When we are confident that
the feature works as expected, the option can probably be removed.

Signed-off-by: Dave Kleikamp sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/reg_booke.h  |4 
 arch/powerpc/kernel/head_44x.S|   25 +
 arch/powerpc/mm/tlb_nohash_low.S  |   14 +-
 arch/powerpc/platforms/44x/Kconfig|7 +++
 arch/powerpc/platforms/44x/misc_44x.S |   26 ++
 5 files changed, 75 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/reg_booke.h 
b/arch/powerpc/include/asm/reg_booke.h
index 667a498..a7ecbfe 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -120,6 +120,7 @@
 #define SPRN_TLB3CFG   0x2B3   /* TLB 3 Config Register */
 #define SPRN_EPR   0x2BE   /* External Proxy Register */
 #define SPRN_CCR1  0x378   /* Core Configuration Register 1 */
+#define SPRN_CCR2_476  0x379   /* Core Configuration Register 2 (476)*/
 #define SPRN_ZPR   0x3B0   /* Zone Protection Register (40x) */
 #define SPRN_MAS7  0x3B0   /* MMU Assist Register 7 */
 #define SPRN_MMUCR 0x3B2   /* MMU Control Register */
@@ -188,6 +189,9 @@
 #defineCCR1_DPC0x0100 /* Disable L1 I-Cache/D-Cache parity 
checking */
 #defineCCR1_TCS0x0080 /* Timer Clock Select */
 
+/* Bit definitions for CCR2. */
+#define CCR2_476_DSTI  0x0800 /* Disable Shadow TLB Invalidate */
+
 /* Bit definitions for the MCSR. */
 #define MCSR_MCS   0x8000 /* Machine Check Summary */
 #define MCSR_IB0x4000 /* Instruction PLB Error */
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 562305b..0c1b118 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -703,8 +703,23 @@ _GLOBAL(set_context)
stw r4, 0x4(r5)
 #endif
mtspr   SPRN_PID,r3
+BEGIN_MMU_FTR_SECTION
+   b   1f
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_47x)
isync   /* Force context change */
blr
+1:
+#ifdef CONFIG_PPC_47x
+   mfspr   r10,SPRN_CCR2_476
+   rlwinm  r11,r10,0,~CCR2_476_DSTI
+   mtspr   SPRN_CCR2_476,r11
+   isync   /* Force context change */
+   mtspr   SPRN_CCR2_476,r10
+#else /* CONFIG_PPC_47x */
+2: trap
+   EMIT_BUG_ENTRY 2b,__FILE__,__LINE__,0;
+#endif /* CONFIG_PPC_47x */
+   blr
 
 /*
  * Init CPU state. This is called at boot time or for secondary CPUs
@@ -861,6 +876,16 @@ skpinv:addir4,r4,1 /* 
Increment */
isync
 #endif /* CONFIG_PPC_EARLY_DEBUG_44x */
 
+   mfspr   r3,SPRN_CCR2_476
+#ifdef CONFIG_PPC_47x_DISABLE_SHADOW_TLB_INVALIDATE
+   /* With CCR2(DSTI) set, isync does not invalidate the shadow TLB */
+   orisr3,r3,ccr2_476_d...@h
+#else
+   rlwinm  r3,r3,0,~CCR2_476_DSTI
+#endif
+   mtspr   SPRN_CCR2_476,r3
+   isync
+
/* Establish the interrupt vector offsets */
SET_IVOR(0,  CriticalInput);
SET_IVOR(1,  MachineCheck);
diff --git a/arch/powerpc/mm/tlb_nohash_low.S b/arch/powerpc/mm/tlb_nohash_low.S
index b9d9fed..f28fb52 100644
--- a/arch/powerpc/mm/tlb_nohash_low.S
+++ b/arch/powerpc/mm/tlb_nohash_low.S
@@ -112,7 +112,11 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_47x)
clrrwi  r4,r3,12/* get an EPN for the hashing with V = 0 */
ori r4,r4,PPC47x_TLBE_SIZE
tlbwe   r4,r7,0 /* write it */
+   mfspr   r8,SPRN_CCR2_476
+   rlwinm  r9,r8,0,~CCR2_476_DSTI
+   mtspr   SPRN_CCR2_476,r9
isync
+   mtspr   SPRN_CCR2_476,r8
wrtee   r10
blr
 #else /* CONFIG_PPC_47x */
@@ -180,7 +184,11 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_47x)
lwz r8,0(r10)   /* Load boltmap entry */
addir10,r10,4   /* Next word */
b   1b  /* Then loop */
-1: isync   /* Sync shadows */
+1: mfspr   r9,SPRN_CCR2_476
+   rlwinm  r10,r9,0,~CCR2_476_DSTI
+   mtspr   SPRN_CCR2_476,r10
+   isync   /* Sync shadows */
+   mtspr   SPRN_CCR2_476,r9
wrtee   r11
 #else /* CONFIG_PPC_47x */
 1: trap
@@ -203,7 +211,11 @@ _GLOBAL(_tlbivax_bcast)
isync
 /* tlbivax 0,r3 - use .long to avoid binutils deps */
.long 0x7c000624 | (r3  11)
+   mfspr   

[PATCH 2/2] ppc: lazy flush_tlb_mm for nohash architectures

2010-09-24 Thread Dave Kleikamp
On PPC_MMU_NOHASH processors that support a large number of contexts,
implement a lazy flush_tlb_mm() that switches to a free context, marking
the old one stale.  The tlb is only flushed when no free contexts are
available.

The lazy tlb flushing is controlled by the global variable tlb_lazy_flush
which is set during init, dependent upon MMU_FTR_TYPE_47x.

Signed-off-by: Dave Kleikamp sha...@linux.vnet.ibm.com
---
 arch/powerpc/mm/mmu_context_nohash.c |  154 +++---
 arch/powerpc/mm/mmu_decl.h   |8 ++
 arch/powerpc/mm/tlb_nohash.c |   28 +-
 3 files changed, 174 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_nohash.c 
b/arch/powerpc/mm/mmu_context_nohash.c
index ddfd7ad..87c7dc2 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -17,10 +17,6 @@
  * TODO:
  *
  *   - The global context lock will not scale very well
- *   - The maps should be dynamically allocated to allow for processors
- * that support more PID bits at runtime
- *   - Implement flush_tlb_mm() by making the context stale and picking
- * a new one
  *   - More aggressively clear stale map bits and maybe find some way to
  * also clear mm-cpu_vm_mask bits when processes are migrated
  */
@@ -52,6 +48,8 @@
 #include asm/mmu_context.h
 #include asm/tlbflush.h
 
+#include mmu_decl.h
+
 static unsigned int first_context, last_context;
 static unsigned int next_context, nr_free_contexts;
 static unsigned long *context_map;
@@ -59,9 +57,31 @@ static unsigned long *stale_map[NR_CPUS];
 static struct mm_struct **context_mm;
 static DEFINE_RAW_SPINLOCK(context_lock);
 
+int tlb_lazy_flush;
+static int tlb_needs_flush[NR_CPUS];
+static unsigned long *context_available_map;
+static unsigned int nr_stale_contexts;
+
 #define CTX_MAP_SIZE   \
(sizeof(unsigned long) * (last_context / BITS_PER_LONG + 1))
 
+/*
+ * if another cpu recycled the stale contexts, we need to flush
+ * the local TLB, so that we may re-use those contexts
+ */
+void flush_recycled_contexts(int cpu)
+{
+   int i;
+
+   if (tlb_needs_flush[cpu]) {
+   pr_hard([%d] flushing tlb\n, cpu);
+   _tlbil_all();
+   for (i = cpu_first_thread_in_core(cpu);
+i = cpu_last_thread_in_core(cpu); i++) {
+   tlb_needs_flush[i] = 0;
+   }
+   }
+}
 
 /* Steal a context from a task that has one at the moment.
  *
@@ -147,7 +167,7 @@ static unsigned int steal_context_up(unsigned int id)
pr_hardcont( | steal %d from 0x%p, id, mm);
 
/* Flush the TLB for that context */
-   local_flush_tlb_mm(mm);
+   __local_flush_tlb_mm(mm);
 
/* Mark this mm has having no context anymore */
mm-context.id = MMU_NO_CONTEXT;
@@ -161,13 +181,19 @@ static unsigned int steal_context_up(unsigned int id)
 #ifdef DEBUG_MAP_CONSISTENCY
 static void context_check_map(void)
 {
-   unsigned int id, nrf, nact;
+   unsigned int id, nrf, nact, nstale;
 
-   nrf = nact = 0;
+   nrf = nact = nstale = 0;
for (id = first_context; id = last_context; id++) {
int used = test_bit(id, context_map);
-   if (!used)
-   nrf++;
+   int allocated = tlb_lazy_flush 
+   test_bit(id, context_available_map);
+   if (!used) {
+   if (allocated)
+   nstale++;
+   else
+   nrf++;
+   }
if (used != (context_mm[id] != NULL))
pr_err(MMU: Context %d is %s and MM is %p !\n,
   id, used ? used : free, context_mm[id]);
@@ -179,6 +205,11 @@ static void context_check_map(void)
   nr_free_contexts, nrf);
nr_free_contexts = nrf;
}
+   if (nstale != nr_stale_contexts) {
+   pr_err(MMU: Stale context count out of sync ! (%d vs %d)\n,
+  nr_stale_contexts, nstale);
+   nr_stale_contexts = nstale;
+   }
if (nact  num_online_cpus())
pr_err(MMU: More active contexts than CPUs ! (%d vs %d)\n,
   nact, num_online_cpus());
@@ -189,6 +220,38 @@ static void context_check_map(void)
 static void context_check_map(void) { }
 #endif
 
+/*
+ * On architectures that support a large number of contexts, the tlb
+ * can be flushed lazily by picking a new context and making the stale
+ * context unusable until a lazy tlb flush has been issued.
+ *
+ * context_available_map keeps track of both active and stale contexts,
+ * while context_map continues to track only active contexts.  When the
+ * lazy tlb flush is triggered, context_map is copied to
+ * context_available_map, making the once-stale contexts available again
+ */
+static void recycle_stale_contexts(void)
+{
+   if 

RE: [PATCH 1/2] PPC4xx: Generelizing drivers/dma/ppc4xx/adma.c

2010-09-24 Thread Tirumala Marri

 It would be really preferable to support all those platforms in a
 single Linux
 image. If technically possible, please try to move this direction.
It is do-able for couple of SoCs. Other SoC DMA engines are quite a bit
different.
Let me first do small steps first and slowly achieve some run time
Differentiation.

Thanks,
Marri
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH RFCv1 0/2] dma: add support for sg-to-sg transfers

2010-09-24 Thread Ira W. Snyder
This series adds support for scatterlist to scatterlist transfers to the
generic DMAEngine API. I have unconditionally enabled it when the fsldma
driver is used to make testing easier. This feature should probably be
selected by individual drivers.

This series is intended to lay the groundwork for further changes to the
series titled CARMA Board Support. That series will be updated when I
have time and hardware to test with.

This series has not been runtime tested yet. I am posting it only to
gain comments before I spend the effort to update the driver that
depends on this.

To help reviewers, I'd like to comment on the architecture of
dma_async_memcpy_sg_to_sg(). It explicitly avoids using descriptor
chaining due to the way that feature interacts with the fsldma
controller's external start feature. To use the external start feature
properly, the in-memory descriptor chain must not be fragmented into
multiple smaller chains. This is what is achieved by submitting all
descriptors without using chaining.

Ira W. Snyder (2):
  dmaengine: add support for scatterlist to scatterlist transfers
  fsldma: use generic support for scatterlist to scatterlist transfers

 arch/powerpc/include/asm/fsldma.h |  115 ++--
 drivers/dma/Kconfig   |4 +
 drivers/dma/dmaengine.c   |  119 
 drivers/dma/fsldma.c  |  219 +++--
 include/linux/dmaengine.h |   10 ++
 5 files changed, 181 insertions(+), 286 deletions(-)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers

2010-09-24 Thread Ira W. Snyder
This adds support for scatterlist to scatterlist DMA transfers. As
requested by Dan, this is hidden behind an ifdef so that it can be
selected by the drivers that need it.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/Kconfig   |4 ++
 drivers/dma/dmaengine.c   |  119 +
 include/linux/dmaengine.h |   10 
 3 files changed, 133 insertions(+), 0 deletions(-)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 9520cf0..f688669 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -89,10 +89,14 @@ config AT_HDMAC
  Support the Atmel AHB DMA controller.  This can be integrated in
  chips such as the Atmel AT91SAM9RL.
 
+config DMAENGINE_SG_TO_SG
+   bool
+
 config FSL_DMA
tristate Freescale Elo and Elo Plus DMA support
depends on FSL_SOC
select DMA_ENGINE
+   select DMAENGINE_SG_TO_SG
---help---
  Enable support for the Freescale Elo and Elo Plus DMA controllers.
  The Elo is the DMA controller on some 82xx and 83xx parts, and the
diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 9d31d5e..57ec1e5 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -972,10 +972,129 @@ dma_async_memcpy_pg_to_pg(struct dma_chan *chan, struct 
page *dest_pg,
 }
 EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
 
+#ifdef CONFIG_DMAENGINE_SG_TO_SG
+dma_cookie_t
+dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
+ struct scatterlist *dst_sg, unsigned int dst_nents,
+ struct scatterlist *src_sg, unsigned int src_nents,
+ dma_async_tx_callback cb, void *cb_param)
+{
+   struct dma_device *dev = chan-device;
+   struct dma_async_tx_descriptor *tx;
+   dma_cookie_t cookie = -ENOMEM;
+   size_t dst_avail, src_avail;
+   struct list_head tx_list;
+   size_t transferred = 0;
+   dma_addr_t dst, src;
+   size_t len;
+
+   if (dst_nents == 0 || src_nents == 0)
+   return -EINVAL;
+
+   if (dst_sg == NULL || src_sg == NULL)
+   return -EINVAL;
+
+   /* get prepared for the loop */
+   dst_avail = sg_dma_len(dst_sg);
+   src_avail = sg_dma_len(src_sg);
+
+   INIT_LIST_HEAD(tx_list);
+
+   /* run until we are out of descriptors */
+   while (true) {
+
+   /* create the largest transaction possible */
+   len = min_t(size_t, src_avail, dst_avail);
+   if (len == 0)
+   goto fetch;
+
+   dst = sg_dma_address(dst_sg) + sg_dma_len(dst_sg) - dst_avail;
+   src = sg_dma_address(src_sg) + sg_dma_len(src_sg) - src_avail;
+
+   /* setup the transaction */
+   tx = dev-device_prep_dma_memcpy(chan, dst, src, len, 0);
+   if (!tx) {
+   dev_err(dev-dev, failed to alloc desc for memcpy\n);
+   return -ENOMEM;
+   }
+
+   /* keep track of the tx for later */
+   list_add_tail(tx-entry, tx_list);
+
+   /* update metadata */
+   transferred += len;
+   dst_avail -= len;
+   src_avail -= len;
+
+fetch:
+   /* fetch the next dst scatterlist entry */
+   if (dst_avail == 0) {
+
+   /* no more entries: we're done */
+   if (dst_nents == 0)
+   break;
+
+   /* fetch the next entry: if there are no more: done */
+   dst_sg = sg_next(dst_sg);
+   if (dst_sg == NULL)
+   break;
+
+   dst_nents--;
+   dst_avail = sg_dma_len(dst_sg);
+   }
+
+   /* fetch the next src scatterlist entry */
+   if (src_avail == 0) {
+
+   /* no more entries: we're done */
+   if (src_nents == 0)
+   break;
+
+   /* fetch the next entry: if there are no more: done */
+   src_sg = sg_next(src_sg);
+   if (src_sg == NULL)
+   break;
+
+   src_nents--;
+   src_avail = sg_dma_len(src_sg);
+   }
+   }
+
+   /* loop through the list of descriptors and submit them */
+   list_for_each_entry(tx, tx_list, entry) {
+
+   /* this is the last descriptor: add the callback */
+   if (list_is_last(tx-entry, tx_list)) {
+   tx-callback = cb;
+   tx-callback_param = cb_param;
+   }
+
+   /* submit the transaction */
+   cookie = tx-tx_submit(tx);
+   if (dma_submit_error(cookie)) {
+   dev_err(dev-dev, failed to submit desc\n);

[PATCH RFCv1 2/2] fsldma: use generic support for scatterlist to scatterlist transfers

2010-09-24 Thread Ira W. Snyder
The fsldma driver uses the DMA_SLAVE API to handle scatterlist to
scatterlist DMA transfers. For quite a while now, it has been possible
to mimic the operation by using the device_prep_dma_memcpy() routine
intelligently.

Now that the DMAEngine API has grown generic support for scatterlist to
scatterlist transfers, this operation is no longer needed. The generic
support is used for scatterlist to scatterlist transfers.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 arch/powerpc/include/asm/fsldma.h |  115 ++--
 drivers/dma/fsldma.c  |  219 +++--
 2 files changed, 48 insertions(+), 286 deletions(-)

diff --git a/arch/powerpc/include/asm/fsldma.h 
b/arch/powerpc/include/asm/fsldma.h
index debc5ed..dc0bd27 100644
--- a/arch/powerpc/include/asm/fsldma.h
+++ b/arch/powerpc/include/asm/fsldma.h
@@ -1,7 +1,7 @@
 /*
  * Freescale MPC83XX / MPC85XX DMA Controller
  *
- * Copyright (c) 2009 Ira W. Snyder i...@ovro.caltech.edu
+ * Copyright (c) 2009-2010 Ira W. Snyder i...@ovro.caltech.edu
  *
  * This file is licensed under the terms of the GNU General Public License
  * version 2. This program is licensed as is without any warranty of any
@@ -11,127 +11,32 @@
 #ifndef __ARCH_POWERPC_ASM_FSLDMA_H__
 #define __ARCH_POWERPC_ASM_FSLDMA_H__
 
-#include linux/slab.h
 #include linux/dmaengine.h
 
 /*
- * Definitions for the Freescale DMA controller's DMA_SLAVE implemention
+ * The Freescale DMA controller has several features that are not accomodated
+ * in the Linux DMAEngine API. Therefore, the generic structure is expanded
+ * to allow drivers to use these features.
  *
- * The Freescale DMA_SLAVE implementation was designed to handle many-to-many
- * transfers. An example usage would be an accelerated copy between two
- * scatterlists. Another example use would be an accelerated copy from
- * multiple non-contiguous device buffers into a single scatterlist.
+ * This structure should be passed into the DMAEngine routine device_control()
+ * as in this example:
  *
- * A DMA_SLAVE transaction is defined by a struct fsl_dma_slave. This
- * structure contains a list of hardware addresses that should be copied
- * to/from the scatterlist passed into device_prep_slave_sg(). The structure
- * also has some fields to enable hardware-specific features.
+ * chan-device-device_control(chan, DMA_SLAVE_CONFIG, (unsigned long)cfg);
  */
 
 /**
- * struct fsl_dma_hw_addr
- * @entry: linked list entry
- * @address: the hardware address
- * @length: length to transfer
- *
- * Holds a single physical hardware address / length pair for use
- * with the DMAEngine DMA_SLAVE API.
- */
-struct fsl_dma_hw_addr {
-   struct list_head entry;
-
-   dma_addr_t address;
-   size_t length;
-};
-
-/**
  * struct fsl_dma_slave
- * @addresses: a linked list of struct fsl_dma_hw_addr structures
+ * @config: the standard Linux DMAEngine API DMA_SLAVE configuration
  * @request_count: value for DMA request count
- * @src_loop_size: setup and enable constant source-address DMA transfers
- * @dst_loop_size: setup and enable constant destination address DMA transfers
  * @external_start: enable externally started DMA transfers
  * @external_pause: enable externally paused DMA transfers
- *
- * Holds a list of address / length pairs for use with the DMAEngine
- * DMA_SLAVE API implementation for the Freescale DMA controller.
  */
-struct fsl_dma_slave {
+struct fsldma_slave_config {
+   struct dma_slave_config config;
 
-   /* List of hardware address/length pairs */
-   struct list_head addresses;
-
-   /* Support for extra controller features */
unsigned int request_count;
-   unsigned int src_loop_size;
-   unsigned int dst_loop_size;
bool external_start;
bool external_pause;
 };
 
-/**
- * fsl_dma_slave_append - add an address/length pair to a struct fsl_dma_slave
- * @slave: the struct fsl_dma_slave to add to
- * @address: the hardware address to add
- * @length: the length of bytes to transfer from @address
- *
- * Add a hardware address/length pair to a struct fsl_dma_slave. Returns 0 on
- * success, -ERRNO otherwise.
- */
-static inline int fsl_dma_slave_append(struct fsl_dma_slave *slave,
-  dma_addr_t address, size_t length)
-{
-   struct fsl_dma_hw_addr *addr;
-
-   addr = kzalloc(sizeof(*addr), GFP_ATOMIC);
-   if (!addr)
-   return -ENOMEM;
-
-   INIT_LIST_HEAD(addr-entry);
-   addr-address = address;
-   addr-length = length;
-
-   list_add_tail(addr-entry, slave-addresses);
-   return 0;
-}
-
-/**
- * fsl_dma_slave_free - free a struct fsl_dma_slave
- * @slave: the struct fsl_dma_slave to free
- *
- * Free a struct fsl_dma_slave and all associated address/length pairs
- */
-static inline void fsl_dma_slave_free(struct fsl_dma_slave *slave)
-{
-   struct fsl_dma_hw_addr *addr, *tmp;
-
-   if (slave) {
-   

Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers

2010-09-24 Thread Dan Williams
On Fri, Sep 24, 2010 at 12:46 PM, Ira W. Snyder i...@ovro.caltech.edu wrote:
 This adds support for scatterlist to scatterlist DMA transfers. As
 requested by Dan, this is hidden behind an ifdef so that it can be
 selected by the drivers that need it.

 Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
 ---
  drivers/dma/Kconfig       |    4 ++
  drivers/dma/dmaengine.c   |  119 
 +
  include/linux/dmaengine.h |   10 
  3 files changed, 133 insertions(+), 0 deletions(-)

 diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
 index 9520cf0..f688669 100644
 --- a/drivers/dma/Kconfig
 +++ b/drivers/dma/Kconfig
 @@ -89,10 +89,14 @@ config AT_HDMAC
          Support the Atmel AHB DMA controller.  This can be integrated in
          chips such as the Atmel AT91SAM9RL.

 +config DMAENGINE_SG_TO_SG
 +       bool
 +
  config FSL_DMA
        tristate Freescale Elo and Elo Plus DMA support
        depends on FSL_SOC
        select DMA_ENGINE
 +       select DMAENGINE_SG_TO_SG
        ---help---
          Enable support for the Freescale Elo and Elo Plus DMA controllers.
          The Elo is the DMA controller on some 82xx and 83xx parts, and the
 diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
 index 9d31d5e..57ec1e5 100644
 --- a/drivers/dma/dmaengine.c
 +++ b/drivers/dma/dmaengine.c
 @@ -972,10 +972,129 @@ dma_async_memcpy_pg_to_pg(struct dma_chan *chan, 
 struct page *dest_pg,
  }
  EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);

 +#ifdef CONFIG_DMAENGINE_SG_TO_SG
 +dma_cookie_t
 +dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
 +                         struct scatterlist *dst_sg, unsigned int dst_nents,
 +                         struct scatterlist *src_sg, unsigned int src_nents,
 +                         dma_async_tx_callback cb, void *cb_param)
 +{
 +       struct dma_device *dev = chan-device;
 +       struct dma_async_tx_descriptor *tx;
 +       dma_cookie_t cookie = -ENOMEM;
 +       size_t dst_avail, src_avail;
 +       struct list_head tx_list;
 +       size_t transferred = 0;
 +       dma_addr_t dst, src;
 +       size_t len;
 +
 +       if (dst_nents == 0 || src_nents == 0)
 +               return -EINVAL;
 +
 +       if (dst_sg == NULL || src_sg == NULL)
 +               return -EINVAL;
 +
 +       /* get prepared for the loop */
 +       dst_avail = sg_dma_len(dst_sg);
 +       src_avail = sg_dma_len(src_sg);
 +
 +       INIT_LIST_HEAD(tx_list);
 +
 +       /* run until we are out of descriptors */
 +       while (true) {
 +
 +               /* create the largest transaction possible */
 +               len = min_t(size_t, src_avail, dst_avail);
 +               if (len == 0)
 +                       goto fetch;
 +
 +               dst = sg_dma_address(dst_sg) + sg_dma_len(dst_sg) - dst_avail;
 +               src = sg_dma_address(src_sg) + sg_dma_len(src_sg) - src_avail;
 +
 +               /* setup the transaction */
 +               tx = dev-device_prep_dma_memcpy(chan, dst, src, len, 0);
 +               if (!tx) {
 +                       dev_err(dev-dev, failed to alloc desc for 
 memcpy\n);
 +                       return -ENOMEM;

I don't think any dma channels gracefully handle descriptors that were
prepped but not submitted.  You would probably need to submit the
backlog, poll for completion, and then return the error.
Alternatively, the expectation is that descriptor allocations are
transient, i.e. once previously submitted transactions are completed
the descriptors will return to the available pool.  So you could do
what async_tx routines do and just poll for a descriptor.

 +               }
 +
 +               /* keep track of the tx for later */
 +               list_add_tail(tx-entry, tx_list);
 +
 +               /* update metadata */
 +               transferred += len;
 +               dst_avail -= len;
 +               src_avail -= len;
 +
 +fetch:
 +               /* fetch the next dst scatterlist entry */
 +               if (dst_avail == 0) {
 +
 +                       /* no more entries: we're done */
 +                       if (dst_nents == 0)
 +                               break;
 +
 +                       /* fetch the next entry: if there are no more: done */
 +                       dst_sg = sg_next(dst_sg);
 +                       if (dst_sg == NULL)
 +                               break;
 +
 +                       dst_nents--;
 +                       dst_avail = sg_dma_len(dst_sg);
 +               }
 +
 +               /* fetch the next src scatterlist entry */
 +               if (src_avail == 0) {
 +
 +                       /* no more entries: we're done */
 +                       if (src_nents == 0)
 +                               break;
 +
 +                       /* fetch the next entry: if there are no more: done */
 +                       src_sg = sg_next(src_sg);
 +                       if (src_sg == NULL)
 +                               break;
 +
 +                       src_nents--;

Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers

2010-09-24 Thread Ira W. Snyder
On Fri, Sep 24, 2010 at 01:40:56PM -0700, Dan Williams wrote:
 On Fri, Sep 24, 2010 at 12:46 PM, Ira W. Snyder i...@ovro.caltech.edu wrote:
  This adds support for scatterlist to scatterlist DMA transfers. As
  requested by Dan, this is hidden behind an ifdef so that it can be
  selected by the drivers that need it.
 
  Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
  ---
   drivers/dma/Kconfig       |    4 ++
   drivers/dma/dmaengine.c   |  119 
  +
   include/linux/dmaengine.h |   10 
   3 files changed, 133 insertions(+), 0 deletions(-)
 
  diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
  index 9520cf0..f688669 100644
  --- a/drivers/dma/Kconfig
  +++ b/drivers/dma/Kconfig
  @@ -89,10 +89,14 @@ config AT_HDMAC
           Support the Atmel AHB DMA controller.  This can be integrated in
           chips such as the Atmel AT91SAM9RL.
 
  +config DMAENGINE_SG_TO_SG
  +       bool
  +
   config FSL_DMA
         tristate Freescale Elo and Elo Plus DMA support
         depends on FSL_SOC
         select DMA_ENGINE
  +       select DMAENGINE_SG_TO_SG
         ---help---
           Enable support for the Freescale Elo and Elo Plus DMA controllers.
           The Elo is the DMA controller on some 82xx and 83xx parts, and the
  diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
  index 9d31d5e..57ec1e5 100644
  --- a/drivers/dma/dmaengine.c
  +++ b/drivers/dma/dmaengine.c
  @@ -972,10 +972,129 @@ dma_async_memcpy_pg_to_pg(struct dma_chan *chan, 
  struct page *dest_pg,
   }
   EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
 
  +#ifdef CONFIG_DMAENGINE_SG_TO_SG
  +dma_cookie_t
  +dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
  +                         struct scatterlist *dst_sg, unsigned int 
  dst_nents,
  +                         struct scatterlist *src_sg, unsigned int 
  src_nents,
  +                         dma_async_tx_callback cb, void *cb_param)
  +{
  +       struct dma_device *dev = chan-device;
  +       struct dma_async_tx_descriptor *tx;
  +       dma_cookie_t cookie = -ENOMEM;
  +       size_t dst_avail, src_avail;
  +       struct list_head tx_list;
  +       size_t transferred = 0;
  +       dma_addr_t dst, src;
  +       size_t len;
  +
  +       if (dst_nents == 0 || src_nents == 0)
  +               return -EINVAL;
  +
  +       if (dst_sg == NULL || src_sg == NULL)
  +               return -EINVAL;
  +
  +       /* get prepared for the loop */
  +       dst_avail = sg_dma_len(dst_sg);
  +       src_avail = sg_dma_len(src_sg);
  +
  +       INIT_LIST_HEAD(tx_list);
  +
  +       /* run until we are out of descriptors */
  +       while (true) {
  +
  +               /* create the largest transaction possible */
  +               len = min_t(size_t, src_avail, dst_avail);
  +               if (len == 0)
  +                       goto fetch;
  +
  +               dst = sg_dma_address(dst_sg) + sg_dma_len(dst_sg) - 
  dst_avail;
  +               src = sg_dma_address(src_sg) + sg_dma_len(src_sg) - 
  src_avail;
  +
  +               /* setup the transaction */
  +               tx = dev-device_prep_dma_memcpy(chan, dst, src, len, 0);
  +               if (!tx) {
  +                       dev_err(dev-dev, failed to alloc desc for 
  memcpy\n);
  +                       return -ENOMEM;
 
 I don't think any dma channels gracefully handle descriptors that were
 prepped but not submitted.  You would probably need to submit the
 backlog, poll for completion, and then return the error.
 Alternatively, the expectation is that descriptor allocations are
 transient, i.e. once previously submitted transactions are completed
 the descriptors will return to the available pool.  So you could do
 what async_tx routines do and just poll for a descriptor.
 

Can you give me an example? Even some pseudocode would help.

The other DMAEngine functions (dma_async_memcpy_*()) don't do anything
with the descriptor if submit fails. Take for example
dma_async_memcpy_buf_to_buf(). If tx-tx_submit(tx); fails, any code
using it has no way to return the descriptor to the free pool.

Does tx_submit() implicitly return descriptors to the free pool if it
fails?

  +               }
  +
  +               /* keep track of the tx for later */
  +               list_add_tail(tx-entry, tx_list);
  +
  +               /* update metadata */
  +               transferred += len;
  +               dst_avail -= len;
  +               src_avail -= len;
  +
  +fetch:
  +               /* fetch the next dst scatterlist entry */
  +               if (dst_avail == 0) {
  +
  +                       /* no more entries: we're done */
  +                       if (dst_nents == 0)
  +                               break;
  +
  +                       /* fetch the next entry: if there are no more: done 
  */
  +                       dst_sg = sg_next(dst_sg);
  +                       if (dst_sg == NULL)
  +                               break;
  +
  +                       

[PATCH 1/1] Add config option for batched hcalls

2010-09-24 Thread Will Schmidt

Add a config option for the (batched) MULTITCE and BULK_REMOVE h-calls.

By default, these options are on and are beneficial for performance and
throughput reasons.   If disabled, the code will fall back to using less
optimal TCE and REMOVE hcalls.   The ability to easily disable these
options is useful for some of the PREEMPT_RT related investigation and
work occurring on Power.


Signed-off-by: Will Schmidt will_schm...@vnet.ibm.com
cc: Anton Blanchard an...@samba.org
cc: Benjamin Herrenschmidt b...@kernel.crashing.org

---
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index f0e6f28..0b5e6a9 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -81,3 +81,23 @@ config DTL
  which are accessible through a debugfs file.
 
  Say N if you are unsure.
+
+config BULK_REMOVE
+   bool Enable BULK_REMOVE
+   depends on PPC_PSERIES
+   default y
+   help
+ Enable the BULK_REMOVE option for the hash page code.
+ This relies on a hcall-bulk firmware feature, and
+ should be enabled for performance throughput.
+
+config MULTITCE
+   bool Enable MultiTCE
+   depends on PPC_PSERIES
+   default y
+   help
+ Enable the Multi-TCE code, allowing a single hcall to
+ update multiple TCE entries at one time.  This relies
+ on a hcall-multi-tce firmware feature, and should be
+ enabled for performance throughput.
+
diff --git a/arch/powerpc/platforms/pseries/firmware.c 
b/arch/powerpc/platforms/pseries/firmware.c
index 0a4d8c..4327064 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -51,9 +51,13 @@ firmware_features_table[FIRMWARE_MAX_FEATURES] = {
{FW_FEATURE_VIO,hcall-vio},
{FW_FEATURE_RDMA,   hcall-rdma},
{FW_FEATURE_LLAN,   hcall-lLAN},
+#if defined(CONFIG_BULK_REMOVE)
{FW_FEATURE_BULK_REMOVE,hcall-bulk},
+#endif
{FW_FEATURE_XDABR,  hcall-xdabr},
+#if defined(CONFIG_MULTITCE)
{FW_FEATURE_MULTITCE,   hcall-multi-tce},
+#endif
{FW_FEATURE_SPLPAR, hcall-splpar},
 };
 
 /* Build up the firmware features bitmask using the contents of



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers

2010-09-24 Thread Dan Williams
On Fri, 2010-09-24 at 14:24 -0700, Ira W. Snyder wrote:
 On Fri, Sep 24, 2010 at 01:40:56PM -0700, Dan Williams wrote:
  I don't think any dma channels gracefully handle descriptors that were
  prepped but not submitted.  You would probably need to submit the
  backlog, poll for completion, and then return the error.
  Alternatively, the expectation is that descriptor allocations are
  transient, i.e. once previously submitted transactions are completed
  the descriptors will return to the available pool.  So you could do
  what async_tx routines do and just poll for a descriptor.
 
 
 Can you give me an example? Even some pseudocode would help.

Here is one from do_async_gen_syndrome() in crypto/async_tx/async_pq.c:

/* Since we have clobbered the src_list we are committed
 * to doing this asynchronously.  Drivers force forward
 * progress in case they can not provide a descriptor
 */
for (;;) {
tx = dma-device_prep_dma_pq(chan, dma_dest,
 dma_src[src_off],
 pq_src_cnt,
 coefs[src_off], len,
 dma_flags);
if (likely(tx))
break;  
async_tx_quiesce(submit-depend_tx);
dma_async_issue_pending(chan);
}   

 The other DMAEngine functions (dma_async_memcpy_*()) don't do anything
 with the descriptor if submit fails. Take for example
 dma_async_memcpy_buf_to_buf(). If tx-tx_submit(tx); fails, any code
 using it has no way to return the descriptor to the free pool.
 
 Does tx_submit() implicitly return descriptors to the free pool if it
 fails?

No, submit() failures are a hold over from when the ioatdma driver used
to perform additional descriptor allocation at -submit() time.  After
prep() the expectation is that the engine is just waiting to be told
go and can't fail.  The only reason -submit() retains a return code
is to support the cookie based method for polling for operation
completion.  A dma driver should handle all descriptor submission
failure scenarios at prep time.

 Ok, I thought the list was clearer, but this is equally easy. How about
 the following change that does away with the list completely. Then
 things should work on ioatdma as well.
 
 From d59569ff48a89ef5411af3cf2995af7b742c5cd3 Mon Sep 17 00:00:00 2001
 From: Ira W. Snyder i...@ovro.caltech.edu
 Date: Fri, 24 Sep 2010 14:18:09 -0700
 Subject: [PATCH] dma: improve scatterlist to scatterlist transfer
 
 This is an improved algorithm to improve support on the Intel I/OAT
 driver.
 
 Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
 ---
  drivers/dma/dmaengine.c   |   52 +---
  include/linux/dmaengine.h |3 --
  2 files changed, 25 insertions(+), 30 deletions(-)
 
 diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
 index 57ec1e5..cde775c 100644
 --- a/drivers/dma/dmaengine.c
 +++ b/drivers/dma/dmaengine.c
 @@ -983,10 +983,13 @@ dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
 struct dma_async_tx_descriptor *tx;
 dma_cookie_t cookie = -ENOMEM;
 size_t dst_avail, src_avail;
 -   struct list_head tx_list;
 +   struct scatterlist *sg;
 size_t transferred = 0;
 +   size_t dst_total = 0;
 +   size_t src_total = 0;
 dma_addr_t dst, src;
 size_t len;
 +   int i;
 
 if (dst_nents == 0 || src_nents == 0)
 return -EINVAL;
 @@ -994,12 +997,17 @@ dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
 if (dst_sg == NULL || src_sg == NULL)
 return -EINVAL;
 
 +   /* get the total count of bytes in each scatterlist */
 +   for_each_sg(dst_sg, sg, dst_nents, i)
 +   dst_total += sg_dma_len(sg);
 +
 +   for_each_sg(src_sg, sg, src_nents, i)
 +   src_total += sg_dma_len(sg);
 +

What about overrun or underrun do we not care if src_total != dst_total?

Otherwise looks ok.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers

2010-09-24 Thread Ira W. Snyder
On Fri, Sep 24, 2010 at 02:53:14PM -0700, Dan Williams wrote:
 On Fri, 2010-09-24 at 14:24 -0700, Ira W. Snyder wrote:
  On Fri, Sep 24, 2010 at 01:40:56PM -0700, Dan Williams wrote:
   I don't think any dma channels gracefully handle descriptors that were
   prepped but not submitted.  You would probably need to submit the
   backlog, poll for completion, and then return the error.
   Alternatively, the expectation is that descriptor allocations are
   transient, i.e. once previously submitted transactions are completed
   the descriptors will return to the available pool.  So you could do
   what async_tx routines do and just poll for a descriptor.
  
  
  Can you give me an example? Even some pseudocode would help.
 
 Here is one from do_async_gen_syndrome() in crypto/async_tx/async_pq.c:
 
 /* Since we have clobbered the src_list we are committed
  * to doing this asynchronously.  Drivers force forward
  * progress in case they can not provide a descriptor
  */
 for (;;) {
 tx = dma-device_prep_dma_pq(chan, dma_dest,
  dma_src[src_off],
  pq_src_cnt,
  coefs[src_off], len,
  dma_flags);
 if (likely(tx))
 break;  
 async_tx_quiesce(submit-depend_tx);
 dma_async_issue_pending(chan);
 }   
 
  The other DMAEngine functions (dma_async_memcpy_*()) don't do anything
  with the descriptor if submit fails. Take for example
  dma_async_memcpy_buf_to_buf(). If tx-tx_submit(tx); fails, any code
  using it has no way to return the descriptor to the free pool.
  
  Does tx_submit() implicitly return descriptors to the free pool if it
  fails?
 
 No, submit() failures are a hold over from when the ioatdma driver used
 to perform additional descriptor allocation at -submit() time.  After
 prep() the expectation is that the engine is just waiting to be told
 go and can't fail.  The only reason -submit() retains a return code
 is to support the cookie based method for polling for operation
 completion.  A dma driver should handle all descriptor submission
 failure scenarios at prep time.
 

Ok, that's more like what I expected. So we still need the try forever
code similar to the above. I can add that for the next version.

  Ok, I thought the list was clearer, but this is equally easy. How about
  the following change that does away with the list completely. Then
  things should work on ioatdma as well.
  
  From d59569ff48a89ef5411af3cf2995af7b742c5cd3 Mon Sep 17 00:00:00 2001
  From: Ira W. Snyder i...@ovro.caltech.edu
  Date: Fri, 24 Sep 2010 14:18:09 -0700
  Subject: [PATCH] dma: improve scatterlist to scatterlist transfer
  
  This is an improved algorithm to improve support on the Intel I/OAT
  driver.
  
  Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
  ---
   drivers/dma/dmaengine.c   |   52 
  +---
   include/linux/dmaengine.h |3 --
   2 files changed, 25 insertions(+), 30 deletions(-)
  
  diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
  index 57ec1e5..cde775c 100644
  --- a/drivers/dma/dmaengine.c
  +++ b/drivers/dma/dmaengine.c
  @@ -983,10 +983,13 @@ dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
  struct dma_async_tx_descriptor *tx;
  dma_cookie_t cookie = -ENOMEM;
  size_t dst_avail, src_avail;
  -   struct list_head tx_list;
  +   struct scatterlist *sg;
  size_t transferred = 0;
  +   size_t dst_total = 0;
  +   size_t src_total = 0;
  dma_addr_t dst, src;
  size_t len;
  +   int i;
  
  if (dst_nents == 0 || src_nents == 0)
  return -EINVAL;
  @@ -994,12 +997,17 @@ dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
  if (dst_sg == NULL || src_sg == NULL)
  return -EINVAL;
  
  +   /* get the total count of bytes in each scatterlist */
  +   for_each_sg(dst_sg, sg, dst_nents, i)
  +   dst_total += sg_dma_len(sg);
  +
  +   for_each_sg(src_sg, sg, src_nents, i)
  +   src_total += sg_dma_len(sg);
  +
 
 What about overrun or underrun do we not care if src_total != dst_total?
 
 Otherwise looks ok.
 

I don't know if we should care about that. The algorithm handles that
case just fine. It copies the maximum amount it can, which is exactly
min(src_total, dst_total). Whichever scatterlist runs out of entries
first is the shortest.

As a real world example, my driver verifies that both scatterlists have
exactly the right number of bytes available before trying to program the
hardware.

Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: ppc44x - how do i optimize driver for tlb hits

2010-09-24 Thread Benjamin Herrenschmidt
On Fri, 2010-09-24 at 08:08 -0500, Ayman El-Khashab wrote:
 
 I suppose another option is to to use the kernel profiling option I 
 always see but have never used.  Is that a viable option to figure out
 what is happening here?  

With perf and stochastic sampling ? If you sample fast enough... but
you'll mostly point to your routine I suppose... though it might tell
you statistically where in your code, which -might- help.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.

2010-09-24 Thread Benjamin Herrenschmidt

  This list is getting way too much unrelated stuff, which I find
  annoying, it would be nice if we were all a bit more careful here with
  our CC lists.
 
 Sorry, I only added device-tree because some one asked me to do so.
 
 http://marc.info/?l=linux-netdevm=127273157912358
 
 I'll leave it off next time.

That's allright. I'd rather you just post the bindings there than the
whole patch least but no big deal.

I was just fixing my email filters and notice a lot of seemingly
unrelated stuff landing there :-)

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH (Option 1)] of/i2c: fix module load order issue caused by of_i2c.c

2010-09-24 Thread Grant Likely
Commit 959e85f7, i2c: add OF-style registration and binding caused a
module dependency loop where of_i2c.c calls functions in i2c-core, and
i2c-core calls of_i2c_register_devices() in of_i2c.  This means that
when i2c support is built as a module when CONFIG_OF is set, then
neither i2c_core nor of_i2c are able to be loaded.

This patch fixes the problem by moving the of_i2c_register_devices()
function into the body of i2c_core and renaming it to
i2c_scan_of_devices (of_i2c_register_devices is analogous to the
existing i2c_scan_static_board_info function and so should be named
similarly).  This function isn't called by any code outside of
i2c_core, and it must always be present when CONFIG_OF is selected, so
it makes sense to locate it there.  When CONFIG_OF is not selected,
of_i2c_register_devices() becomes a no-op.

Signed-off-by: Grant Likely grant.lik...@secretlab.ca
---
 drivers/i2c/i2c-core.c |   61 ++--
 drivers/of/of_i2c.c|   57 -
 include/linux/of_i2c.h |7 --
 3 files changed, 59 insertions(+), 66 deletions(-)

diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index 6649176..64a261b 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -32,8 +32,8 @@
 #include linux/init.h
 #include linux/idr.h
 #include linux/mutex.h
-#include linux/of_i2c.h
 #include linux/of_device.h
+#include linux/of_irq.h
 #include linux/completion.h
 #include linux/hardirq.h
 #include linux/irqflags.h
@@ -818,6 +818,63 @@ static void i2c_scan_static_board_info(struct i2c_adapter 
*adapter)
up_read(__i2c_board_lock);
 }
 
+#ifdef CONFIG_OF
+void i2c_scan_of_devices(struct i2c_adapter *adap)
+{
+   void *result;
+   struct device_node *node;
+
+   /* Only register child devices if the adapter has a node pointer set */
+   if (!adap-dev.of_node)
+   return;
+
+   for_each_child_of_node(adap-dev.of_node, node) {
+   struct i2c_board_info info = {};
+   struct dev_archdata dev_ad = {};
+   const __be32 *addr;
+   int len;
+
+   dev_dbg(adap-dev, of_i2c: register %s\n, node-full_name);
+   if (of_modalias_node(node, info.type, sizeof(info.type))  0) {
+   dev_err(adap-dev, of_i2c: modalias failure on %s\n,
+   node-full_name);
+   continue;
+   }
+
+   addr = of_get_property(node, reg, len);
+   if (!addr || (len  sizeof(int))) {
+   dev_err(adap-dev, of_i2c: invalid reg on %s\n,
+   node-full_name);
+   continue;
+   }
+
+   info.addr = be32_to_cpup(addr);
+   if (info.addr  (1  10) - 1) {
+   dev_err(adap-dev, of_i2c: invalid addr=%x on %s\n,
+   info.addr, node-full_name);
+   continue;
+   }
+
+   info.irq = irq_of_parse_and_map(node, 0);
+   info.of_node = of_node_get(node);
+   info.archdata = dev_ad;
+
+   request_module(%s, info.type);
+
+   result = i2c_new_device(adap, info);
+   if (result == NULL) {
+   dev_err(adap-dev, of_i2c: Failure registering %s\n,
+   node-full_name);
+   of_node_put(node);
+   irq_dispose_mapping(info.irq);
+   continue;
+   }
+   }
+}
+#else
+static inline void i2c_scan_of_devices(struct i2c_adapter *adap) { }
+#endif
+
 static int i2c_do_add_adapter(struct i2c_driver *driver,
  struct i2c_adapter *adap)
 {
@@ -877,7 +934,7 @@ static int i2c_register_adapter(struct i2c_adapter *adap)
i2c_scan_static_board_info(adap);
 
/* Register devices from the device tree */
-   of_i2c_register_devices(adap);
+   i2c_scan_of_devices(adap);
 
/* Notify drivers */
mutex_lock(core_lock);
diff --git a/drivers/of/of_i2c.c b/drivers/of/of_i2c.c
index 0a694de..e0c3841 100644
--- a/drivers/of/of_i2c.c
+++ b/drivers/of/of_i2c.c
@@ -17,63 +17,6 @@
 #include linux/of_irq.h
 #include linux/module.h
 
-void of_i2c_register_devices(struct i2c_adapter *adap)
-{
-   void *result;
-   struct device_node *node;
-
-   /* Only register child devices if the adapter has a node pointer set */
-   if (!adap-dev.of_node)
-   return;
-
-   dev_dbg(adap-dev, of_i2c: walking child nodes\n);
-
-   for_each_child_of_node(adap-dev.of_node, node) {
-   struct i2c_board_info info = {};
-   struct dev_archdata dev_ad = {};
-   const __be32 *addr;
-   int len;
-
-   dev_dbg(adap-dev, of_i2c: register %s\n, node-full_name);
-
-   if 

Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop

2010-09-24 Thread Grant Likely
On Fri, Sep 24, 2010 at 7:48 AM, Grant Likely grant.lik...@secretlab.ca wrote:


 Jean Delvare kh...@linux-fr.org wrote:

Hi Mikael,

On Fri, 24 Sep 2010 12:50:01 +0200, Mikael Pettersson wrote:
 Jean Delvare writes:
   As far as I can see this is caused by this commit from Grant:
  
   
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5
  
   Mikael, can you please try reverting this patch and see if it solves
   your problem?

 Yes, reverting the above commit from 2.6.36-rc5 eliminated the warnings,
 and I was able to insmod the i2c-{core,dev,powermac}.ko modules.

Thanks for testing and reporting. Grant, unless you come up with a fix
very quickly, I'll have to revert
959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5 for 2.6.36.

 I'll get a fix out today.

I've got two different fixes that I'm about to send you.  You can
choose the fix that you prefer.  The first option moves the offending
function into i2c-core.c.  The function parses the device tree data
and creates i2c_device for each i2c device node that it finds.  This
is analogous to i2c_scan_static_board_info().

The second options reverts most of the 959e85f7 commit, but keeps the
line that allows of-style matching is retained so that all i2c_devices
on powerpc machines will still bind correctly.

My preferred solution is the first option because the tested code path
does not changed.  The offending function is simply moved verbatim.
The second option is a smaller patch, but I can only test one of the
affected drivers.  However, I'll let you make the decision.

Both have been build tested on PowerPC and ARM, and run tested on a
PowerPC MPC5200 board.

patches to follow in a few minutes..

g.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH (Option 2)] of/i2c: fix module load order issue caused by of_i2c.c

2010-09-24 Thread Grant Likely
Commit 959e85f7, i2c: add OF-style registration and binding caused a
module dependency loop where of_i2c.c calls functions in i2c-core, and
i2c-core calls of_i2c_register_devices() in of_i2c.  This means that
when i2c support is built as a module when CONFIG_OF is set, then
neither i2c_core nor of_i2c are able to be loaded.

This patch fixes the problem by moving the of_i2c_register_devices()
calls back into the device drivers.  Device drivers already
specifically request the core code to parse the device tree for
devices anyway by setting the of_node pointer, so it isn't a big
deal to also call the registration function.  The drivers just become
slightly more verbose.

Signed-off-by: Grant Likely grant.lik...@secretlab.ca
---
 drivers/i2c/busses/i2c-cpm.c |5 +
 drivers/i2c/busses/i2c-ibm_iic.c |3 +++
 drivers/i2c/busses/i2c-mpc.c |1 +
 drivers/i2c/i2c-core.c   |4 
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c
index f7bd261..f2de3be 100644
--- a/drivers/i2c/busses/i2c-cpm.c
+++ b/drivers/i2c/busses/i2c-cpm.c
@@ -677,6 +677,11 @@ static int __devinit cpm_i2c_probe(struct platform_device 
*ofdev,
dev_dbg(ofdev-dev, hw routines for %s registered.\n,
cpm-adap.name);
 
+   /*
+* register OF I2C devices
+*/
+   of_i2c_register_devices(cpm-adap);
+
return 0;
 out_shut:
cpm_i2c_shutdown(cpm);
diff --git a/drivers/i2c/busses/i2c-ibm_iic.c b/drivers/i2c/busses/i2c-ibm_iic.c
index 43ca32f..89eedf4 100644
--- a/drivers/i2c/busses/i2c-ibm_iic.c
+++ b/drivers/i2c/busses/i2c-ibm_iic.c
@@ -761,6 +761,9 @@ static int __devinit iic_probe(struct platform_device 
*ofdev,
dev_info(ofdev-dev, using %s mode\n,
 dev-fast_mode ? fast (400 kHz) : standard (100 kHz));
 
+   /* Now register all the child nodes */
+   of_i2c_register_devices(adap);
+
return 0;
 
 error_cleanup:
diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c
index a1c419a..b74e6dc 100644
--- a/drivers/i2c/busses/i2c-mpc.c
+++ b/drivers/i2c/busses/i2c-mpc.c
@@ -632,6 +632,7 @@ static int __devinit fsl_i2c_probe(struct platform_device 
*op,
dev_err(i2c-dev, failed to add adapter\n);
goto fail_add;
}
+   of_i2c_register_devices(i2c-adap);
 
return result;
 
diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index 6649176..a9589f5 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -32,7 +32,6 @@
 #include linux/init.h
 #include linux/idr.h
 #include linux/mutex.h
-#include linux/of_i2c.h
 #include linux/of_device.h
 #include linux/completion.h
 #include linux/hardirq.h
@@ -876,9 +875,6 @@ static int i2c_register_adapter(struct i2c_adapter *adap)
if (adap-nr  __i2c_first_dynamic_bus_num)
i2c_scan_static_board_info(adap);
 
-   /* Register devices from the device tree */
-   of_i2c_register_devices(adap);
-
/* Notify drivers */
mutex_lock(core_lock);
bus_for_each_drv(i2c_bus_type, NULL, adap, __process_new_adapter);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers

2010-09-24 Thread Dan Williams
On Fri, 2010-09-24 at 15:04 -0700, Ira W. Snyder wrote:
 On Fri, Sep 24, 2010 at 02:53:14PM -0700, Dan Williams wrote:
  What about overrun or underrun do we not care if src_total != dst_total?
  
  Otherwise looks ok.
  
 
 I don't know if we should care about that. The algorithm handles that
 case just fine. It copies the maximum amount it can, which is exactly
 min(src_total, dst_total). Whichever scatterlist runs out of entries
 first is the shortest.
 
 As a real world example, my driver verifies that both scatterlists have
 exactly the right number of bytes available before trying to program the
 hardware.

Ok, just handle the prep failure and I think we are good to go.

--
Dan


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers

2010-09-24 Thread Ira W. Snyder
On Fri, Sep 24, 2010 at 03:04:19PM -0700, Ira W. Snyder wrote:
 On Fri, Sep 24, 2010 at 02:53:14PM -0700, Dan Williams wrote:
  On Fri, 2010-09-24 at 14:24 -0700, Ira W. Snyder wrote:
   On Fri, Sep 24, 2010 at 01:40:56PM -0700, Dan Williams wrote:
I don't think any dma channels gracefully handle descriptors that were
prepped but not submitted.  You would probably need to submit the
backlog, poll for completion, and then return the error.
Alternatively, the expectation is that descriptor allocations are
transient, i.e. once previously submitted transactions are completed
the descriptors will return to the available pool.  So you could do
what async_tx routines do and just poll for a descriptor.
   
   
   Can you give me an example? Even some pseudocode would help.
  
  Here is one from do_async_gen_syndrome() in crypto/async_tx/async_pq.c:
  
  /* Since we have clobbered the src_list we are committed
   * to doing this asynchronously.  Drivers force forward
   * progress in case they can not provide a descriptor
   */
  for (;;) {
  tx = dma-device_prep_dma_pq(chan, dma_dest,
   dma_src[src_off],
   pq_src_cnt,
   coefs[src_off], len,
   dma_flags);
  if (likely(tx))
  break;  
  async_tx_quiesce(submit-depend_tx);
  dma_async_issue_pending(chan);
  }   
  
   The other DMAEngine functions (dma_async_memcpy_*()) don't do anything
   with the descriptor if submit fails. Take for example
   dma_async_memcpy_buf_to_buf(). If tx-tx_submit(tx); fails, any code
   using it has no way to return the descriptor to the free pool.
   
   Does tx_submit() implicitly return descriptors to the free pool if it
   fails?
  
  No, submit() failures are a hold over from when the ioatdma driver used
  to perform additional descriptor allocation at -submit() time.  After
  prep() the expectation is that the engine is just waiting to be told
  go and can't fail.  The only reason -submit() retains a return code
  is to support the cookie based method for polling for operation
  completion.  A dma driver should handle all descriptor submission
  failure scenarios at prep time.
  
 
 Ok, that's more like what I expected. So we still need the try forever
 code similar to the above. I can add that for the next version.
 

When coding this change, I've noticed one problem that would break my
driver. I cannot issue dma_async_issue_pending() on the channel while
creating the descriptors, since this will start transferring the
previously submitted DMA descriptors. This breaks the external hardware
control requirement.

Imagine this scenario:
1) device is not yet setup for external control (nothing is pulsing the pins)
2) dma_async_memcpy_sg_to_sg()

- this hits an allocation failure, which calls dma_async_issue_pending()
- this causes the DMA engine to start transferring to a device which is
  not ready yet
- memory pressure stops, and allocation succeeds again
- some descriptors have been transferred, but not the ones since the
  alloc failure
- now the first half of the descriptors (pre alloc failure) have been
  transferred
- the second half of the descriptors (post alloc failure) are still
  pending
- the dma_async_memcpy_sg_to_sg() returns success: all tx_submit()
  succeeded

3) device_control() - setup external control mode
4) dma_async_issue_pending() - start the externally controlled transfer
5) tell the external agent to start controlling the DMA transaction

- now there isn't enough data left, and the external agent fails to
  program the FPGAs

I don't mind adding it to the code, since I have enough memory that I
don't ever see allocation failures. It is an embedded system, and we've
been careful not to overcommit memory. I think for all other users, it
would be the appropriate thing to do. Most people don't care if the
scatterlist is copied in two chunks with a time gap in the middle.

An alternative implementation would be to implement
device_prep_sg_to_sg() that returned a struct dma_async_tx_descriptor,
which could then be used as normal by higher layers. This would allow
the driver to allocate / cleanup all descriptors in one shot. This would
be completely robust to this error situation.

Is there one solution you'd prefer over the other? They're both similar
in the amount of code, though duplication would probably be increased in
the device_prep_sg_to_sg() case. If any other driver implements it.

Thanks,
Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH RFCv2 0/2] dma: add support for sg-to-sg transfers

2010-09-24 Thread Ira W. Snyder
This series adds support for scatterlist to scatterlist transfers to the
generic DMAEngine API. I have hidden it behind a configuration option to
allow specific drivers that need this functionality to enable it.

This series is intended to lay the groundwork for further changes to the
series titled CARMA Board Support. That series will be updated when I
have time and hardware to test with.

This series has not been runtime tested yet. I am posting it only to
gain comments before I spend the effort to update the driver that
depends on this.

To help reviewers, I'd like to comment on the architecture of
dma_async_memcpy_sg_to_sg(). It explicitly avoids using descriptor
chaining due to the way that feature interacts with the fsldma
controller's external start feature. To use the external start feature
properly, the in-memory descriptor chain must not be fragmented into
multiple smaller chains. This is what is achieved by submitting all
descriptors without using chaining.

An alternative implementation would create a device_prep_sg_to_sg()
function, and use that to allocate all descriptors in one shot. That
implementation would be safer against allocation failures than this one.

I would recommend against committing this until I've tested it on real
hardware.

Ira W. Snyder (2):
  dmaengine: add support for scatterlist to scatterlist transfers
  fsldma: use generic support for scatterlist to scatterlist transfers

 arch/powerpc/include/asm/fsldma.h |  115 ++--
 drivers/dma/Kconfig   |4 +
 drivers/dma/dmaengine.c   |  119 
 drivers/dma/fsldma.c  |  219 +++--
 include/linux/dmaengine.h |   10 ++
 5 files changed, 181 insertions(+), 286 deletions(-)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH RFCv2 1/2] dmaengine: add support for scatterlist to scatterlist transfers

2010-09-24 Thread Ira W. Snyder
This adds support for scatterlist to scatterlist DMA transfers. This is
currently hidden behind a configuration option, which will allow drivers
which need this functionality to select it individually.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/Kconfig   |3 +
 drivers/dma/dmaengine.c   |  125 +
 include/linux/dmaengine.h |6 ++
 3 files changed, 134 insertions(+), 0 deletions(-)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 9520cf0..82d2244 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -89,6 +89,9 @@ config AT_HDMAC
  Support the Atmel AHB DMA controller.  This can be integrated in
  chips such as the Atmel AT91SAM9RL.
 
+config DMAENGINE_SG_TO_SG
+   bool
+
 config FSL_DMA
tristate Freescale Elo and Elo Plus DMA support
depends on FSL_SOC
diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 9d31d5e..9238b86 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -972,6 +972,131 @@ dma_async_memcpy_pg_to_pg(struct dma_chan *chan, struct 
page *dest_pg,
 }
 EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
 
+#ifdef CONFIG_DMAENGINE_SG_TO_SG
+dma_cookie_t
+dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
+ struct scatterlist *dst_sg, unsigned int dst_nents,
+ struct scatterlist *src_sg, unsigned int src_nents,
+ dma_async_tx_callback cb, void *cb_param)
+{
+   struct dma_device *dev = chan-device;
+   struct dma_async_tx_descriptor *tx;
+   dma_cookie_t cookie = -ENOMEM;
+   size_t dst_avail, src_avail;
+   struct scatterlist *sg;
+   size_t transferred = 0;
+   size_t dst_total = 0;
+   size_t src_total = 0;
+   dma_addr_t dst, src;
+   size_t len;
+   int i;
+
+   if (dst_nents == 0 || src_nents == 0)
+   return -EINVAL;
+
+   if (dst_sg == NULL || src_sg == NULL)
+   return -EINVAL;
+
+   /* get the total count of bytes in each scatterlist */
+   for_each_sg(dst_sg, sg, dst_nents, i)
+   dst_total += sg_dma_len(sg);
+
+   for_each_sg(src_sg, sg, src_nents, i)
+   src_total += sg_dma_len(sg);
+
+   /* get prepared for the loop */
+   dst_avail = sg_dma_len(dst_sg);
+   src_avail = sg_dma_len(src_sg);
+
+   /* run until we are out of descriptors */
+   while (true) {
+
+   /* create the largest transaction possible */
+   len = min_t(size_t, src_avail, dst_avail);
+   if (len == 0)
+   goto fetch;
+
+   dst = sg_dma_address(dst_sg) + sg_dma_len(dst_sg) - dst_avail;
+   src = sg_dma_address(src_sg) + sg_dma_len(src_sg) - src_avail;
+
+   /*
+* get a descriptor
+*
+* we must poll for a descriptor here since the DMAEngine API
+* does not provide a way for external users to free previously
+* allocated descriptors
+*/
+   for (;;) {
+   tx = dev-device_prep_dma_memcpy(chan, dst, src, len, 
0);
+   if (likely(tx))
+   break;
+
+   dma_async_issue_pending(chan);
+   }
+
+   /* update metadata */
+   transferred += len;
+   dst_avail -= len;
+   src_avail -= len;
+
+   /* if this is the last transfer, setup the callback */
+   if (dst_total == transferred || src_total == transferred) {
+   tx-callback = cb;
+   tx-callback_param = cb_param;
+   }
+
+   /* submit the transaction */
+   cookie = tx-tx_submit(tx);
+   if (dma_submit_error(cookie)) {
+   dev_err(dev-dev, failed to submit desc\n);
+   return cookie;
+   }
+
+fetch:
+   /* fetch the next dst scatterlist entry */
+   if (dst_avail == 0) {
+
+   /* no more entries: we're done */
+   if (dst_nents == 0)
+   break;
+
+   /* fetch the next entry: if there are no more: done */
+   dst_sg = sg_next(dst_sg);
+   if (dst_sg == NULL)
+   break;
+
+   dst_nents--;
+   dst_avail = sg_dma_len(dst_sg);
+   }
+
+   /* fetch the next src scatterlist entry */
+   if (src_avail == 0) {
+
+   /* no more entries: we're done */
+   if (src_nents == 0)
+   break;
+
+   /* fetch the next entry: if there are no more: done */
+   src_sg = 

[PATCH RFCv2 2/2] fsldma: use generic support for scatterlist to scatterlist transfers

2010-09-24 Thread Ira W. Snyder
The fsldma driver uses the DMA_SLAVE API to handle scatterlist to
scatterlist DMA transfers. For quite a while now, it has been possible
to mimic the operation by using the device_prep_dma_memcpy() routine
intelligently.

Now that the DMAEngine API has grown generic support for scatterlist to
scatterlist transfers, this operation is no longer needed. The generic
support is used for scatterlist to scatterlist transfers.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 arch/powerpc/include/asm/fsldma.h |  115 ++--
 drivers/dma/fsldma.c  |  219 +++--
 2 files changed, 48 insertions(+), 286 deletions(-)

diff --git a/arch/powerpc/include/asm/fsldma.h 
b/arch/powerpc/include/asm/fsldma.h
index debc5ed..dc0bd27 100644
--- a/arch/powerpc/include/asm/fsldma.h
+++ b/arch/powerpc/include/asm/fsldma.h
@@ -1,7 +1,7 @@
 /*
  * Freescale MPC83XX / MPC85XX DMA Controller
  *
- * Copyright (c) 2009 Ira W. Snyder i...@ovro.caltech.edu
+ * Copyright (c) 2009-2010 Ira W. Snyder i...@ovro.caltech.edu
  *
  * This file is licensed under the terms of the GNU General Public License
  * version 2. This program is licensed as is without any warranty of any
@@ -11,127 +11,32 @@
 #ifndef __ARCH_POWERPC_ASM_FSLDMA_H__
 #define __ARCH_POWERPC_ASM_FSLDMA_H__
 
-#include linux/slab.h
 #include linux/dmaengine.h
 
 /*
- * Definitions for the Freescale DMA controller's DMA_SLAVE implemention
+ * The Freescale DMA controller has several features that are not accomodated
+ * in the Linux DMAEngine API. Therefore, the generic structure is expanded
+ * to allow drivers to use these features.
  *
- * The Freescale DMA_SLAVE implementation was designed to handle many-to-many
- * transfers. An example usage would be an accelerated copy between two
- * scatterlists. Another example use would be an accelerated copy from
- * multiple non-contiguous device buffers into a single scatterlist.
+ * This structure should be passed into the DMAEngine routine device_control()
+ * as in this example:
  *
- * A DMA_SLAVE transaction is defined by a struct fsl_dma_slave. This
- * structure contains a list of hardware addresses that should be copied
- * to/from the scatterlist passed into device_prep_slave_sg(). The structure
- * also has some fields to enable hardware-specific features.
+ * chan-device-device_control(chan, DMA_SLAVE_CONFIG, (unsigned long)cfg);
  */
 
 /**
- * struct fsl_dma_hw_addr
- * @entry: linked list entry
- * @address: the hardware address
- * @length: length to transfer
- *
- * Holds a single physical hardware address / length pair for use
- * with the DMAEngine DMA_SLAVE API.
- */
-struct fsl_dma_hw_addr {
-   struct list_head entry;
-
-   dma_addr_t address;
-   size_t length;
-};
-
-/**
  * struct fsl_dma_slave
- * @addresses: a linked list of struct fsl_dma_hw_addr structures
+ * @config: the standard Linux DMAEngine API DMA_SLAVE configuration
  * @request_count: value for DMA request count
- * @src_loop_size: setup and enable constant source-address DMA transfers
- * @dst_loop_size: setup and enable constant destination address DMA transfers
  * @external_start: enable externally started DMA transfers
  * @external_pause: enable externally paused DMA transfers
- *
- * Holds a list of address / length pairs for use with the DMAEngine
- * DMA_SLAVE API implementation for the Freescale DMA controller.
  */
-struct fsl_dma_slave {
+struct fsldma_slave_config {
+   struct dma_slave_config config;
 
-   /* List of hardware address/length pairs */
-   struct list_head addresses;
-
-   /* Support for extra controller features */
unsigned int request_count;
-   unsigned int src_loop_size;
-   unsigned int dst_loop_size;
bool external_start;
bool external_pause;
 };
 
-/**
- * fsl_dma_slave_append - add an address/length pair to a struct fsl_dma_slave
- * @slave: the struct fsl_dma_slave to add to
- * @address: the hardware address to add
- * @length: the length of bytes to transfer from @address
- *
- * Add a hardware address/length pair to a struct fsl_dma_slave. Returns 0 on
- * success, -ERRNO otherwise.
- */
-static inline int fsl_dma_slave_append(struct fsl_dma_slave *slave,
-  dma_addr_t address, size_t length)
-{
-   struct fsl_dma_hw_addr *addr;
-
-   addr = kzalloc(sizeof(*addr), GFP_ATOMIC);
-   if (!addr)
-   return -ENOMEM;
-
-   INIT_LIST_HEAD(addr-entry);
-   addr-address = address;
-   addr-length = length;
-
-   list_add_tail(addr-entry, slave-addresses);
-   return 0;
-}
-
-/**
- * fsl_dma_slave_free - free a struct fsl_dma_slave
- * @slave: the struct fsl_dma_slave to free
- *
- * Free a struct fsl_dma_slave and all associated address/length pairs
- */
-static inline void fsl_dma_slave_free(struct fsl_dma_slave *slave)
-{
-   struct fsl_dma_hw_addr *addr, *tmp;
-
-   if (slave) {
-