Re: [PATCH 0/8] sdhci: Move real work out of an atomic context

2010-09-09 Thread Anton Vorontsov
On Thu, Sep 09, 2010 at 03:28:34AM +0100, Chris Ball wrote:
[...]
> [7.372843]  [] __might_sleep+0xd9/0xe0
> [7.387864]  [] mutex_lock+0x1c/0x2a
> [7.402576]  [] sdhci_led_control+0x1a/0x41
> [7.417727]  [] led_trigger_event+0x42/0x5c

led_trigger_even grabs a readlock. :-(

> [7.432807]  [] mmc_request_done+0x56/0x6f
> [7.447597]  [] sdhci_finish_work+0xc8/0xcd
> [7.462643]  [] ? sdhci_finish_work+0x0/0xcd
> [7.477941]  [] worker_thread+0x165/0x1ed
> [7.492856]  [] ? sdhci_finish_work+0x0/0xcd
> [7.508204]  [] ? autoremove_wake_function+0x0/0x34
> [7.524178]  [] ? worker_thread+0x0/0x1ed
> [7.538953]  [] kthread+0x63/0x68
> [7.552659]  [] ? kthread+0x0/0x68
> [7.566349]  [] kernel_thread_helper+0x6/0x10
> [7.709931] udev: starting version 141
> [7.940374] mmc2: new high speed SDHC card at address e4da
> [8.058165] mmcblk0: mmc2:e4da SU04G 3.69 GiB 
> [8.135730]  mmcblk0: p1 p2
> 
> Full dmesg is at http://chris.printf.net/anton-mutex-dmesg.txt. 
> Anton, the kernel is 2.6.35.4-olpc plus your patchset from -mm.
> I can think about how to test on an upstream kernel instead, but
> perhaps your own tests simply didn't hit sdhci_led_control(). 

Yep, LEDS support was turned off.

> Andrew, if you want to drop this while the BUG() and potential
> performance regressions are worked out, I'd be happy to keep 
> testing patches from Anton until it's without regressions here.

Thanks Chris.

I also think that it's better to drop these series now,
and meanwhile I'll try to prepare another patchset.

-- 
Anton Vorontsov
email: cbouatmai...@gmail.com
irc://irc.freenode.net/bd2
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] sdhci: Move real work out of an atomic context

2010-09-08 Thread Chris Ball
Hi,

On Wed, Sep 08, 2010 at 10:37:41PM +0100, Chris Ball wrote:
> Hi Andrew,
> 
> On Tue, Sep 07, 2010 at 03:38:13PM -0700, Andrew Morton wrote:
> > > I noticed no throughput drop neither with PIO transfers nor
> > > with DMA (tested on MPC8569E CPU), while latencies should be
> > > greatly improved.
> > 
> > This patchset isn't causing any problems yet, but may do so in the
> > future and will impact the validity of any testing.  It seems to be
> > kind of stuck.  Should I drop it all?
> 
> I suggest keeping it -- I'll find time to test it out here soon, and
> will keep it in mind as a possible regression cause.

Am running this now.  The first thing I'm noticing is a repeated BUG():

[7.288186] Write protecting the kernel read-only data: 1072k
[7.306446] BUG: sleeping function called from invalid context at 
kernel/mutex.c:94
[7.324375] in_atomic(): 1, irqs_disabled(): 0, pid: 532, name: mmc2/0
[7.340989] Pid: 532, comm: mmc2/0 Not tainted 
2.6.35.4_xo1.5-20100908.2141.olpc.44f3b38_DIRTY #1
[7.360129] Call Trace:
[7.372843]  [] __might_sleep+0xd9/0xe0
[7.387864]  [] mutex_lock+0x1c/0x2a
[7.402576]  [] sdhci_led_control+0x1a/0x41
[7.417727]  [] led_trigger_event+0x42/0x5c
[7.432807]  [] mmc_request_done+0x56/0x6f
[7.447597]  [] sdhci_finish_work+0xc8/0xcd
[7.462643]  [] ? sdhci_finish_work+0x0/0xcd
[7.477941]  [] worker_thread+0x165/0x1ed
[7.492856]  [] ? sdhci_finish_work+0x0/0xcd
[7.508204]  [] ? autoremove_wake_function+0x0/0x34
[7.524178]  [] ? worker_thread+0x0/0x1ed
[7.538953]  [] kthread+0x63/0x68
[7.552659]  [] ? kthread+0x0/0x68
[7.566349]  [] kernel_thread_helper+0x6/0x10
[7.709931] udev: starting version 141
[7.940374] mmc2: new high speed SDHC card at address e4da
[8.058165] mmcblk0: mmc2:e4da SU04G 3.69 GiB 
[8.135730]  mmcblk0: p1 p2

Full dmesg is at http://chris.printf.net/anton-mutex-dmesg.txt. 
Anton, the kernel is 2.6.35.4-olpc plus your patchset from -mm.
I can think about how to test on an upstream kernel instead, but
perhaps your own tests simply didn't hit sdhci_led_control(). 

Andrew, if you want to drop this while the BUG() and potential
performance regressions are worked out, I'd be happy to keep 
testing patches from Anton until it's without regressions here.

Thanks,

-- 
Chris Ball  
One Laptop Per Child
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] sdhci: Move real work out of an atomic context

2010-09-08 Thread Chris Ball
Hi Anton,

On Thu, Sep 09, 2010 at 01:57:50AM +0400, Anton Vorontsov wrote:
> Thanks!
> 
> Would be also great if you could point out which patch causes
> most of the performance drop (if any)?
> 
> Albert, if you could find time, can you also "bisect" the
> patchset? I wouldn't want to buy Nintendo WII just to debug the
> perf regression. ;-) FWIW, I tried to disable multiblock
> read/writes and test with SD cards, and still didn't notice
> any performance drops.
> 
> Maybe it's SDIO IRQs that cause the performance drop for the
> WII case, as we delay them a little bit? Or it could be the
> patch that introduces threaded IRQ handler in whole causes
> it. If so, I guess we'd need to move some of the processing to
> the real IRQ context, keeping the handler lockless (if
> possible) or introducing a very fine grained locking.

I didn't know anything about a reported performance drop, and I don't
think Andrew did either -- Albert's test results don't seem to have
made it to this list, or anywhere else that I can see.  Could you 
link to/repost his comments?

(I'll be testing with libertas, so that will stress-test SDIO IRQs.)

Thanks,

-- 
Chris Ball  
One Laptop Per Child
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] sdhci: Move real work out of an atomic context

2010-09-08 Thread Chris Ball
Hi Andrew,

On Tue, Sep 07, 2010 at 03:38:13PM -0700, Andrew Morton wrote:
> > I noticed no throughput drop neither with PIO transfers nor
> > with DMA (tested on MPC8569E CPU), while latencies should be
> > greatly improved.
> 
> This patchset isn't causing any problems yet, but may do so in the
> future and will impact the validity of any testing.  It seems to be
> kind of stuck.  Should I drop it all?

I suggest keeping it -- I'll find time to test it out here soon, and
will keep it in mind as a possible regression cause.

Thanks,

-- 
Chris Ball  
One Laptop Per Child
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] sdhci: Move real work out of an atomic context

2010-09-08 Thread Anton Vorontsov
On Wed, Sep 08, 2010 at 11:05:48PM +0100, Chris Ball wrote:
> Hi Anton,
> 
> On Thu, Sep 09, 2010 at 01:57:50AM +0400, Anton Vorontsov wrote:
> > Thanks!
> > 
> > Would be also great if you could point out which patch causes
> > most of the performance drop (if any)?
> > 
> > Albert, if you could find time, can you also "bisect" the
> > patchset? I wouldn't want to buy Nintendo WII just to debug the
> > perf regression. ;-) FWIW, I tried to disable multiblock
> > read/writes and test with SD cards, and still didn't notice
> > any performance drops.
> > 
> > Maybe it's SDIO IRQs that cause the performance drop for the
> > WII case, as we delay them a little bit? Or it could be the
> > patch that introduces threaded IRQ handler in whole causes
> > it. If so, I guess we'd need to move some of the processing to
> > the real IRQ context, keeping the handler lockless (if
> > possible) or introducing a very fine grained locking.
> 
> I didn't know anything about a reported performance drop, and I don't
> think Andrew did either -- Albert's test results don't seem to have
> made it to this list, or anywhere else that I can see.  Could you 
> link to/repost his comments?
> 
> (I'll be testing with libertas, so that will stress-test SDIO IRQs.)

Sure thing, here are Albert's results.

- Forwarded message from Albert Herranz  -

Date: Mon, 02 Aug 2010 21:23:51 +0200
From: Albert Herranz 
To: Anton Vorontsov 
CC: a...@linux-foundation.org, mm-comm...@vger.kernel.org,
ben-li...@fluff.org, m...@console-pimps.org, pie...@ossman.eu,
w.s...@pengutronix.de, m...@bu3sch.de
Subject: Re: + sdhci-use-work-structs-instead-of-tasklets.patch added to -mm
tree

Hi,

Some initial numbers regarding performance. The patchset seems to cause a 
noticeable performance drop.
I've run two iperf client tests (see the two invocations of iperf -c) and two 
iperf server tests (see iperf -s invocation).

== 2.6.33 ==

$ iperf -c 192.168.1.130 

Client connecting to 192.168.1.130, TCP port 5001
TCP window size: 16.0 KByte (default)

[  3] local 192.168.1.127 port 40119 connected with 192.168.1.130 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.1 sec  1.05 MBytes872 Kbits/sec

$ iperf -c 192.168.1.130 

Client connecting to 192.168.1.130, TCP port 5001
TCP window size: 16.0 KByte (default)

[  3] local 192.168.1.127 port 40120 connected with 192.168.1.130 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  1.04 MBytes870 Kbits/sec

$ iperf -s

Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)

[  4] local 192.168.1.127 port 5001 connected with 192.168.1.130 port 36691
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-10.2 sec  3.61 MBytes  2.98 Mbits/sec
[  5] local 192.168.1.127 port 5001 connected with 192.168.1.130 port 36692
[  5]  0.0-10.1 sec  4.94 MBytes  4.09 Mbits/sec


== 2.6.33 + "sdhci: Move real work out of an atomic context" patchset ==

$ iperf -c 192.168.1.130 

Client connecting to 192.168.1.130, TCP port 5001
TCP window size: 16.0 KByte (default)

[  3] local 192.168.1.127 port 39210 connected with 192.168.1.130 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec368 KBytes301 Kbits/sec

$ iperf -c 192.168.1.130 

Client connecting to 192.168.1.130, TCP port 5001
TCP window size: 16.0 KByte (default)

[  3] local 192.168.1.127 port 39211 connected with 192.168.1.130 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.2 sec440 KBytes354 Kbits/sec

$ iperf -s

Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)

[  4] local 192.168.1.127 port 5001 connected with 192.168.1.130 port 57833
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-10.2 sec  2.37 MBytes  1.95 Mbits/sec
[  5] local 192.168.1.127 port 5001 connected with 192.168.1.130 port 57834
[  5]  0.0-10.2 sec  2.30 MBytes  1.90 Mbits/sec

The subjective feeling is too that the system is slower.

Cheers,
Albert

- End forwarded message -
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] sdhci: Move real work out of an atomic context

2010-09-08 Thread Anton Vorontsov
On Wed, Sep 08, 2010 at 10:37:41PM +0100, Chris Ball wrote:
> Hi Andrew,
> 
> On Tue, Sep 07, 2010 at 03:38:13PM -0700, Andrew Morton wrote:
> > > I noticed no throughput drop neither with PIO transfers nor
> > > with DMA (tested on MPC8569E CPU), while latencies should be
> > > greatly improved.
> > 
> > This patchset isn't causing any problems yet, but may do so in the
> > future and will impact the validity of any testing.  It seems to be
> > kind of stuck.  Should I drop it all?
> 
> I suggest keeping it -- I'll find time to test it out here soon, and
> will keep it in mind as a possible regression cause.

Thanks!

Would be also great if you could point out which patch causes
most of the performance drop (if any)?

Albert, if you could find time, can you also "bisect" the
patchset? I wouldn't want to buy Nintendo WII just to debug the
perf regression. ;-) FWIW, I tried to disable multiblock
read/writes and test with SD cards, and still didn't notice
any performance drops.

Maybe it's SDIO IRQs that cause the performance drop for the
WII case, as we delay them a little bit? Or it could be the
patch that introduces threaded IRQ handler in whole causes
it. If so, I guess we'd need to move some of the processing to
the real IRQ context, keeping the handler lockless (if
possible) or introducing a very fine grained locking.

Thanks,

-- 
Anton Vorontsov
email: cbouatmai...@gmail.com
irc://irc.freenode.net/bd2
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] sdhci: Move real work out of an atomic context

2010-09-07 Thread Andrew Morton
On Wed, 14 Jul 2010 17:07:28 +0400
Anton Vorontsov  wrote:

> Hi all,
> 
> Currently the sdhci driver does everything in the atomic context.
> And what is worse, PIO transfers are made from the IRQ handler.
> 
> This causes huge latencies (up to 120 ms). On some P2020 SOCs,
> DMA and card detection is broken, which means that kernel polls
> for the card via PIO transfers every second. Needless to say
> that this is quite bad.
> 
> So, this patch set reworks sdhci code to avoid atomic context,
> almost completely. We only do two device memory operations
> in the atomic context, and all the rest is threaded.
> 
> I noticed no throughput drop neither with PIO transfers nor
> with DMA (tested on MPC8569E CPU), while latencies should be
> greatly improved.
> 

This patchset isn't causing any problems yet, but may do so in the
future and will impact the validity of any testing.  It seems to be
kind of stuck.  Should I drop it all?

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] sdhci: Move real work out of an atomic context

2010-07-21 Thread Andrew Morton
On Wed, 14 Jul 2010 17:07:28 +0400
Anton Vorontsov  wrote:

> Hi all,
> 
> Currently the sdhci driver does everything in the atomic context.
> And what is worse, PIO transfers are made from the IRQ handler.
> 
> This causes huge latencies (up to 120 ms). On some P2020 SOCs,
> DMA and card detection is broken, which means that kernel polls
> for the card via PIO transfers every second. Needless to say
> that this is quite bad.
> 
> So, this patch set reworks sdhci code to avoid atomic context,
> almost completely. We only do two device memory operations
> in the atomic context, and all the rest is threaded.
> 
> I noticed no throughput drop neither with PIO transfers nor
> with DMA (tested on MPC8569E CPU), while latencies should be
> greatly improved.
> 

The patchset looks good to me, but it'd be nice to hear from the other
people who work on this code, please?

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] sdhci: Move real work out of an atomic context

2010-07-15 Thread Matt Fleming
On Wed, 14 Jul 2010 17:07:28 +0400, Anton Vorontsov  
wrote:
> Hi all,
> 
> Currently the sdhci driver does everything in the atomic context.
> And what is worse, PIO transfers are made from the IRQ handler.
> 
> This causes huge latencies (up to 120 ms). On some P2020 SOCs,
> DMA and card detection is broken, which means that kernel polls
> for the card via PIO transfers every second. Needless to say
> that this is quite bad.
> 
> So, this patch set reworks sdhci code to avoid atomic context,
> almost completely. We only do two device memory operations
> in the atomic context, and all the rest is threaded.
> 
> I noticed no throughput drop neither with PIO transfers nor
> with DMA (tested on MPC8569E CPU), while latencies should be
> greatly improved.

I haven't had time to read these patches in detail yet but they all seem
to be sensible changes. A very nice series!
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 0/8] sdhci: Move real work out of an atomic context

2010-07-14 Thread Anton Vorontsov
Hi all,

Currently the sdhci driver does everything in the atomic context.
And what is worse, PIO transfers are made from the IRQ handler.

This causes huge latencies (up to 120 ms). On some P2020 SOCs,
DMA and card detection is broken, which means that kernel polls
for the card via PIO transfers every second. Needless to say
that this is quite bad.

So, this patch set reworks sdhci code to avoid atomic context,
almost completely. We only do two device memory operations
in the atomic context, and all the rest is threaded.

I noticed no throughput drop neither with PIO transfers nor
with DMA (tested on MPC8569E CPU), while latencies should be
greatly improved.

Thanks,

-- 
Anton Vorontsov
email: cbouatmai...@gmail.com
irc://irc.freenode.net/bd2
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev