Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-07-03 Thread Shaohua Li
On Thu, Jun 20, 2013 at 04:17:13PM -0400, Matthew Wilcox wrote: > > A paper at FAST2012 > (http://static.usenix.org/events/fast12/tech/full_papers/Yang.pdf) pointed > out the performance overhead of taking interrupts for low-latency block > I/Os. The solution the author investigated was to spin w

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-27 Thread Rik van Riel
On 06/20/2013 04:17 PM, Matthew Wilcox wrote: --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4527,6 +4527,36 @@ long __sched io_schedule_timeout(long timeout) return ret; } +/* + * Wait for an I/O to complete against this backing_dev_info. If the + * task exhausts its timesl

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-27 Thread Rik van Riel
On 06/23/2013 02:29 PM, Linus Torvalds wrote: You could try to do that either *in* the idle thread (which would take the context switch overhead - maybe negating some of the advantages), or alternatively hook into the scheduler idle logic before actually doing the switch. But anything that star

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-25 Thread Jens Axboe
On Mon, Jun 24 2013, Matthew Wilcox wrote: > On Mon, Jun 24, 2013 at 10:07:51AM +0200, Ingo Molnar wrote: > > I'm wondering, how will this scheme work if the IO completion latency is a > > lot more than the 5 usecs in the testcase? What if it takes 20 usecs or > > 100 usecs or more? > > There's

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-25 Thread Jens Axboe
On Mon, Jun 24 2013, Steven Rostedt wrote: > On Mon, Jun 24, 2013 at 09:17:18AM +0200, Jens Axboe wrote: > > On Sun, Jun 23 2013, Linus Torvalds wrote: > > > > > > You could try to do that either *in* the idle thread (which would take > > > the context switch overhead - maybe negating some of the

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-25 Thread Jens Axboe
On Mon, Jun 24 2013, Matthew Wilcox wrote: > On Mon, Jun 24, 2013 at 09:15:45AM +0200, Jens Axboe wrote: > > Willy, I think the general design is fine, hooking in via the bdi is the > > only way to get back to the right place from where you need to sleep. > > Some thoughts: > > > > - This should b

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-25 Thread Steven Rostedt
On Mon, 2013-06-24 at 23:07 -0400, Matthew Wilcox wrote: > On Mon, Jun 24, 2013 at 08:11:02PM -0400, Steven Rostedt wrote: > > What about hooking into the idle_balance code? That happens if we are > > about to go to idle but before the full schedule switch to the idle > > task. > > > > > > In __s

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-25 Thread Bart Van Assche
On 06/25/13 05:18, Matthew Wilcox wrote: On Mon, Jun 24, 2013 at 10:07:51AM +0200, Ingo Molnar wrote: I'm wondering, how will this scheme work if the IO completion latency is a lot more than the 5 usecs in the testcase? What if it takes 20 usecs or 100 usecs or more? There's clearly a threshol

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-24 Thread Matthew Wilcox
On Mon, Jun 24, 2013 at 10:07:51AM +0200, Ingo Molnar wrote: > I'm wondering, how will this scheme work if the IO completion latency is a > lot more than the 5 usecs in the testcase? What if it takes 20 usecs or > 100 usecs or more? There's clearly a threshold at which it stops making sense, and

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-24 Thread Matthew Wilcox
On Mon, Jun 24, 2013 at 08:11:02PM -0400, Steven Rostedt wrote: > What about hooking into the idle_balance code? That happens if we are > about to go to idle but before the full schedule switch to the idle > task. > > > In __schedule(void): > > if (unlikely(!rq->nr_running)) >

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-24 Thread Matthew Wilcox
On Mon, Jun 24, 2013 at 09:15:45AM +0200, Jens Axboe wrote: > Willy, I think the general design is fine, hooking in via the bdi is the > only way to get back to the right place from where you need to sleep. > Some thoughts: > > - This should be hooked in via blk-iopoll, both of them should call in

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-24 Thread Steven Rostedt
On Mon, Jun 24, 2013 at 09:17:18AM +0200, Jens Axboe wrote: > On Sun, Jun 23 2013, Linus Torvalds wrote: > > > > You could try to do that either *in* the idle thread (which would take > > the context switch overhead - maybe negating some of the advantages), > > or alternatively hook into the sched

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-24 Thread Ingo Molnar
* David Ahern wrote: > On 6/23/13 3:09 AM, Ingo Molnar wrote: > >If an IO driver is implemented properly then it will batch up requests for > >the controller, and gets IRQ-notified on a (sub-)batch of buffers > >completed. > > > >If there's any spinning done then it should be NAPI-alike polling:

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-24 Thread Ingo Molnar
* Jens Axboe wrote: > - With the former note, the app either needs to opt in (and hence > willingly sacrifice CPU cycles of its scheduling slice) or it needs to > be nicer in when it gives up and goes back to irq driven IO. The scheduler could look at sleep latency averages of the task in

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-24 Thread Ingo Molnar
* Linus Torvalds wrote: > On Sun, Jun 23, 2013 at 12:09 AM, Ingo Molnar wrote: > > > > The spinning approach you add has the disadvantage of actively wasting > > CPU time, which could be used to run other tasks. In general it's much > > better to make sure the completion IRQs are rate-limited

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-24 Thread Jens Axboe
On Sun, Jun 23 2013, Linus Torvalds wrote: > nothing in common. Networking very very seldom > has the kind of "submit and wait for immediate result" issues that > disk reads do. > > That said, I dislike the patch intensely. I do not think it's at all a > good idea to look at "need_resched" to say

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-24 Thread Jens Axboe
On Sun, Jun 23 2013, Ingo Molnar wrote: > I'm wondering why this makes such a performance difference. They key ingredient here is simply not going to sleep, only to get an IRQ and get woken up very shortly again. NAPI and similar approaches work great for high IOPS cases, where you maintain a cert

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-23 Thread David Ahern
On 6/23/13 3:09 AM, Ingo Molnar wrote: If an IO driver is implemented properly then it will batch up requests for the controller, and gets IRQ-notified on a (sub-)batch of buffers completed. If there's any spinning done then it should be NAPI-alike polling: a single "is stuff completed" polling

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-23 Thread Linus Torvalds
On Sun, Jun 23, 2013 at 12:09 AM, Ingo Molnar wrote: > > The spinning approach you add has the disadvantage of actively wasting CPU > time, which could be used to run other tasks. In general it's much better > to make sure the completion IRQs are rate-limited and just schedule. This > (combined wi

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-23 Thread Ingo Molnar
* Matthew Wilcox wrote: > > A paper at FAST2012 > (http://static.usenix.org/events/fast12/tech/full_papers/Yang.pdf) pointed > out the performance overhead of taking interrupts for low-latency block > I/Os. The solution the author investigated was to spin waiting for each > I/O to complete. T

RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-20 Thread Matthew Wilcox
A paper at FAST2012 (http://static.usenix.org/events/fast12/tech/full_papers/Yang.pdf) pointed out the performance overhead of taking interrupts for low-latency block I/Os. The solution the author investigated was to spin waiting for each I/O to complete. This is inefficient as Linux submits man