Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-20 Thread Johannes Thumshirn
On Tue, Jan 17, 2017 at 05:45:53PM +0200, Sagi Grimberg wrote: > > >-- > >[1] > >queue = b'nvme0q1' > > usecs : count distribution > > 0 -> 1 : 7310 || > > 2 -> 3 : 11 | | > > 4 -> 7

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-19 Thread Jens Axboe
On 01/18/2017 06:02 AM, Hannes Reinecke wrote: > On 01/17/2017 05:50 PM, Sagi Grimberg wrote: >> >>> So it looks like we are super not efficient because most of the >>> times we catch 1 >>> completion per interrupt and the whole point is that we need to find >>> more! This fio >>>

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-19 Thread Hannes Reinecke
On 01/19/2017 11:57 AM, Ming Lei wrote: > On Wed, Jan 11, 2017 at 11:07 PM, Jens Axboe wrote: >> On 01/11/2017 06:43 AM, Johannes Thumshirn wrote: >>> Hi all, >>> >>> I'd like to attend LSF/MM and would like to discuss polling for block >>> drivers. >>> >>> Currently there is

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-19 Thread Ming Lei
On Wed, Jan 11, 2017 at 11:07 PM, Jens Axboe wrote: > On 01/11/2017 06:43 AM, Johannes Thumshirn wrote: >> Hi all, >> >> I'd like to attend LSF/MM and would like to discuss polling for block >> drivers. >> >> Currently there is blk-iopoll but it is neither as widely used as NAPI

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-19 Thread Sagi Grimberg
Christoph suggest to me once that we can take a hybrid approach where we consume a small amount of completions (say 4) right away from the interrupt handler and if we have more we schedule irq-poll to reap the rest. But back then it didn't work better which is not aligned with my observations

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-19 Thread Johannes Thumshirn
On Thu, Jan 19, 2017 at 10:23:28AM +0200, Sagi Grimberg wrote: > Christoph suggest to me once that we can take a hybrid > approach where we consume a small amount of completions (say 4) > right away from the interrupt handler and if we have more > we schedule irq-poll to reap the rest. But back

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-19 Thread Johannes Thumshirn
On Thu, Jan 19, 2017 at 10:12:17AM +0200, Sagi Grimberg wrote: > > >>>I think you missed: > >>>http://git.infradead.org/nvme.git/commit/49c91e3e09dc3c9dd1718df85112a8cce3ab7007 > >> > >>I indeed did, thanks. > >> > >But it doesn't help. > > > >We're still having to wait for the first interrupt,

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-19 Thread Sagi Grimberg
I think you missed: http://git.infradead.org/nvme.git/commit/49c91e3e09dc3c9dd1718df85112a8cce3ab7007 I indeed did, thanks. But it doesn't help. We're still having to wait for the first interrupt, and if we're really fast that's the only completion we have to process. Try this: diff

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Hannes Reinecke
On 01/18/2017 04:16 PM, Johannes Thumshirn wrote: > On Wed, Jan 18, 2017 at 05:14:36PM +0200, Sagi Grimberg wrote: >> >>> Hannes just spotted this: >>> static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx, >>> const struct blk_mq_queue_data *bd) >>> { >>> [...] >>>

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Andrey Kuzmin
On Wed, Jan 18, 2017 at 5:40 PM, Sagi Grimberg wrote: > >> Your report provided this stats with one-completion dominance for the >> single-threaded case. Does it also hold if you run multiple fio >> threads per core? > > > It's useless to run more threads on that core, it's

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Johannes Thumshirn
On Wed, Jan 18, 2017 at 05:14:36PM +0200, Sagi Grimberg wrote: > > >Hannes just spotted this: > >static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx, > > const struct blk_mq_queue_data *bd) > >{ > >[...] > >__nvme_submit_cmd(nvmeq, ); > >

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Sagi Grimberg
Hannes just spotted this: static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd) { [...] __nvme_submit_cmd(nvmeq, ); nvme_process_cq(nvmeq); spin_unlock_irq(>q_lock); return BLK_MQ_RQ_QUEUE_OK;

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Johannes Thumshirn
On Wed, Jan 18, 2017 at 04:27:24PM +0200, Sagi Grimberg wrote: > > >So what you say is you saw a consomed == 1 [1] most of the time? > > > >[1] from > >http://git.infradead.org/nvme.git/commitdiff/eed5a9d925c59e43980047059fde29e3aa0b7836 > > Exactly. By processing 1 completion per interrupt it

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Sagi Grimberg
Your report provided this stats with one-completion dominance for the single-threaded case. Does it also hold if you run multiple fio threads per core? It's useless to run more threads on that core, it's already fully utilized. That single threads is already posting a fair amount of

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Andrey Kuzmin
On Wed, Jan 18, 2017 at 5:27 PM, Sagi Grimberg wrote: > >> So what you say is you saw a consomed == 1 [1] most of the time? >> >> [1] from >> http://git.infradead.org/nvme.git/commitdiff/eed5a9d925c59e43980047059fde29e3aa0b7836 > > > Exactly. By processing 1 completion per

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Sagi Grimberg
So what you say is you saw a consomed == 1 [1] most of the time? [1] from http://git.infradead.org/nvme.git/commitdiff/eed5a9d925c59e43980047059fde29e3aa0b7836 Exactly. By processing 1 completion per interrupt it makes perfect sense why this performs poorly, it's not worth paying the

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Hannes Reinecke
On 01/17/2017 05:50 PM, Sagi Grimberg wrote: > >> So it looks like we are super not efficient because most of the >> times we catch 1 >> completion per interrupt and the whole point is that we need to find >> more! This fio >> is single threaded with QD=32 so I'd expect that

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Johannes Thumshirn
On Tue, Jan 17, 2017 at 06:38:43PM +0200, Sagi Grimberg wrote: > > >Just for the record, all tests you've run are with the upper irq_poll_budget > >of > >256 [1]? > > Yes, but that's the point, I never ever reach this budget because > I'm only processing 1-2 completions per interrupt. > > >We

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-17 Thread Sagi Grimberg
So it looks like we are super not efficient because most of the times we catch 1 completion per interrupt and the whole point is that we need to find more! This fio is single threaded with QD=32 so I'd expect that we be somewhere in 8-31 almost all the time... I also

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-17 Thread Johannes Thumshirn
On Tue, Jan 17, 2017 at 06:15:43PM +0200, Sagi Grimberg wrote: > Oh, and the current code that was tested can be found at: > > git://git.infradead.org/nvme.git nvme-irqpoll Just for the record, all tests you've run are with the upper irq_poll_budget of 256 [1]? We (Hannes and me) recently

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-17 Thread Sagi Grimberg
Just for the record, all tests you've run are with the upper irq_poll_budget of 256 [1]? Yes, but that's the point, I never ever reach this budget because I'm only processing 1-2 completions per interrupt. We (Hannes and me) recently stumbed accross this when trying to poll for more than

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-17 Thread Sagi Grimberg
Oh, and the current code that was tested can be found at: git://git.infradead.org/nvme.git nvme-irqpoll -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-17 Thread Sagi Grimberg
Hey, so I made some initial analysis of whats going on with irq-poll. First, I sampled how much time it takes before we get the interrupt in nvme_irq and the initial visit to nvme_irqpoll_handler. I ran a single threaded fio with QD=32 of 4K reads. This is two displays of a histogram of the

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-17 Thread Sagi Grimberg
-- [1] queue = b'nvme0q1' usecs : count distribution 0 -> 1 : 7310 || 2 -> 3 : 11 | | 4 -> 7 : 10 | | 8 -> 15 : 20 | |

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-13 Thread Johannes Thumshirn
On Wed, Jan 11, 2017 at 08:13:02AM -0700, Jens Axboe wrote: > On 01/11/2017 08:07 AM, Jens Axboe wrote: > > On 01/11/2017 06:43 AM, Johannes Thumshirn wrote: > >> Hi all, > >> > >> I'd like to attend LSF/MM and would like to discuss polling for block > >> drivers. > >> > >> Currently there is

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-12 Thread Bart Van Assche
On Thu, 2017-01-12 at 10:41 +0200, Sagi Grimberg wrote: > First, when the nvme device fires an interrupt, the driver consumes > the completion(s) from the interrupt (usually there will be some more > completions waiting in the cq by the time the host start processing it). > With irq-poll, we

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-12 Thread Johannes Thumshirn
On Thu, Jan 12, 2017 at 01:44:05PM +0200, Sagi Grimberg wrote: [...] > Its pretty basic: > -- > [global] > group_reporting > cpus_allowed=0 > cpus_allowed_policy=split > rw=randrw > bs=4k > numjobs=4 > iodepth=32 > runtime=60 > time_based > loops=1 > ioengine=libaio > direct=1 > invalidate=1 >

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-12 Thread Sagi Grimberg
I agree with Jens that we'll need some analysis if we want the discussion to be affective, and I can spend some time this if I can find volunteers with high-end nvme devices (I only have access to client nvme devices. I have a P3700 but somehow burned the FW. Let me see if I can bring it back

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-12 Thread Johannes Thumshirn
On Thu, Jan 12, 2017 at 10:23:47AM +0200, Sagi Grimberg wrote: > > >>>Hi all, > >>> > >>>I'd like to attend LSF/MM and would like to discuss polling for block > >>>drivers. > >>> > >>>Currently there is blk-iopoll but it is neither as widely used as NAPI in > >>>the > >>>networking field and

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-12 Thread sagi grimberg
A typical Ethernet network adapter delays the generation of an interrupt after it has received a packet. A typical block device or HBA does not delay the generation of an interrupt that reports an I/O completion. >>> >>> NVMe allows for configurable interrupt coalescing,

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-12 Thread Sagi Grimberg
I'd like to attend LSF/MM and would like to discuss polling for block drivers. Currently there is blk-iopoll but it is neither as widely used as NAPI in the networking field and accoring to Sagi's findings in [1] performance with polling is not on par with IRQ usage. On LSF/MM I'd like to

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-12 Thread Sagi Grimberg
Hi all, I'd like to attend LSF/MM and would like to discuss polling for block drivers. Currently there is blk-iopoll but it is neither as widely used as NAPI in the networking field and accoring to Sagi's findings in [1] performance with polling is not on par with IRQ usage. On LSF/MM I'd

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Stephen Bates
> > This is a separate topic. The initial proposal is for polling for > interrupt mitigation, you are talking about polling in the context of > polling for completion of an IO. > > We can definitely talk about this form of polling as well, but it should > be a separate topic and probably proposed

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Stephen Bates
>> >> I'd like to attend LSF/MM and would like to discuss polling for block >> drivers. >> >> Currently there is blk-iopoll but it is neither as widely used as NAPI >> in the networking field and accoring to Sagi's findings in [1] >> performance with polling is not on par with IRQ usage. >> >> On

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Jens Axboe
On 01/11/2017 09:36 PM, Stephen Bates wrote: >>> >>> I'd like to attend LSF/MM and would like to discuss polling for block >>> drivers. >>> >>> Currently there is blk-iopoll but it is neither as widely used as NAPI >>> in the networking field and accoring to Sagi's findings in [1] >>> performance

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Hannes Reinecke
On 01/11/2017 05:26 PM, Bart Van Assche wrote: > On Wed, 2017-01-11 at 17:22 +0100, Hannes Reinecke wrote: >> On 01/11/2017 05:12 PM, h...@infradead.org wrote: >>> On Wed, Jan 11, 2017 at 04:08:31PM +, Bart Van Assche wrote: A typical Ethernet network adapter delays the generation of an

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Bart Van Assche
On Wed, 2017-01-11 at 17:22 +0100, Hannes Reinecke wrote: > On 01/11/2017 05:12 PM, h...@infradead.org wrote: > > On Wed, Jan 11, 2017 at 04:08:31PM +, Bart Van Assche wrote: > > > A typical Ethernet network adapter delays the generation of an > > > interrupt > > > after it has received a

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Hannes Reinecke
On 01/11/2017 05:12 PM, h...@infradead.org wrote: > On Wed, Jan 11, 2017 at 04:08:31PM +, Bart Van Assche wrote: >> A typical Ethernet network adapter delays the generation of an interrupt >> after it has received a packet. A typical block device or HBA does not delay >> the generation of an

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Jens Axboe
On 01/11/2017 09:12 AM, h...@infradead.org wrote: > On Wed, Jan 11, 2017 at 04:08:31PM +, Bart Van Assche wrote: >> A typical Ethernet network adapter delays the generation of an interrupt >> after it has received a packet. A typical block device or HBA does not delay >> the generation of an

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Johannes Thumshirn
On Wed, Jan 11, 2017 at 04:08:31PM +, Bart Van Assche wrote: [...] > A typical Ethernet network adapter delays the generation of an interrupt > after it has received a packet. A typical block device or HBA does not delay > the generation of an interrupt that reports an I/O completion. I

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Bart Van Assche
On Wed, 2017-01-11 at 14:43 +0100, Johannes Thumshirn wrote: > I'd like to attend LSF/MM and would like to discuss polling for block > drivers. > > Currently there is blk-iopoll but it is neither as widely used as NAPI in > the networking field and accoring to Sagi's findings in [1] performance >

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread h...@infradead.org
On Wed, Jan 11, 2017 at 04:08:31PM +, Bart Van Assche wrote: > A typical Ethernet network adapter delays the generation of an interrupt > after it has received a packet. A typical block device or HBA does not delay > the generation of an interrupt that reports an I/O completion. NVMe allows

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Hannes Reinecke
On 01/11/2017 04:07 PM, Jens Axboe wrote: > On 01/11/2017 06:43 AM, Johannes Thumshirn wrote: >> Hi all, >> >> I'd like to attend LSF/MM and would like to discuss polling for block >> drivers. >> >> Currently there is blk-iopoll but it is neither as widely used as NAPI in the >> networking field

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Jens Axboe
On 01/11/2017 08:07 AM, Jens Axboe wrote: > On 01/11/2017 06:43 AM, Johannes Thumshirn wrote: >> Hi all, >> >> I'd like to attend LSF/MM and would like to discuss polling for block >> drivers. >> >> Currently there is blk-iopoll but it is neither as widely used as NAPI in the >> networking field

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Jens Axboe
On 01/11/2017 06:43 AM, Johannes Thumshirn wrote: > Hi all, > > I'd like to attend LSF/MM and would like to discuss polling for block drivers. > > Currently there is blk-iopoll but it is neither as widely used as NAPI in the > networking field and accoring to Sagi's findings in [1] performance

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Hannes Reinecke
On 01/11/2017 02:43 PM, Johannes Thumshirn wrote: > Hi all, > > I'd like to attend LSF/MM and would like to discuss polling for block drivers. > > Currently there is blk-iopoll but it is neither as widely used as NAPI in the > networking field and accoring to Sagi's findings in [1] performance