Re: [PATCH 3/4] nvme: use blk_rq_payload_bytes

2017-01-18 Thread Sagi Grimberg
@@ -1014,9 +1013,9 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue, } Christoph, a little above here we still look at blk_rq_bytes(), shouldn't that look at blk_rq_payload_bytes() too? The check is ok for now as it's just zero vs non-zero. It's somewhat broken for

[bug report] blk-mq-sched: add framework for MQ capable IO schedulers

2017-01-18 Thread Dan Carpenter
Hello Jens Axboe, This is a semi-automatic email about new static checker warnings. The patch bd166ef183c2: "blk-mq-sched: add framework for MQ capable IO schedulers" from Jan 17, 2017, leads to the following Smatch complaint: block/elevator.c:234 elevator_init() error: we previously

Re: [LSF/MM TOPIC] block level event logging for storage media management

2017-01-18 Thread Hannes Reinecke
On 01/19/2017 12:34 AM, Song Liu wrote: > > Media health monitoring is very important for large scale distributed storage > systems. > Traditionally, enterprise storage controllers maintain event logs for > attached storage > devices. However, these controller managed logs do not scale well

Re: [LSF/MM TOPIC] block level event logging for storage media management

2017-01-18 Thread Coly Li
On 2017/1/19 上午7:34, Song Liu wrote: > > Media health monitoring is very important for large scale distributed storage > systems. > Traditionally, enterprise storage controllers maintain event logs for > attached storage > devices. However, these controller managed logs do not scale well for

Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems

2017-01-18 Thread Darrick J. Wong
On Wed, Jan 18, 2017 at 03:39:17PM -0500, Jeff Moyer wrote: > Jan Kara writes: > > > On Tue 17-01-17 15:14:21, Vishal Verma wrote: > >> Your note on the online repair does raise another tangentially related > >> topic. Currently, if there are badblocks, writes via the bio

RE: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems

2017-01-18 Thread Slava Dubeyko
-Original Message- From: Jeff Moyer [mailto:jmo...@redhat.com] Sent: Wednesday, January 18, 2017 12:48 PM To: Slava Dubeyko Cc: Jan Kara ; linux-nvd...@lists.01.org ; linux-block@vger.kernel.org; Viacheslav Dubeyko

Re: [LSF/MM TOPIC] block level event logging for storage media management

2017-01-18 Thread Bart Van Assche
On Wed, 2017-01-18 at 23:34 +, Song Liu wrote: > Media health monitoring is very important for large scale distributed storage > systems. > Traditionally, enterprise storage controllers maintain event logs for > attached storage > devices. However, these controller managed logs do not scale

Re: [PATCH for-4.10] blk-mq: Remove unused variable

2017-01-18 Thread Keith Busch
On Wed, Jan 18, 2017 at 02:21:48PM -0800, Jens Axboe wrote: > On 01/18/2017 02:16 PM, Jens Axboe wrote: > > On 01/18/2017 02:21 PM, Keith Busch wrote: > >> Signed-off-by: Keith Busch > >> Reviewed-by: Christoph Hellwig > >> Reviewed-by: Sagi Grimberg

Re: [PATCH for-4.10] blk-mq: Remove unused variable

2017-01-18 Thread Jens Axboe
On 01/18/2017 02:16 PM, Jens Axboe wrote: > On 01/18/2017 02:21 PM, Keith Busch wrote: >> Signed-off-by: Keith Busch >> Reviewed-by: Christoph Hellwig >> Reviewed-by: Sagi Grimberg > > Does it cause a warning anywhere? If not, I'd rather

[PATCH for-4.10] blk-mq: Remove unused variable

2017-01-18 Thread Keith Busch
Signed-off-by: Keith Busch Reviewed-by: Christoph Hellwig Reviewed-by: Sagi Grimberg --- block/blk-mq.c | 1 - 1 file changed, 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index a8e67a1..c3400b5 100644 --- a/block/blk-mq.c

Re: [PATCH 1/2] sbitmap: use smp_mb__after_atomic() in sbq_wake_up()

2017-01-18 Thread Jens Axboe
On 01/18/2017 11:55 AM, Omar Sandoval wrote: > From: Omar Sandoval > > We always do an atomic clear_bit() right before we call sbq_wake_up(), > so we can use smp_mb__after_atomic(). While we're here, comment the > memory barriers in here a little more. Thanks Omar, applied 1-2.

Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems

2017-01-18 Thread Jeff Moyer
Jan Kara writes: > On Tue 17-01-17 15:14:21, Vishal Verma wrote: >> Your note on the online repair does raise another tangentially related >> topic. Currently, if there are badblocks, writes via the bio submission >> path will clear the error (if the hardware is able to remap the

Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems

2017-01-18 Thread Jeff Moyer
Slava Dubeyko writes: >> Well, the situation with NVM is more like with DRAM AFAIU. It is quite >> reliable >> but given the size the probability *some* cell has degraded is quite high. >> And similar to DRAM you'll get MCE (Machine Check Exception) when you try >>

[PATCH 1/2] sbitmap: use smp_mb__after_atomic() in sbq_wake_up()

2017-01-18 Thread Omar Sandoval
From: Omar Sandoval We always do an atomic clear_bit() right before we call sbq_wake_up(), so we can use smp_mb__after_atomic(). While we're here, comment the memory barriers in here a little more. Signed-off-by: Omar Sandoval --- lib/sbitmap.c | 13

[PATCH 2/2] sbitmap: fix wakeup hang after sbq resize

2017-01-18 Thread Omar Sandoval
From: Omar Sandoval When we resize a struct sbitmap_queue, we update the wakeup batch size, but we don't update the wait count in the struct sbq_wait_states. If we resized down from a size which could use a bigger batch size, these counts could be too large and cause us to miss

Re: [LSF/MM TOPIC] Future direction of DAX

2017-01-18 Thread Ross Zwisler
On Tue, Jan 17, 2017 at 09:25:33PM -0800, wi...@bombadil.infradead.org wrote: > On Fri, Jan 13, 2017 at 05:20:08PM -0700, Ross Zwisler wrote: > > We still have a lot of work to do, though, and I'd like to propose a > > discussion > > around what features people would like to see enabled in the

Re: [PATCHSET v4] blk-mq-scheduling framework

2017-01-18 Thread Jens Axboe
On 01/18/2017 08:14 AM, Paolo Valente wrote: > according to the function blk_mq_sched_put_request, the > mq.completed_request hook seems to always be invoked (if set) for a > request for which the mq.put_rq_priv is invoked (if set). Correct, any request that came out of blk_mq_sched_get_request()

Re: [PATCHSET v4] blk-mq-scheduling framework

2017-01-18 Thread Paolo Valente
> Il giorno 17 gen 2017, alle ore 11:49, Paolo Valente > ha scritto: > > [NEW RESEND ATTEMPT] > >> Il giorno 17 gen 2017, alle ore 03:47, Jens Axboe ha scritto: >> >> On 12/22/2016 08:28 AM, Paolo Valente wrote: >>> Il giorno 19 dic 2016, alle

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Hannes Reinecke
On 01/18/2017 04:16 PM, Johannes Thumshirn wrote: > On Wed, Jan 18, 2017 at 05:14:36PM +0200, Sagi Grimberg wrote: >> >>> Hannes just spotted this: >>> static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx, >>> const struct blk_mq_queue_data *bd) >>> { >>> [...] >>>

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Sagi Grimberg
Hannes just spotted this: static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd) { [...] __nvme_submit_cmd(nvmeq, ); nvme_process_cq(nvmeq); spin_unlock_irq(>q_lock); return BLK_MQ_RQ_QUEUE_OK;

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Sagi Grimberg
Your report provided this stats with one-completion dominance for the single-threaded case. Does it also hold if you run multiple fio threads per core? It's useless to run more threads on that core, it's already fully utilized. That single threads is already posting a fair amount of

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Andrey Kuzmin
On Wed, Jan 18, 2017 at 5:27 PM, Sagi Grimberg wrote: > >> So what you say is you saw a consomed == 1 [1] most of the time? >> >> [1] from >> http://git.infradead.org/nvme.git/commitdiff/eed5a9d925c59e43980047059fde29e3aa0b7836 > > > Exactly. By processing 1 completion per

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Sagi Grimberg
So what you say is you saw a consomed == 1 [1] most of the time? [1] from http://git.infradead.org/nvme.git/commitdiff/eed5a9d925c59e43980047059fde29e3aa0b7836 Exactly. By processing 1 completion per interrupt it makes perfect sense why this performs poorly, it's not worth paying the

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Hannes Reinecke
On 01/17/2017 05:50 PM, Sagi Grimberg wrote: > >> So it looks like we are super not efficient because most of the >> times we catch 1 >> completion per interrupt and the whole point is that we need to find >> more! This fio >> is single threaded with QD=32 so I'd expect that

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Johannes Thumshirn
On Tue, Jan 17, 2017 at 06:38:43PM +0200, Sagi Grimberg wrote: > > >Just for the record, all tests you've run are with the upper irq_poll_budget > >of > >256 [1]? > > Yes, but that's the point, I never ever reach this budget because > I'm only processing 1-2 completions per interrupt. > > >We

Re: kernel oops with blk-mq-sched latest

2017-01-18 Thread Hannes Reinecke
On 01/17/2017 02:00 PM, Jens Axboe wrote: > On 01/17/2017 04:47 AM, Jens Axboe wrote: >> On 01/17/2017 12:57 AM, Hannes Reinecke wrote: >>> Hi Jens, >>> >>> I gave your latest patchset from >>> >>> git.kernel.dk/linux-block blk-mq-sched >>> >>> I see a kernel oops when shutting down: >>> >>> [

Re: [Lsf-pc] [LSF/MM ATTEND] Un-addressable device memory and block/fs implications

2017-01-18 Thread Jan Kara
On Fri 16-12-16 08:44:11, Aneesh Kumar K.V wrote: > Jerome Glisse writes: > > > I would like to discuss un-addressable device memory in the context of > > filesystem and block device. Specificaly how to handle write-back, read, > > ... when a filesystem page is migrated to

Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems

2017-01-18 Thread Jan Kara
On Tue 17-01-17 15:14:21, Vishal Verma wrote: > Your note on the online repair does raise another tangentially related > topic. Currently, if there are badblocks, writes via the bio submission > path will clear the error (if the hardware is able to remap the bad > locations). However, if the

[PATCH] genhd: Do not hold event lock when scheduling workqueue elements

2017-01-18 Thread Hannes Reinecke
When scheduling workqueue elements the callback function might be called directly, so holding the event lock is potentially dangerous as it might lead to a deadlock: [ 989.542827] INFO: task systemd-udevd:459 blocked for more than 480 seconds. [ 989.609721] Not tainted 4.10.0-rc4+ #546 [

Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems

2017-01-18 Thread Jan Kara
On Tue 17-01-17 15:37:05, Vishal Verma wrote: > I do mean that in the filesystem, for every IO, the badblocks will be > checked. Currently, the pmem driver does this, and the hope is that the > filesystem can do a better job at it. The driver unconditionally checks > every IO for badblocks on the

Re: [PATCH 3/4] nvme: use blk_rq_payload_bytes

2017-01-18 Thread Christoph Hellwig
On Tue, Jan 17, 2017 at 10:06:51PM +0200, Sagi Grimberg wrote: > >> @@ -1014,9 +1013,9 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue >> *queue, >> } >> > > Christoph, a little above here we still look at blk_rq_bytes(), > shouldn't that look at blk_rq_payload_bytes() too? The