Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-02-28 Thread Omar Sandoval
On Tue, Feb 28, 2017 at 11:25:28PM -0800, Omar Sandoval wrote: > On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote: > > On Tue 21-02-17 10:19:28, Jens Axboe wrote: > > > On 02/21/2017 10:09 AM, Jan Kara wrote: > > > > Hello, > > > > > > > > this is a second revision of the patch set to fix

Re: connect cmd error for nvme-rdma with eventual kernel crash

2017-02-28 Thread Omar Sandoval
On Wed, Mar 01, 2017 at 04:55:23AM +, Parav Pandit wrote: > Hi Jens, > > > -Original Message- > > From: Jens Axboe [mailto:ax...@kernel.dk] > > Subject: Re: connect cmd error for nvme-rdma with eventual kernel crash > > > > > On Feb 28, 2017, at 5:57 PM, Parav Pandit

Re: [PATCH 0/3] Separate zone requests from medium access requests

2017-02-28 Thread Martin K. Petersen
> "Damien" == Damien Le Moal writes: Damien, Damien> The problem remains that the mpt3sas driver needs fixing. As you Damien> suggest, we can do that in sd, or directly in mpt3sas. I tried Damien> to do a clean fix in sd, but always end up consuming a lot of Damien>

RE: connect cmd error for nvme-rdma with eventual kernel crash

2017-02-28 Thread Parav Pandit
Hi Jens, > -Original Message- > From: Jens Axboe [mailto:ax...@kernel.dk] > Subject: Re: connect cmd error for nvme-rdma with eventual kernel crash > > > On Feb 28, 2017, at 5:57 PM, Parav Pandit wrote: > > > > Hi Jens, > > > > With your commit

Re: [PATCH 3/8] nowait aio: return if direct write will trigger writeback

2017-02-28 Thread Matthew Wilcox
On Tue, Feb 28, 2017 at 05:36:05PM -0600, Goldwyn Rodrigues wrote: > Find out if the write will trigger a wait due to writeback. If yes, > return -EAGAIN. > > This introduces a new function filemap_range_has_page() which > returns true if the file's mapping has a page within the range >

Re: [PATCH 0/3] Separate zone requests from medium access requests

2017-02-28 Thread Damien Le Moal
Martin, On 3/1/17 11:52, Martin K. Petersen wrote: >> "Christoph" == Christoph Hellwig writes: > > Christoph> I don't really like this too much - this is too many SCSI > Christoph> specifics for the block layer to care. Maybe using bios for > Christoph> the zone ops was a

Re: [PATCH 0/3] Separate zone requests from medium access requests

2017-02-28 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig writes: Christoph> I don't really like this too much - this is too many SCSI Christoph> specifics for the block layer to care. Maybe using bios for Christoph> the zone ops was a mistake after all, and we should just have Christoph> operations

[PATCH] cfq-iosched: fix the delay of cfq_group's vdisktime under iops mode

2017-02-28 Thread Hou Tao
When adding a cfq_group into the cfq service tree, we use CFQ_IDLE_DELAY as the delay of cfq_group's vdisktime if there have been other cfq_groups already. When cfq is under iops mode, commit 9a7f38c42c2b ("cfq-iosched: Convert from jiffies to nanoseconds") could result in a large iops delay and

Re: connect cmd error for nvme-rdma with eventual kernel crash

2017-02-28 Thread Jens Axboe
> On Feb 28, 2017, at 5:57 PM, Parav Pandit wrote: > > Hi Jens, > > With your commit 2af8cbe30531eca73c8f3ba277f155fc0020b01a in linux-block git > tree, > There are two requests tables. Static and dynamic of same size. > However function blk_mq_tag_to_rq() always try to

[PATCH] libsas: add sas_unregister_devs_sas_addr in flutter case

2017-02-28 Thread Jason Yan
In sas_rediscover_dev when we call sas_get_phy_attached_dev to find the device is ok and when in the flutter case when we call sas_ex_phy_discover the device is gone, the sas_addr was changed to zero. [300247.584696] sas: ex 500e004aaa1f phy0 originated BROADCAST(CHANGE) [300247.663516] sas:

connect cmd error for nvme-rdma with eventual kernel crash

2017-02-28 Thread Parav Pandit
Hi Jens, With your commit 2af8cbe30531eca73c8f3ba277f155fc0020b01a in linux-block git tree, There are two requests tables. Static and dynamic of same size. However function blk_mq_tag_to_rq() always try to get the tag from the dynamic table which doesn't seem to be always initialized. I am

[PATCH 4/8] nowait aio: Introduce IOMAP_NOWAIT

2017-02-28 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues IOCB_NOWAIT translates to IOMAP_NOWAIT for iomaps. This is used by XFS in the XFS patch. Signed-off-by: Goldwyn Rodrigues --- fs/iomap.c| 2 ++ include/linux/iomap.h | 1 + 2 files changed, 3 insertions(+) diff --git

[PATCH 7/8] nowait aio: xfs

2017-02-28 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable immediately. IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin if it needs allocation either due to file extending, writing to a hole, or COW. Signed-off-by: Goldwyn Rodrigues

[PATCH 8/8] nowait aio: btrfs

2017-02-28 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues Return EAGAIN if any of the following checks fail + i_rwsem is not lockable + NODATACOW or PREALLOC is not set + Cannot nocow at the desired location + Writing beyond end of file which is not allocated Signed-off-by: Goldwyn Rodrigues

[PATCH 3/8] nowait aio: return if direct write will trigger writeback

2017-02-28 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues Find out if the write will trigger a wait due to writeback. If yes, return -EAGAIN. This introduces a new function filemap_range_has_page() which returns true if the file's mapping has a page within the range mentioned. Return -EINVAL for buffered

[PATCH 0/8 v2] Non-blocking AIO

2017-02-28 Thread Goldwyn Rodrigues
This series adds nonblocking feature to asynchronous I/O writes. io_submit() can be delayed because of a number of reason: - Block allocation for files - Data writebacks for direct I/O - Sleeping because of waiting to acquire i_rwsem - Congested block device The goal of the patch series is to

Re: [PATCH 2/3] block: Separate zone requests from medium access requests

2017-02-28 Thread Bart Van Assche
On Tue, 2017-02-28 at 19:25 +0900, Damien Le Moal wrote: > From: Bart Van Assche > > Use blk_rq_accesses_medium() instead of !blk_rq_is_passthrough() to > ensure that code that is intended for normal medium access requests, > e.g. DISCARD, READ and WRITE requests, is

Re: [PATCH v2 11/13] md: raid10: don't use bio's vec table to manage resync pages

2017-02-28 Thread Shaohua Li
On Tue, Feb 28, 2017 at 11:41:41PM +0800, Ming Lei wrote: > Now we allocate one page array for managing resync pages, instead > of using bio's vec table to do that, and the old way is very hacky > and won't work any more if multipage bvec is enabled. > > The introduced cost is that we need to

Re: [RFC] failure atomic writes for file systems and block devices

2017-02-28 Thread Chris Mason
On 02/28/2017 09:57 AM, Christoph Hellwig wrote: Hi all, this series implements a new O_ATOMIC flag for failure atomic writes to files. It is based on and tries to unify to earlier proposals, the first one for block devices by Chris Mason:

Re: [PATCH v2 04/13] md: prepare for managing resync I/O pages in clean way

2017-02-28 Thread Shaohua Li
On Tue, Feb 28, 2017 at 11:41:34PM +0800, Ming Lei wrote: > Now resync I/O use bio's bec table to manage pages, > this way is very hacky, and may not work any more > once multipage bvec is introduced. > > So introduce helpers and new data structure for > managing resync I/O pages more cleanly. >

Re: [PATCH 07/12] xfs: implement failure-atomic writes

2017-02-28 Thread Darrick J. Wong
On Tue, Feb 28, 2017 at 06:57:32AM -0800, Christoph Hellwig wrote: > If O_ATOMIC is specified in the open flags this will cause XFS to > allocate new extents in the COW for even if overwriting existing data, "COW fork"^^^ The previous patch's commit message also has that

Re: [PATCH] blkcg: allocate struct blkcg_gq outside request queue spinlock

2017-02-28 Thread Tahsin Erdogan
On Tue, Feb 28, 2017 at 2:47 PM, Tejun Heo wrote: >> + if (!blkcg_policy_enabled(q, pol)) { >> + ret = -EOPNOTSUPP; >> + goto fail; > > Pulling this out of the queue_lock doesn't seem safe to me. This > function may end up calling into callbacks of

[PATCH 2/8] nowait aio: Return if cannot get hold of i_rwsem

2017-02-28 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues A failure to lock i_rwsem would mean there is I/O being performed by another thread. So, let's bail. Signed-off-by: Goldwyn Rodrigues --- mm/filemap.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git

[PATCH 5/8] nowait aio: return on congested block device

2017-02-28 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues A new flag BIO_NOWAIT is introduced to identify bio's orignating from iocb with IOCB_NOWAIT. This flag indicates to return immediately if a request cannot be made instead of retrying. Signed-off-by: Goldwyn Rodrigues ---

Re: [PATCH v2 06/13] md: raid1: don't use bio's vec table to manage resync pages

2017-02-28 Thread Shaohua Li
On Tue, Feb 28, 2017 at 11:41:36PM +0800, Ming Lei wrote: > Now we allocate one page array for managing resync pages, instead > of using bio's vec table to do that, and the old way is very hacky > and won't work any more if multipage bvec is enabled. > > The introduced cost is that we need to

Re: [PATCH v2 08/13] md: raid1: use bio helper in process_checks()

2017-02-28 Thread Shaohua Li
On Tue, Feb 28, 2017 at 11:41:38PM +0800, Ming Lei wrote: > Avoid to direct access to bvec table. > > Signed-off-by: Ming Lei > --- > drivers/md/raid1.c | 12 > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/drivers/md/raid1.c

Re: [PATCH v2 05/13] md: raid1: simplify r1buf_pool_free()

2017-02-28 Thread Shaohua Li
On Tue, Feb 28, 2017 at 11:41:35PM +0800, Ming Lei wrote: > This patch gets each page's reference of each bio for resync, > then r1buf_pool_free() gets simplified a lot. > > The same policy has been taken in raid10's buf pool allocation/free > too. We are going to delete the code, this simplify

Re: [RFC] failure atomic writes for file systems and block devices

2017-02-28 Thread Darrick J. Wong
On Tue, Feb 28, 2017 at 06:57:25AM -0800, Christoph Hellwig wrote: > Hi all, > > this series implements a new O_ATOMIC flag for failure atomic writes > to files. It is based on and tries to unify to earlier proposals, > the first one for block devices by Chris Mason: > >

Re: [PATCH] blkcg: allocate struct blkcg_gq outside request queue spinlock

2017-02-28 Thread Tejun Heo
Hello, Overall, the approach looks good to me but please see below. On Mon, Feb 27, 2017 at 06:49:57PM -0800, Tahsin Erdogan wrote: > @@ -806,44 +807,99 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct > blkcg_policy *pol, > if (!disk) > return -ENODEV; > if

[PATCH 4/6] nbd: set queue timeout properly

2017-02-28 Thread Josef Bacik
We can't just set the timeout on the tagset, we have to set it on the queue as it would have been setup already at this point. Signed-off-by: Josef Bacik --- drivers/block/nbd.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/block/nbd.c

Re: [Lsf-pc] [LSF/MM TOPIC] do we really need PG_error at all?

2017-02-28 Thread NeilBrown
On Mon, Feb 27 2017, Jeff Layton wrote: > On Tue, 2017-02-28 at 10:32 +1100, NeilBrown wrote: >> On Mon, Feb 27 2017, Andreas Dilger wrote: >> >> > >> > My thought is that PG_error is definitely useful for applications to get >> > correct errors back when doing write()/sync_file_range() so that

Re: [PATCH 11/16] mmc: block: shuffle retry and error handling

2017-02-28 Thread Bartlomiej Zolnierkiewicz
On Thursday, February 09, 2017 04:33:58 PM Linus Walleij wrote: > Instead of doing retries at the same time as trying to submit new > requests, do the retries when the request is reported as completed > by the driver, in the finalization worker. > > This is achieved by letting the core worker

Re: [PATCH 11/13] block: Fix oops in locked_inode_to_wb_and_lock_list()

2017-02-28 Thread Tejun Heo
On Tue, Feb 21, 2017 at 06:09:56PM +0100, Jan Kara wrote: > When block device is closed, we call inode_detach_wb() in __blkdev_put() > which sets inode->i_wb to NULL. That is contrary to expectations that > inode->i_wb stays valid once set during the whole inode's lifetime and > leads to oops in

Re: [PATCH 13/16] mmc: queue: issue struct mmc_queue_req items

2017-02-28 Thread Bartlomiej Zolnierkiewicz
On Thursday, February 09, 2017 04:34:00 PM Linus Walleij wrote: > Instead of passing two pointers around and messing and reassigning > to the left and right, issue mmc_queue_req and dereference > the queue from the request where needed. The struct mmc_queue_req > is the thing that has a lifecycle

Re: [PATCH 14/16] mmc: queue: get/put struct mmc_queue_req

2017-02-28 Thread Bartlomiej Zolnierkiewicz
On Thursday, February 09, 2017 04:34:01 PM Linus Walleij wrote: > The per-hardware-transaction struct mmc_queue_req is assigned > from a pool of 2 requests using a current/previous scheme and > then swapped around. > > This is confusing, especially if we need more than two to make > our work

Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-02-28 Thread Tejun Heo
Hello, It generally looks good to me. The only worry I have is around wb_shutdown() synchronization and if that is actually an issue it shouldn't be too difficult to fix. The other thing which came to mind is that the congested->__bdi sever semantics. IIRC, that one was also to support the

Re: [PATCH 12/16] mmc: queue: stop flushing the pipeline with NULL

2017-02-28 Thread Bartlomiej Zolnierkiewicz
On Thursday, February 09, 2017 04:33:59 PM Linus Walleij wrote: > Remove all the pipeline flush: i.e. repeatedly sending NULL > down to the core layer to flush out asynchronous requests, > and also sending NULL after "special" commands to achieve the > same flush. > > Instead: let the "special"

[PATCH 2/6] nbd: ref count the nbd device

2017-02-28 Thread Josef Bacik
In preparation for seamless reconnects and the netlink configuration interface we need a way to make sure our nbd device configuration doesn't disappear until we are finished with it. So add a ref counter, and on the final put we do all of the cleanup work on the nbd device. At configuration time

Re: [PATCHv2 2/2] nvme: Complete all stuck requests

2017-02-28 Thread Keith Busch
On Tue, Feb 28, 2017 at 08:42:19AM +0100, Artur Paszkiewicz wrote: > > I'm observing the same thing when hibernating during mdraid resync on > nvme - it hangs in blk_mq_freeze_queue_wait() after "Disabling non-boot > CPUs ...". The patch guarantees forward progress for blk-mq's hot-cpu notifier

[PATCH 3/6] nbd: stop using the bdev everywhere

2017-02-28 Thread Josef Bacik
In preparation for the upcoming netlink interface we need to not rely on already having the bdev for the NBD device we are doing operations on. Instead of passing the bdev around, just use it in places where we know we already have the bdev. Signed-off-by: Josef Bacik ---

[PATCH 0/6] Lots of NBD fixes and enhancements

2017-02-28 Thread Josef Bacik
This is kind of a big batch of patches, but they all depend on eachother so it was hard to tease out the fixes from the enhancements without making my life miserable. FIXES: nbd: set queue timeout properly nbd: handle ERESTARTSYS properly The ERSTARTSYS one in particular is pretty awful as we

[PATCH 1/6] nbd: handle single path failures gracefully

2017-02-28 Thread Josef Bacik
Currently if we have multiple connections and one of them goes down we will tear down the whole device. However there's no reason we need to do this as we could have other connections that are working fine. Deal with this by keeping track of the state of the different connections, and if we lose

Re: [PATCH 10/13] bdi: Rename cgwb_bdi_destroy() to cgwb_bdi_unregister()

2017-02-28 Thread Tejun Heo
On Tue, Feb 21, 2017 at 06:09:55PM +0100, Jan Kara wrote: > Rename cgwb_bdi_destroy() to cgwb_bdi_unregister() as it gets called > from bdi_unregister() which is not necessarily called from bdi_destroy() > and thus the name is somewhat misleading. > > Signed-off-by: Jan Kara

Re: [PATCH 05/12] fs: add a F_IOINFO fcntl

2017-02-28 Thread Darrick J. Wong
On Tue, Feb 28, 2017 at 06:57:30AM -0800, Christoph Hellwig wrote: > This fcntl can be used to query I/O parameters for the given file > descriptor. Initially it is used for the I/O alignment and atomic > write parameters. > > Signed-off-by: Christoph Hellwig > --- > fs/fcntl.c

Re: [PATCH 09/13] bdi: Do not wait for cgwbs release in bdi_unregister()

2017-02-28 Thread Tejun Heo
Hello, On Tue, Feb 21, 2017 at 06:09:54PM +0100, Jan Kara wrote: > @@ -726,14 +718,6 @@ static void cgwb_bdi_destroy(struct backing_dev_info > *bdi) > } > > spin_unlock_irq(_lock); > - > - /* > - * All cgwb's and their congested states must be shutdown and > - *

Re: [PATCH 08/13] bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()

2017-02-28 Thread Tejun Heo
Hello, On Tue, Feb 21, 2017 at 06:09:53PM +0100, Jan Kara wrote: > Currently we waited for all cgwbs to get freed in cgwb_bdi_destroy() > which also means that writeback has been shutdown on them. Since this > wait is going away, directly shutdown writeback on cgwbs from > cgwb_bdi_destroy() to

Re: [PATCH 0/3] Separate zone requests from medium access requests

2017-02-28 Thread Bart Van Assche
On Tue, 2017-02-28 at 17:02 +0100, Christoph Hellwig wrote: > I don't really like this too much - this is too many SCSI specifics > for the block layer to care. Maybe using bios for the zone ops was a > mistake after all, and we should just have operations in struct block_device > instead..

Re: [PATCH 06/13] bdi: Make wb->bdi a proper reference

2017-02-28 Thread Tejun Heo
On Tue, Feb 21, 2017 at 06:09:51PM +0100, Jan Kara wrote: > Make wb->bdi a proper refcounted reference to bdi for all bdi_writeback > structures except for the one embedded inside struct backing_dev_info. > That will allow us to simplify bdi unregistration. > > Signed-off-by: Jan Kara

Re: [PATCH 08/16] mmc: core: do away with is_new_req

2017-02-28 Thread Bartlomiej Zolnierkiewicz
On Thursday, February 09, 2017 04:33:55 PM Linus Walleij wrote: > The host context member "is_new_req" is only assigned values, > never checked. Delete it. > > Signed-off-by: Linus Walleij Reviewed-by: Bartlomiej Zolnierkiewicz Best regards,

Re: [PATCH 06/16] mmc: core: replace waitqueue with worker

2017-02-28 Thread Bartlomiej Zolnierkiewicz
On Thursday, February 09, 2017 04:33:53 PM Linus Walleij wrote: > The waitqueue in the host context is there to signal back from > mmc_request_done() through mmc_wait_data_done() that the hardware > is done with a command, and when the wait is over, the core > will typically submit the next

Re: [PATCH 07/16] mmc: core: do away with is_done_rcv

2017-02-28 Thread Bartlomiej Zolnierkiewicz
On Thursday, February 09, 2017 04:33:54 PM Linus Walleij wrote: > The "is_done_rcv" in the context info for the host is no longer > needed: it is clear from context (ha!) that as long as we are > waiting for the asynchronous request to come to completion, > we are not done receiving data, and when

Re: [PATCH 09/16] mmc: core: kill off the context info

2017-02-28 Thread Bartlomiej Zolnierkiewicz
On Thursday, February 09, 2017 04:33:56 PM Linus Walleij wrote: > The last member of the context info: is_waiting_last_req is > just assigned values, never checked. Delete that and the whole > context info as a result. > > Signed-off-by: Linus Walleij Reviewed-by:

Re: [PATCH 02/16] mmc: core: refactor asynchronous request finalization

2017-02-28 Thread Bartlomiej Zolnierkiewicz
On Thursday, February 09, 2017 04:33:49 PM Linus Walleij wrote: > mmc_wait_for_data_req_done() is called in exactly one place, > and having it spread out is making things hard to oversee. > Factor this function into mmc_finalize_areq(). > > Signed-off-by: Linus Walleij

Re: [PATCH 10/16] mmc: queue: simplify queue logic

2017-02-28 Thread Bartlomiej Zolnierkiewicz
On Thursday, February 09, 2017 04:33:57 PM Linus Walleij wrote: > The if() statment checking if there is no current or previous > request is now just looking ahead at something that will be > concluded a few lines below. Simplify the logic by moving the > assignment of .asleep. > > Signed-off-by:

Re: [PATCH 05/13] bdi: Mark congested->bdi as internal

2017-02-28 Thread Tejun Heo
On Tue, Feb 21, 2017 at 06:09:50PM +0100, Jan Kara wrote: > congested->bdi pointer is used only to be able to remove congested > structure from bdi->cgwb_congested_tree on structure release. Moreover > the pointer can become NULL when we unregister the bdi. Rename the field > to __bdi and add a

Re: [PATCH 0/3] Separate zone requests from medium access requests

2017-02-28 Thread Christoph Hellwig
I don't really like this too much - this is too many SCSI specifics for the block layer to care. Maybe using bios for the zone ops was a mistake after all, and we should just have operations in struct block_device instead..

[PATCH v2 02/13] md: raid1/raid10: don't handle failure of bio_add_page()

2017-02-28 Thread Ming Lei
All bio_add_page() is for adding one page into resync bio, which is big enough to hold RESYNC_PAGES pages, and the current bio_add_page() doesn't check queue limit any more, so it won't fail at all. Signed-off-by: Ming Lei --- drivers/md/raid1.c | 21

[PATCH v2 09/13] md: raid1: use bio_segments_all()

2017-02-28 Thread Ming Lei
Use this helper, instead of direct access to .bi_vcnt. Signed-off-by: Ming Lei --- drivers/md/raid1.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 316bd6dd6cc1..7396c99ff7b1 100644 ---

[PATCH v2 11/13] md: raid10: don't use bio's vec table to manage resync pages

2017-02-28 Thread Ming Lei
Now we allocate one page array for managing resync pages, instead of using bio's vec table to do that, and the old way is very hacky and won't work any more if multipage bvec is enabled. The introduced cost is that we need to allocate (128 + 16) * copies bytes per r10_bio, and it is fine because

[PATCH v2 04/13] md: prepare for managing resync I/O pages in clean way

2017-02-28 Thread Ming Lei
Now resync I/O use bio's bec table to manage pages, this way is very hacky, and may not work any more once multipage bvec is introduced. So introduce helpers and new data structure for managing resync I/O pages more cleanly. Signed-off-by: Ming Lei --- drivers/md/md.h |

[PATCH v2 03/13] md: move two macros into md.h

2017-02-28 Thread Ming Lei
Both raid1 and raid10 share common resync block size and page count, so move them into md.h. Signed-off-by: Ming Lei --- drivers/md/md.h | 5 + drivers/md/raid1.c | 2 -- drivers/md/raid10.c | 3 --- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git

[PATCH v2 12/13] md: raid10: retrieve page from preallocated resync page array

2017-02-28 Thread Ming Lei
Now one page array is allocated for each resync bio, and we can retrieve page from this table directly. Signed-off-by: Ming Lei --- drivers/md/raid10.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/md/raid10.c

[PATCH v2 05/13] md: raid1: simplify r1buf_pool_free()

2017-02-28 Thread Ming Lei
This patch gets each page's reference of each bio for resync, then r1buf_pool_free() gets simplified a lot. The same policy has been taken in raid10's buf pool allocation/free too. Signed-off-by: Ming Lei --- drivers/md/raid1.c | 15 +++ 1 file changed, 7

[PATCH v2 10/13] md: raid10: refactor code of read reshape's .bi_end_io

2017-02-28 Thread Ming Lei
reshape read request is a bit special and requires one extra bio which isn't allocated from r10buf_pool. Refactor the .bi_end_io for read reshape, so that we can use raid10's resync page mangement approach easily in the following patches. Signed-off-by: Ming Lei ---

[PATCH v2 13/13] md: raid10: avoid direct access to bvec table in handle_reshape_read_error

2017-02-28 Thread Ming Lei
The cost is 128bytes(8*16) stack space in kernel thread context, and just use the bio helper to retrieve pages from bio. Signed-off-by: Ming Lei --- drivers/md/raid10.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/md/raid10.c

[PATCH v2 01/13] block: introduce bio_segments_all()

2017-02-28 Thread Ming Lei
So that we can replace the direct access to .bi_vcnt. Signed-off-by: Ming Lei --- include/linux/bio.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/include/linux/bio.h b/include/linux/bio.h index 8e521194f6fc..3364b3ed90e7 100644 --- a/include/linux/bio.h +++

[PATCH v2 07/13] md: raid1: retrieve page from pre-allocated resync page array

2017-02-28 Thread Ming Lei
Now one page array is allocated for each resync bio, and we can retrieve page from this table directly. Signed-off-by: Ming Lei --- drivers/md/raid1.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/md/raid1.c

[PATCH v2 00/14] md: cleanup on direct access to bvec table

2017-02-28 Thread Ming Lei
In MD's resync I/O path, there are lots of direct access to bio's bvec table. This patchset kills almost all, and the conversion is quite straightforward. One root cause of direct access to bvec table is that resync I/O uses the bio's bvec to manage pages. In V1, as suggested by Shaohua, a new

[PATCH 07/12] xfs: implement failure-atomic writes

2017-02-28 Thread Christoph Hellwig
If O_ATOMIC is specified in the open flags this will cause XFS to allocate new extents in the COW for even if overwriting existing data, and not remap them into the data fork until ->fsync is called, at which point the whole range will be atomically remapped into the data fork. This allows

[PATCH 03/12] iomap: add a IOMAP_ATOMIC flag

2017-02-28 Thread Christoph Hellwig
To pass through O_ATOMIC to the iomap_begin methods. Signed-off-by: Christoph Hellwig --- fs/iomap.c| 13 +++-- include/linux/iomap.h | 1 + 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/iomap.c b/fs/iomap.c index 16a9d2b89cb6..096cbf573932

[PATCH 06/12] xfs: cleanup is_reflink checks

2017-02-28 Thread Christoph Hellwig
We'll soon need to distinguish between inodes that actually are reflinked, and those that just use the COW fork for atomic write operations. Switch a few places to check for the existance of a COW for instead of the reflink to prepare for that. Signed-off-by: Christoph Hellwig ---

[PATCH 12/12] nvme: export the atomic write limit

2017-02-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig --- drivers/nvme/host/core.c | 10 ++ drivers/nvme/host/nvme.h | 1 + 2 files changed, 11 insertions(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 8a3c3e32a704..e86d07589f18 100644 --- a/drivers/nvme/host/core.c

[PATCH 05/12] fs: add a F_IOINFO fcntl

2017-02-28 Thread Christoph Hellwig
This fcntl can be used to query I/O parameters for the given file descriptor. Initially it is used for the I/O alignment and atomic write parameters. Signed-off-by: Christoph Hellwig --- fs/fcntl.c | 18 ++ include/linux/fs.h | 1 +

[PATCH 11/12] block_dev: implement the F_IOINFO fcntl

2017-02-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig --- fs/block_dev.c | 21 + 1 file changed, 21 insertions(+) diff --git a/fs/block_dev.c b/fs/block_dev.c index 4dd5c54cdefb..48a799964e1d 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -2116,6 +2116,26 @@ static long

[PATCH 01/12] uapi/fs: add O_ATOMIC to the open flags

2017-02-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig --- fs/fcntl.c | 3 ++- include/uapi/asm-generic/fcntl.h | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/fcntl.c b/fs/fcntl.c index e1c54f20325c..ca5d228be7ea 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@

[PATCH 04/12] fs: add a BH_Atomic flag

2017-02-28 Thread Christoph Hellwig
This allows us propagate the O_ATOMIC flag through the writeback code. Signed-off-by: Christoph Hellwig --- fs/buffer.c | 13 + fs/internal.h | 2 +- fs/iomap.c | 4 ++-- include/linux/buffer_head.h | 2 ++ 4 files

Re: [PATCH 05/16] mmc: core: add a kthread for completing requests

2017-02-28 Thread Bartlomiej Zolnierkiewicz
On Thursday, February 09, 2017 04:33:52 PM Linus Walleij wrote: > As we want to complete requests autonomously from feeding the > host with new requests, we create a worker thread to deal with > this specifically in response to the callback from a host driver. > > This patch just adds the

[PATCH 08/12] xfs: implement the F_IOINFO fcntl

2017-02-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig --- fs/xfs/xfs_file.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index a7d8324b59c5..4d955b3266df 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -898,6

[PATCH 10/12] block_dev: set REQ_NOMERGE for O_ATOMIC writes

2017-02-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig --- fs/block_dev.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/block_dev.c b/fs/block_dev.c index 3c47614a4b32..4dd5c54cdefb 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -242,6 +242,10 @@ __blkdev_direct_IO_simple(struct

[PATCH 02/12] iomap: pass IOMAP_* flags to actors

2017-02-28 Thread Christoph Hellwig
This will be needed to implement O_ATOMIC. Signed-off-by: Christoph Hellwig --- fs/dax.c | 2 +- fs/internal.h | 2 +- fs/iomap.c| 39 +-- 3 files changed, 23 insertions(+), 20 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index

[PATCH 09/12] block: advertize max atomic write limit

2017-02-28 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig --- block/blk-settings.c | 22 ++ block/blk-sysfs.c | 12 include/linux/blkdev.h | 9 + 3 files changed, 43 insertions(+) diff --git a/block/blk-settings.c b/block/blk-settings.c index

Re: [PATCH 03/16] mmc: core: refactor mmc_request_done()

2017-02-28 Thread Bartlomiej Zolnierkiewicz
On Thursday, February 09, 2017 04:33:50 PM Linus Walleij wrote: > We have this construction: > > if (a && b && !c) >finalize; > else >block; >finalize; > > Which is equivalent by boolean logic to: > > if (!a || !b || c) >block; > finalize; > > Which is simpler code. > >

[RFC] failure atomic writes for file systems and block devices

2017-02-28 Thread Christoph Hellwig
Hi all, this series implements a new O_ATOMIC flag for failure atomic writes to files. It is based on and tries to unify to earlier proposals, the first one for block devices by Chris Mason: https://lwn.net/Articles/573092/ and the second one for regular files, published by HP

Re: [PATCH 01/16] mmc: core: move some code in mmc_start_areq()

2017-02-28 Thread Bartlomiej Zolnierkiewicz
On Thursday, February 09, 2017 04:33:48 PM Linus Walleij wrote: > "previous" is a better name for the variable storing the previous > asynchronous request, better than the opaque name "data" atleast. > We see that we assign the return status to the returned variable > on all code paths, so we

Re: [PATCH v1 01/14] block: introduce bio_segments_all()

2017-02-28 Thread Ming Lei
On Sun, Feb 26, 2017 at 2:22 AM, Christoph Hellwig wrote: >> +static inline unsigned bio_segments_all(struct bio *bio) >> +{ >> + WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); >> + >> + return bio->bi_vcnt; >> +} > > I don't think this helpers really adds any benefit.

Re: [PATCH v1 02/14] block: introduce bio_remove_last_page()

2017-02-28 Thread Ming Lei
On Sun, Feb 26, 2017 at 2:23 AM, Christoph Hellwig wrote: > On Fri, Feb 24, 2017 at 11:42:39PM +0800, Ming Lei wrote: >> MD need this helper to remove the last added page, so introduce >> it. > > If MD really has a valid use case for this it should open code the > operation.

Re: [PATCHv2 2/2] nvme: Complete all stuck requests

2017-02-28 Thread Sagi Grimberg
OK, I think we can get it for fabrics too, need to figure out how to handle it there too. Do you have a reproducer? To repro, I have to run a buffered writer workload then put the system into S3. This fio job seems to reproduce for me: fio --name=global --filename=/dev/nvme0n1

Re: [Lsf-pc] [LSF/MM TOPIC] do we really need PG_error at all?

2017-02-28 Thread Jeff Layton
On Tue, 2017-02-28 at 12:12 +0200, Boaz Harrosh wrote: > On 02/28/2017 03:11 AM, Jeff Layton wrote: > <> > > > > I'll probably have questions about the read side as well, but for now it > > looks like it's mostly used in an ad-hoc way to communicate errors > > across subsystems (block to fs

Re: [blk_mq_register_hctx] 29dee3c03a WARNING: CPU: 0 PID: 5 at lib/refcount.c:114 refcount_inc

2017-02-28 Thread Peter Zijlstra
On Tue, Feb 28, 2017 at 09:38:04AM +0100, Peter Zijlstra wrote: > On Tue, Feb 28, 2017 at 09:17:11AM +0100, Peter Zijlstra wrote: > > On Tue, Feb 28, 2017 at 12:11:17PM +0800, Fengguang Wu wrote: > > > Hello, > > > > > > FYI, an old blk_mq bug triggers new warnings on this commit. It's very > > >

[PATCH 3/3] mpt3sas: Do not check resid for non medium access commands

2017-02-28 Thread Damien Le Moal
From: Bart Van Assche Commit f2e767bb5d6e ("mpt3sas: Force request partial completion alignment") introduced a forced alignment of resid to the device logical block size to fix bogus HBA firmware sometimes returning an unaligned value. This fix however did not

[PATCH 1/3] block: Introduce blk_rq_accesses_medium()

2017-02-28 Thread Damien Le Moal
From: Bart Van Assche A medium access request is defined as an internal regular request that operates on a whole number of logical blocks of the storage medium. These include REQ_OP_READ, REQ_OP_WRITE, REQ_OP_FLUSH, REQ_OP_DISCARD, REQ_OP_SECURE_ERASE,

[PATCH 0/3] Separate zone requests from medium access requests

2017-02-28 Thread Damien Le Moal
This series introduces blk_rq_accesses_medium(), which is equivalent to !blk_rq_is_passthrough() minus the zone request operations REQ_OP_ZONE_REPORT and REQ_OP_ZONE_RESET. This new helper allows avoiding problems due to the non-standard nature of these commands (report zones does no operate on

[PATCH 2/3] block: Separate zone requests from medium access requests

2017-02-28 Thread Damien Le Moal
From: Bart Van Assche Use blk_rq_accesses_medium() instead of !blk_rq_is_passthrough() to ensure that code that is intended for normal medium access requests, e.g. DISCARD, READ and WRITE requests, is not applied to REQ_OP_ZONE_REPORT requests nor to REQ_OP_ZONE_RESET

Re: [PATCHv2 2/2] nvme: Complete all stuck requests

2017-02-28 Thread Artur Paszkiewicz
On 02/27/2017 08:15 PM, Keith Busch wrote: > On Mon, Feb 27, 2017 at 07:27:51PM +0200, Sagi Grimberg wrote: >> OK, I think we can get it for fabrics too, need to figure out how to >> handle it there too. >> >> Do you have a reproducer? > > To repro, I have to run a buffered writer workload then

Re: [Lsf-pc] [LSF/MM TOPIC] do we really need PG_error at all?

2017-02-28 Thread Boaz Harrosh
On 02/28/2017 03:11 AM, Jeff Layton wrote: <> > > I'll probably have questions about the read side as well, but for now it > looks like it's mostly used in an ad-hoc way to communicate errors > across subsystems (block to fs layer, for instance). If memory does not fail me it used to be checked

Re: [PATCH 0/4] blk-mq: cleanup on all kinds of kobjects

2017-02-28 Thread Peter Zijlstra
On Wed, Feb 22, 2017 at 06:13:58PM +0800, Ming Lei wrote: > This patchset cleans up on kojects of request_queue.mq_kobj, > sw queue's kobject and hw queue's kobject. > > The 1st patch initialized kobject of request_queue and sw queue > in blk_mq_init_allocated_queue(), so we can avoid to

Re: [blk_mq_register_hctx] 29dee3c03a WARNING: CPU: 0 PID: 5 at lib/refcount.c:114 refcount_inc

2017-02-28 Thread Peter Zijlstra
On Tue, Feb 28, 2017 at 09:17:11AM +0100, Peter Zijlstra wrote: > On Tue, Feb 28, 2017 at 12:11:17PM +0800, Fengguang Wu wrote: > > Hello, > > > > FYI, an old blk_mq bug triggers new warnings on this commit. It's very > > reproducible and you may try the attached reproduce-* script. > > > [

Re: [blk_mq_register_hctx] 29dee3c03a WARNING: CPU: 0 PID: 5 at lib/refcount.c:114 refcount_inc

2017-02-28 Thread Fengguang Wu
On Tue, Feb 28, 2017 at 09:17:11AM +0100, Peter Zijlstra wrote: On Tue, Feb 28, 2017 at 12:11:17PM +0800, Fengguang Wu wrote: Hello, FYI, an old blk_mq bug triggers new warnings on this commit. It's very reproducible and you may try the attached reproduce-* script. [4.447772] kobject

Re: blk_integrity_revalidate() clears BDI_CAP_STABLE_WRITES

2017-02-28 Thread Ilya Dryomov
On Fri, Feb 24, 2017 at 12:49 AM, Martin K. Petersen wrote: >> "Ilya" == Ilya Dryomov writes: > > Ilya, > > Ilya> Well, blk_integrity_revalidate() doesn't clear the profile, it > Ilya> just clears the stable pages flag. Whoever calls > Ilya>

Re: [blk_mq_register_hctx] 29dee3c03a WARNING: CPU: 0 PID: 5 at lib/refcount.c:114 refcount_inc

2017-02-28 Thread Peter Zijlstra
On Tue, Feb 28, 2017 at 12:11:17PM +0800, Fengguang Wu wrote: > Hello, > > FYI, an old blk_mq bug triggers new warnings on this commit. It's very > reproducible and you may try the attached reproduce-* script. > [4.447772] kobject (88001c041f10): tried to init an initialized > object,

  1   2   >