[PATCH 2/2] nbd: handle racing with error'ed out commands

2019-10-21 Thread Josef Bacik
this command out, which would indicate that we've completed it as well. Signed-off-by: Josef Bacik --- drivers/block/nbd.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 8fb8913074b8..e9f5d4e476e7 100644 --- a/drivers/block/nbd.c +++

[PATCH 0/2] fix double completion of timed out commands

2019-10-21 Thread Josef Bacik
We noticed a problem where NBD sometimes double completes the same request when things go wrong and we time out the request. If the other side goes out to lunch but happens to reply just as we're timing out the requests we can end up with a double completion on the request. We already keep track

[PATCH 1/2] nbd: protect cmd->status with cmd->lock

2019-10-21 Thread Josef Bacik
se this is initiated by the user, so again is safe. Signed-off-by: Josef Bacik --- drivers/block/nbd.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index a8e3815295fe..8fb8913074b8 100644 --- a/drivers/block/nbd.c +++ b/dri

Re: [PATCH] nbd: fix possible sysfs duplicate warning

2019-10-10 Thread Josef Bacik
ill make sure all the disk add/remove stuff are done > by holding the nbd_index_mutex lock. > > Signed-off-by: Xiubo Li > Reported-by: Mike Christie Sorry, don't know how I missed this. Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH v4 2/2] nbd: fix possible page fault for nbd disk

2019-09-17 Thread Josef Bacik
On Tue, Sep 17, 2019 at 02:44:09PM -0500, Mike Christie wrote: > On 09/17/2019 01:40 PM, Josef Bacik wrote: > >>> + nbd->destroy_complete = &destroy_complete; > >> > >> Also, without the mutex part of the v3 patch, we could race and &

Re: [PATCH v4 2/2] nbd: fix possible page fault for nbd disk

2019-09-17 Thread Josef Bacik
On Tue, Sep 17, 2019 at 01:31:05PM -0500, Mike Christie wrote: > On 09/17/2019 06:56 AM, xiu...@redhat.com wrote: > > From: Xiubo Li > > > > When the NBD_CFLAG_DESTROY_ON_DISCONNECT flag is set and at the same > > time when the socket is closed due to the server daemon is restarted, > > just befo

Re: [PATCH v4 0/2] nbd: fix possible page fault for nbd disk

2019-09-17 Thread Josef Bacik
fter free bug from Mike's comments > - This has been test for 3 days, works well. > > You can add Reviewed-by: Josef Bacik to the series, Thanks, Josef

Re: io.latency controller apparently not working

2019-08-19 Thread Josef Bacik
t;> > >> > >> > >>> Il giorno 16 ago 2019, alle ore 19:59, Josef Bacik > >>> ha scritto: > >>> > >>> On Fri, Aug 16, 2019 at 07:52:40PM +0200, Paolo Valente wrote: > >>>> > >>>> > >>

Re: io.latency controller apparently not working

2019-08-16 Thread Josef Bacik
On Fri, Aug 16, 2019 at 07:52:40PM +0200, Paolo Valente wrote: > > > > Il giorno 16 ago 2019, alle ore 15:21, Josef Bacik > > ha scritto: > > > > On Fri, Aug 16, 2019 at 12:57:41PM +0200, Paolo Valente wrote: > >> Hi, > >> I happened to test

Re: io.latency controller apparently not working

2019-08-16 Thread Josef Bacik
On Fri, Aug 16, 2019 at 12:57:41PM +0200, Paolo Valente wrote: > Hi, > I happened to test the io.latency controller, to make a comparison > between this controller and BFQ. But io.latency seems not to work, > i.e., not to reduce latency compared with what happens with no I/O > control at all. Her

Re: [PATCH 1/1] nbd: fix max number of supported devs

2019-08-13 Thread Josef Bacik
On Sun, Aug 04, 2019 at 02:10:06PM -0500, Mike Christie wrote: > This fixes a bug added in 4.10 with commit: > > commit 9561a7ade0c205bc2ee035a2ac880478dcc1a024 > Author: Josef Bacik > Date: Tue Nov 22 14:04:40 2016 -0500 > > nbd: add multi-connection support > &

Re: [PATCH 4/4] nbd: fix zero cmd timeout handling

2019-08-13 Thread Josef Bacik
On Tue, Aug 13, 2019 at 10:45:55AM -0500, Mike Christie wrote: > On 08/13/2019 08:13 AM, Josef Bacik wrote: > > On Fri, Aug 09, 2019 at 04:26:10PM -0500, Mike Christie wrote: > >> This fixes a regression added in 4.9 with commit: > >> > >> commit 0ead

Re: [PATCH 4/4] nbd: fix zero cmd timeout handling

2019-08-13 Thread Josef Bacik
On Fri, Aug 09, 2019 at 04:26:10PM -0500, Mike Christie wrote: > This fixes a regression added in 4.9 with commit: > > commit 0eadf37afc2500e1162c9040ec26a705b9af8d47 > Author: Josef Bacik > Date: Thu Sep 8 12:33:40 2016 -0700 > > nbd: allow block mq to deal wit

Re: [PATCH 2/4] nbd: add function to convert blk req op to nbd cmd

2019-08-13 Thread Josef Bacik
On Fri, Aug 09, 2019 at 04:26:08PM -0500, Mike Christie wrote: > This adds a helper function to convert a block req op to a nbd cmd type. > It will be used in the last patch to log the type in the timeout > handler. > > Signed-off-by: Mike Christie Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH 3/4] nbd: add missing config put

2019-08-13 Thread Josef Bacik
On Fri, Aug 09, 2019 at 04:26:09PM -0500, Mike Christie wrote: > Fix bug added with the patch: > > commit 8f3ea35929a0806ad1397db99a89ffee0140822a > Author: Josef Bacik > Date: Mon Jul 16 12:11:35 2018 -0400 > > nbd: handle unexpected replies better > > where

Re: [PATCH 1/4] nbd: add set cmd timeout helper

2019-08-13 Thread Josef Bacik
On Fri, Aug 09, 2019 at 04:26:07PM -0500, Mike Christie wrote: > Add a helper to set the cmd timeout. It does not really do a lot now, > but will be more useful in the next patches. > > Signed-off-by: Mike Christie Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH v2] nbd: replace kill_bdev() with __invalidate_device() again

2019-07-31 Thread Josef Bacik
bd2] >? remove_wait_queue+0x60/0x60 >kthread+0xf8/0x130 >? commit_timeout+0x10/0x10 [jbd2] >? kthread_bind+0x10/0x10 >ret_from_fork+0x35/0x40 > > With __invalidate_device(), I no longer hit the BUG_ON with sync or > unmount on the disconnected device. > Jeeze I swear I see this same patch go by every 6 months or so, not sure what happens to it. Anyway Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH] nbd_genl_status: null check for nla_nest_start

2019-07-29 Thread Josef Bacik
sk_buff *skb, struct > genl_info *info) > } > > dev_list = nla_nest_start_noflag(reply, NBD_ATTR_DEVICE_LIST); > + No newline here, once you fix that nit you can add Reviewed-by: Josef Bacik Thanks, Josef

[PATCH 5/5] rq-qos: use a mb for got_token

2019-07-16 Thread Josef Bacik
Oleg noticed that our checking of data.got_token is unsafe in the cleanup case, and should really use a memory barrier. Use a wmb on the write side, and a rmb() on the read side. We don't need one in the main loop since we're saved by set_current_state(). Signed-off-by: Josef Bacik

[PATCH 3/5] rq-qos: don't reset has_sleepers on spurious wakeups

2019-07-16 Thread Josef Bacik
spurious wakeups we'd still want this to be the case. So set has_sleepers to true if we went to sleep to make sure we're woken up the proper way. Signed-off-by: Josef Bacik --- block/blk-rq-qos.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-rq-qos.c b

[PATCH 4/5] rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule

2019-07-16 Thread Josef Bacik
In case we get a spurious wakeup we need to make sure to re-set ourselves to TASK_UNINTERRUPTIBLE so we don't busy wait. Signed-off-by: Josef Bacik --- block/blk-rq-qos.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c index 69a0f0b77795..c450b89

[PATCH 2/5] rq-qos: fix missed wake-ups in rq_qos_throttle

2019-07-16 Thread Josef Bacik
sleeper on the list after we add ourselves, that way we have an uptodate view of the list. Signed-off-by: Josef Bacik --- block/blk-rq-qos.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c index 659ccb8b693f..67a0a4c07060 100644 --- a/block/blk-rq-q

[PATCH 1/5] wait: add wq_has_single_sleeper helper

2019-07-16 Thread Josef Bacik
e are existing waiters locklessly we need to be able to update our view of the waitqueue list after we've added ourselves to the waitqueue. Accomplish this by adding this helper to see if there is more than just ourselves on the list. Signed-off-by: Josef Bacik --- include/linux/w

[PATCH 0/5][v3] rq-qos memory barrier shenanigans

2019-07-16 Thread Josef Bacik
This is the patch series to address the hang we saw in production because of missed wakeups, and the other issues that Oleg noticed while reviewing the code. v2->v3: - apparently I don't understand what READ/WRITE_ONCE does - set ourselves to TASK_UNINTERRUPTIBLE on wakeup just in case - add a com

[PATCH 1/4] wait: add wq_has_single_sleeper helper

2019-07-15 Thread Josef Bacik
e are existing waiters locklessly we need to be able to update our view of the waitqueue list after we've added ourselves to the waitqueue. Accomplish this by adding this helper to see if there is more than just ourselves on the list. Signed-off-by: Josef Bacik --- include/linux/w

[PATCH 4/4] rq-qos: don't reset has_sleepers on spurious wakeups

2019-07-15 Thread Josef Bacik
ng isn't needed. Signed-off-by: Josef Bacik --- block/blk-rq-qos.c | 1 - 1 file changed, 1 deletion(-) diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c index f4aa7b818cf5..35bc6f54d088 100644 --- a/block/blk-rq-qos.c +++ b/block/blk-rq-qos.c @@ -261,7 +261,6 @@ void rq_qos_wait(struc

[PATCH 2/4] rq-qos: fix missed wake-ups in rq_qos_throttle

2019-07-15 Thread Josef Bacik
sleeper on the list after we add ourselves, that way we have an uptodate view of the list. Signed-off-by: Josef Bacik --- block/blk-rq-qos.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c index 659ccb8b693f..67a0a4c07060 100644 --- a/block/blk-rq-q

[PATCH 3/4] rq-qos: use READ_ONCE/WRITE_ONCE for got_token

2019-07-15 Thread Josef Bacik
Oleg noticed that our checking of data.got_token is unsafe in the cleanup case, and should really use a memory barrier. Use the READ_ONCE/WRITE_ONCE helpers on got_token so we can be sure we're always safe. Signed-off-by: Josef Bacik --- block/blk-rq-qos.c | 6 +++--- 1 file chang

[PATCH 0/4][v2] rq-qos memory barrier shenanigans

2019-07-15 Thread Josef Bacik
This is the patch series to address the hang we saw in production because of missed wakeups, and the other issues that Oleg noticed while reviewing the code. v1->v2: - rename wq_has_multiple_sleepers to wq_has_single_sleeper - fix the check for has_sleepers in the missed wake-ups patch - fix the b

Re: [PATCH 1/2] wait: add wq_has_multiple_sleepers helper

2019-07-11 Thread Josef Bacik
On Thu, Jul 11, 2019 at 03:40:06PM +0200, Oleg Nesterov wrote: > On 07/11, Oleg Nesterov wrote: > > > > Jens, > > > > I managed to convince myself I understand why 2/2 needs this change... > > But rq_qos_wait() still looks suspicious to me. Why can't the main loop > > "break" right after io_schedul

[PATCH 1/2] wait: add wq_has_multiple_sleepers helper

2019-07-10 Thread Josef Bacik
e are existing waiters locklessly we need to be able to update our view of the waitqueue list after we've added ourselves to the waitqueue. Accomplish this by adding this helper to see if there are more than two waiters on the waitqueue. Suggested-by: Jens Axboe Signed-off-by: Josef Bacik --

[PATCH 2/2] rq-qos: fix missed wake-ups in rq_qos_throttle

2019-07-10 Thread Josef Bacik
e sleepers after we add ourselves to the list, that way we have an uptodate view of the list. Signed-off-by: Josef Bacik --- block/blk-rq-qos.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c index 659ccb8b693f..b39b5f3fb01b 100644 --- a/block/blk-rq-q

[PATCH] rq-qos: fix missed wake-ups in rq_qos_throttle

2019-07-10 Thread Josef Bacik
(yes, yes, I know) in order to get a real value for has_sleepers. This way we keep our optimization in place and avoid hanging forever if there are no longer any waiters on the list. Signed-off-by: Josef Bacik --- block/blk-rq-qos.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) di

Re: [PATCH 8/8] Btrfs: extent_write_locked_range() should attach inode->i_wb

2019-06-14 Thread Josef Bacik
ure the IO goes down from > the correct cgroup. > > Signed-off-by: Chris Mason Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH 7/8] Btrfs: use REQ_CGROUP_PUNT for worker thread submitted bios

2019-06-14 Thread Josef Bacik
async crcs, the bio already has the correct css, we just need to > tell the block layer to use REQ_CGROUP_PUNT. > > Signed-off-by: Chris Mason > Modified-and-reviewed-by: Tejun Heo > --- Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH 6/8] Btrfs: only associate the locked page with one async_cow struct

2019-06-14 Thread Josef Bacik
gt; [ 8308.623451] entry_SYSCALL_64_after_hwframe+0x42/0xb7 > > The fix here is to make asyc_cow->locked_page NULL everywhere but the > one async_cow struct that's allowed to do things to the locked page. > > Signed-off-by: Chris Mason > Fixes: 771ed689d2cd ("Btrfs: Optimize compressed writeback and reads") > --- Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH 5/8] Btrfs: delete the entire async bio submission framework

2019-06-14 Thread Josef Bacik
On Thu, Jun 13, 2019 at 05:33:47PM -0700, Tejun Heo wrote: > From: Chris Mason > > Now that we're not using btrfs_schedule_bio() anymore, delete all the > code that supported it. > > Signed-off-by: Chris Mason Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH 4/8] Btrfs: stop using btrfs_schedule_bio()

2019-06-14 Thread Josef Bacik
switch during IO submission, > and doesn't fit well with the modern blkmq IO stack. So, this commit stops > using btrfs_schedule_bio(). We may need to adjust the number of async > helper threads for crcs and compression, but long term it's a better > path. > > Signed-off-by: Chris Mason Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH 3/8] blkcg: implement REQ_CGROUP_PUNT

2019-06-14 Thread Josef Bacik
t; > Signed-off-by: Tejun Heo > Cc: Chris Mason Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH 2/8] blkcg, writeback: Implement wbc_blkcg_css()

2019-06-14 Thread Josef Bacik
On Thu, Jun 13, 2019 at 05:33:44PM -0700, Tejun Heo wrote: > Add a helper to determine the target blkcg from wbc. > > Signed-off-by: Tejun Heo Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH 1/8] blkcg, writeback: Add wbc->no_wbc_acct

2019-06-14 Thread Josef Bacik
allow disabling wbc accounting. This will be used > make btfs compression work well with cgroup IO control. > > Signed-off-by: Tejun Heo Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH 0/2] Fix misuse of blk_rq_stats in blk-iolatency

2019-06-14 Thread Josef Bacik
prevent recurrences. > > > Pavel Begunkov (2): > blk-iolatency: Fix zero mean in previous stats > blk-stats: Introduce explicit stat staging buffers > I don't have a problem with this, but it's up to Jens I suppose Acked-by: Josef Bacik Thanks, Josef

Re: [PATCH 2/2] nbd: add support for nbd as root device

2019-06-14 Thread Josef Bacik
On Fri, Jun 14, 2019 at 12:33:43PM +0200, Wouter Verhelst wrote: > On Thu, Jun 13, 2019 at 10:55:36AM -0400, Josef Bacik wrote: > > Also I mean that there are a bunch of different nbd servers out there. We > > have > > our own here at Facebook, qemu has one, IIRC there'

Re: [PATCH 2/2] nbd: add netlink reconfigure resize support v3

2019-06-13 Thread Josef Bacik
On Thu, Jun 13, 2019 at 12:35:27PM -0500, Mike Christie wrote: > On 06/13/2019 12:01 PM, Josef Bacik wrote: > > On Wed, May 29, 2019 at 03:16:06PM -0500, Mike Christie wrote: > >> If the device is setup with ioctl we can resize the device after the > >> initial setup,

Re: [PATCH 2/2] nbd: add netlink reconfigure resize support v3

2019-06-13 Thread Josef Bacik
On Wed, May 29, 2019 at 03:16:06PM -0500, Mike Christie wrote: > If the device is setup with ioctl we can resize the device after the > initial setup, but if the device is setup with netlink we cannot use the > resize related ioctls and there is no netlink reconfigure size ATTR > handling code. >

Re: [PATCH 1/2] nbd: fix crash when the blksize is zero v2

2019-06-13 Thread Josef Bacik
Christie > --- Sorry I missed this second go around Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH 2/2] nbd: add support for nbd as root device

2019-06-13 Thread Josef Bacik
On Thu, Jun 13, 2019 at 07:21:43PM +0300, Roman Stratiienko wrote: > > I don't doubt you have a good reason to want it, I'm just not clear on why > > an > > initramfs isn't an option? You have this special kernel with your special > > option, and you manage to get these things to boot your specia

Re: [PATCH 2/2] nbd: add support for nbd as root device

2019-06-13 Thread Josef Bacik
On Wed, Jun 12, 2019 at 07:31:44PM +0300, roman.stratiie...@globallogic.com wrote: > From: Roman Stratiienko > > Adding support to nbd to use it as a root device. This code essentially > provides a minimal nbd-client implementation within the kernel. It opens > a socket and makes the negotiation

Re: [PATCH 2/2] nbd: add support for nbd as root device

2019-06-13 Thread Josef Bacik
On Thu, Jun 13, 2019 at 05:45:13PM +0300, Roman Stratiienko wrote: > On Thu, Jun 13, 2019 at 4:52 PM Josef Bacik wrote: > > > > On Wed, Jun 12, 2019 at 07:31:44PM +0300, roman.stratiie...@globallogic.com > > wrote: > > > From: Roman Stratiienko > > > >

Re: [PATCH] nbd: fix crash when the blksize is zero

2019-05-29 Thread Josef Bacik
On Mon, May 27, 2019 at 01:44:38PM +0800, xiu...@redhat.com wrote: > From: Xiubo Li > > This will allow the blksize to be set zero and then use 1024 as > default. > > Signed-off-by: Xiubo Li Hmm sorry I missed this somehow Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH 1/3] nbd: fix connection timed out error after reconnecting to server

2019-05-29 Thread Josef Bacik
On Wed, May 29, 2019 at 03:04:46AM +0800, Yao Liu wrote: > On Tue, May 28, 2019 at 12:57:59PM -0400, Josef Bacik wrote: > > On Tue, May 28, 2019 at 02:07:43AM +0800, Yao Liu wrote: > > > On Fri, May 24, 2019 at 09:07:42AM -0400, Josef Bacik wrote: > > > > On Fri, Ma

Re: [RFC PATCH] nbd: set the default nbds_max to 0

2019-05-29 Thread Josef Bacik
On Wed, May 29, 2019 at 04:08:36PM +0800, xiu...@redhat.com wrote: > From: Xiubo Li > > There is one problem that when trying to check the nbd device > NBD_CMD_STATUS and at the same time insert the nbd.ko module, > we can randomly get some of the 16 /dev/nbd{0~15} are connected, > but they are n

Re: [PATCH 1/3] nbd: fix connection timed out error after reconnecting to server

2019-05-28 Thread Josef Bacik
On Tue, May 28, 2019 at 02:07:43AM +0800, Yao Liu wrote: > On Fri, May 24, 2019 at 09:07:42AM -0400, Josef Bacik wrote: > > On Fri, May 24, 2019 at 05:43:54PM +0800, Yao Liu wrote: > > > Some I/O requests that have been sent succussfully but have not yet been > > > r

Re: [PATCH 2/3] nbd: notify userland even if nbd has already disconnected

2019-05-28 Thread Josef Bacik
On Tue, May 28, 2019 at 02:23:23AM +0800, Yao Liu wrote: > On Fri, May 24, 2019 at 09:08:58AM -0400, Josef Bacik wrote: > > On Fri, May 24, 2019 at 05:43:55PM +0800, Yao Liu wrote: > > > Some nbd client implementations have a userland's daemon, so we should > > >

Re: [PATCH 3/3] nbd: mark sock as dead even if it's the last one

2019-05-24 Thread Josef Bacik
On Fri, May 24, 2019 at 05:43:56PM +0800, Yao Liu wrote: > When sock dead, nbd_read_stat should return a ERR_PTR and then we should > mark sock as dead and wait for a reconnection if the dead sock is the last > one, because nbd_xmit_timeout won't resubmit while num_connections <= 1. num_connection

Re: [PATCH 2/3] nbd: notify userland even if nbd has already disconnected

2019-05-24 Thread Josef Bacik
On Fri, May 24, 2019 at 05:43:55PM +0800, Yao Liu wrote: > Some nbd client implementations have a userland's daemon, so we should > inform client daemon to clean up and exit. > > Signed-off-by: Yao Liu Except the nbd_disconnected() check is for the case that the client told us specifically to di

Re: [PATCH 1/3] nbd: fix connection timed out error after reconnecting to server

2019-05-24 Thread Josef Bacik
On Fri, May 24, 2019 at 05:43:54PM +0800, Yao Liu wrote: > Some I/O requests that have been sent succussfully but have not yet been > replied won't be resubmitted after reconnecting because of server restart, > so we add a list to track them. > > Signed-off-by: Yao Liu Nack, this is what the tim

Re: CFQ idling kills I/O performance on ext4 with blkio cgroup controller

2019-05-21 Thread Josef Bacik
On Tue, May 21, 2019 at 12:48:14PM -0400, Theodore Ts'o wrote: > On Mon, May 20, 2019 at 11:15:58AM +0200, Jan Kara wrote: > > But this makes priority-inversion problems with ext4 journal worse, doesn't > > it? If we submit journal commit in blkio cgroup of some random process, it > > may get throt

Re: [PATCH 3/3] block: rename BIO_QUEUE_ENTERED as BIO_SPLITTED

2019-05-15 Thread Josef Bacik
re. The only > one use is for cgroup accounting on splitted bio, so rename it > as BIO_SPLITTED. > > Cc: Josef Bacik > Cc: Christoph Hellwig > Cc: Bart Van Assche > Signed-off-by: Ming Lei Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH] nbd:clear NBD_BOUND flag when NBD connection is closed

2019-04-03 Thread Josef Bacik
On Wed, Apr 03, 2019 at 12:13:53PM -0500, Adriana Kobylak wrote: > Adding Josef (updated email address in the maintainers file). > > On 2018-12-13 08:21, Adriana Kobylak wrote: > > On 2018-12-11 00:17, medadyo...@gmail.com wrote: > > > From: Medad > > > > > > If we do NOT clear NBD_BOUND fla

[PATCH] nbd: add a round robin client option

2019-04-03 Thread Josef Bacik
issue. Signed-off-by: Josef Bacik --- drivers/block/nbd.c | 14 ++ include/uapi/linux/nbd.h | 2 ++ 2 files changed, 16 insertions(+) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 90ba9f4c03f3..53463217fbe9 100644 --- a/drivers/block/nbd.c +++ b/drivers/block

Re: [PATCH v2 0/3] blkcg: sync() isolation

2019-03-08 Thread Josef Bacik
On Thu, Mar 07, 2019 at 07:08:31PM +0100, Andrea Righi wrote: > = Problem = > > When sync() is executed from a high-priority cgroup, the process is forced to > wait the completion of the entire outstanding writeback I/O, even the I/O that > was originally generated by low-priority cgroups potentia

Re: [PATCH v2 2/3] blkcg: introduce io.sync_isolation

2019-03-07 Thread Josef Bacik
ng the > previous behavior by default). > > When this flag is enabled any cgroup can write out only dirty pages that > belong to the cgroup itself (except for the root cgroup that would still > be able to write out all pages globally). > > Signed-off-by: Andrea Righi Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH v2 1/3] blkcg: prevent priority inversion problem during sync()

2019-03-07 Thread Josef Bacik
On Thu, Mar 07, 2019 at 07:08:32PM +0100, Andrea Righi wrote: > Prevent priority inversion problem when a high-priority blkcg issues a > sync() and it is forced to wait the completion of all the writeback I/O > generated by any other low-priority blkcg, causing massive latencies to > processes that

Re: [PATCH v2 3/3] blkcg: implement sync() isolation

2019-03-07 Thread Josef Bacik
On Thu, Mar 07, 2019 at 07:08:34PM +0100, Andrea Righi wrote: > Keep track of the inodes that have been dirtied by each blkcg cgroup and > make sure that a blkcg issuing a sync() can trigger the writeback + wait > of only those pages that belong to the cgroup itself. > > This behavior is applied o

[PATCH] block: init flush rq ref count to 1

2019-03-07 Thread Josef Bacik
sted this with a nbd-server that dropped flush requests to verify that it hung, and then tested with this patch to verify I got the timeout as expected and the error handling kicked in. Thanks, Signed-off-by: Josef Bacik --- block/blk-core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/b

Re: [RFC PATCH v2] blkcg: prevent priority inversion problem during sync()

2019-02-11 Thread Josef Bacik
On Mon, Feb 11, 2019 at 09:40:29PM +0100, Andrea Righi wrote: > On Mon, Feb 11, 2019 at 10:39:34AM -0500, Josef Bacik wrote: > > On Sat, Feb 09, 2019 at 03:07:49PM +0100, Andrea Righi wrote: > > > This is an attempt to mitigate the priority inversion problem of a > > >

Re: [RFC PATCH v2] blkcg: prevent priority inversion problem during sync()

2019-02-11 Thread Josef Bacik
On Sat, Feb 09, 2019 at 03:07:49PM +0100, Andrea Righi wrote: > This is an attempt to mitigate the priority inversion problem of a > high-priority blkcg issuing a sync() and being forced to wait the > completion of all the writeback I/O generated by any other low-priority > blkcg, causing massive l

Re: [RFC PATCH 0/3] cgroup: fsio throttle controller

2019-01-29 Thread Josef Bacik
On Tue, Jan 29, 2019 at 07:39:38PM +0100, Andrea Righi wrote: > On Mon, Jan 28, 2019 at 02:26:20PM -0500, Vivek Goyal wrote: > > On Mon, Jan 28, 2019 at 06:41:29PM +0100, Andrea Righi wrote: > > > Hi Vivek, > > > > > > sorry for the late reply. > > > > > > On Mon, Jan 21, 2019 at 04:47:15PM -0500

Re: [PATCH] blk-iolatency: fix IO hang due to negative inflight counter

2019-01-23 Thread Josef Bacik
On Tue, Jan 22, 2019 at 11:52:06PM -0800, Liu Bo wrote: > On Fri, Jan 18, 2019 at 5:51 PM Jens Axboe wrote: > > > > On 1/18/19 6:39 PM, Liu Bo wrote: > > > On Fri, Jan 18, 2019 at 8:43 AM Josef Bacik wrote: > > >> > > >> On Fri, Jan 18, 2019 at 09:

Re: [RFC PATCH 0/3] cgroup: fsio throttle controller

2019-01-18 Thread Josef Bacik
On Fri, Jan 18, 2019 at 07:44:03PM +0100, Andrea Righi wrote: > On Fri, Jan 18, 2019 at 11:35:31AM -0500, Josef Bacik wrote: > > On Fri, Jan 18, 2019 at 11:31:24AM +0100, Andrea Righi wrote: > > > This is a redesign of my old cgroup-io-throttle controller: > > > http

Re: [RFC PATCH 0/3] cgroup: fsio throttle controller

2019-01-18 Thread Josef Bacik
On Fri, Jan 18, 2019 at 06:07:45PM +0100, Paolo Valente wrote: > > > > Il giorno 18 gen 2019, alle ore 17:35, Josef Bacik > > ha scritto: > > > > On Fri, Jan 18, 2019 at 11:31:24AM +0100, Andrea Righi wrote: > >> This is a redesign of my old cgroup-io-th

Re: [PATCH] blk-iolatency: fix IO hang due to negative inflight counter

2019-01-18 Thread Josef Bacik
On Fri, Jan 18, 2019 at 09:28:06AM -0700, Jens Axboe wrote: > On 1/18/19 9:21 AM, Josef Bacik wrote: > > On Fri, Jan 18, 2019 at 05:58:18AM -0700, Jens Axboe wrote: > >> On 1/14/19 12:21 PM, Liu Bo wrote: > >>> Our test reported the following stack, and vmcore showed

Re: [RFC PATCH 0/3] cgroup: fsio throttle controller

2019-01-18 Thread Josef Bacik
On Fri, Jan 18, 2019 at 11:31:24AM +0100, Andrea Righi wrote: > This is a redesign of my old cgroup-io-throttle controller: > https://lwn.net/Articles/330531/ > > I'm resuming this old patch to point out a problem that I think is still > not solved completely. > > = Problem = > > The io.max cont

Re: [PATCH] blk-iolatency: fix IO hang due to negative inflight counter

2019-01-18 Thread Josef Bacik
On Fri, Jan 18, 2019 at 05:58:18AM -0700, Jens Axboe wrote: > On 1/14/19 12:21 PM, Liu Bo wrote: > > Our test reported the following stack, and vmcore showed that > > ->inflight counter is -1. > > > > [c9003fcc38d0] __schedule at 8173d95d > > [c9003fcc3958] schedule at 8173

Re: [PATCH 1/3] blktests: add cgroup2 infrastructure

2019-01-17 Thread Josef Bacik
On Wed, Jan 16, 2019 at 06:31:51PM -0800, Bart Van Assche wrote: > On 1/16/19 5:40 PM, Omar Sandoval wrote: > > On Tue, Jan 15, 2019 at 08:40:41AM -0800, Bart Van Assche wrote: > > > On Tue, 2019-01-01 at 19:13 -0800, Bart Van Assche wrote: > > > > On 12/4

Re: [PATCH v2] block: fix iolat timestamp and restore accounting semantics

2018-12-13 Thread Josef Bacik
On Thu, Dec 13, 2018 at 12:59:03PM -0700, Jens Axboe wrote: > On 12/13/18 12:52 PM, Josef Bacik wrote: > > On Thu, Dec 13, 2018 at 12:48:11PM -0700, Jens Axboe wrote: > >> On 12/11/18 4:01 PM, Dennis Zhou wrote: > >>> The blk-iolatency controller measures the time f

Re: [PATCH v2] block: fix iolat timestamp and restore accounting semantics

2018-12-13 Thread Josef Bacik
On Thu, Dec 13, 2018 at 12:48:11PM -0700, Jens Axboe wrote: > On 12/11/18 4:01 PM, Dennis Zhou wrote: > > The blk-iolatency controller measures the time from rq_qos_throttle() to > > rq_qos_done_bio() and attributes this time to the first bio that needs > > to create the request. This means if a bi

[PATCH] blktests: test turning wbt on and off

2018-12-12 Thread Josef Bacik
There have been a few issues with turning wbt on and off while IO is in flight, so add a test that just does some random rw IO and has a background thread that toggles wbt on and off. Signed-off-by: Josef Bacik --- tests/block/027 | 47 +++ tests

[PATCH] blktests: add a test for wbt

2018-12-12 Thread Josef Bacik
There's currently no tests to verify wbt is working properly, this patch fixes that. Simply run a varied workload and measure the read latencies with wbt off, and then turn it on and verify that the read latencies go down. Signed-off-by: Josef Bacik --- common/fio | 8 + c

[PATCH] blktests: include cgroup in rc

2018-12-12 Thread Josef Bacik
I added the cgroup cleanup code to the main loop, but didn't test any other test so I didn't notice it'll error out if we haven't included the cgroup helpers. Do this so the cleanup can happen appropriately. Signed-off-by: Josef Bacik --- common/rc | 1 + 1 file changed,

[PATCH] blktests: make block/026 run in constant time

2018-12-12 Thread Josef Bacik
. Signed-off-by: Josef Bacik --- tests/block/026 | 77 +++-- 1 file changed, 47 insertions(+), 30 deletions(-) diff --git a/tests/block/026 b/tests/block/026 index d56fabfcd880..88113a99bd28 100644 --- a/tests/block/026 +++ b/tests/block/026

Re: [PATCH] block: fix iolat timestamp and restore accounting semantics

2018-12-10 Thread Josef Bacik
ors by accounting time separately in a bio > adding the field bi_start. If this field is set, the bio should be > processed by blk-iolatency in rq_qos_done_bio(). > > [1] https://lore.kernel.org/lkml/20181205171039.73066-1-den...@kernel.org/ > > Signed-off-by: Dennis Zhou

[PATCH 2/2] blktests: block/025: an io.latency test

2018-12-05 Thread Josef Bacik
slow group get throttled until the first cgroup is able to finish, and then the slow cgroup will be allowed to finish. Signed-off-by: Josef Bacik --- tests/block/025 | 133 tests/block/025.out | 1 + 2 files changed, 134 insertions

[PATCH 1/2] blktests: add cgroup2 infrastructure

2018-12-05 Thread Josef Bacik
is always in a clean state. Signed-off-by: Josef Bacik --- check | 2 ++ common/rc | 48 2 files changed, 50 insertions(+) diff --git a/check b/check index ebd87c097e25..1c9dbc518fa1 100755 --- a/check +++ b/check @@ -294,6 +294,8 @@ _cleanup

[PATCH 0/2][V2] io.latency test for blktests

2018-12-05 Thread Josef Bacik
v1->v2: - dropped my python library, TIL about jq. - fixed the spelling mistakes in the test. -- Original message -- This patchset is to add a test to verify io.latency is working properly, and to add all the supporting code to run that test. First is the cgroup2 infrastructure which is fairly s

Re: [PATCH 13/14] blkcg: change blkg reference counting to use percpu_ref

2018-12-04 Thread Josef Bacik
On Tue, Dec 04, 2018 at 01:35:59PM -0500, Dennis Zhou wrote: > Every bio is now associated with a blkg putting blkg_get, blkg_try_get, > and blkg_put on the hot path. Switch over the refcnt in blkg to use > percpu_ref. > > Signed-off-by: Dennis Zhou > Acked-by: Tejun Heo

Re: [PATCH 12/14] blkcg: remove bio_disassociate_task()

2018-12-04 Thread Josef Bacik
by: Dennis Zhou > Acked-by: Tejun Heo Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH 11/14] blkcg: remove additional reference to the css

2018-12-04 Thread Josef Bacik
by: Dennis Zhou > Acked-by: Tejun Heo Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH 07/14] blkcg: consolidate bio_issue_init() to be a part of core

2018-12-04 Thread Josef Bacik
by: Dennis Zhou > Acked-by: Tejun Heo > Reviewed-by: Liu Bo > --- Reviewed-by: Josef Bacik Thanks, Josef

Re: [PATCH 06/14] blkcg: associate blkg when associating a device

2018-12-04 Thread Josef Bacik
st_queue *q, > struct bio *bio) > { > - struct blkcg *blkcg; > struct blkcg_gq *blkg; > bool throtl = false; > > - rcu_read_lock(); > + if (!bio->bi_blkg) > + bio_associate_blkg(bio); > Should we maybe WARN_ON_ONCE() here since this really shouldn't happen? Otherwise you can add Reviewed-by: Josef Bacik Thanks, Josef

[PATCH 0/3] Unify the throttling code for wbt and io-latency

2018-12-04 Thread Josef Bacik
Originally when I wrote io-latency and the rq_qos code to provide a common base between wbt and io-latency I left out the throttling part. These were basically the same, but slightly different in both cases. The difference was enough and the code wasn't too complicated that I just copied it into

[PATCH 1/3] block: add rq_qos_wait to rq_qos

2018-12-04 Thread Josef Bacik
do their own thing as appropriate. Signed-off-by: Josef Bacik --- block/blk-rq-qos.c | 86 ++ block/blk-rq-qos.h | 6 2 files changed, 92 insertions(+) diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c index 80f603b76f61..e932ef9d2718

[PATCH 2/3] block: convert wbt_wait() to use rq_qos_wait()

2018-12-04 Thread Josef Bacik
Now that we have rq_qos_wait() in place, convert wbt_wait() over to using it with it's specific callbacks. Signed-off-by: Josef Bacik --- block/blk-wbt.c | 65 ++--- 1 file changed, 11 insertions(+), 54 deletions(-) diff --git a/bloc

[PATCH 3/3] block: convert io-latency to use rq_qos_wait

2018-12-04 Thread Josef Bacik
Now that we have this common helper, convert io-latency over to use it as well. Signed-off-by: Josef Bacik --- block/blk-iolatency.c | 31 --- 1 file changed, 8 insertions(+), 23 deletions(-) diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c index

[PATCH 1/3] blktests: add cgroup2 infrastructure

2018-12-04 Thread Josef Bacik
is always in a clean state. Signed-off-by: Josef Bacik --- check | 2 ++ common/rc | 48 2 files changed, 50 insertions(+) diff --git a/check b/check index ebd87c097e25..1c9dbc518fa1 100755 --- a/check +++ b/check @@ -294,6 +294,8 @@ _cleanup

[PATCH 0/3] io.latency test for blktests

2018-12-04 Thread Josef Bacik
This patchset is to add a test to verify io.latency is working properly, and to add all the supporting code to run that test. First is the cgroup2 infrastructure which is fairly straightforward. Just verifies we have cgroup2, and gives us the helpers to check and make sure we have the right contr

[PATCH 3/3] blktests: block/025: an io.latency test

2018-12-04 Thread Josef Bacik
slow group get throttled until the first cgroup is able to finish, and then the slow cgroup will be allowed to finish. Signed-off-by: Josef Bacik --- tests/block/025 | 133 tests/block/025.out | 1 + 2 files changed, 134 insertions

[PATCH 2/3] blktests: add python scripts for parsing fio json output

2018-12-04 Thread Josef Bacik
sults data. Signed-off-by: Josef Bacik --- src/FioResultDecoder.py | 64 + src/fio-key-value.py| 28 ++ 2 files changed, 92 insertions(+) create mode 100644 src/FioResultDecoder.py create mode 100644 src/fio-key-value.py

Re: [PATCH 00/13 v4] block: always associate blkg and refcount cleanup

2018-11-27 Thread Josef Bacik
On Mon, Nov 26, 2018 at 04:19:33PM -0500, Dennis Zhou wrote: > Hi everyone, > > This is respin of v3 [1] with fixes for the errors reported in [2] and > [3]. v3 was reverted in [4]. > > The issue in [3] was that bio->bi_disk->queue and blkg->q were out > of sync. So when I changed blk_get_rl() to

  1   2   3   >