Re: [Nbd] NBD: exported files over something around 1 TiB get an insane device size on the client side and are actually empty

2017-01-14 Thread Alex Bligh
> On 15 Jan 2017, at 01:03, Josef Bacik wrote: > > Yeah I noticed this in testing, there's a bug with NBD since it's > inception where it uses a 32 bit number for keeping track of the size > of the device, I fixed it with > > nbd: use loff_t for blocksize and nbd_set_size args

[PATCH V6 05/18] blk-throttle: add upgrade logic for LIMIT_LOW state

2017-01-14 Thread Shaohua Li
When queue is in LIMIT_LOW state and all cgroups with low limit cross the bps/iops limitation, we will upgrade queue's state to LIMIT_MAX. To determine if a cgroup exceeds its limitation, we check if the cgroup has pending request. Since cgroup is throttled according to the limit, pending request

[PATCH V6 04/18] blk-throttle: configure bps/iops limit for cgroup in low limit

2017-01-14 Thread Shaohua Li
each queue will have a state machine. Initially queue is in LIMIT_LOW state, which means all cgroups will be throttled according to their low limit. After all cgroups with low limit cross the limit, the queue state gets upgraded to LIMIT_MAX state. For max limit, cgroup will use the limit

[PATCH V6 03/18] blk-throttle: add .low interface

2017-01-14 Thread Shaohua Li
Add low limit for cgroup and corresponding cgroup interface. To be consistent with memcg, we allow users configure .low limit higher than .max limit. But the internal logic always assumes .low limit is lower than .max limit. So we add extra bps/iops_conf fields in throtl_grp for userspace

[PATCH V6 00/18] blk-throttle: add .low limit

2017-01-14 Thread Shaohua Li
Hi, cgroup still lacks a good iocontroller. CFQ works well for hard disk, but not much for SSD. This patch set try to add a conservative limit for blk-throttle. It isn't a proportional scheduling, but can help prioritize cgroups. There are several advantages we choose blk-throttle: - blk-throttle

[PATCH V6 13/18] blk-throttle: add interface to configure idle time threshold

2017-01-14 Thread Shaohua Li
Add interface to configure the threshold. The io.low interface will like: echo "8:16 rbps=2097152 wbps=max idle=2000" > io.low idle is in microsecond unit. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 41 - 1 file changed, 28

[PATCH V6 10/18] blk-throttle: detect completed idle cgroup

2017-01-14 Thread Shaohua Li
cgroup could be assigned a limit, but doesn't dispatch enough IO, eg the cgroup is idle. When this happens, the cgroup doesn't hit its limit, so we can't move the state machine to higher level and all cgroups will be throttled to their lower limit, so we waste bandwidth. Detecting idle cgroup is

[PATCH V6 11/18] blk-throttle: make bandwidth change smooth

2017-01-14 Thread Shaohua Li
When cgroups all reach low limit, cgroups can dispatch more IO. This could make some cgroups dispatch more IO but others not, and even some cgroups could dispatch less IO than their low limit. For example, cg1 low limit 10MB/s, cg2 limit 80MB/s, assume disk maximum bandwidth is 120M/s for the

[PATCH V6 09/18] blk-throttle: choose a small throtl_slice for SSD

2017-01-14 Thread Shaohua Li
The throtl_slice is 100ms by default. This is a long time for SSD, a lot of IO can run. To make cgroups have smoother throughput, we choose a small value (20ms) for SSD. Signed-off-by: Shaohua Li --- block/blk-sysfs.c| 2 ++ block/blk-throttle.c | 18 +++---

[PATCH V6 14/18] blk-throttle: ignore idle cgroup limit

2017-01-14 Thread Shaohua Li
Last patch introduces a way to detect idle cgroup. We use it to make upgrade/downgrade decision. And the new algorithm can detect completely idle cgroup too, so we can delete the corresponding code. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 40

[PATCH V6 17/18] blk-throttle: add a mechanism to estimate IO latency

2017-01-14 Thread Shaohua Li
User configures latency target, but the latency threshold for each request size isn't fixed. For a SSD, the IO latency highly depends on request size. To calculate latency threshold, we sample some data, eg, average latency for request size 4k, 8k, 16k, 32k .. 1M. The latency threshold of each

Re: NBD: exported files over something around 1 TiB get an insane device size on the client side and are actually empty

2017-01-14 Thread Christoph Anton Mitterer
On Sat, 2017-01-14 at 20:03 -0500, Josef Bacik wrote: > nbd: use loff_t for blocksize and nbd_set_size args I'm just trying your patch, which does however not apply to 4.9.2 (but I've adapted it)... will tell later if it worked for me. Cheers, Chris. smime.p7s Description: S/MIME cryptographic

Re: [PATCH] nbd: use an idr to keep track of nbd devices

2017-01-14 Thread Josef Bacik
On Sat, Jan 14, 2017 at 4:10 PM, Sagi Grimberg wrote: Hey Josef, To prepare for dynamically adding new nbd devices to the system switch from using an array for the nbd devices and instead use an idr. This copies what loop does for keeping track of its devices. I think

Re: NBD: exported files over something around 1 TiB get an insane device size on the client side and are actually empty

2017-01-14 Thread Josef Bacik
On Sat, Jan 14, 2017 at 6:31 PM, Christoph Anton Mitterer wrote: Hi. On advice from Alex Bligh I'd like to ping linux-block and nbd-general about the issue described here: https://github.com/NetworkBlockDevice/nbd/issues/44 What basically happens is, that with a recent

[GIT PULL] Block fixes for 4.10-rc

2017-01-14 Thread Jens Axboe
Hi Linus, Here's a set of fixes for the current series. This pull request contains: - The virtio_blk stack DMA corruption fix from Christoph, fixing and issue with VMAP stacks. - O_DIRECT blkbits calculation fix from Chandan. - Discard regression fix from Christoph. - Queue init error

NBD: exported files over something around 1 TiB get an insane device size on the client side and are actually empty

2017-01-14 Thread Christoph Anton Mitterer
Hi. On advice from Alex Bligh I'd like to ping linux-block and nbd-general about the issue described here: https://github.com/NetworkBlockDevice/nbd/issues/44 What basically happens is, that with a recent kernel (Linux heisenberg 4.9.0-1-amd64 #1 SMP Debian 4.9.2-2 (2017-01-12) x86_64

Re: [PATCH] nbd: create a recv workqueue per nbd device

2017-01-14 Thread Josef Bacik
> On Jan 14, 2017, at 4:15 PM, Sagi Grimberg wrote: > > >>> Hey Josef, >>> Since we are in the memory reclaim path we need our recv work to be on a workqueue that has WQ_MEM_RECLAIM set so we can avoid deadlocks. Also set WQ_HIGHPRI since we are in the

Re: [PATCH] nbd: create a recv workqueue per nbd device

2017-01-14 Thread Sagi Grimberg
Hey Josef, Since we are in the memory reclaim path we need our recv work to be on a workqueue that has WQ_MEM_RECLAIM set so we can avoid deadlocks. Also set WQ_HIGHPRI since we are in the completion path for IO. Really a workqueue per device?? Did this really give performance advantage?

Re: [PATCH] nbd: use an idr to keep track of nbd devices

2017-01-14 Thread Sagi Grimberg
Hey Josef, To prepare for dynamically adding new nbd devices to the system switch from using an array for the nbd devices and instead use an idr. This copies what loop does for keeping track of its devices. I think ida_simple_* is simpler and sufficient here isn't it? I use more of the