Re: [PATCH] blk-merge: fix blk_recount_segments

2014-10-14 Thread Rusty Russell
Ming Lei  writes:
> Hi Rusty,

Hi Ming! 

Sorry, I was on vacation.  I'm back and slowly working through
all my mail...

> 1, FIO script
> [global]
> direct=1
> size=128G
> bsrange=${BS}-${BS}
> timeout=60
> numjobs=4
> ioengine=libaio
> iodepth=64
> filename=/dev/vdb #backed by /dev/nullb0, 4 virtqueues per virtio-blk
> group_reporting=1
>
> [f]
> rw=randread

> 5, result
> 5.1 without Rusty's virtio-vring patch
> - BS=4K, throughput: 179K
> - BS=256K, throughput: 27540

(ie. always using indirect)

> 5.2 with Rusty's virtio-vring patch
> - BS=4K, throughput: 173K
> - BS=256K, throughput: 25350

(ie. usually using indirect).

> Looks throughput decreases if BS is 256K in case of your patch.

Interesting.  Looks like we're ending up with fewer descs in flight,
though I'm surprised: with 256K blocks I'd expect us to hit the
steady state of "all indirect" almost immediately.

Hmm... I think the heuristic in my patch is flawed.  Let me try again,
and get back to you.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-10-14 Thread Rusty Russell
Ming Lei ming@canonical.com writes:
 Hi Rusty,

Hi Ming! 

Sorry, I was on vacation.  I'm back and slowly working through
all my mail...

 1, FIO script
 [global]
 direct=1
 size=128G
 bsrange=${BS}-${BS}
 timeout=60
 numjobs=4
 ioengine=libaio
 iodepth=64
 filename=/dev/vdb #backed by /dev/nullb0, 4 virtqueues per virtio-blk
 group_reporting=1

 [f]
 rw=randread

 5, result
 5.1 without Rusty's virtio-vring patch
 - BS=4K, throughput: 179K
 - BS=256K, throughput: 27540

(ie. always using indirect)

 5.2 with Rusty's virtio-vring patch
 - BS=4K, throughput: 173K
 - BS=256K, throughput: 25350

(ie. usually using indirect).

 Looks throughput decreases if BS is 256K in case of your patch.

Interesting.  Looks like we're ending up with fewer descs in flight,
though I'm surprised: with 256K blocks I'd expect us to hit the
steady state of all indirect almost immediately.

Hmm... I think the heuristic in my patch is flawed.  Let me try again,
and get back to you.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-13 Thread Ming Lei
Hi Rusty,

On Fri, Sep 12, 2014 at 9:43 AM, Rusty Russell  wrote:
> Ming Lei  writes:
>> On Thu, Sep 11, 2014 at 7:38 AM, Rusty Russell  wrote:
>>> Rusty Russell  writes:
 Rusty Russell  writes:
 Here's what I have.  It seems to work as expected, but I haven't
 benchmarked it.
>>>
>>> Hi Ming,
>>>
>>> Any chance you can run your benchmarks against this patch?
>>
>> I can run the previous benchmark against this patch, but I am wondering
>> if it is enough since the change need lots of test cases to verify.
>
> Indeed, I'll particularly need to run network tests as well, but you're
> the one who suggesting the fix for block so I'm relying on you to see
> if this helps :)
>
>> BTW, looks your patch doesn't against upstream kernel, since I can't
>> find alloc_indirect() in drivers/virtio/virtio_ring.c
>
> Yes, the code in my pending-rebases tree has been reworked.  Here's
> the backport for you:

I didn't know your virtio tree, otherwise I can pull your tree by myself, :-)

Follows my virtio-blk test and result:

1, FIO script
[global]
direct=1
size=128G
bsrange=${BS}-${BS}
timeout=60
numjobs=4
ioengine=libaio
iodepth=64
filename=/dev/vdb #backed by /dev/nullb0, 4 virtqueues per virtio-blk
group_reporting=1

[f]
rw=randread

2, hypervisor(patched QEMU with multi virtqueue support)

git://kernel.ubuntu.com/ming/linux.git v3.17-block-dev_v3

3, host
- 2 sockets, 8 physical cores(16 threads), Intel Xeon 2.13GHz
- 24G RAM

4, QEMU command line
$QEMUPATH \
-name 'kvm-test'  \
-M pc  \
-vga none  \
-drive 
id=drive_image1,if=none,format=raw,cache=none,aio=native,file=rootfs.img
\
-device 
virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1,scsi=off,config-wce=off,x-data-plane=on,bus=pci.0,addr=02
\
-drive 
id=drive_image3,if=none,format=raw,cache=none,aio=native,file=/dev/nullb0
\
-device 
virtio-blk-pci,id=image3,drive=drive_image3,bootindex=3,scsi=off,config-wce=off,x-data-plane=on,vectors=5,num_queues=4,bus=pci.0,addr=04
\
-device 
virtio-net-pci,mac=9a:be:bf:c0:c1:c2,id=idDyAIbK,vectors=4,netdev=idabMX4S,bus=pci.0,addr=05
\
-netdev user,id=idabMX4S,hostfwd=tcp::5000-:22  \
-m 8192  \
-smp 4,maxcpus=4  \
-kernel $KERNEL_IMG \
-append 'earlyprintk console=ttyS0 mem=8192M rootfstype=ext4
root=/dev/vda rw virtio_blk.queue_depth=128 loglevel=9
no_console_suspend ip=dhcp'  \
-nographic  \
-rtc base=utc,clock=host,driftfix=none \
-enable-kvm

5, result
5.1 without Rusty's virtio-vring patch
- BS=4K, throughput: 179K
- BS=256K, throughput: 27540

5.2 with Rusty's virtio-vring patch
- BS=4K, throughput: 173K
- BS=256K, throughput: 25350

Looks throughput decreases if BS is 256K in case of your patch.

Thanks,
--
Ming Lei
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 4d08f45a9c29..5a911e1f7462 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -78,6 +78,11 @@ struct vring_virtqueue
> /* Number we've added since last sync. */
> unsigned int num_added;
>
> +   /* How many descriptors have been added. */
> +   unsigned int num_in_use;
> +   /* Maximum descriptors in use (degrades over time). */
> +   unsigned int max_in_use;
> +
> /* Last used index we've seen. */
> u16 last_used_idx;
>
> @@ -184,6 +189,31 @@ static inline int vring_add_indirect(struct 
> vring_virtqueue *vq,
> return head;
>  }
>
> +static bool try_indirect(struct vring_virtqueue *vq, unsigned int total_sg)
> +{
> +   unsigned long num_expected;
> +
> +   if (!vq->indirect)
> +   return false;
> +
> +   /* Completely full?  Don't even bother with indirect alloc */
> +   if (!vq->vq.num_free)
> +   return false;
> +
> +   /* We're not going to fit?  Try indirect. */
> +   if (total_sg > vq->vq.num_free)
> +   return true;
> +
> +   /* We should be tracking this. */
> +   BUG_ON(vq->max_in_use < vq->num_in_use);
> +
> +   /* How many more descriptors do we expect at peak usage? */
> +   num_expected = vq->max_in_use - vq->num_in_use;
> +
> +   /* If each were this size, would they overflow? */
> +   return (total_sg * num_expected > vq->vq.num_free);
> +}
> +
>  static inline int virtqueue_add(struct virtqueue *_vq,
> struct scatterlist *sgs[],
> struct scatterlist *(*next)
> @@ -226,7 +256,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>
> /* If the host supports indirect descriptor tables, and we have 
> multiple
>  * buffers, then go indirect. FIXME: tune this threshold */
> -   if (vq->indirect && total_sg > 1 && vq->vq.num_free) {
> +   if (try_indirect(vq, total_sg)) {
> head = vring_add_indirect(vq, sgs, next, total_sg, total_out,
>   total_in,
>   out_sgs, 

Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-13 Thread Ming Lei
Hi Rusty,

On Fri, Sep 12, 2014 at 9:43 AM, Rusty Russell ru...@rustcorp.com.au wrote:
 Ming Lei ming@canonical.com writes:
 On Thu, Sep 11, 2014 at 7:38 AM, Rusty Russell ru...@rustcorp.com.au wrote:
 Rusty Russell ru...@rustcorp.com.au writes:
 Rusty Russell ru...@rustcorp.com.au writes:
 Here's what I have.  It seems to work as expected, but I haven't
 benchmarked it.

 Hi Ming,

 Any chance you can run your benchmarks against this patch?

 I can run the previous benchmark against this patch, but I am wondering
 if it is enough since the change need lots of test cases to verify.

 Indeed, I'll particularly need to run network tests as well, but you're
 the one who suggesting the fix for block so I'm relying on you to see
 if this helps :)

 BTW, looks your patch doesn't against upstream kernel, since I can't
 find alloc_indirect() in drivers/virtio/virtio_ring.c

 Yes, the code in my pending-rebases tree has been reworked.  Here's
 the backport for you:

I didn't know your virtio tree, otherwise I can pull your tree by myself, :-)

Follows my virtio-blk test and result:

1, FIO script
[global]
direct=1
size=128G
bsrange=${BS}-${BS}
timeout=60
numjobs=4
ioengine=libaio
iodepth=64
filename=/dev/vdb #backed by /dev/nullb0, 4 virtqueues per virtio-blk
group_reporting=1

[f]
rw=randread

2, hypervisor(patched QEMU with multi virtqueue support)

git://kernel.ubuntu.com/ming/linux.git v3.17-block-dev_v3

3, host
- 2 sockets, 8 physical cores(16 threads), Intel Xeon 2.13GHz
- 24G RAM

4, QEMU command line
$QEMUPATH \
-name 'kvm-test'  \
-M pc  \
-vga none  \
-drive 
id=drive_image1,if=none,format=raw,cache=none,aio=native,file=rootfs.img
\
-device 
virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1,scsi=off,config-wce=off,x-data-plane=on,bus=pci.0,addr=02
\
-drive 
id=drive_image3,if=none,format=raw,cache=none,aio=native,file=/dev/nullb0
\
-device 
virtio-blk-pci,id=image3,drive=drive_image3,bootindex=3,scsi=off,config-wce=off,x-data-plane=on,vectors=5,num_queues=4,bus=pci.0,addr=04
\
-device 
virtio-net-pci,mac=9a:be:bf:c0:c1:c2,id=idDyAIbK,vectors=4,netdev=idabMX4S,bus=pci.0,addr=05
\
-netdev user,id=idabMX4S,hostfwd=tcp::5000-:22  \
-m 8192  \
-smp 4,maxcpus=4  \
-kernel $KERNEL_IMG \
-append 'earlyprintk console=ttyS0 mem=8192M rootfstype=ext4
root=/dev/vda rw virtio_blk.queue_depth=128 loglevel=9
no_console_suspend ip=dhcp'  \
-nographic  \
-rtc base=utc,clock=host,driftfix=none \
-enable-kvm

5, result
5.1 without Rusty's virtio-vring patch
- BS=4K, throughput: 179K
- BS=256K, throughput: 27540

5.2 with Rusty's virtio-vring patch
- BS=4K, throughput: 173K
- BS=256K, throughput: 25350

Looks throughput decreases if BS is 256K in case of your patch.

Thanks,
--
Ming Lei

 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 4d08f45a9c29..5a911e1f7462 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -78,6 +78,11 @@ struct vring_virtqueue
 /* Number we've added since last sync. */
 unsigned int num_added;

 +   /* How many descriptors have been added. */
 +   unsigned int num_in_use;
 +   /* Maximum descriptors in use (degrades over time). */
 +   unsigned int max_in_use;
 +
 /* Last used index we've seen. */
 u16 last_used_idx;

 @@ -184,6 +189,31 @@ static inline int vring_add_indirect(struct 
 vring_virtqueue *vq,
 return head;
  }

 +static bool try_indirect(struct vring_virtqueue *vq, unsigned int total_sg)
 +{
 +   unsigned long num_expected;
 +
 +   if (!vq-indirect)
 +   return false;
 +
 +   /* Completely full?  Don't even bother with indirect alloc */
 +   if (!vq-vq.num_free)
 +   return false;
 +
 +   /* We're not going to fit?  Try indirect. */
 +   if (total_sg  vq-vq.num_free)
 +   return true;
 +
 +   /* We should be tracking this. */
 +   BUG_ON(vq-max_in_use  vq-num_in_use);
 +
 +   /* How many more descriptors do we expect at peak usage? */
 +   num_expected = vq-max_in_use - vq-num_in_use;
 +
 +   /* If each were this size, would they overflow? */
 +   return (total_sg * num_expected  vq-vq.num_free);
 +}
 +
  static inline int virtqueue_add(struct virtqueue *_vq,
 struct scatterlist *sgs[],
 struct scatterlist *(*next)
 @@ -226,7 +256,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,

 /* If the host supports indirect descriptor tables, and we have 
 multiple
  * buffers, then go indirect. FIXME: tune this threshold */
 -   if (vq-indirect  total_sg  1  vq-vq.num_free) {
 +   if (try_indirect(vq, total_sg)) {
 head = vring_add_indirect(vq, sgs, next, total_sg, total_out,
   total_in,
   out_sgs, in_sgs, gfp);
 @@ 

Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-12 Thread Rusty Russell
Ming Lei  writes:
> On Thu, Sep 11, 2014 at 7:38 AM, Rusty Russell  wrote:
>> Rusty Russell  writes:
>>> Rusty Russell  writes:
>>> Here's what I have.  It seems to work as expected, but I haven't
>>> benchmarked it.
>>
>> Hi Ming,
>>
>> Any chance you can run your benchmarks against this patch?
>
> I can run the previous benchmark against this patch, but I am wondering
> if it is enough since the change need lots of test cases to verify.

Indeed, I'll particularly need to run network tests as well, but you're
the one who suggesting the fix for block so I'm relying on you to see
if this helps :)

> BTW, looks your patch doesn't against upstream kernel, since I can't
> find alloc_indirect() in drivers/virtio/virtio_ring.c

Yes, the code in my pending-rebases tree has been reworked.  Here's
the backport for you:

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 4d08f45a9c29..5a911e1f7462 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -78,6 +78,11 @@ struct vring_virtqueue
/* Number we've added since last sync. */
unsigned int num_added;
 
+   /* How many descriptors have been added. */
+   unsigned int num_in_use;
+   /* Maximum descriptors in use (degrades over time). */
+   unsigned int max_in_use;
+
/* Last used index we've seen. */
u16 last_used_idx;
 
@@ -184,6 +189,31 @@ static inline int vring_add_indirect(struct 
vring_virtqueue *vq,
return head;
 }
 
+static bool try_indirect(struct vring_virtqueue *vq, unsigned int total_sg)
+{
+   unsigned long num_expected;
+
+   if (!vq->indirect)
+   return false;
+
+   /* Completely full?  Don't even bother with indirect alloc */
+   if (!vq->vq.num_free)
+   return false;
+
+   /* We're not going to fit?  Try indirect. */
+   if (total_sg > vq->vq.num_free)
+   return true;
+
+   /* We should be tracking this. */
+   BUG_ON(vq->max_in_use < vq->num_in_use);
+
+   /* How many more descriptors do we expect at peak usage? */
+   num_expected = vq->max_in_use - vq->num_in_use;
+
+   /* If each were this size, would they overflow? */
+   return (total_sg * num_expected > vq->vq.num_free);
+}
+
 static inline int virtqueue_add(struct virtqueue *_vq,
struct scatterlist *sgs[],
struct scatterlist *(*next)
@@ -226,7 +256,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 
/* If the host supports indirect descriptor tables, and we have multiple
 * buffers, then go indirect. FIXME: tune this threshold */
-   if (vq->indirect && total_sg > 1 && vq->vq.num_free) {
+   if (try_indirect(vq, total_sg)) {
head = vring_add_indirect(vq, sgs, next, total_sg, total_out,
  total_in,
  out_sgs, in_sgs, gfp);
@@ -291,6 +321,14 @@ add_head:
virtio_wmb(vq->weak_barriers);
vq->vring.avail->idx++;
vq->num_added++;
+   vq->num_in_use++;
+
+   /* Every vq->vring.num descriptors, decay the maximum value */
+   if (unlikely(avail == 0))
+   vq->max_in_use >>= 1;
+
+   if (vq->num_in_use > vq->max_in_use)
+   vq->max_in_use = vq->num_in_use;
 
/* This is very unlikely, but theoretically possible.  Kick
 * just in case. */
@@ -569,6 +607,7 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int 
*len)
virtio_mb(vq->weak_barriers);
}
 
+   vq->num_in_use--;
 #ifdef DEBUG
vq->last_add_time_valid = false;
 #endif
@@ -791,6 +830,8 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
vq->last_used_idx = 0;
vq->num_added = 0;
list_add_tail(>vq.list, >vqs);
+   vq->num_in_use = 0;
+   vq->max_in_use = 0;
 #ifdef DEBUG
vq->in_use = false;
vq->last_add_time_valid = false;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-12 Thread Rusty Russell
Ming Lei ming@canonical.com writes:
 On Thu, Sep 11, 2014 at 7:38 AM, Rusty Russell ru...@rustcorp.com.au wrote:
 Rusty Russell ru...@rustcorp.com.au writes:
 Rusty Russell ru...@rustcorp.com.au writes:
 Here's what I have.  It seems to work as expected, but I haven't
 benchmarked it.

 Hi Ming,

 Any chance you can run your benchmarks against this patch?

 I can run the previous benchmark against this patch, but I am wondering
 if it is enough since the change need lots of test cases to verify.

Indeed, I'll particularly need to run network tests as well, but you're
the one who suggesting the fix for block so I'm relying on you to see
if this helps :)

 BTW, looks your patch doesn't against upstream kernel, since I can't
 find alloc_indirect() in drivers/virtio/virtio_ring.c

Yes, the code in my pending-rebases tree has been reworked.  Here's
the backport for you:

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 4d08f45a9c29..5a911e1f7462 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -78,6 +78,11 @@ struct vring_virtqueue
/* Number we've added since last sync. */
unsigned int num_added;
 
+   /* How many descriptors have been added. */
+   unsigned int num_in_use;
+   /* Maximum descriptors in use (degrades over time). */
+   unsigned int max_in_use;
+
/* Last used index we've seen. */
u16 last_used_idx;
 
@@ -184,6 +189,31 @@ static inline int vring_add_indirect(struct 
vring_virtqueue *vq,
return head;
 }
 
+static bool try_indirect(struct vring_virtqueue *vq, unsigned int total_sg)
+{
+   unsigned long num_expected;
+
+   if (!vq-indirect)
+   return false;
+
+   /* Completely full?  Don't even bother with indirect alloc */
+   if (!vq-vq.num_free)
+   return false;
+
+   /* We're not going to fit?  Try indirect. */
+   if (total_sg  vq-vq.num_free)
+   return true;
+
+   /* We should be tracking this. */
+   BUG_ON(vq-max_in_use  vq-num_in_use);
+
+   /* How many more descriptors do we expect at peak usage? */
+   num_expected = vq-max_in_use - vq-num_in_use;
+
+   /* If each were this size, would they overflow? */
+   return (total_sg * num_expected  vq-vq.num_free);
+}
+
 static inline int virtqueue_add(struct virtqueue *_vq,
struct scatterlist *sgs[],
struct scatterlist *(*next)
@@ -226,7 +256,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 
/* If the host supports indirect descriptor tables, and we have multiple
 * buffers, then go indirect. FIXME: tune this threshold */
-   if (vq-indirect  total_sg  1  vq-vq.num_free) {
+   if (try_indirect(vq, total_sg)) {
head = vring_add_indirect(vq, sgs, next, total_sg, total_out,
  total_in,
  out_sgs, in_sgs, gfp);
@@ -291,6 +321,14 @@ add_head:
virtio_wmb(vq-weak_barriers);
vq-vring.avail-idx++;
vq-num_added++;
+   vq-num_in_use++;
+
+   /* Every vq-vring.num descriptors, decay the maximum value */
+   if (unlikely(avail == 0))
+   vq-max_in_use = 1;
+
+   if (vq-num_in_use  vq-max_in_use)
+   vq-max_in_use = vq-num_in_use;
 
/* This is very unlikely, but theoretically possible.  Kick
 * just in case. */
@@ -569,6 +607,7 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int 
*len)
virtio_mb(vq-weak_barriers);
}
 
+   vq-num_in_use--;
 #ifdef DEBUG
vq-last_add_time_valid = false;
 #endif
@@ -791,6 +830,8 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
vq-last_used_idx = 0;
vq-num_added = 0;
list_add_tail(vq-vq.list, vdev-vqs);
+   vq-num_in_use = 0;
+   vq-max_in_use = 0;
 #ifdef DEBUG
vq-in_use = false;
vq-last_add_time_valid = false;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-10 Thread Ming Lei
On Thu, Sep 11, 2014 at 7:38 AM, Rusty Russell  wrote:
> Rusty Russell  writes:
>> Rusty Russell  writes:
>> Here's what I have.  It seems to work as expected, but I haven't
>> benchmarked it.
>
> Hi Ming,
>
> Any chance you can run your benchmarks against this patch?

I can run the previous benchmark against this patch, but I am wondering
if it is enough since the change need lots of test cases to verify.

BTW, looks your patch doesn't against upstream kernel, since I can't
find alloc_indirect() in drivers/virtio/virtio_ring.c

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-10 Thread Rusty Russell
Rusty Russell  writes:
> Rusty Russell  writes:
> Here's what I have.  It seems to work as expected, but I haven't
> benchmarked it.

Hi Ming,

Any chance you can run your benchmarks against this patch?

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-10 Thread Rusty Russell
Rusty Russell ru...@rustcorp.com.au writes:
 Rusty Russell ru...@rustcorp.com.au writes:
 Here's what I have.  It seems to work as expected, but I haven't
 benchmarked it.

Hi Ming,

Any chance you can run your benchmarks against this patch?

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-10 Thread Ming Lei
On Thu, Sep 11, 2014 at 7:38 AM, Rusty Russell ru...@rustcorp.com.au wrote:
 Rusty Russell ru...@rustcorp.com.au writes:
 Rusty Russell ru...@rustcorp.com.au writes:
 Here's what I have.  It seems to work as expected, but I haven't
 benchmarked it.

 Hi Ming,

 Any chance you can run your benchmarks against this patch?

I can run the previous benchmark against this patch, but I am wondering
if it is enough since the change need lots of test cases to verify.

BTW, looks your patch doesn't against upstream kernel, since I can't
find alloc_indirect() in drivers/virtio/virtio_ring.c

Thanks,
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-05 Thread Rusty Russell
Rusty Russell  writes:
> Ming Lei  writes:
>> On Tue, 02 Sep 2014 10:24:24 -0600
>> Jens Axboe  wrote:
>>
>>> On 09/02/2014 10:21 AM, Christoph Hellwig wrote:
>>> > Btw, one thing we should reconsider is where we set
>>> > QUEUE_FLAG_NO_SG_MERGE.  At least for virtio-blk it seems to me that
>>> > doing the S/G merge should be a lot cheaper than fanning out into the
>>> > indirect descriptors.
>>
>> Indirect is always considered first no matter SG merge is off or on,
>> at least from current virtio-blk implementation.
>>
>> But it is a good idea to try direct descriptor first, the below simple
>> change can improve randread(libaio, O_DIRECT, multi-queue) 7% in my test,
>> and 77% transfer starts to use direct descriptor, and almost all transfer
>> uses indirect descriptor only in current upstream implementation.
>
> Hi Ming!
>
> In general, we want to use direct descriptors of we have plenty
> of descriptors, and indirect if the ring is going to fill up.  I was
> thinking about this just yesterday, in fact.
>
> I've been trying to use EWMA to figure out how full the ring gets, but
> so far it's not working well.  I'm still hacking on a solution though,
> and your thoughts would be welcome.

Here's what I have.  It seems to work as expected, but I haven't
benchmarked it.

Subject: virtio_ring: try to use direct descriptors when we're not likely to 
fill ring

Indirect virtio descriptors allow us to use a single ring entry for a
large scatter-gather list, at the cost of a kmalloc.  If our ring
isn't heavily used, there's no point preserving descriptors.

This patch tracks the maximum number of descriptors in the ring, with
a slow decay.  When we add a new buffer, we assume there will be that
maximum number of descriptors, and use a direct buffer if there would
be room for that many descriptors of this size.

Signed-off-by: Rusty Russell 

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6d2b5310c991..2ff583477139 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -78,6 +78,11 @@ struct vring_virtqueue
/* Number we've added since last sync. */
unsigned int num_added;
 
+   /* How many descriptors have been added. */
+   unsigned int num_in_use;
+   /* Maximum descriptors in use (degrades over time). */
+   unsigned int max_in_use;
+
/* Last used index we've seen. */
u16 last_used_idx;
 
@@ -120,6 +125,31 @@ static struct vring_desc *alloc_indirect(unsigned int 
total_sg, gfp_t gfp)
return desc;
 }
 
+static bool try_indirect(struct vring_virtqueue *vq, unsigned int total_sg)
+{
+   unsigned long num_expected;
+
+   if (!vq->indirect)
+   return false;
+
+   /* Completely full?  Don't even bother with indirect alloc */
+   if (!vq->vq.num_free)
+   return false;
+
+   /* We're not going to fit?  Try indirect. */
+   if (total_sg > vq->vq.num_free)
+   return true;
+
+   /* We should be tracking this. */
+   BUG_ON(vq->max_in_use < vq->num_in_use);
+
+   /* How many more descriptors do we expect at peak usage? */
+   num_expected = vq->max_in_use - vq->num_in_use;
+
+   /* If each were this size, would they overflow? */
+   return (total_sg * num_expected > vq->vq.num_free);
+}
+
 static inline int virtqueue_add(struct virtqueue *_vq,
struct scatterlist *sgs[],
unsigned int total_sg,
@@ -162,9 +192,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 
head = vq->free_head;
 
-   /* If the host supports indirect descriptor tables, and we have multiple
-* buffers, then go indirect. FIXME: tune this threshold */
-   if (vq->indirect && total_sg > 1 && vq->vq.num_free)
+   if (try_indirect(vq, total_sg))
desc = alloc_indirect(total_sg, gfp);
else
desc = NULL;
@@ -243,6 +271,14 @@ static inline int virtqueue_add(struct virtqueue *_vq,
virtio_wmb(vq->weak_barriers);
vq->vring.avail->idx++;
vq->num_added++;
+   vq->num_in_use++;
+
+   /* Every vq->vring.num descriptors, decay the maximum value */
+   if (unlikely(avail == 0))
+   vq->max_in_use >>= 1;
+
+   if (vq->num_in_use > vq->max_in_use)
+   vq->max_in_use = vq->num_in_use;
 
/* This is very unlikely, but theoretically possible.  Kick
 * just in case. */
@@ -515,6 +551,7 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int 
*len)
virtio_mb(vq->weak_barriers);
}
 
+   vq->num_in_use--;
 #ifdef DEBUG
vq->last_add_time_valid = false;
 #endif
@@ -737,6 +774,8 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
vq->last_used_idx = 0;
vq->num_added = 0;
list_add_tail(>vq.list, >vqs);
+   vq->num_in_use = 0;
+   vq->max_in_use = 0;
 #ifdef DEBUG
vq->in_use 

Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-05 Thread Ming Lei
On Fri, Sep 5, 2014 at 2:26 PM, Ming Lei  wrote:
> On Fri, Sep 5, 2014 at 1:43 PM, Rusty Russell  wrote:
>
> The main problem is the extra kmalloc(), which might be
> improved by a memory pool.

Or use kind of EWMA model to cache previous allocated indirect descriptors.


Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-05 Thread Ming Lei
On Fri, Sep 5, 2014 at 1:43 PM, Rusty Russell  wrote:
> Ming Lei  writes:
>> On Tue, 02 Sep 2014 10:24:24 -0600
>> Jens Axboe  wrote:
>>
>>> On 09/02/2014 10:21 AM, Christoph Hellwig wrote:
>>> > Btw, one thing we should reconsider is where we set
>>> > QUEUE_FLAG_NO_SG_MERGE.  At least for virtio-blk it seems to me that
>>> > doing the S/G merge should be a lot cheaper than fanning out into the
>>> > indirect descriptors.
>>
>> Indirect is always considered first no matter SG merge is off or on,
>> at least from current virtio-blk implementation.
>>
>> But it is a good idea to try direct descriptor first, the below simple
>> change can improve randread(libaio, O_DIRECT, multi-queue) 7% in my test,
>> and 77% transfer starts to use direct descriptor, and almost all transfer
>> uses indirect descriptor only in current upstream implementation.
>
> Hi Ming!
>
> In general, we want to use direct descriptors of we have plenty
> of descriptors, and indirect if the ring is going to fill up.  I was
> thinking about this just yesterday, in fact.

I thought about the idea further and looks it isn't mature at least for
virtio-blk:

- the queue num is a bit small, for example 128 returned from QEMU
- so it is easy to exhaust all direct descriptors, and queue has to be
stopped

My previous test is based on null_blk which is quite fast so no
above problem.

IMO, there are at least two advantages by using indirect descriptors:

- queue won't be stopped because descriptor is enough(may not be true
for other virtio devices, like virtio-net, rx traffic is difficult to predict)
- good cache utilization because all descriptors are put together

The main problem is the extra kmalloc(), which might be
improved by a memory pool.

>
> I've been trying to use EWMA to figure out how full the ring gets, but

How full should have been figured out by num_free?

> so far it's not working well.  I'm still hacking on a solution though,
> and your thoughts would be welcome.

I am wondering if it is easy to predict how many transfers will be coming
with some mathematics model. My concern is that the cost caused by
stopping queue may overwhelm advantage from using direct descriptor.


Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-05 Thread Ming Lei
On Fri, Sep 5, 2014 at 1:43 PM, Rusty Russell ru...@rustcorp.com.au wrote:
 Ming Lei ming@canonical.com writes:
 On Tue, 02 Sep 2014 10:24:24 -0600
 Jens Axboe ax...@kernel.dk wrote:

 On 09/02/2014 10:21 AM, Christoph Hellwig wrote:
  Btw, one thing we should reconsider is where we set
  QUEUE_FLAG_NO_SG_MERGE.  At least for virtio-blk it seems to me that
  doing the S/G merge should be a lot cheaper than fanning out into the
  indirect descriptors.

 Indirect is always considered first no matter SG merge is off or on,
 at least from current virtio-blk implementation.

 But it is a good idea to try direct descriptor first, the below simple
 change can improve randread(libaio, O_DIRECT, multi-queue) 7% in my test,
 and 77% transfer starts to use direct descriptor, and almost all transfer
 uses indirect descriptor only in current upstream implementation.

 Hi Ming!

 In general, we want to use direct descriptors of we have plenty
 of descriptors, and indirect if the ring is going to fill up.  I was
 thinking about this just yesterday, in fact.

I thought about the idea further and looks it isn't mature at least for
virtio-blk:

- the queue num is a bit small, for example 128 returned from QEMU
- so it is easy to exhaust all direct descriptors, and queue has to be
stopped

My previous test is based on null_blk which is quite fast so no
above problem.

IMO, there are at least two advantages by using indirect descriptors:

- queue won't be stopped because descriptor is enough(may not be true
for other virtio devices, like virtio-net, rx traffic is difficult to predict)
- good cache utilization because all descriptors are put together

The main problem is the extra kmalloc(), which might be
improved by a memory pool.


 I've been trying to use EWMA to figure out how full the ring gets, but

How full should have been figured out by num_free?

 so far it's not working well.  I'm still hacking on a solution though,
 and your thoughts would be welcome.

I am wondering if it is easy to predict how many transfers will be coming
with some mathematics model. My concern is that the cost caused by
stopping queue may overwhelm advantage from using direct descriptor.


Thanks,
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-05 Thread Ming Lei
On Fri, Sep 5, 2014 at 2:26 PM, Ming Lei ming@canonical.com wrote:
 On Fri, Sep 5, 2014 at 1:43 PM, Rusty Russell ru...@rustcorp.com.au wrote:

 The main problem is the extra kmalloc(), which might be
 improved by a memory pool.

Or use kind of EWMA model to cache previous allocated indirect descriptors.


Thanks,
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-05 Thread Rusty Russell
Rusty Russell ru...@rustcorp.com.au writes:
 Ming Lei ming@canonical.com writes:
 On Tue, 02 Sep 2014 10:24:24 -0600
 Jens Axboe ax...@kernel.dk wrote:

 On 09/02/2014 10:21 AM, Christoph Hellwig wrote:
  Btw, one thing we should reconsider is where we set
  QUEUE_FLAG_NO_SG_MERGE.  At least for virtio-blk it seems to me that
  doing the S/G merge should be a lot cheaper than fanning out into the
  indirect descriptors.

 Indirect is always considered first no matter SG merge is off or on,
 at least from current virtio-blk implementation.

 But it is a good idea to try direct descriptor first, the below simple
 change can improve randread(libaio, O_DIRECT, multi-queue) 7% in my test,
 and 77% transfer starts to use direct descriptor, and almost all transfer
 uses indirect descriptor only in current upstream implementation.

 Hi Ming!

 In general, we want to use direct descriptors of we have plenty
 of descriptors, and indirect if the ring is going to fill up.  I was
 thinking about this just yesterday, in fact.

 I've been trying to use EWMA to figure out how full the ring gets, but
 so far it's not working well.  I'm still hacking on a solution though,
 and your thoughts would be welcome.

Here's what I have.  It seems to work as expected, but I haven't
benchmarked it.

Subject: virtio_ring: try to use direct descriptors when we're not likely to 
fill ring

Indirect virtio descriptors allow us to use a single ring entry for a
large scatter-gather list, at the cost of a kmalloc.  If our ring
isn't heavily used, there's no point preserving descriptors.

This patch tracks the maximum number of descriptors in the ring, with
a slow decay.  When we add a new buffer, we assume there will be that
maximum number of descriptors, and use a direct buffer if there would
be room for that many descriptors of this size.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6d2b5310c991..2ff583477139 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -78,6 +78,11 @@ struct vring_virtqueue
/* Number we've added since last sync. */
unsigned int num_added;
 
+   /* How many descriptors have been added. */
+   unsigned int num_in_use;
+   /* Maximum descriptors in use (degrades over time). */
+   unsigned int max_in_use;
+
/* Last used index we've seen. */
u16 last_used_idx;
 
@@ -120,6 +125,31 @@ static struct vring_desc *alloc_indirect(unsigned int 
total_sg, gfp_t gfp)
return desc;
 }
 
+static bool try_indirect(struct vring_virtqueue *vq, unsigned int total_sg)
+{
+   unsigned long num_expected;
+
+   if (!vq-indirect)
+   return false;
+
+   /* Completely full?  Don't even bother with indirect alloc */
+   if (!vq-vq.num_free)
+   return false;
+
+   /* We're not going to fit?  Try indirect. */
+   if (total_sg  vq-vq.num_free)
+   return true;
+
+   /* We should be tracking this. */
+   BUG_ON(vq-max_in_use  vq-num_in_use);
+
+   /* How many more descriptors do we expect at peak usage? */
+   num_expected = vq-max_in_use - vq-num_in_use;
+
+   /* If each were this size, would they overflow? */
+   return (total_sg * num_expected  vq-vq.num_free);
+}
+
 static inline int virtqueue_add(struct virtqueue *_vq,
struct scatterlist *sgs[],
unsigned int total_sg,
@@ -162,9 +192,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 
head = vq-free_head;
 
-   /* If the host supports indirect descriptor tables, and we have multiple
-* buffers, then go indirect. FIXME: tune this threshold */
-   if (vq-indirect  total_sg  1  vq-vq.num_free)
+   if (try_indirect(vq, total_sg))
desc = alloc_indirect(total_sg, gfp);
else
desc = NULL;
@@ -243,6 +271,14 @@ static inline int virtqueue_add(struct virtqueue *_vq,
virtio_wmb(vq-weak_barriers);
vq-vring.avail-idx++;
vq-num_added++;
+   vq-num_in_use++;
+
+   /* Every vq-vring.num descriptors, decay the maximum value */
+   if (unlikely(avail == 0))
+   vq-max_in_use = 1;
+
+   if (vq-num_in_use  vq-max_in_use)
+   vq-max_in_use = vq-num_in_use;
 
/* This is very unlikely, but theoretically possible.  Kick
 * just in case. */
@@ -515,6 +551,7 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int 
*len)
virtio_mb(vq-weak_barriers);
}
 
+   vq-num_in_use--;
 #ifdef DEBUG
vq-last_add_time_valid = false;
 #endif
@@ -737,6 +774,8 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
vq-last_used_idx = 0;
vq-num_added = 0;
list_add_tail(vq-vq.list, vdev-vqs);
+   vq-num_in_use = 0;
+   vq-max_in_use = 0;
 #ifdef DEBUG
vq-in_use = false;

Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-04 Thread Rusty Russell
Ming Lei  writes:
> On Tue, 02 Sep 2014 10:24:24 -0600
> Jens Axboe  wrote:
>
>> On 09/02/2014 10:21 AM, Christoph Hellwig wrote:
>> > Btw, one thing we should reconsider is where we set
>> > QUEUE_FLAG_NO_SG_MERGE.  At least for virtio-blk it seems to me that
>> > doing the S/G merge should be a lot cheaper than fanning out into the
>> > indirect descriptors.
>
> Indirect is always considered first no matter SG merge is off or on,
> at least from current virtio-blk implementation.
>
> But it is a good idea to try direct descriptor first, the below simple
> change can improve randread(libaio, O_DIRECT, multi-queue) 7% in my test,
> and 77% transfer starts to use direct descriptor, and almost all transfer
> uses indirect descriptor only in current upstream implementation.

Hi Ming!

In general, we want to use direct descriptors of we have plenty
of descriptors, and indirect if the ring is going to fill up.  I was
thinking about this just yesterday, in fact.

I've been trying to use EWMA to figure out how full the ring gets, but
so far it's not working well.  I'm still hacking on a solution though,
and your thoughts would be welcome.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-04 Thread Rusty Russell
Ming Lei ming@canonical.com writes:
 On Tue, 02 Sep 2014 10:24:24 -0600
 Jens Axboe ax...@kernel.dk wrote:

 On 09/02/2014 10:21 AM, Christoph Hellwig wrote:
  Btw, one thing we should reconsider is where we set
  QUEUE_FLAG_NO_SG_MERGE.  At least for virtio-blk it seems to me that
  doing the S/G merge should be a lot cheaper than fanning out into the
  indirect descriptors.

 Indirect is always considered first no matter SG merge is off or on,
 at least from current virtio-blk implementation.

 But it is a good idea to try direct descriptor first, the below simple
 change can improve randread(libaio, O_DIRECT, multi-queue) 7% in my test,
 and 77% transfer starts to use direct descriptor, and almost all transfer
 uses indirect descriptor only in current upstream implementation.

Hi Ming!

In general, we want to use direct descriptors of we have plenty
of descriptors, and indirect if the ring is going to fill up.  I was
thinking about this just yesterday, in fact.

I've been trying to use EWMA to figure out how full the ring gets, but
so far it's not working well.  I'm still hacking on a solution though,
and your thoughts would be welcome.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-03 Thread Ming Lei
On Wed, Sep 3, 2014 at 1:01 AM, Jeff Moyer  wrote:
> Jens Axboe  writes:
>
>> On 09/02/2014 09:02 AM, Ming Lei wrote:
>>> QUEUE_FLAG_NO_SG_MERGE is set at default for blk-mq devices,
>>> so bio->bi_phys_segment computed may be bigger than
>>> queue_max_segments(q) for blk-mq devices, then drivers will
>>> fail to handle the case, for example, BUG_ON() in
>>> virtio_queue_rq() can be triggerd for virtio-blk:
>>>
>>>  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1359146
>>>
>>> This patch fixes the issue by ignoring the QUEUE_FLAG_NO_SG_MERGE
>>> flag if the computed bio->bi_phys_segment is bigger than
>>> queue_max_segments(q), and the regression is caused by commit
>>> 05f1dd53152173(block: add queue flag for disabling SG merging).
>>>
>>> Reported-by: Kick In 
>>> Tested-by: Chris J Arges 
>>> Signed-off-by: Ming Lei 
>>
>> Thanks Ming, this looks nice and clean. Will apply for 3.17.
>
> This sounds an awful lot like the bug I fixed in dm:

Anyway block should respect max segments constraint even though
QUEUE_FLAG_NO_SG_MERGE is set, that is what this patch does.

>
> commit 200612ec33e555a356eebc717630b866ae2b694f
> Author: Jeff Moyer 
> Date:   Fri Aug 8 11:03:41 2014 -0400
>
> dm table: propagate QUEUE_FLAG_NO_SG_MERGE
>
> Commit 05f1dd5 ("block: add queue flag for disabling SG merging")
> introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE.  This gets set by
> default in blk_mq_init_queue for mq-enabled devices.  The effect of
> the flag is to bypass the SG segment merging.  Instead, the
> bio->bi_vcnt is used as the number of hardware segments.
>
> With a device mapper target on top of a device with
> QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments
> than a driver is prepared to handle.  I ran into this when backporting
> the virtio_blk mq support.  It triggerred this BUG_ON, in
> virtio_queue_rq:
>
> BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
>
> ...
>
> Is bcache a stacking driver in this case?  Should it not inherit this
> flag, just as DM now does?

IMO your patch might make things complicated because:

- max segment constraint is HW related thing, so DM/bcache shouldn't
have touched related flag

- it isn't good to let both DM and underlying device do SG merge or not
do SG merge, performance may hurt if both do SG merge.

So why not let driver and block-core handle it? If driver has max segment
constraint requirement, block core just meets its requirement like this patch.

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-03 Thread Ming Lei
On Wed, Sep 3, 2014 at 12:19 PM, Ming Lei  wrote:
> On Tue, 02 Sep 2014 10:24:24 -0600
> Jens Axboe  wrote:
>
>> On 09/02/2014 10:21 AM, Christoph Hellwig wrote:
>> > Btw, one thing we should reconsider is where we set
>> > QUEUE_FLAG_NO_SG_MERGE.  At least for virtio-blk it seems to me that
>> > doing the S/G merge should be a lot cheaper than fanning out into the
>> > indirect descriptors.
>
> Indirect is always considered first no matter SG merge is off or on,
> at least from current virtio-blk implementation.
>
> But it is a good idea to try direct descriptor first, the below simple
> change can improve randread(libaio, O_DIRECT, multi-queue) 7% in my test,
> and 77% transfer starts to use direct descriptor, and almost all transfer
> uses indirect descriptor only in current upstream implementation.
>
> From 8803341503dedab282c15f2801fdb6d7420cea6f Mon Sep 17 00:00:00 2001
> From: Ming Lei 
> Date: Wed, 3 Sep 2014 10:42:32 +0800
> Subject: [PATCH] virtio_ring: use direct descriptor as far as possible
>
> Direct descriptor can avoid one kmalloc() and should have
> better cache utilization, so try to use it if there are enough
> free direct descriptors.
>
> Suggested-by: Christoph Hellwig 
> Signed-off-by: Ming Lei 
> ---
>  drivers/virtio/virtio_ring.c |   10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 4d08f45a..c76b7a4 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -224,9 +224,13 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>
> total_sg = total_in + total_out;
>
> -   /* If the host supports indirect descriptor tables, and we have 
> multiple
> -* buffers, then go indirect. FIXME: tune this threshold */
> -   if (vq->indirect && total_sg > 1 && vq->vq.num_free) {
> +   /*
> +* If the host supports indirect descriptor tables, and we have
> +* multiple buffers, then go indirect only if free descriptors
> +* are less than 16.
> +*/
> +   if (vq->indirect && total_sg > 1 && vq->vq.num_free &&
> +   vq->vq.num_free < 16) {

Oops, the above should have been below:

if (vq->indirect && total_sg > 1 && vq->vq.num_free &&
(vq->vq.num_free < 16 ||
 (vq->vq.num_free < total_sg))) {


Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-03 Thread Ming Lei
On Wed, Sep 3, 2014 at 12:19 PM, Ming Lei ming@canonical.com wrote:
 On Tue, 02 Sep 2014 10:24:24 -0600
 Jens Axboe ax...@kernel.dk wrote:

 On 09/02/2014 10:21 AM, Christoph Hellwig wrote:
  Btw, one thing we should reconsider is where we set
  QUEUE_FLAG_NO_SG_MERGE.  At least for virtio-blk it seems to me that
  doing the S/G merge should be a lot cheaper than fanning out into the
  indirect descriptors.

 Indirect is always considered first no matter SG merge is off or on,
 at least from current virtio-blk implementation.

 But it is a good idea to try direct descriptor first, the below simple
 change can improve randread(libaio, O_DIRECT, multi-queue) 7% in my test,
 and 77% transfer starts to use direct descriptor, and almost all transfer
 uses indirect descriptor only in current upstream implementation.

 From 8803341503dedab282c15f2801fdb6d7420cea6f Mon Sep 17 00:00:00 2001
 From: Ming Lei ming@canonical.com
 Date: Wed, 3 Sep 2014 10:42:32 +0800
 Subject: [PATCH] virtio_ring: use direct descriptor as far as possible

 Direct descriptor can avoid one kmalloc() and should have
 better cache utilization, so try to use it if there are enough
 free direct descriptors.

 Suggested-by: Christoph Hellwig h...@lst.de
 Signed-off-by: Ming Lei ming@canonical.com
 ---
  drivers/virtio/virtio_ring.c |   10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 4d08f45a..c76b7a4 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -224,9 +224,13 @@ static inline int virtqueue_add(struct virtqueue *_vq,

 total_sg = total_in + total_out;

 -   /* If the host supports indirect descriptor tables, and we have 
 multiple
 -* buffers, then go indirect. FIXME: tune this threshold */
 -   if (vq-indirect  total_sg  1  vq-vq.num_free) {
 +   /*
 +* If the host supports indirect descriptor tables, and we have
 +* multiple buffers, then go indirect only if free descriptors
 +* are less than 16.
 +*/
 +   if (vq-indirect  total_sg  1  vq-vq.num_free 
 +   vq-vq.num_free  16) {

Oops, the above should have been below:

if (vq-indirect  total_sg  1  vq-vq.num_free 
(vq-vq.num_free  16 ||
 (vq-vq.num_free  total_sg))) {


Thanks,
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-03 Thread Ming Lei
On Wed, Sep 3, 2014 at 1:01 AM, Jeff Moyer jmo...@redhat.com wrote:
 Jens Axboe ax...@kernel.dk writes:

 On 09/02/2014 09:02 AM, Ming Lei wrote:
 QUEUE_FLAG_NO_SG_MERGE is set at default for blk-mq devices,
 so bio-bi_phys_segment computed may be bigger than
 queue_max_segments(q) for blk-mq devices, then drivers will
 fail to handle the case, for example, BUG_ON() in
 virtio_queue_rq() can be triggerd for virtio-blk:

  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1359146

 This patch fixes the issue by ignoring the QUEUE_FLAG_NO_SG_MERGE
 flag if the computed bio-bi_phys_segment is bigger than
 queue_max_segments(q), and the regression is caused by commit
 05f1dd53152173(block: add queue flag for disabling SG merging).

 Reported-by: Kick In pierre-andre.mo...@canonical.com
 Tested-by: Chris J Arges chris.j.ar...@canonical.com
 Signed-off-by: Ming Lei ming@canonical.com

 Thanks Ming, this looks nice and clean. Will apply for 3.17.

 This sounds an awful lot like the bug I fixed in dm:

Anyway block should respect max segments constraint even though
QUEUE_FLAG_NO_SG_MERGE is set, that is what this patch does.


 commit 200612ec33e555a356eebc717630b866ae2b694f
 Author: Jeff Moyer jmo...@redhat.com
 Date:   Fri Aug 8 11:03:41 2014 -0400

 dm table: propagate QUEUE_FLAG_NO_SG_MERGE

 Commit 05f1dd5 (block: add queue flag for disabling SG merging)
 introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE.  This gets set by
 default in blk_mq_init_queue for mq-enabled devices.  The effect of
 the flag is to bypass the SG segment merging.  Instead, the
 bio-bi_vcnt is used as the number of hardware segments.

 With a device mapper target on top of a device with
 QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments
 than a driver is prepared to handle.  I ran into this when backporting
 the virtio_blk mq support.  It triggerred this BUG_ON, in
 virtio_queue_rq:

 BUG_ON(req-nr_phys_segments + 2  vblk-sg_elems);

 ...

 Is bcache a stacking driver in this case?  Should it not inherit this
 flag, just as DM now does?

IMO your patch might make things complicated because:

- max segment constraint is HW related thing, so DM/bcache shouldn't
have touched related flag

- it isn't good to let both DM and underlying device do SG merge or not
do SG merge, performance may hurt if both do SG merge.

So why not let driver and block-core handle it? If driver has max segment
constraint requirement, block core just meets its requirement like this patch.

Thanks,
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-02 Thread Ming Lei
On Tue, 02 Sep 2014 10:24:24 -0600
Jens Axboe  wrote:

> On 09/02/2014 10:21 AM, Christoph Hellwig wrote:
> > Btw, one thing we should reconsider is where we set
> > QUEUE_FLAG_NO_SG_MERGE.  At least for virtio-blk it seems to me that
> > doing the S/G merge should be a lot cheaper than fanning out into the
> > indirect descriptors.

Indirect is always considered first no matter SG merge is off or on,
at least from current virtio-blk implementation.

But it is a good idea to try direct descriptor first, the below simple
change can improve randread(libaio, O_DIRECT, multi-queue) 7% in my test,
and 77% transfer starts to use direct descriptor, and almost all transfer
uses indirect descriptor only in current upstream implementation.

>From 8803341503dedab282c15f2801fdb6d7420cea6f Mon Sep 17 00:00:00 2001
From: Ming Lei 
Date: Wed, 3 Sep 2014 10:42:32 +0800
Subject: [PATCH] virtio_ring: use direct descriptor as far as possible

Direct descriptor can avoid one kmalloc() and should have
better cache utilization, so try to use it if there are enough
free direct descriptors.

Suggested-by: Christoph Hellwig 
Signed-off-by: Ming Lei 
---
 drivers/virtio/virtio_ring.c |   10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 4d08f45a..c76b7a4 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -224,9 +224,13 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 
total_sg = total_in + total_out;
 
-   /* If the host supports indirect descriptor tables, and we have multiple
-* buffers, then go indirect. FIXME: tune this threshold */
-   if (vq->indirect && total_sg > 1 && vq->vq.num_free) {
+   /*
+* If the host supports indirect descriptor tables, and we have
+* multiple buffers, then go indirect only if free descriptors
+* are less than 16.
+*/
+   if (vq->indirect && total_sg > 1 && vq->vq.num_free &&
+   vq->vq.num_free < 16) {
head = vring_add_indirect(vq, sgs, next, total_sg, total_out,
  total_in,
  out_sgs, in_sgs, gfp);
-- 
1.7.9.5

 

> 
> And we might consider flipping the default, as most would probably like
> to have the sg merging on.
>

With this patch, I think SG merge can be kept as off at default if there is
no extra cost introduced for handling more segments by driver or HW.

If SG merge is changed to on in my above virtio-blk fio test, small throughput
drop can be observed. 


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-02 Thread Jeff Moyer
Jens Axboe  writes:

> On 09/02/2014 09:02 AM, Ming Lei wrote:
>> QUEUE_FLAG_NO_SG_MERGE is set at default for blk-mq devices,
>> so bio->bi_phys_segment computed may be bigger than
>> queue_max_segments(q) for blk-mq devices, then drivers will
>> fail to handle the case, for example, BUG_ON() in
>> virtio_queue_rq() can be triggerd for virtio-blk:
>> 
>>  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1359146
>> 
>> This patch fixes the issue by ignoring the QUEUE_FLAG_NO_SG_MERGE
>> flag if the computed bio->bi_phys_segment is bigger than
>> queue_max_segments(q), and the regression is caused by commit
>> 05f1dd53152173(block: add queue flag for disabling SG merging).
>> 
>> Reported-by: Kick In 
>> Tested-by: Chris J Arges 
>> Signed-off-by: Ming Lei 
>
> Thanks Ming, this looks nice and clean. Will apply for 3.17.

This sounds an awful lot like the bug I fixed in dm:

commit 200612ec33e555a356eebc717630b866ae2b694f
Author: Jeff Moyer 
Date:   Fri Aug 8 11:03:41 2014 -0400

dm table: propagate QUEUE_FLAG_NO_SG_MERGE

Commit 05f1dd5 ("block: add queue flag for disabling SG merging")
introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE.  This gets set by
default in blk_mq_init_queue for mq-enabled devices.  The effect of
the flag is to bypass the SG segment merging.  Instead, the
bio->bi_vcnt is used as the number of hardware segments.

With a device mapper target on top of a device with
QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments
than a driver is prepared to handle.  I ran into this when backporting
the virtio_blk mq support.  It triggerred this BUG_ON, in
virtio_queue_rq:

BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);

...

Is bcache a stacking driver in this case?  Should it not inherit this
flag, just as DM now does?

-Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-02 Thread Jens Axboe
On 09/02/2014 10:21 AM, Christoph Hellwig wrote:
> Btw, one thing we should reconsider is where we set
> QUEUE_FLAG_NO_SG_MERGE.  At least for virtio-blk it seems to me that
> doing the S/G merge should be a lot cheaper than fanning out into the
> indirect descriptors.

And we might consider flipping the default, as most would probably like
to have the sg merging on.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-02 Thread Jens Axboe
On 09/02/2014 09:02 AM, Ming Lei wrote:
> QUEUE_FLAG_NO_SG_MERGE is set at default for blk-mq devices,
> so bio->bi_phys_segment computed may be bigger than
> queue_max_segments(q) for blk-mq devices, then drivers will
> fail to handle the case, for example, BUG_ON() in
> virtio_queue_rq() can be triggerd for virtio-blk:
> 
>   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1359146
> 
> This patch fixes the issue by ignoring the QUEUE_FLAG_NO_SG_MERGE
> flag if the computed bio->bi_phys_segment is bigger than
> queue_max_segments(q), and the regression is caused by commit
> 05f1dd53152173(block: add queue flag for disabling SG merging).
> 
> Reported-by: Kick In 
> Tested-by: Chris J Arges 
> Signed-off-by: Ming Lei 

Thanks Ming, this looks nice and clean. Will apply for 3.17.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-02 Thread Christoph Hellwig
Btw, one thing we should reconsider is where we set
QUEUE_FLAG_NO_SG_MERGE.  At least for virtio-blk it seems to me that
doing the S/G merge should be a lot cheaper than fanning out into the
indirect descriptors.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] blk-merge: fix blk_recount_segments

2014-09-02 Thread Ming Lei
QUEUE_FLAG_NO_SG_MERGE is set at default for blk-mq devices,
so bio->bi_phys_segment computed may be bigger than
queue_max_segments(q) for blk-mq devices, then drivers will
fail to handle the case, for example, BUG_ON() in
virtio_queue_rq() can be triggerd for virtio-blk:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1359146

This patch fixes the issue by ignoring the QUEUE_FLAG_NO_SG_MERGE
flag if the computed bio->bi_phys_segment is bigger than
queue_max_segments(q), and the regression is caused by commit
05f1dd53152173(block: add queue flag for disabling SG merging).

Reported-by: Kick In 
Tested-by: Chris J Arges 
Signed-off-by: Ming Lei 
---
 block/blk-merge.c |   17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 5453583..7788179 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -10,10 +10,11 @@
 #include "blk.h"
 
 static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
-struct bio *bio)
+struct bio *bio,
+bool no_sg_merge)
 {
struct bio_vec bv, bvprv = { NULL };
-   int cluster, high, highprv = 1, no_sg_merge;
+   int cluster, high, highprv = 1;
unsigned int seg_size, nr_phys_segs;
struct bio *fbio, *bbio;
struct bvec_iter iter;
@@ -35,7 +36,6 @@ static unsigned int __blk_recalc_rq_segments(struct 
request_queue *q,
cluster = blk_queue_cluster(q);
seg_size = 0;
nr_phys_segs = 0;
-   no_sg_merge = test_bit(QUEUE_FLAG_NO_SG_MERGE, >queue_flags);
high = 0;
for_each_bio(bio) {
bio_for_each_segment(bv, bio, iter) {
@@ -88,18 +88,23 @@ new_segment:
 
 void blk_recalc_rq_segments(struct request *rq)
 {
-   rq->nr_phys_segments = __blk_recalc_rq_segments(rq->q, rq->bio);
+   bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE,
+   >q->queue_flags);
+
+   rq->nr_phys_segments = __blk_recalc_rq_segments(rq->q, rq->bio,
+   no_sg_merge);
 }
 
 void blk_recount_segments(struct request_queue *q, struct bio *bio)
 {
-   if (test_bit(QUEUE_FLAG_NO_SG_MERGE, >queue_flags))
+   if (test_bit(QUEUE_FLAG_NO_SG_MERGE, >queue_flags) &&
+   bio->bi_vcnt < queue_max_segments(q))
bio->bi_phys_segments = bio->bi_vcnt;
else {
struct bio *nxt = bio->bi_next;
 
bio->bi_next = NULL;
-   bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio);
+   bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio, false);
bio->bi_next = nxt;
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] blk-merge: fix blk_recount_segments

2014-09-02 Thread Ming Lei
QUEUE_FLAG_NO_SG_MERGE is set at default for blk-mq devices,
so bio-bi_phys_segment computed may be bigger than
queue_max_segments(q) for blk-mq devices, then drivers will
fail to handle the case, for example, BUG_ON() in
virtio_queue_rq() can be triggerd for virtio-blk:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1359146

This patch fixes the issue by ignoring the QUEUE_FLAG_NO_SG_MERGE
flag if the computed bio-bi_phys_segment is bigger than
queue_max_segments(q), and the regression is caused by commit
05f1dd53152173(block: add queue flag for disabling SG merging).

Reported-by: Kick In pierre-andre.mo...@canonical.com
Tested-by: Chris J Arges chris.j.ar...@canonical.com
Signed-off-by: Ming Lei ming@canonical.com
---
 block/blk-merge.c |   17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 5453583..7788179 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -10,10 +10,11 @@
 #include blk.h
 
 static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
-struct bio *bio)
+struct bio *bio,
+bool no_sg_merge)
 {
struct bio_vec bv, bvprv = { NULL };
-   int cluster, high, highprv = 1, no_sg_merge;
+   int cluster, high, highprv = 1;
unsigned int seg_size, nr_phys_segs;
struct bio *fbio, *bbio;
struct bvec_iter iter;
@@ -35,7 +36,6 @@ static unsigned int __blk_recalc_rq_segments(struct 
request_queue *q,
cluster = blk_queue_cluster(q);
seg_size = 0;
nr_phys_segs = 0;
-   no_sg_merge = test_bit(QUEUE_FLAG_NO_SG_MERGE, q-queue_flags);
high = 0;
for_each_bio(bio) {
bio_for_each_segment(bv, bio, iter) {
@@ -88,18 +88,23 @@ new_segment:
 
 void blk_recalc_rq_segments(struct request *rq)
 {
-   rq-nr_phys_segments = __blk_recalc_rq_segments(rq-q, rq-bio);
+   bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE,
+   rq-q-queue_flags);
+
+   rq-nr_phys_segments = __blk_recalc_rq_segments(rq-q, rq-bio,
+   no_sg_merge);
 }
 
 void blk_recount_segments(struct request_queue *q, struct bio *bio)
 {
-   if (test_bit(QUEUE_FLAG_NO_SG_MERGE, q-queue_flags))
+   if (test_bit(QUEUE_FLAG_NO_SG_MERGE, q-queue_flags) 
+   bio-bi_vcnt  queue_max_segments(q))
bio-bi_phys_segments = bio-bi_vcnt;
else {
struct bio *nxt = bio-bi_next;
 
bio-bi_next = NULL;
-   bio-bi_phys_segments = __blk_recalc_rq_segments(q, bio);
+   bio-bi_phys_segments = __blk_recalc_rq_segments(q, bio, false);
bio-bi_next = nxt;
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-02 Thread Christoph Hellwig
Btw, one thing we should reconsider is where we set
QUEUE_FLAG_NO_SG_MERGE.  At least for virtio-blk it seems to me that
doing the S/G merge should be a lot cheaper than fanning out into the
indirect descriptors.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-02 Thread Jens Axboe
On 09/02/2014 09:02 AM, Ming Lei wrote:
 QUEUE_FLAG_NO_SG_MERGE is set at default for blk-mq devices,
 so bio-bi_phys_segment computed may be bigger than
 queue_max_segments(q) for blk-mq devices, then drivers will
 fail to handle the case, for example, BUG_ON() in
 virtio_queue_rq() can be triggerd for virtio-blk:
 
   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1359146
 
 This patch fixes the issue by ignoring the QUEUE_FLAG_NO_SG_MERGE
 flag if the computed bio-bi_phys_segment is bigger than
 queue_max_segments(q), and the regression is caused by commit
 05f1dd53152173(block: add queue flag for disabling SG merging).
 
 Reported-by: Kick In pierre-andre.mo...@canonical.com
 Tested-by: Chris J Arges chris.j.ar...@canonical.com
 Signed-off-by: Ming Lei ming@canonical.com

Thanks Ming, this looks nice and clean. Will apply for 3.17.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-02 Thread Jens Axboe
On 09/02/2014 10:21 AM, Christoph Hellwig wrote:
 Btw, one thing we should reconsider is where we set
 QUEUE_FLAG_NO_SG_MERGE.  At least for virtio-blk it seems to me that
 doing the S/G merge should be a lot cheaper than fanning out into the
 indirect descriptors.

And we might consider flipping the default, as most would probably like
to have the sg merging on.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-02 Thread Jeff Moyer
Jens Axboe ax...@kernel.dk writes:

 On 09/02/2014 09:02 AM, Ming Lei wrote:
 QUEUE_FLAG_NO_SG_MERGE is set at default for blk-mq devices,
 so bio-bi_phys_segment computed may be bigger than
 queue_max_segments(q) for blk-mq devices, then drivers will
 fail to handle the case, for example, BUG_ON() in
 virtio_queue_rq() can be triggerd for virtio-blk:
 
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1359146
 
 This patch fixes the issue by ignoring the QUEUE_FLAG_NO_SG_MERGE
 flag if the computed bio-bi_phys_segment is bigger than
 queue_max_segments(q), and the regression is caused by commit
 05f1dd53152173(block: add queue flag for disabling SG merging).
 
 Reported-by: Kick In pierre-andre.mo...@canonical.com
 Tested-by: Chris J Arges chris.j.ar...@canonical.com
 Signed-off-by: Ming Lei ming@canonical.com

 Thanks Ming, this looks nice and clean. Will apply for 3.17.

This sounds an awful lot like the bug I fixed in dm:

commit 200612ec33e555a356eebc717630b866ae2b694f
Author: Jeff Moyer jmo...@redhat.com
Date:   Fri Aug 8 11:03:41 2014 -0400

dm table: propagate QUEUE_FLAG_NO_SG_MERGE

Commit 05f1dd5 (block: add queue flag for disabling SG merging)
introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE.  This gets set by
default in blk_mq_init_queue for mq-enabled devices.  The effect of
the flag is to bypass the SG segment merging.  Instead, the
bio-bi_vcnt is used as the number of hardware segments.

With a device mapper target on top of a device with
QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments
than a driver is prepared to handle.  I ran into this when backporting
the virtio_blk mq support.  It triggerred this BUG_ON, in
virtio_queue_rq:

BUG_ON(req-nr_phys_segments + 2  vblk-sg_elems);

...

Is bcache a stacking driver in this case?  Should it not inherit this
flag, just as DM now does?

-Jeff
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-merge: fix blk_recount_segments

2014-09-02 Thread Ming Lei
On Tue, 02 Sep 2014 10:24:24 -0600
Jens Axboe ax...@kernel.dk wrote:

 On 09/02/2014 10:21 AM, Christoph Hellwig wrote:
  Btw, one thing we should reconsider is where we set
  QUEUE_FLAG_NO_SG_MERGE.  At least for virtio-blk it seems to me that
  doing the S/G merge should be a lot cheaper than fanning out into the
  indirect descriptors.

Indirect is always considered first no matter SG merge is off or on,
at least from current virtio-blk implementation.

But it is a good idea to try direct descriptor first, the below simple
change can improve randread(libaio, O_DIRECT, multi-queue) 7% in my test,
and 77% transfer starts to use direct descriptor, and almost all transfer
uses indirect descriptor only in current upstream implementation.

From 8803341503dedab282c15f2801fdb6d7420cea6f Mon Sep 17 00:00:00 2001
From: Ming Lei ming@canonical.com
Date: Wed, 3 Sep 2014 10:42:32 +0800
Subject: [PATCH] virtio_ring: use direct descriptor as far as possible

Direct descriptor can avoid one kmalloc() and should have
better cache utilization, so try to use it if there are enough
free direct descriptors.

Suggested-by: Christoph Hellwig h...@lst.de
Signed-off-by: Ming Lei ming@canonical.com
---
 drivers/virtio/virtio_ring.c |   10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 4d08f45a..c76b7a4 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -224,9 +224,13 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 
total_sg = total_in + total_out;
 
-   /* If the host supports indirect descriptor tables, and we have multiple
-* buffers, then go indirect. FIXME: tune this threshold */
-   if (vq-indirect  total_sg  1  vq-vq.num_free) {
+   /*
+* If the host supports indirect descriptor tables, and we have
+* multiple buffers, then go indirect only if free descriptors
+* are less than 16.
+*/
+   if (vq-indirect  total_sg  1  vq-vq.num_free 
+   vq-vq.num_free  16) {
head = vring_add_indirect(vq, sgs, next, total_sg, total_out,
  total_in,
  out_sgs, in_sgs, gfp);
-- 
1.7.9.5

 

 
 And we might consider flipping the default, as most would probably like
 to have the sg merging on.


With this patch, I think SG merge can be kept as off at default if there is
no extra cost introduced for handling more segments by driver or HW.

If SG merge is changed to on in my above virtio-blk fio test, small throughput
drop can be observed. 


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/