Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver

2014-08-30 Thread Richard W.M. Jones
On Tue, Jul 01, 2014 at 12:11:14PM -0700, Andy Grover wrote:
 Describes the driver and its interface to make it possible for user
 programs to back a LIO-exported LUN.
 
 Signed-off-by: Andy Grover agro...@redhat.com
 ---
  Documentation/target/tcmu-design.txt | 210 
 +++
  1 file changed, 210 insertions(+)
  create mode 100644 Documentation/target/tcmu-design.txt
 
 diff --git a/Documentation/target/tcmu-design.txt 
 b/Documentation/target/tcmu-design.txt
 new file mode 100644
 index 000..200ff3e
 --- /dev/null
 +++ b/Documentation/target/tcmu-design.txt
 @@ -0,0 +1,210 @@
 +TCM Userspace Design
 +
 +
 +
 +Background:
 +
 +In addition to modularizing the transport protocol used for carrying
 +SCSI commands (fabrics), the Linux kernel target, LIO, also modularizes
 +the actual data storage as well. These are referred to as backstores
 +or storage engines. The target comes with backstores that allow a
 +file, a block device, RAM, or another SCSI device to be used for the
 +local storage needed for the exported SCSI LUN. Like the rest of LIO,
 +these are implemented entirely as kernel code.
 +
 +These backstores cover the most common use cases, but not all. One new
 +use case that other non-kernel target solutions, such as tgt, are able
 +to support is using Gluster's GLFS or Ceph's RBD as a backstore. The
 +target then serves as a translator, allowing initiators to store data
 +in these non-traditional networked storage systems, while still only
 +using standard protocols themselves.
 +
 +If the target is a userspace process, supporting these is easy. tgt,
 +for example, needs only a small adapter module for each, because the
 +modules just use the available userspace libraries for RBD and GLFS.
 +
 +Adding support for these backstores in LIO is considerably more
 +difficult, because LIO is entirely kernel code. Instead of undertaking
 +the significant work to port the GLFS or RBD APIs and protocols to the
 +kernel, another approach is to create a userspace pass-through
 +backstore for LIO, TCMU.

It has to be said that this documentation is terrible.

Jumping in medias res[1] is great for fiction, awful for technical
documentation.

I would recommend the Economist Style Guide[2].  They always say
Barak Obama, President of the United States the first time he is
mentioned in an article, even though almost everyone knows who Barak
Obama is.

In this case you're leaping into something .. fabrics, LIO,
backstores, target solutions, ... aargh.  Explain what you mean by
each term and how it all fits together.

Thanks,
Rich.

[1] https://en.wikipedia.org/wiki/In_medias_res

[2] http://www.economist.com/styleguide/introduction

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver

2014-08-31 Thread Richard W.M. Jones
On Sun, Aug 31, 2014 at 12:49:26PM -0700, Andy Grover wrote:
 Thanks for the feedback. I am undoubtedly too close to the details,
 because I thought I *was* explaining things :)

Yeah, sorry it came across as a bit harsh.

Benoit did explain it to me so I understood it in the end (I think!)

 This doc is for people like you -- tech-savvy but unfamiliar with
 this specific area. Would you be so kind as to point out exactly the
 terms this document should explain? Should it explain SCSI and SCSI
 commands? What a SCSI target is? Say target implementations rather
 than target solutions? Do I need some ASCII art?

So I can only speak for myself here, but I'm pretty familiar with
iSCSI, using it, and some of the internals -- in fact I'm using the
Linux kernel target daily.

 TCM Userspace Design
 In addition to modularizing the transport protocol used for carrying
 SCSI commands (fabrics), the Linux kernel target, LIO, also
 modularizes the actual data storage as well.  These are referred to
 as backstores or storage engines.

Reading this several times, I now think I get what it's trying to say,
but I think it needs to introduces the terms (as the Economist style
does).  Something like this:

  TCM is the new name for LIO, an in-kernel iSCSI target (server).
  Existing TCM targets run in the kernel.  TCMU (TCM in Userspace)
  allows userspace programs to be written which act as iSCSI targets.
  This document describes the design.

  The existing kernel provides modules for different SCSI transport
  protocols.  TCM also modularizes the data storage.  There are
  existing modules for file, block device, RAM or using another SCSI
  device as storage.  These are called backstores or storage
  engines.  These built-in modules are implemented entirely as kernel
  code.

And hopefully having defined a bit of background, the rest of the
document just flows nicely:

 These backstores cover the most common use cases, but not all. One new
 use case that other non-kernel target solutions, such as tgt, are able
 to support is using Gluster's GLFS or Ceph's RBD as a backstore. The
 target then serves as a translator, allowing initiators to store data
 in these non-traditional networked storage systems, while still only
 using standard protocols themselves.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [SCSI] virtio-scsi: Initialize scatterlist structure.

2012-10-01 Thread Richard W.M. Jones
On Mon, Oct 01, 2012 at 03:13:01PM +0200, Paolo Bonzini wrote:
 Il 20/08/2012 16:05, Paolo Bonzini ha scritto:
  Il 20/08/2012 16:04, Richard W.M. Jones ha scritto:
  From: Richard W.M. Jones rjo...@redhat.com
 
  The sg struct is used without being initialized.
 
  https://bugzilla.redhat.com/show_bug.cgi?id=847548
 
  Signed-off-by: Richard W.M. Jones rjo...@redhat.com
  ---
   drivers/scsi/virtio_scsi.c |2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
 
  diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
  index c7030fb..8a66f83 100644
  --- a/drivers/scsi/virtio_scsi.c
  +++ b/drivers/scsi/virtio_scsi.c
  @@ -219,7 +219,7 @@ static int virtscsi_kick_event(struct virtio_scsi 
  *vscsi,
 struct scatterlist sg;
 unsigned long flags;
   
  -  sg_set_buf(sg, event_node-event, sizeof(struct virtio_scsi_event));
  +  sg_init_one(sg, event_node-event, sizeof(struct virtio_scsi_event));
   
 spin_lock_irqsave(vscsi-event_vq.vq_lock, flags);
   
 
  
  Acked-by: Paolo Bonzini pbonz...@redhat.com
  
  Paolo
  
 
 Ping, is this patch going into 3.7?

I was a bit surprised to see this wasn't upstream since it's vital for
using virtio-scsi reliably (as well as being a one line fix for a
blatant bug).

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [SCSI] virtio-scsi: Initialize scatterlist structure.

2012-08-20 Thread Richard W.M. Jones
From: Richard W.M. Jones rjo...@redhat.com

The sg struct is used without being initialized.

https://bugzilla.redhat.com/show_bug.cgi?id=847548

Signed-off-by: Richard W.M. Jones rjo...@redhat.com
---
 drivers/scsi/virtio_scsi.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index c7030fb..8a66f83 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -219,7 +219,7 @@ static int virtscsi_kick_event(struct virtio_scsi *vscsi,
struct scatterlist sg;
unsigned long flags;
 
-   sg_set_buf(sg, event_node-event, sizeof(struct virtio_scsi_event));
+   sg_init_one(sg, event_node-event, sizeof(struct virtio_scsi_event));
 
spin_lock_irqsave(vscsi-event_vq.vq_lock, flags);
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Increased memory usage with scsi-mq

2017-08-10 Thread Richard W.M. Jones
OK this is looking a bit better now.

  With scsi-mq enabled:   175 disks
  virtqueue_size=64:  318 disks *
  virtqueue_size=16:  775 disks *
  With scsi-mq disabled: 1755 disks
* = new results

I also ran the whole libguestfs test suite with virtqueue_size=16
(with no failures shown).  As this tests many different disk I/O
operations, it gives me some confidence that things generally work.

Do you have any other comments about the patches?  I'm not sure I know
enough to write an intelligent commit message for the kernel patch.

Rich.


--- kernel patch ---

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 9be211d68b15..d6b4ff634c0d 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -972,6 +972,8 @@ static int virtscsi_probe(struct virtio_device *vdev)
if (err)
goto virtscsi_init_failed;
 
+   shost->can_queue = virtqueue_get_vring_size(vscsi->req_vqs[0].vq);
+
cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1;
shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);
shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0x;
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5e1b548828e6..2d7509da9f39 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -296,7 +296,6 @@ static inline int virtqueue_add(struct virtqueue *_vq,
}
 #endif
 
-   BUG_ON(total_sg > vq->vring.num);
BUG_ON(total_sg == 0);
 
head = vq->free_head;
@@ -305,8 +304,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 * buffers, then go indirect. FIXME: tune this threshold */
if (vq->indirect && total_sg > 1 && vq->vq.num_free)
desc = alloc_indirect(_vq, total_sg, gfp);
-   else
+   else {
desc = NULL;
+   WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
+}
 
if (desc) {
/* Use a single buffer which doesn't continue */

--- qemu patch ---

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index eb639442d1..aadd99aad1 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -867,10 +867,10 @@ void virtio_scsi_common_realize(DeviceState *dev,
 s->sense_size = VIRTIO_SCSI_SENSE_DEFAULT_SIZE;
 s->cdb_size = VIRTIO_SCSI_CDB_DEFAULT_SIZE;
 
-s->ctrl_vq = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE, ctrl);
-s->event_vq = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE, evt);
+s->ctrl_vq = virtio_add_queue(vdev, s->conf.virtqueue_size, ctrl);
+s->event_vq = virtio_add_queue(vdev, s->conf.virtqueue_size, evt);
 for (i = 0; i < s->conf.num_queues; i++) {
-s->cmd_vqs[i] = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE, cmd);
+s->cmd_vqs[i] = virtio_add_queue(vdev, s->conf.virtqueue_size, cmd);
 }
 }
 
@@ -917,6 +917,7 @@ static void virtio_scsi_device_unrealize(DeviceState *dev, 
Error **errp)
 
 static Property virtio_scsi_properties[] = {
 DEFINE_PROP_UINT32("num_queues", VirtIOSCSI, parent_obj.conf.num_queues, 
1),
+DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSI, 
parent_obj.conf.virtqueue_size, 128),
 DEFINE_PROP_UINT32("max_sectors", VirtIOSCSI, parent_obj.conf.max_sectors,
   0x),
 DEFINE_PROP_UINT32("cmd_per_lun", VirtIOSCSI, parent_obj.conf.cmd_per_lun,
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index de6ae5a9f6..e30a92d3e7 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -48,6 +48,7 @@ typedef struct virtio_scsi_config VirtIOSCSIConfig;
 
 struct VirtIOSCSIConf {
 uint32_t num_queues;
+uint32_t virtqueue_size;
 uint32_t max_sectors;
 uint32_t cmd_per_lun;
 #ifdef CONFIG_VHOST_SCSI



-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top


Re: Increased memory usage with scsi-mq

2017-08-10 Thread Richard W.M. Jones
On Thu, Aug 10, 2017 at 02:53:58PM +0200, Paolo Bonzini wrote:
> can_queue and cmd_per_lun are different.  can_queue should be set to the
> value of vq->vring.num where vq is the command virtqueue (the first one
> is okay if there's >1).
> 
> If you want to change it, you'll have to do so in QEMU.

Here are a couple more patches I came up with, the first to Linux,
the second to qemu.

There are a few problems ...

(1) Setting virtqueue_size to < 128 causes a BUG_ON failure in
virtio_ring.c:virtqueue_add at:

BUG_ON(total_sg > vq->vring.num);

I initially thought that I should also set cmd_per_lun to the same
value, but that doesn't help.  Is there some relationship between
virtqueue_size and other settings?

(2) Can/should the ctrl, event and cmd queue sizes be set to the same
values?  Or should ctrl & event be left at 128?

(3) It seems like it might be a problem that virtqueue_size is not
passed through the virtio_scsi_conf struct (which is ABI between the
kernel and qemu AFAICT and so not easily modifiable).  However I think
this is not a problem because virtqueue size is stored in the
Virtqueue Descriptor table, which is how the kernel gets the value.
Is that right?

Rich.


--- kernel patch ---

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 9be211d68b15..d6b4ff634c0d 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -972,6 +972,8 @@ static int virtscsi_probe(struct virtio_device *vdev)
if (err)
goto virtscsi_init_failed;
 
+   shost->can_queue = virtqueue_get_vring_size(vscsi->req_vqs[0].vq);
+
cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1;
shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);
shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0x;


--- qemu patch ---

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index eb639442d1..aadd99aad1 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -867,10 +867,10 @@ void virtio_scsi_common_realize(DeviceState *dev,
 s->sense_size = VIRTIO_SCSI_SENSE_DEFAULT_SIZE;
 s->cdb_size = VIRTIO_SCSI_CDB_DEFAULT_SIZE;
 
-s->ctrl_vq = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE, ctrl);
-s->event_vq = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE, evt);
+s->ctrl_vq = virtio_add_queue(vdev, s->conf.virtqueue_size, ctrl);
+s->event_vq = virtio_add_queue(vdev, s->conf.virtqueue_size, evt);
 for (i = 0; i < s->conf.num_queues; i++) {
-s->cmd_vqs[i] = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE, cmd);
+s->cmd_vqs[i] = virtio_add_queue(vdev, s->conf.virtqueue_size, cmd);
 }
 }
 
@@ -917,6 +917,7 @@ static void virtio_scsi_device_unrealize(DeviceState *dev, 
Error **errp)
 
 static Property virtio_scsi_properties[] = {
 DEFINE_PROP_UINT32("num_queues", VirtIOSCSI, parent_obj.conf.num_queues, 
1),
+DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSI, 
parent_obj.conf.virtqueue_size, 128),
 DEFINE_PROP_UINT32("max_sectors", VirtIOSCSI, parent_obj.conf.max_sectors,
   0x),
 DEFINE_PROP_UINT32("cmd_per_lun", VirtIOSCSI, parent_obj.conf.cmd_per_lun,
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index de6ae5a9f6..e30a92d3e7 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -48,6 +48,7 @@ typedef struct virtio_scsi_config VirtIOSCSIConfig;
 
 struct VirtIOSCSIConf {
 uint32_t num_queues;
+uint32_t virtqueue_size;
 uint32_t max_sectors;
 uint32_t cmd_per_lun;
 #ifdef CONFIG_VHOST_SCSI


-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html


[PATCH 2/2] virtio: virtio_scsi: Set can_queue to the length of the virtqueue.

2017-08-10 Thread Richard W.M. Jones
Since switching to blk-mq as the default in commit 5c279bd9e406
("scsi: default to scsi-mq"), virtio-scsi LUNs consume about 10x as
much kernel memory.

qemu currently allocates a fixed 128 entry virtqueue.  can_queue
currently is set to 1024.  But with indirect descriptors, each command
in the queue takes 1 virtqueue entry, so the number of commands which
can be queued is equal to the length of the virtqueue.

Note I intend to send a patch to qemu to allow the virtqueue size to
be configured from the qemu command line.

Thanks Paolo Bonzini, Christoph Hellwig.

Signed-off-by: Richard W.M. Jones <rjo...@redhat.com>
---
 drivers/scsi/virtio_scsi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 9be211d68b15..d6b4ff634c0d 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -972,6 +972,8 @@ static int virtscsi_probe(struct virtio_device *vdev)
if (err)
goto virtscsi_init_failed;
 
+   shost->can_queue = virtqueue_get_vring_size(vscsi->req_vqs[0].vq);
+
cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1;
shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);
shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0x;
-- 
2.13.1



[PATCH v2 0/2] virtio_scsi: Set can_queue based on size of virtqueue.

2017-08-10 Thread Richard W.M. Jones
v1 was here:

  https://lkml.org/lkml/2017/8/10/689

v1 -> v2:

  Remove .can_queue field from the templates.

Rich.



Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.

2017-08-10 Thread Richard W.M. Jones
On Fri, Aug 11, 2017 at 12:21:16AM +0300, Michael S. Tsirkin wrote:
> On Thu, Aug 10, 2017 at 05:40:34PM +0100, Richard W.M. Jones wrote:
> > If using indirect descriptors, you can make the total_sg as large as
> > you want.
> 
> That would be a spec violation though, even if it happens to
> work on current QEMU.
> 
> The spec says:
>   A driver MUST NOT create a descriptor chain longer than the Queue Size 
> of the device.
> 
> What prompted this patch?
> Do we ever encounter this situation?

This patch is needed because the following (2/2) patch will trigger
that BUG_ON if I set virtqueue_size=64 or any smaller value.

The precise backtrace is below.

Rich.

[4.029510] [ cut here ]
[4.030127] kernel BUG at drivers/virtio/virtio_ring.c:299!
[4.030834] invalid opcode:  [#1] SMP
[4.031340] Modules linked in: libcrc32c crc8 crc7 crc4 crc_itu_t virtio_pci 
virtio_mmio virtio_input virtio_balloon virtio_scsi nd_pmem nd_btt virtio_net 
virtio_crypto crypto_engine virtio_console virtio_rng virtio_blk virtio_ring 
virtio nfit crc32_generic crct10dif_pclmul crc32c_intel crc32_pclmul
[4.034606] CPU: 0 PID: 1 Comm: init Not tainted 4.13.0-rc4+ #100
[4.035354] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[4.036770] task: 9a859e243300 task.stack: a731c00cc000
[4.037506] RIP: 0010:virtqueue_add_sgs+0x23d/0x460 [virtio_ring]
[4.038250] RSP: :a731c00cf6e0 EFLAGS: 00010097
[4.038898] RAX:  RBX: 0011 RCX: dd0680646c40
[4.039762] RDX:  RSI: a731c00cf7b8 RDI: 9a85945c4d68
[4.040634] RBP: a731c00cf748 R08: 9a85945c4a78 R09: 01080020
[4.041508] R10: a731c00cf788 R11: 9a859b3d3120 R12: a731c00cf7d0
[4.042382] R13: a731c00cf7d0 R14: 0001 R15: 9a859b3f8200
[4.043248] FS:  () GS:9a859e60() 
knlGS:
[4.044232] CS:  0010 DS:  ES:  CR0: 80050033
[4.044942] CR2: 7fcff02e931c CR3: 1d23b000 CR4: 003406f0
[4.045815] DR0:  DR1:  DR2: 
[4.046684] DR3:  DR6: fffe0ff0 DR7: 0400
[4.047559] Call Trace:
[4.047876]  virtscsi_add_cmd+0x1c9/0x280 [virtio_scsi]
[4.048528]  virtscsi_kick_cmd+0x38/0x90 [virtio_scsi]
[4.049161]  virtscsi_queuecommand+0x104/0x280 [virtio_scsi]
[4.049875]  virtscsi_queuecommand_single+0x38/0x40 [virtio_scsi]
[4.050628]  scsi_dispatch_cmd+0xf9/0x390
[4.051128]  scsi_queue_rq+0x5e5/0x6f0
[4.051602]  blk_mq_dispatch_rq_list+0x1ff/0x3c0
[4.052175]  blk_mq_sched_dispatch_requests+0x199/0x1f0
[4.052824]  __blk_mq_run_hw_queue+0x11d/0x1b0
[4.053382]  __blk_mq_delay_run_hw_queue+0x8d/0xa0
[4.053972]  blk_mq_run_hw_queue+0x14/0x20
[4.054485]  blk_mq_sched_insert_requests+0x96/0x120
[4.055095]  blk_mq_flush_plug_list+0x19d/0x410
[4.055661]  blk_flush_plug_list+0xf9/0x2b0
[4.056182]  blk_finish_plug+0x2c/0x40
[4.056655]  __do_page_cache_readahead+0x32e/0x400
[4.057261]  filemap_fault+0x2fb/0x890
[4.057731]  ? filemap_fault+0x2fb/0x890
[4.058220]  ? find_held_lock+0x3c/0xb0
[4.058714]  ext4_filemap_fault+0x34/0x50
[4.059212]  __do_fault+0x1e/0x110
[4.059644]  __handle_mm_fault+0x6b2/0x1080
[4.060167]  handle_mm_fault+0x178/0x350
[4.060662]  __do_page_fault+0x26e/0x510
[4.061152]  trace_do_page_fault+0x9d/0x290
[4.061677]  do_async_page_fault+0x51/0xa0
[4.062189]  async_page_fault+0x28/0x30
[4.062667] RIP: 0033:0x7fcff030a24f
[4.063113] RSP: 002b:7ffefc2ad078 EFLAGS: 00010206
[4.063760] RAX: 7fcff02e931c RBX: 7fcff050f660 RCX: 7fcff02e935c
[4.064648] RDX: 0664 RSI:  RDI: 7fcff02e931c
[4.065519] RBP: 7ffefc2ad320 R08: 7fcff02e931c R09: 00027000
[4.066392] R10: 7fcff02e9980 R11: 0206 R12: 7ffefc2ad0b0
[4.067263] R13: 7ffefc2ad408 R14: 0002 R15: 87f0
[4.068135] Code: af 01 c7 45 c8 01 00 00 00 45 31 ed e9 9b fe ff ff 31 db 
48 83 7d d0 00 0f 85 3f fe ff ff 0f 0b 48 8b 7d b8 e8 e5 fd 22 de eb 8b <0f> 0b 
0f 0b 44 89 6d a8 45 89 f5 48 8b 45 a0 44 8d 70 01 48 83 
[4.070506] RIP: virtqueue_add_sgs+0x23d/0x460 [virtio_ring] RSP: 
a731c00cf6e0
[4.071434] ---[ end trace 02532659840e2a64 ]---


-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org


Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.

2017-08-10 Thread Richard W.M. Jones
On Fri, Aug 11, 2017 at 12:31:44AM +0300, Michael S. Tsirkin wrote:
> Then we probably should fail probe if vq size is too small.

What does this mean?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html


Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.

2017-08-10 Thread Richard W.M. Jones
On Fri, Aug 11, 2017 at 12:41:47AM +0300, Michael S. Tsirkin wrote:
> On Thu, Aug 10, 2017 at 10:35:11PM +0100, Richard W.M. Jones wrote:
> > On Fri, Aug 11, 2017 at 12:31:44AM +0300, Michael S. Tsirkin wrote:
> > > Then we probably should fail probe if vq size is too small.
> > 
> > What does this mean?
> > 
> > Rich.
> 
> We must prevent driver from submitting s/g lists > vq size to device.

Fair point.  I would have thought (not knowing anything about how this
works in the kernel) that the driver should be able to split up
scatter-gather lists if they are larger than what the driver supports?

Anyway the commit message is derived from what Paolo told me on IRC,
so let's see what he says.

Rich.

> Either tell linux to avoid s/g lists that are too long, or
> simply fail request if this happens, or refuse to attach driver to device.
> 
> Later option would look something like this within probe:
> 
> for (i = VIRTIO_SCSI_VQ_BASE; i < num_vqs; i++)
>   if (vqs[i]->num < MAX_SG_USED_BY_LINUX)
>   goto err;
> 
> 
> I don't know what's MAX_SG_USED_BY_LINUX though.
> 
> > -- 
> > Richard Jones, Virtualization Group, Red Hat 
> > http://people.redhat.com/~rjones
> > Read my programming and virtualization blog: http://rwmj.wordpress.com
> > virt-builder quickly builds VMs from scratch
> > http://libguestfs.org/virt-builder.1.html

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html


[PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.

2017-08-10 Thread Richard W.M. Jones
If using indirect descriptors, you can make the total_sg as large as
you want.  If not, BUG is too serious because the function later
returns -ENOSPC.

Thanks Paolo Bonzini, Christoph Hellwig.

Signed-off-by: Richard W.M. Jones <rjo...@redhat.com>
---
 drivers/virtio/virtio_ring.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5e1b548828e6..27cbc1eab868 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -296,7 +296,6 @@ static inline int virtqueue_add(struct virtqueue *_vq,
}
 #endif
 
-   BUG_ON(total_sg > vq->vring.num);
BUG_ON(total_sg == 0);
 
head = vq->free_head;
@@ -305,8 +304,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 * buffers, then go indirect. FIXME: tune this threshold */
if (vq->indirect && total_sg > 1 && vq->vq.num_free)
desc = alloc_indirect(_vq, total_sg, gfp);
-   else
+   else {
desc = NULL;
+   WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
+   }
 
if (desc) {
/* Use a single buffer which doesn't continue */
-- 
2.13.1



[PATCH 0/2] virtio_scsi: Set can_queue based on size of virtqueue.

2017-08-10 Thread Richard W.M. Jones
Earlier discussion:
https://lkml.org/lkml/2017/8/4/601
"Increased memory usage with scsi-mq"

Downstream bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1478201





[PATCH v2 2/2] virtio: virtio_scsi: Set can_queue to the length of the virtqueue.

2017-08-10 Thread Richard W.M. Jones
Since switching to blk-mq as the default in commit 5c279bd9e406
("scsi: default to scsi-mq"), virtio-scsi LUNs consume about 10x as
much kernel memory.

qemu currently allocates a fixed 128 entry virtqueue.  can_queue
currently is set to 1024.  But with indirect descriptors, each command
in the queue takes 1 virtqueue entry, so the number of commands which
can be queued is equal to the length of the virtqueue.

Note I intend to send a patch to qemu to allow the virtqueue size to
be configured from the qemu command line.

Thanks Paolo Bonzini, Christoph Hellwig.

Signed-off-by: Richard W.M. Jones <rjo...@redhat.com>
---
 drivers/scsi/virtio_scsi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 9be211d68b15..7c28e8d4955a 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -818,7 +818,6 @@ static struct scsi_host_template 
virtscsi_host_template_single = {
.eh_timed_out = virtscsi_eh_timed_out,
.slave_alloc = virtscsi_device_alloc,
 
-   .can_queue = 1024,
.dma_boundary = UINT_MAX,
.use_clustering = ENABLE_CLUSTERING,
.target_alloc = virtscsi_target_alloc,
@@ -839,7 +838,6 @@ static struct scsi_host_template 
virtscsi_host_template_multi = {
.eh_timed_out = virtscsi_eh_timed_out,
.slave_alloc = virtscsi_device_alloc,
 
-   .can_queue = 1024,
.dma_boundary = UINT_MAX,
.use_clustering = ENABLE_CLUSTERING,
.target_alloc = virtscsi_target_alloc,
@@ -972,6 +970,8 @@ static int virtscsi_probe(struct virtio_device *vdev)
if (err)
goto virtscsi_init_failed;
 
+   shost->can_queue = virtqueue_get_vring_size(vscsi->req_vqs[0].vq);
+
cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1;
shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);
shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0x;
-- 
2.13.1



[PATCH v2 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.

2017-08-10 Thread Richard W.M. Jones
If using indirect descriptors, you can make the total_sg as large as
you want.  If not, BUG is too serious because the function later
returns -ENOSPC.

Thanks Paolo Bonzini, Christoph Hellwig.

Signed-off-by: Richard W.M. Jones <rjo...@redhat.com>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
---
 drivers/virtio/virtio_ring.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5e1b548828e6..27cbc1eab868 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -296,7 +296,6 @@ static inline int virtqueue_add(struct virtqueue *_vq,
}
 #endif
 
-   BUG_ON(total_sg > vq->vring.num);
BUG_ON(total_sg == 0);
 
head = vq->free_head;
@@ -305,8 +304,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 * buffers, then go indirect. FIXME: tune this threshold */
if (vq->indirect && total_sg > 1 && vq->vq.num_free)
desc = alloc_indirect(_vq, total_sg, gfp);
-   else
+   else {
desc = NULL;
+   WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
+   }
 
if (desc) {
/* Use a single buffer which doesn't continue */
-- 
2.13.1



Re: Increased memory usage with scsi-mq

2017-08-10 Thread Richard W.M. Jones
On Wed, Aug 09, 2017 at 06:50:10PM +0200, Paolo Bonzini wrote:
> On 09/08/2017 18:01, Christoph Hellwig wrote:
> > On Mon, Aug 07, 2017 at 03:07:48PM +0200, Paolo Bonzini wrote:
> >> can_queue should depend on the virtqueue size, which unfortunately can
> >> vary for each virtio-scsi device in theory.  The virtqueue size is
> >> retrieved by drivers/virtio prior to calling vring_create_virtqueue, and
> >> in QEMU it is the second argument to virtio_add_queue.
> > 
> > Why is that unfortunate?  We don't even have to set can_queue in
> > the host template, we can dynamically set it on per-host basis.
> 
> Ah, cool, I thought allocations based on can_queue happened already in
> scsi_host_alloc, but they happen at scsi_add_host time.

I think I've decoded all that information into the patch below.

I tested it, and it appears to work: when I set cmd_per_lun on the
qemu command line, I see that the guest can add more disks:

  With scsi-mq enabled:   175 disks
  cmd_per_lun not set:177 disks  *
  cmd_per_lun=16: 776 disks  *
  cmd_per_lun=4: 1160 disks  *
  With scsi-mq disabled: 1755 disks
 * = new result

>From my point of view, this is a good result, but you should be warned
that I don't fully understand what's going on here and I may have made
obvious or not-so-obvious mistakes.

I tested the performance impact and it's not noticable in the
libguestfs case even with very small cmd_per_lun settings, but
libguestfs is largely serial and so this result won't be applicable to
guests in general.

Also, should the guest kernel validate cmd_per_lun to make sure it's
not too small or large?  And if so, what would the limits be?

Rich.

>From e923e49846189b2f55f3f02b70a290d4be237ed5 Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjo...@redhat.com>
Date: Thu, 10 Aug 2017 12:21:47 +0100
Subject: [PATCH] scsi: virtio_scsi: Set can_queue based on cmd_per_lun passed
 by hypervisor.

Signed-off-by: Richard W.M. Jones <rjo...@redhat.com>
---
 drivers/scsi/virtio_scsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 9be211d68b15..b22591e9b16b 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -973,7 +973,7 @@ static int virtscsi_probe(struct virtio_device *vdev)
goto virtscsi_init_failed;
 
cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1;
-   shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);
+   shost->cmd_per_lun = shost->can_queue = cmd_per_lun;
shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0x;
 
/* LUNs > 256 are reported with format 1, so they go in the range
-- 
2.13.1


-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW


Increased memory usage with scsi-mq

2017-08-04 Thread Richard W.M. Jones
https://bugzilla.redhat.com/show_bug.cgi?id=1478201

We have a libguestfs test which adds 256 virtio-scsi disks to a qemu
virtual machine.  The VM has 500 MB of RAM, 1 vCPU and no swap.

This test has been failing for a little while.  It runs out of memory
during SCSI enumeration in early boot.

Tonight I bisected the cause to:

  5c279bd9e40624f4ab6e688671026d6005b066fa is the first bad commit
  commit 5c279bd9e40624f4ab6e688671026d6005b066fa
  Author: Christoph Hellwig 
  Date:   Fri Jun 16 10:27:55 2017 +0200

scsi: default to scsi-mq

Remove the SCSI_MQ_DEFAULT config option and default to the blk-mq I/O
path now that we had plenty of testing, and have I/O schedulers for
blk-mq.  The module option to disable the blk-mq path is kept around for
now.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Martin K. Petersen 

  :04 04 57ec7d5d2ba76592a695f533a69f747700c31966
  c79f6ecb070acc4fadf6fc05ca9ba32bc9c0c665 Mdrivers

I also wrote a small test to see the maximum number of virtio-scsi
disks I could add to the above VM.  The results were very surprising
(to me anyhow):

  With scsi-mq enabled:   175 disks
  With scsi-mq disabled: 1755 disks

I don't know why the ratio is almost exactly 10 times.

I read your slides about scsi-mq and it seems like a significant
benefit to large machines, but could the out of the box defaults be
made more friendly for small memory machines?

Thanks,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org


Re: Increased memory usage with scsi-mq

2017-08-05 Thread Richard W.M. Jones
On Sat, Aug 05, 2017 at 10:44:36AM +0200, Christoph Hellwig wrote:
> On Fri, Aug 04, 2017 at 10:00:47PM +0100, Richard W.M. Jones wrote:
> > I read your slides about scsi-mq and it seems like a significant
> > benefit to large machines, but could the out of the box defaults be
> > made more friendly for small memory machines?
> 
> The default inumber of queues and queue depth and thus memory usage is
> set by the LLDD.
> 
> Try to reduce the can_queue value in virtio_scsi and/or make sure
> you use the single queue variant in your VM (which should be tunable
> in qemu).

Thanks, this is interesting.

Virtio-scsi seems to have a few settable parameters that might be
related to this:

  DEFINE_PROP_UINT32("num_queues", VirtIOSCSI, parent_obj.conf.num_queues, 1),
  DEFINE_PROP_UINT32("max_sectors", VirtIOSCSI, parent_obj.conf.max_sectors,
0x),
  DEFINE_PROP_UINT32("cmd_per_lun", VirtIOSCSI, parent_obj.conf.cmd_per_lun,
128),

Unfortunately (assuming I'm setting them right - see below), none of
them have any effect on the number of disks that I can add to the VM.

I am testing them by placing them in the ‘-device virtio-scsi-pci’
parameter, ie. as a property of the controller, not a property of the
LUN, eg:

-device virtio-scsi-pci,cmd_per_lun=32,id=scsi \
-drive 
file=/home/rjones/d/libguestfs/tmp/libguestfshXImTv/scratch.1,cache=unsafe,format=raw,id=hd0,if=none
 \
-device scsi-hd,drive=hd0 \

The debugging output is a bit too large to attach to this email, but I
have placed it at the link below.  It contains (if you scroll down a
bit) the full qemu command line and full kernel output.

  http://oirase.annexia.org/tmp/bz1478201-log.txt

I can add some extra debugging into the kernel if you like.  Just
point me to the right place.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html


Re: Increased memory usage with scsi-mq

2017-08-07 Thread Richard W.M. Jones
On Mon, Aug 07, 2017 at 02:11:39PM +0200, Paolo Bonzini wrote:
> You could also add a module parameter to the driver, and set it to 64 on
> the kernel command line (there is an example in
> drivers/scsi/vmw_pvscsi.c of how to do it).

[Proviso: I've not tested the performance of difference values, nor do
I have any particular knowledge in this area]

can_queue is documented as:

 * This determines if we will use a non-interrupt driven
 * or an interrupt driven scheme.  It is set to the maximum number
 * of simultaneous commands a given host adapter will accept.

Wouldn't it be better to make it default to k * number of CPUs for
some small integer k?

I looked at the other scsi drivers and they set it to all kinds of
values.  1, small integers, large integers, configurable values.

Also I noticed this code in virtio_scsi.c:

cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1;
shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);

but setting cmd_per_lun (as a qemu virtio-scsi-pci parameter) didn't
seem to make any difference to memory usage.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/


Re: Increased memory usage with scsi-mq

2017-08-05 Thread Richard W.M. Jones
On Sat, Aug 05, 2017 at 03:39:54PM +0200, Christoph Hellwig wrote:
> For now can you apply this testing patch to the guest kernel?
> 
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 9be211d68b15..0cbe2c882e1c 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -818,7 +818,7 @@ static struct scsi_host_template 
> virtscsi_host_template_single = {
>   .eh_timed_out = virtscsi_eh_timed_out,
>   .slave_alloc = virtscsi_device_alloc,
>  
> - .can_queue = 1024,
> + .can_queue = 64,
>   .dma_boundary = UINT_MAX,
>   .use_clustering = ENABLE_CLUSTERING,
>   .target_alloc = virtscsi_target_alloc,
> @@ -839,7 +839,7 @@ static struct scsi_host_template 
> virtscsi_host_template_multi = {
>   .eh_timed_out = virtscsi_eh_timed_out,
>   .slave_alloc = virtscsi_device_alloc,
>  
> - .can_queue = 1024,
> + .can_queue = 64,
>   .dma_boundary = UINT_MAX,
>   .use_clustering = ENABLE_CLUSTERING,
>   .target_alloc = virtscsi_target_alloc,
> @@ -983,7 +983,7 @@ static int virtscsi_probe(struct virtio_device *vdev)
>   shost->max_id = num_targets;
>   shost->max_channel = 0;
>   shost->max_cmd_len = VIRTIO_SCSI_CDB_SIZE;
> - shost->nr_hw_queues = num_queues;
> + shost->nr_hw_queues = 1;
>  
>  #ifdef CONFIG_BLK_DEV_INTEGRITY
>   if (virtio_has_feature(vdev, VIRTIO_SCSI_F_T10_PI)) {

Yes, that's an improvement, although it's still a little way off the
density possible the old way:

  With scsi-mq enabled:   175 disks
* With this patch:319 disks *
  With scsi-mq disabled: 1755 disks

Also only the first two hunks are necessary.  The kernel behaves
exactly the same way with or without the third hunk (ie. num_queues
must already be 1).

Can I infer from this that qemu needs a way to specify the can_queue
setting to the virtio-scsi driver in the guest kernel?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org